Sometimes it's hard to give
clear instruction, especially if you are not inclined to be
exact with your instruction:
var persons =
from p in session.Query<Person>()
select new
{
p.BusinessEntityId, p.FirstName, p.LastName, Count = p.PayHistories.Count()
};
What's the SQL generated by that Linq?
NHibernate-generated SQL:
select
person0_.BusinessEntityId as col_0_0_,
person0_.FirstName as col_1_0_,
person0_.LastName as col_2_0_,
(select
cast(count(*) as INT)
from
HumanResources.EmployeePayHistory payhistori1_
where
person0_.BusinessEntityId=payhistori1_.BusinessEntityId) as col_3_0_
from
Person.Person person0_
Entity Framework-generated SQL:
SELECT
[Extent1].[BusinessEntityId] AS [BusinessEntityId],
[Extent1].[LastName] AS [LastName],
[Extent1].[FirstName] AS [FirstName],
(SELECT
COUNT(1) AS [A1]
FROM
[HumanResources].[EmployeePayHistory] AS [Extent2]
WHERE
[Extent1].[BusinessEntityId] = [Extent2].[BusinessEntityId]) AS [C1]
FROM
[Person].[Person] AS [Extent1]
"That's not efficient! That's not what I wanted" you told yourself
I've often felt that much of the frustration with ORMs is about inflated expectations -- http://martinfowler.com/bliki/OrmHate.html
There are developers who insists that the above should generate a GROUP BY COUNT. Don't expect a GROUP BY will be generated on the Linq you've made above. Who told you that computers will do what you
wanted or
hoped them to do? You cannot wish your way around on solving a problem, especially on giving instructions to computers
With that being said, let's be exact with our
instructions. Let's cut to the chase, following is how to do efficient aggregation in NHibernate and Entity Framework. Initial step, flatten the hierarchical objects via SelectMany; final step, do a group by on the flattened object. Easy-peasy!
var persons =
from p in session.Query<Person>()
.SelectMany(x => x.PayHistories, (person, payHistory) => new {person, payHistory})
group p by new {p.person.BusinessEntityId, p.person.FirstName, p.person.LastName} into g
select new {g.Key, Count = g.Count()};
SQL output:
NHibernate-generated SQL:
select
person0_.BusinessEntityId as col_0_0_,
person0_.FirstName as col_1_0_,
person0_.LastName as col_2_0_,
cast(count(*) as INT) as col_3_0_
from
Person.Person person0_
inner join
HumanResources.EmployeePayHistory payhistori1_
on person0_.BusinessEntityId=payhistori1_.BusinessEntityId
group by
person0_.BusinessEntityId ,
person0_.FirstName ,
person0_.LastName
Entity Framework-Generated SQL. Go home Entity Framework, you're drunk!
SELECT
[GroupBy1].[K1] AS [BusinessEntityId],
[GroupBy1].[K4] AS [C1],
[GroupBy1].[K3] AS [FirstName],
[GroupBy1].[K2] AS [LastName],
[GroupBy1].[A1] AS [C2]
FROM ( SELECT
[Join1].[K1] AS [K1],
[Join1].[K2] AS [K2],
[Join1].[K3] AS [K3],
[Join1].[K4] AS [K4],
COUNT([Join1].[A1]) AS [A1]
FROM ( SELECT
[Extent1].[BusinessEntityId] AS [K1],
[Extent1].[LastName] AS [K2],
[Extent1].[FirstName] AS [K3],
1 AS [K4],
1 AS [A1]
FROM [Person].[Person] AS [Extent1]
INNER JOIN [HumanResources].[EmployeePayHistory] AS [Extent2] ON [Extent1].[BusinessEntityId] = [Extent2].[BusinessEntityId]
) AS [Join1]
GROUP BY [K1], [K2], [K3], [K4]
) AS [GroupBy1]
With the exact instructions, the computer will do what we told it to do, NHibernate generated what we wanted, a COUNT via efficient GROUP BY, instead of subquery in SELECT. However, even giving an exact instruction is not a guarantee that the computer will do what we
told it to do, look at how stubborn Entity Framework is; though also it doesn't generate a subquery for COUNT, the approach and efficiency of Entity Framework-generated SQL is questionable and hard to reason about
A keen developer shall observe, the GROUP BY we made is not exactly the same as the subquery approach; the subquery approach could report persons without employee pay history while the INNER JOIN could not. We should do a LEFT JOIN and make a conditional count on non-null. Entity Framework solution:
var query =
from p in db.Set<Person>()
.SelectMany(x => x.PayHistories.DefaultIfEmpty(), (person, payHistory) => new {person, payHistory })
group p by new { p.person.BusinessEntityId, p.person.FirstName, p.person.LastName } into g
select new { g.Key, Count = g.Sum(x => x.payHistory != null ? 1 : 0) };
Entity Framework-generated SQL:
SELECT
[GroupBy1].[K1] AS [BusinessEntityId],
[GroupBy1].[K4] AS [C1],
[GroupBy1].[K3] AS [FirstName],
[GroupBy1].[K2] AS [LastName],
[GroupBy1].[A1] AS [C2]
FROM ( SELECT
[Join1].[K1] AS [K1],
[Join1].[K2] AS [K2],
[Join1].[K3] AS [K3],
[Join1].[K4] AS [K4],
SUM([Join1].[A1]) AS [A1]
FROM ( SELECT
[Extent1].[BusinessEntityId] AS [K1],
[Extent1].[LastName] AS [K2],
[Extent1].[FirstName] AS [K3],
1 AS [K4],
CASE WHEN ( NOT (([Extent2].[BusinessEntityId] IS NULL) AND ([Extent2].[RateChangeDate] IS NULL))) THEN 1 ELSE 0 END AS [A1]
FROM [Person].[Person] AS [Extent1]
LEFT OUTER JOIN [HumanResources].[EmployeePayHistory] AS [Extent2] ON [Extent1].[BusinessEntityId] = [Extent2].[BusinessEntityId]
) AS [Join1]
GROUP BY [K1], [K2], [K3], [K4]
) AS [GroupBy1]
Unfortunately, though it's a relatively easy fix on Entity Framework, we just added a DefaultIfEmpty and do a conditional count on non-null, the same won't work on NHibernate. As of the time of this writing, NHibernate Linq doesn't support DefaultIfEmpty, it can't produce a LEFT JOIN on collection nor on manual joins. NHibernate has an API for that trivial task, and its API is more complex and longer than the problem it is solving :-)
EmployeePayHistory payHistory = null;
var persons =
session.QueryOver<Person>()
.Left.JoinAlias(p => p.PayHistories, () => payHistory)
.SelectList(x => x
.SelectGroup(p => p.BusinessEntityId)
.SelectGroup(p => p.FirstName)
.SelectGroup(p => p.LastName)
.SelectCount(p => payHistory.EmployeePayHistoryCompositePK.BusinessEntityId)
)
.List<object[]>()
.Select(cols => new
{
Id = (int)cols[0],
FirstName = (string)cols[1],
LastName = (string)cols[2],
Count = (int)cols[3]
});
NHibernate-generated SQL:
SELECT
this_.BusinessEntityId as y0_,
this_.FirstName as y1_,
this_.LastName as y2_,
count(payhistory1_.BusinessEntityId) as y3_
FROM
Person.Person this_
left outer join
HumanResources.EmployeePayHistory payhistory1_
on this_.BusinessEntityId=payhistory1_.BusinessEntityId
GROUP BY
this_.BusinessEntityId,
this_.FirstName,
this_.LastName
With clear instructions and reasonable expectations, computers will do what you wanted it to do
Happy Coding! ツ