Let's take an example on http://stackoverflow.com/questions/497241/how-do-i-perform-a-group-by-on-an-aliased-column-in-ms-sql-server/497251#497251
The question:
"I'm trying to perform a group by action on an aliased column (example below) but can't determine the proper syntax."
SELECT LastName + ', ' + FirstName AS 'FullName'
FROM customers
GROUP BY 'FullName'
Then one stacker answer this:
SELECT LastName + ', ' + FirstName AS 'FullName'
FROM customers
GROUP BY LastName + ', ' + FirstName
Then I suggested to him to change the concatenation in grouping: GROUP BY LastName + ', ' + FirstName
To its natural grouping: GROUP BY Lastname, Firstname
Then somebody defended his answer:
You should leave it in, at least to differentiate between 'x, yz' and 'xy, z' which would roll up to the same string without the comma. – ConcernedOfTunbridgeWells
Now, to refute the argument that "x, yz" and "xy, z" will roll up to same string when concatenation is removed from GROUP BY clause, let's produce sample data that is purported to roll up when there is no comma in GROUP BY clause
create table customers(lastname varchar(50),firstname varchar(50)); insert into customers select 'robot', 'share' union select 'robots', 'hare';
If by virtue of not concatenating strings in GROUP BY will roll up those two fields to same thing, that is this query...
select lastname + firstname as fullname from customers group by lastname, firstname
...will produce one row only:
Output: fullname ----------- robotshare
But no, the above query did not produce one row, despite the two entities have same caption, they are still grouped to two rows:
Output: fullname ----------- robotshare robotshare
And how about this:
select lastname + ', ' + firstname as fullname from customers group by lastname, firstname
Output:
fullname --------------- robot, share robots, hare
Was it reduced to one row? Again No
In fact, the only requirement in GROUP BY clause is to repeat the field(s) from SELECT clause to GROUP BY clause, but if it is not a field, and just an auxiliary information, let's say a caption:
select 'yikes' from customers group by lastname, firstname
Output:
(No column name) ---------------- yikes yikes
your RDBMS won't deem that as invalid query, and result will not be reduced to one row.
Logic-wise, grouping by concatenated result is the one that has the chances of having two or more dissimilar rows be reduced to one row when grouping, the following query is a contrived example but you'll get the point:
select lastname + firstname as fullname from customers group by lastname + firstname
fullname ---------------- robotshare
Now, that is a faulty query, there are two entities, yet they are reduced to one row when grouping.
The following is the correct and proper query. And performant query, not because we don't repeat the concatenation expression, it's because RDBMS properly use the table's index, any RDBMS will cease to use index on function or expression results (unless your RDBMS can put index on function or expression), see the figures below:
select lastname + ', ' + firstname as fullname, 'yikes' as reaction from customers group by lastname, firstname
Output:
fullname reaction ----------------- -------- robot, share yikes robots, hare yikes
Did the above query reduce the result to one row? Again NO
And in fact, you don't need to repeat non-field values in GROUP BY as seen on query above, the 'yikes' is not repeated in GROUP BY
So this is the correct query(and performant query):
select lastname + ', ' + firstname as fullname from customers group by lastname, firstname
Outputs:
fullname ----------------- robot, share robots, hare
Now let's put some index on fields Lastname, Firstname to see how those query differs in execution when run.
create index ux_customers on customers (lastname, firstname)
And let's see how the RDBMS will execute the GROUP BY concatenation:
And how it will execute the natural grouping:
As we can see from the execution plan of concatenated grouping, it spent some time on sorting(78%), while the natural grouping will just rip through the indexed fields directly when it executes the query
No comments:
Post a Comment