If you will try to date a DBA, perhaps you can use this pickup line:
NOT IN is faster than LEFT JOIN + IS NULL combo
Why?
Because 135 millisecond vs 2,194 millisecond
The best wingman for dating DBAs is http://explainextended.com/
NOT IN: 135 milliseconds only:
/*------------------------ SELECT l.id, l.value FROM [20090915_anti].t_left l WHERE l.value NOT IN ( SELECT value FROM [20090915_anti].t_right r ) ------------------------*/ SQL Server parse and compile time: CPU time = 6 ms, elapsed time = 6 ms. (10 row(s) affected) Table 't_left'. Scan count 1, logical reads 499, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 't_right'. Scan count 1, logical reads 3112, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 125 ms, elapsed time = 128 ms.
LEFT JOIN IS NULL: 2,194 milliseconds:
/*------------------------ SELECT l.id, l.value FROM [20090915_anti].t_left l LEFT JOIN [20090915_anti].t_right r ON r.value = l.value WHERE r.value IS NULL ------------------------*/ SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 10 ms. (10 row(s) affected) Table 't_left'. Scan count 9, logical reads 1984, physical reads 27, read-ahead reads 498, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 't_right'. Scan count 9, logical reads 3166, physical reads 37, read-ahead reads 3106, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. Table 'Worktable'. Scan count 9999, logical reads 2239770, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0. SQL Server Execution Times: CPU time = 14646 ms, elapsed time = 2184 ms.
NOT IN is fast and efficient in CPU and Reads
NOT IN -- CPU: 128 Reads: 3,619 Duration: 135 LEFT JOIN IS NULL -- CPU: 14,642 Reads: 2,254,967 Duration: 2,194
Even we are using an smart RDBMS (we are using SQL Server though) and it can discern that our LEFT JOIN IS NULL is doing an anti-join(i.e., a NOT IN) instead, hence our RDBMS gives our LEFT JOIN IS NULL the same fast performance as NOT IN, there's no reason for us to use LEFT JOIN IS NULL for anti-joins. Don't obscure the code's NOT IN intent, don't replace NOT IN with LEFT JOIN IS NULL. Use NOT IN when we need to detect if a column's value is NOT IN another table. See how intuitive is that? Do not complicate a simple concept with LEFT JOIN IS NULL combo.
LEFT JOIN IS NULL is too highfalutin-a-programmin', LEFT JOIN IS NULL is not the place where we should showboat our programming prowess on SQL Server or any RDBMS for that matter. I'm seeing a few (no, not many, fortunately our company hires the best devs in town) highfalutin codes not just in our SQL codes but on C# codes too, I'll put them on a blog when the list is comprehensive enough.
Do not confuse motion and progress. A rocking horse keeps moving but does not make any progress. -- Alfred Montapert
SELECT p.*, e.IsTerminated FROM dbo.Person p join dbo.Employee e on p.PersonID = e.EmployeePersonId left join dbo.PersonResponsibility pr on p.PersonId = pr.ReviewerPersonId -- Hey IS NULL buddy! what ya doin' down far there? where e.AllowLogin = 1 /* other humongous filters here competing for your attention */ -- Hey LEFT JOIN! keeps movin' here buddy, just chillin' and filterin' your dumb join, makin' queries slow and pr.ReviewerPersonId is null
With LEFT JOIN and IS NULL, those two star-crossed lovers could be far apart from each other, thus making code analysis a little harder; contrast that to NOT IN which is pretty much a one-stop-shop syntax, everything is in one place, thus making NOT IN easier to analyze if there's even analysis at all, the code's intent is self-describing. And oh, did I mention NOT IN is fast? Ironically, the humble-looking NOT IN approach is fast:
SELECT p.*, e.IsTerminated FROM dbo.Person p join dbo.Employee e on p.PersonID = e.EmployeePersonId where e.AllowLogin = 1 /* other humongous filters here competing for your attention */ -- Yay! no more LEFT JOIN buddy that can ruin the party! and p.PersonId not in (select pr.ReviewerPersonId from dbo.PersonResponsibility pr)
Unless we really found a corner case where LEFT JOIN IS NULL combo could be faster than NOT IN, we should just keep using NOT IN, aside from it's faster, it is easier to read. And before someone suggest NOT EXISTS, have I forgot to mention IN / NOT IN is easier to read
Happy Coding! ツ
No comments:
Post a Comment