Some devs who have fervent belief that CTE is just a cursor got me worked up
Hearing that broken theory from them, I decided to create a proof-of-concept that would prove otherwise. As we only have limited keystrokes left in our hands, I decided to practice first at play.typeracer.com, being able to type fast is certainly a win if one is very worked up to quickly prove something by quickly making a proof-of-concept and quickly blogging it afterwards, all needed be done on one sitting. So that's how seriously worked up I am to prove something lol
Heheh scratch that, I already made an analysis and proof an eon ago that CTE is not slow.
A CTE is just an inline view, and in turn is just a table-deriving query, blogged it here: http://www.ienablemuch.com/2013/07/debunking-myth-that-cte-is-slow.html
CTE is not slow, not understanding the intent and how the query works is what makes a query slow, blogged it here: http://www.ienablemuch.com/2012/05/recursive-cte-is-evil-and-cursor-is.html
Please do note that a **recursive** CTE is being done sequentially, hence the "cursor-like" belief of some devs, however, the recursive CTE sequential steps is not being done on cursor. The CTE's loop is made on C++, which makes CTE faster, while a CURSOR's loop is being done via T-SQL, T-SQL loop is slow. C#'s loop is even an order of magnitude faster than T-SQL loop. Push the loop on C++, i.e., push the loop on CTEs. For non-recursive CTE, rest assured, the query still operates in set-based manner, not in cursor-like manner.
Time to stop spewing the nonsense that CTE is just a cursor.
Happy Coding!
"Simplicity can't be bought later, it must be earned from the start" -- DB
Thursday, September 26, 2013
Monday, September 16, 2013
Using Partial Index On SQL Server To Enforce Uniqueness On Denormalized Design
On my last post, using partial index (aka filtered index) I showed you how we can still enforce unique constraint even when soft deletions is the chosen mechanism for deleting records.
We will put again unique filtered index to good use. Given this table:
These are all valid data:
Invalid data such as the following can be blocked:
However the above unique constraint can't defend the system on invalid data such as the following:
There should be only one primary reviewer for each reviewee, but since our unique constraint is applied on these three fields: PeriodId + RevieweePersonId + ReviewerPersonId, invalid data such as above could creep in to our system.
This is the ideal solution:
That's a fully normalized design. The advantage of normalization is we can save space on redundant data. On our business case, there can be only one primary reviewer for each person on a given period, so let's say we have 20 reviewers for each person on a given period, those other 19 reviewers, who are not primary reviewers, whose IsPrimaryReviewer are all set to false, those 19 IsPrimary fields are just wasting space, IsPrimaryReviewer field should be removed from the table. To normalize, just create a table that maintains each person's primary reviewer on a given period, as illustrated on the DDL above.
However fully normalizing the database might not be the norm nor an option, hence denormalized schema might be present in your system like the first one shown in this article. As we all know, denormalized design can be faster for queries. So for example, we wanted to list all the person's reviewer and indicate if that reviewer is a primary reviewer of the person. It's just this simple with denormalized table:
Now, to do that on fully normalized tables:
Using very normalized design, queries could become slower. Denormalized table makes for faster queries, as there's no need to join tables. However, denormalized design can cause data integrity problem. On denormalized design, concurrent update or not properly implemented application could let bad data slip in. So in our denormalized design, this bad data could slip in:
As we can see, Marcus has two primary reviewers now. This kind of errors doesn't happen on normalized database design. If the denormalized design is already in place, it could be far more costly to redesign the database, and it's not guaranteed that fully normalizing the design won't affect query's performance. But how can we prevent bad data such as above from happening if we maintain the denormalized design?
Enter partial index, er.. filtered index. With SQL Server's filtered index (available since version 2008), we can create unique filtered index to prevent two or more primary reviewers for each person.
To note, the following is still allowed even we add unique filtered index, which is a valid business case.
However, upon further adding a reviewer for the same person on the same period, whose IsPrimaryReviewer field value is also set to true, that should be blocked by the database.
To facilitate robust database design for the desired scenario above, we just need to add a unique index on those primary reviewers only, hence the filtered index nomenclature. In most RDBMSes this is called partial index. To wit, this will be the table definition:
We just need to create a unique index on Period and RevieweePersonId for primary reviewers only. When we try to add another primary reviewer for the same person on the same period, it will be stopped by the database, no bad data could creep in. That's how easy it is to prevent bad data on SQL Server. Validating business logic should be done on application layer, but it should also be done on the database too
Happy Computing! ツ
We will put again unique filtered index to good use. Given this table:
create table PeriodReviewers ( PeriodId int not null references Period(PeriodId), RevieweePersonId int not null references Person(PersonId), ReviewerPersonId int not null references Person(PersonId), IsPrimaryReviewer bit not null default(0), constraint pk_PeriodReviewers primary key(PeriodId, RevieweePersonId, ReviewerPersonId) );
These are all valid data:
PeriodId RevieweePersonId ReviewerPersonId IsPrimaryReviewer 1 Marcus Buddy 1 1 Marcus Raymund 0 1 Marcus Ely 0 1 Buddy Raymund 1 1 Buddy Ely 0 1 Raymund Ely 1
Invalid data such as the following can be blocked:
PeriodId RevieweePersonId ReviewerPersonId IsPrimaryReviewer 1 Marcus Buddy 1 1 Marcus Raymund 0 1 Marcus Raymund 0 <-- this is blocked, two Raymunds 1 Marcus Ely 0 1 Buddy Raymund 1 1 Buddy Ely 0 1 Raymund Ely 1
However the above unique constraint can't defend the system on invalid data such as the following:
PeriodId RevieweePersonId ReviewerPersonId IsPrimaryReviewer 1 Marcus Buddy 1 1 Marcus Raymund 1 <-- should error,only one primary reviewer to each person 1 Marcus Ely 0 1 Buddy Raymund 1 1 Buddy Ely 0 1 Raymund Ely 1
There should be only one primary reviewer for each reviewee, but since our unique constraint is applied on these three fields: PeriodId + RevieweePersonId + ReviewerPersonId, invalid data such as above could creep in to our system.
This is the ideal solution:
create table PeriodReviewers ( PeriodId int not null references Period(PeriodId), RevieweePersonId int not null references Person(PersonId), ReviewerPersonId int not null references Person(PersonId), constraint pk_PeriodReviewers primary key(PeriodId, RevieweePersonId, ReviewerPersonId) ); create table PeriodPrimaryReviewers ( PeriodId int not null references Period(PeriodId), RevieweePersonId int not null references Person(PersonId), ReviewerPersonId int not null references Person(PersonId), constraint pk_PeriodReviewers primary key(PeriodId, RevieweePersonId) );
That's a fully normalized design. The advantage of normalization is we can save space on redundant data. On our business case, there can be only one primary reviewer for each person on a given period, so let's say we have 20 reviewers for each person on a given period, those other 19 reviewers, who are not primary reviewers, whose IsPrimaryReviewer are all set to false, those 19 IsPrimary fields are just wasting space, IsPrimaryReviewer field should be removed from the table. To normalize, just create a table that maintains each person's primary reviewer on a given period, as illustrated on the DDL above.
Normalize until it hurts, denormalize until it works -- Jeff Atwood
However fully normalizing the database might not be the norm nor an option, hence denormalized schema might be present in your system like the first one shown in this article. As we all know, denormalized design can be faster for queries. So for example, we wanted to list all the person's reviewer and indicate if that reviewer is a primary reviewer of the person. It's just this simple with denormalized table:
select PeriodId, RevieweePersonId, ReviewerPersonId, IsPrimaryReviewer from PeriodReviewers
Now, to do that on fully normalized tables:
select emp.PeriodId, emp.RevieweePersonId, emp.ReviewerPersonId, emp.IsPrimaryReviewer = case when pri.ReviewerPersonId is not null then convert(bit, 1) else convert(bit, 0) end from PeriodReviewers emp left join PeriodPrimaryReviewers pri on emp.PeriodId = pri.PeriodId and emp.RevieweePersonId = pri.RevieweePersonId and emp.ReviewerPersonId = pri.ReviewerPersonId
Using very normalized design, queries could become slower. Denormalized table makes for faster queries, as there's no need to join tables. However, denormalized design can cause data integrity problem. On denormalized design, concurrent update or not properly implemented application could let bad data slip in. So in our denormalized design, this bad data could slip in:
PeriodId RevieweePersonId ReviewerPersonId IsPrimaryReviewer 1 Marcus Buddy 1 1 Marcus Raymund 1 1 Marcus Ely 0 1 Buddy Raymund 1 1 Buddy Ely 0 1 Raymund Ely 1
As we can see, Marcus has two primary reviewers now. This kind of errors doesn't happen on normalized database design. If the denormalized design is already in place, it could be far more costly to redesign the database, and it's not guaranteed that fully normalizing the design won't affect query's performance. But how can we prevent bad data such as above from happening if we maintain the denormalized design?
Enter partial index, er.. filtered index. With SQL Server's filtered index (available since version 2008), we can create unique filtered index to prevent two or more primary reviewers for each person.
To note, the following is still allowed even we add unique filtered index, which is a valid business case.
PeriodId RevieweePersonId ReviewerPersonId IsPrimaryReviewer 1 Marcus Buddy 1 1 Marcus Ely 0 1 Marcus Robin 0
However, upon further adding a reviewer for the same person on the same period, whose IsPrimaryReviewer field value is also set to true, that should be blocked by the database.
PeriodId RevieweePersonId ReviewerPersonId IsPrimaryReviewer 1 Marcus Buddy 1 1 Marcus Ely 0 1 Marcus Robin 0 1 Marcus Raymund 1 <-- adding this fourth row will not be allowed by the database.
To facilitate robust database design for the desired scenario above, we just need to add a unique index on those primary reviewers only, hence the filtered index nomenclature. In most RDBMSes this is called partial index. To wit, this will be the table definition:
create table PeriodReviewers ( PeriodId int not null references Period(PeriodId), RevieweePersonId int not null references Person(PersonId), ReviewerPersonId int not null references Person(PersonId), constraint pk_PeriodReviewers primary key(PeriodId, RevieweePersonId, ReviewerPersonId) ); create unique index ix_PeriodReviewers_PrimaryReviewer on PeriodReviewers(PeriodId, RevieweePersonId) where IsPrimaryReviewer = 1;
We just need to create a unique index on Period and RevieweePersonId for primary reviewers only. When we try to add another primary reviewer for the same person on the same period, it will be stopped by the database, no bad data could creep in. That's how easy it is to prevent bad data on SQL Server. Validating business logic should be done on application layer, but it should also be done on the database too
Happy Computing! ツ
Subscribe to:
Posts (Atom)