We will put again unique filtered index to good use. Given this table:
create table PeriodReviewers ( PeriodId int not null references Period(PeriodId), RevieweePersonId int not null references Person(PersonId), ReviewerPersonId int not null references Person(PersonId), IsPrimaryReviewer bit not null default(0), constraint pk_PeriodReviewers primary key(PeriodId, RevieweePersonId, ReviewerPersonId) );
These are all valid data:
PeriodId RevieweePersonId ReviewerPersonId IsPrimaryReviewer 1 Marcus Buddy 1 1 Marcus Raymund 0 1 Marcus Ely 0 1 Buddy Raymund 1 1 Buddy Ely 0 1 Raymund Ely 1
Invalid data such as the following can be blocked:
PeriodId RevieweePersonId ReviewerPersonId IsPrimaryReviewer 1 Marcus Buddy 1 1 Marcus Raymund 0 1 Marcus Raymund 0 <-- this is blocked, two Raymunds 1 Marcus Ely 0 1 Buddy Raymund 1 1 Buddy Ely 0 1 Raymund Ely 1
However the above unique constraint can't defend the system on invalid data such as the following:
PeriodId RevieweePersonId ReviewerPersonId IsPrimaryReviewer 1 Marcus Buddy 1 1 Marcus Raymund 1 <-- should error,only one primary reviewer to each person 1 Marcus Ely 0 1 Buddy Raymund 1 1 Buddy Ely 0 1 Raymund Ely 1
There should be only one primary reviewer for each reviewee, but since our unique constraint is applied on these three fields: PeriodId + RevieweePersonId + ReviewerPersonId, invalid data such as above could creep in to our system.
This is the ideal solution:
create table PeriodReviewers ( PeriodId int not null references Period(PeriodId), RevieweePersonId int not null references Person(PersonId), ReviewerPersonId int not null references Person(PersonId), constraint pk_PeriodReviewers primary key(PeriodId, RevieweePersonId, ReviewerPersonId) ); create table PeriodPrimaryReviewers ( PeriodId int not null references Period(PeriodId), RevieweePersonId int not null references Person(PersonId), ReviewerPersonId int not null references Person(PersonId), constraint pk_PeriodReviewers primary key(PeriodId, RevieweePersonId) );
That's a fully normalized design. The advantage of normalization is we can save space on redundant data. On our business case, there can be only one primary reviewer for each person on a given period, so let's say we have 20 reviewers for each person on a given period, those other 19 reviewers, who are not primary reviewers, whose IsPrimaryReviewer are all set to false, those 19 IsPrimary fields are just wasting space, IsPrimaryReviewer field should be removed from the table. To normalize, just create a table that maintains each person's primary reviewer on a given period, as illustrated on the DDL above.
Normalize until it hurts, denormalize until it works -- Jeff Atwood
However fully normalizing the database might not be the norm nor an option, hence denormalized schema might be present in your system like the first one shown in this article. As we all know, denormalized design can be faster for queries. So for example, we wanted to list all the person's reviewer and indicate if that reviewer is a primary reviewer of the person. It's just this simple with denormalized table:
select PeriodId, RevieweePersonId, ReviewerPersonId, IsPrimaryReviewer from PeriodReviewers
Now, to do that on fully normalized tables:
select emp.PeriodId, emp.RevieweePersonId, emp.ReviewerPersonId, emp.IsPrimaryReviewer = case when pri.ReviewerPersonId is not null then convert(bit, 1) else convert(bit, 0) end from PeriodReviewers emp left join PeriodPrimaryReviewers pri on emp.PeriodId = pri.PeriodId and emp.RevieweePersonId = pri.RevieweePersonId and emp.ReviewerPersonId = pri.ReviewerPersonId
Using very normalized design, queries could become slower. Denormalized table makes for faster queries, as there's no need to join tables. However, denormalized design can cause data integrity problem. On denormalized design, concurrent update or not properly implemented application could let bad data slip in. So in our denormalized design, this bad data could slip in:
PeriodId RevieweePersonId ReviewerPersonId IsPrimaryReviewer 1 Marcus Buddy 1 1 Marcus Raymund 1 1 Marcus Ely 0 1 Buddy Raymund 1 1 Buddy Ely 0 1 Raymund Ely 1
As we can see, Marcus has two primary reviewers now. This kind of errors doesn't happen on normalized database design. If the denormalized design is already in place, it could be far more costly to redesign the database, and it's not guaranteed that fully normalizing the design won't affect query's performance. But how can we prevent bad data such as above from happening if we maintain the denormalized design?
Enter partial index, er.. filtered index. With SQL Server's filtered index (available since version 2008), we can create unique filtered index to prevent two or more primary reviewers for each person.
To note, the following is still allowed even we add unique filtered index, which is a valid business case.
PeriodId RevieweePersonId ReviewerPersonId IsPrimaryReviewer 1 Marcus Buddy 1 1 Marcus Ely 0 1 Marcus Robin 0
However, upon further adding a reviewer for the same person on the same period, whose IsPrimaryReviewer field value is also set to true, that should be blocked by the database.
PeriodId RevieweePersonId ReviewerPersonId IsPrimaryReviewer 1 Marcus Buddy 1 1 Marcus Ely 0 1 Marcus Robin 0 1 Marcus Raymund 1 <-- adding this fourth row will not be allowed by the database.
To facilitate robust database design for the desired scenario above, we just need to add a unique index on those primary reviewers only, hence the filtered index nomenclature. In most RDBMSes this is called partial index. To wit, this will be the table definition:
create table PeriodReviewers ( PeriodId int not null references Period(PeriodId), RevieweePersonId int not null references Person(PersonId), ReviewerPersonId int not null references Person(PersonId), constraint pk_PeriodReviewers primary key(PeriodId, RevieweePersonId, ReviewerPersonId) ); create unique index ix_PeriodReviewers_PrimaryReviewer on PeriodReviewers(PeriodId, RevieweePersonId) where IsPrimaryReviewer = 1;
We just need to create a unique index on Period and RevieweePersonId for primary reviewers only. When we try to add another primary reviewer for the same person on the same period, it will be stopped by the database, no bad data could creep in. That's how easy it is to prevent bad data on SQL Server. Validating business logic should be done on application layer, but it should also be done on the database too
Happy Computing! ツ
No comments:
Post a Comment