Given a student has this attendance:
July 3 Tuesday July 4 Holiday -- Independence Day July 5 Thursday July 6 Friday July 9 Monday July 12 Thursday July 13 Friday
The query should show the student had achieved 4 consecutive attendance on these dates:
July 3 Tuesday July 5 Thursday July 6 Friday July 9 Monday
July 4 is a holiday and should not affect or penalize the student's attendance performance, the query should show the student has good four consecutive attendance from July 3 to July 9; likewise, the weekends(Saturday and Sunday) should not have an effect on attendance performance.
Student gets an attendance performance of 11 from July 18 to August 1 when she attend classes on following dates:
July 18 Wednesday July 19 Thursday July 20 Friday July 23 Monday July 24 Tuesday July 25 Wednesday July 26 Thursday July 27 Friday July 30 Monday July 31 Tuesday August 1 Wednesday August 3 Friday
How to determine if the given dates are consecutive? This is a solved problem, and it has a terminology to it, it's called islands and gaps algorithm.
The core thing with islands and gaps problem is we must have a mechanism to group consecutive rows so in the final query we can count them. One mechanism is to use row_number windowing function, though row numbering approach will fail if there's duplicate rows, row_number would suffice if the rows are unique, e.g. attendance.
So given this numbers, we have 5 consecutive numbers(1 to 5) and another 3 consecutive numbers(8 to 10):
1 2 3 4 5 8 9 10 12
With row_number we can determine the consecutive-ness of those numbers
N R# 1 1 2 2 3 3 4 4 5 5 8 6 9 7 10 8 12 9
Subtract row number from N, this will give us related rows
N R# GRP 1 1 0 2 2 0 3 3 0 4 4 0 5 5 0 8 6 2 9 7 2 10 8 2 12 9 3
Given the GRP column, we can do this query to count the longest attendance streak a given student has:
with a as ( select n, n - row_number() over(order by n) as grp from tbl ) select * ,dense_rank() over(order by grp) as nth_streak ,count(*) over(partition by grp) as streak from a;
| N | GRP | NTH_STREAK | STREAK | ---------------------------------- | 1 | 0 | 1 | 5 | | 2 | 0 | 1 | 5 | | 3 | 0 | 1 | 5 | | 4 | 0 | 1 | 5 | | 5 | 0 | 1 | 5 | | 8 | 2 | 2 | 3 | | 9 | 2 | 2 | 3 | | 10 | 2 | 2 | 3 | | 12 | 3 | 3 | 1 |
Now, for the final hurdle to tackle, if 5 is Friday, 6 and 7 are Saturday and Sunday respectively. So on the above data, the student should have an 8 consecutive attendance instead of separate 5 and 3. To do that, we need to shift all the date if it is over 7.
Here's the formula for that:
n - ((n/7) * 2) - row_number()over(order by n) as grp
To illustrate, use this query:
with a as ( select n, n - ((n/7) * 2) as shift_up, row_number() over(order by n) as rn from tbl ) select * ,n - rn as grp from a;
Here's the output:
| N | SHIFT_UP | RN | GRP | ---------------------------- | 1 | 1 | 1 | 0 | | 2 | 2 | 2 | 0 | | 3 | 3 | 3 | 0 | | 4 | 4 | 4 | 0 | | 5 | 5 | 5 | 0 | | 8 | 6 | 6 | 2 | | 9 | 7 | 7 | 2 | | 10 | 8 | 8 | 2 | | 12 | 10 | 9 | 3 |
Notice that the value Monday's value(8) shifted to 6, likewise with Tuesday(9, this shifted to 7) and so on.
With this logic at hand, we can now easily determine the consecutive-ness of adjacent rows. We just need to subtract row_number from SHIFT_UP in order to make the current Monday consecutive to last week's Friday. This is the query:
with a as ( select n, n - ((n/7) * 2) - row_number()over(order by n) as grp from tbl ) select * ,dense_rank() over(order by grp) as nth_streak ,count(*) over(partition by grp) as streak from a;
This is the output:
| N | GRP | NTH_STREAK | STREAK | ---------------------------------- | 1 | 0 | 1 | 8 | | 2 | 0 | 1 | 8 | | 3 | 0 | 1 | 8 | | 4 | 0 | 1 | 8 | | 5 | 0 | 1 | 8 | | 8 | 0 | 1 | 8 | | 9 | 0 | 1 | 8 | | 10 | 0 | 1 | 8 | | 12 | 1 | 2 | 1 |
It now yields the date from last week's Monday to Friday(1 to 5) and this week's Monday to Friday(8 to 10) as consecutive 8 days(indicated by GRP=0). Hence yielding a 5+3 consecutive attendance, this yields 8 consecutive attendance.
Another way to visualize how the date are being shifted(to determine date consecutive-ness) two days back:
Original data:
Weekends are not in attendance(e.g. 6,7,13,14) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
If a given date falls on 15th, this will be computed as 15/7 * 2 == 4; then subtract 4 from the original number, 15 - 4 == 11. 15 will become the 11th day. Likewise the 8th day becomes the 6th day; 8 - (8/7 * 2) == 6. The above data yields the following output when the computation is applied:
1 2 3 4 5 6 7 8 9 10 11
The above attendance yields 11 consecutive good attendance. For holidays, you need to slot them on attendance, so the consecutive-ness of dates could be easily determined, then just remove them from the final query.
To apply the above logic on determining consecutive attendance even there are holidays and weekends, use this:
drop table tx; drop table holiday; create table tx ( i int identity(1,1) not null primary key, n varchar(10), d date, constraint ux_tx unique(n,d) ); insert into tx(n,d) values ('john','2012-7-3'), ('john','2012-7-5'), ('john','2012-7-6'), ('john','2012-7-9'), ('john','2012-7-12'), ('john','2012-7-13'), ('john','2012-7-16'), ('john','2012-7-17'), ('john','2012-7-18'), ('john','2012-7-20'), ('john','2012-7-30'), ('john','2012-7-31'), ('paul','2012-7-3'), ('paul','2012-7-5'), ('paul','2012-7-18'), ('paul','2012-7-19'), ('paul','2012-7-20'), ('paul','2012-7-23'), ('paul','2012-7-24'), ('paul','2012-7-25'), ('paul','2012-7-26'), ('paul','2012-7-27'), ('paul','2012-7-30'), ('paul','2012-7-31'), ('paul','2012-8-1'), ('paul','2012-8-3'), ('paul','2012-8-6'), ('paul','2012-8-7'); create table holiday(d date); insert into holiday(d) values ('2012-7-4'); --- query with first_date as ( select dateadd( ww, datediff(ww,0,min(d)), 0 ) as first_date -- get the monday of the earliest date from tx ) ,shifted as ( select tx.n, tx.d, diff = datediff(day, fd.first_date, tx.d) - (datediff(day, fd.first_date, tx.d)/7 * 2) from tx cross join first_date fd union select xxx.n, h.d, diff = datediff(day, fd.first_date, h.d) - (datediff(day, fd.first_date, h.d)/7 * 2) from holiday h cross join first_date fd cross join (select distinct n from tx) as xxx ) ,grouped as ( select * , grp = diff - row_number() over(partition by n order by d) from shifted ) select -- remove staging columns from the output... -- * -- ...just output what the user will see: d, n ,dense_rank() over (partition by n order by grp) as nth_streak ,count(*) over (partition by n, grp) as streak from grouped where d not in (select d from holiday) -- remove the holidays
Output:
| D | N | NTH_STREAK | STREAK | ------------------------------------------- | 2012-07-03 | john | 1 | 4 | | 2012-07-05 | john | 1 | 4 | | 2012-07-06 | john | 1 | 4 | | 2012-07-09 | john | 1 | 4 | | 2012-07-12 | john | 2 | 5 | | 2012-07-13 | john | 2 | 5 | | 2012-07-16 | john | 2 | 5 | | 2012-07-17 | john | 2 | 5 | | 2012-07-18 | john | 2 | 5 | | 2012-07-20 | john | 3 | 1 | | 2012-07-30 | john | 4 | 2 | | 2012-07-31 | john | 4 | 2 | | 2012-07-03 | paul | 1 | 2 | | 2012-07-05 | paul | 1 | 2 | | 2012-07-18 | paul | 2 | 11 | | 2012-07-19 | paul | 2 | 11 | | 2012-07-20 | paul | 2 | 11 | | 2012-07-23 | paul | 2 | 11 | | 2012-07-24 | paul | 2 | 11 | | 2012-07-25 | paul | 2 | 11 | | 2012-07-26 | paul | 2 | 11 | | 2012-07-27 | paul | 2 | 11 | | 2012-07-30 | paul | 2 | 11 | | 2012-07-31 | paul | 2 | 11 | | 2012-08-01 | paul | 2 | 11 | | 2012-08-03 | paul | 3 | 3 | | 2012-08-06 | paul | 3 | 3 | | 2012-08-07 | paul | 3 | 3 |
Live test: http://www.sqlfiddle.com/#!3/709a0/1
No comments:
Post a Comment