
By now many of you have heard that a number of 30G Zune froze on the last day of 2008 ...at the same time. Ok not really, but it did happened while the Zunes were booting up. A sign of the apocalypse! No, we'll have to wait for 2012 for that. Microsoft quickly confirmed that it was a leap year bug. In case you didn't know, 2008 was indeed a leap year (366 days).
Since the problem seems to only affect recently purchased Zune players or certain players with the latest firmware, we can assume that the bug has not been lurking in there for the last few years.
Let's get technical
Why is the 30G Zune failing at boot time ?Most computers or any devices with a processor really, need to keep the time if you will. Technically your electronic device i.e. your Zune or iPod is never truly off. System time and settings are stored in something known as
RTC/NVRAM or CMOS RAM , which is in essence an embedded chip powered by a long lasting CMOS battery (it can last a few years). Your electronic device is most likely keeping track of the number of seconds since a specific date, Jan 1, 1972 for instance...non stop.
When the Zune boots up that constantly increasing number of seconds is converted into a proper calendar date which takes into account (leap year, leap seconds, daylight saving time). It can't figure out your time zone though; the first time you turn on your Zune player, it might request that you to input your time zone.That information goes into regular storage (flash or hard-drive).If you reset your Zune or your iPod, you may have to again specify your time zone. Anyway when the 30G Zune attempted to wake up on day 366 of 2008 it encountered a software bug.
Here is culprit :
full source .
(snippet in question)
1 BOOL ConvertDays(UINT32 days, SYSTEMTIME* lpTime){
2 int dayofweek, month, year;
3 UINT8 *month_tab;
4 //Calculate current day of the week
5 dayofweek = GetDayOfWeek(days);
6 year = ORIGINYEAR;
// 19807 while (days > 365)
// Number of days since Jan 1, 19808 {
9 if (IsLeapYear(year))
// Yes 2008 is a leap year10 {
11 if (days > 366)
// No. Today is day 366 "days" does is not reset12 {
13 days -= 366;
14 year += 1;
15 }
16 }
17 else
18 {
19 days -= 365;
20 year += 1;
21 }
22 }
As you can tell, this C function will not be able to get out of the "while (days > 365)" loop. It will get stuck within the "if (IsLeapYear(year))" condition...until next year.
Why ?Let's dive in:
7: while (days > 365) ---> True, current date is Dec 31st 2008 (366th day of the year).
9: if (IsLeapYear(year)) ---> True
11: if (days > 366) ---> Not True! at least not until tomorrow Jan 1st 2009
Code execution is stuck since there are no alternative conditions. It's a leap year, current day is greater than 365 and not greater than 366.
11: if (days > 366)
12 {
13 days -= 366;
14 year += 1;
15 }
Simple rewrite:
while ( (days > 365) && (!IsLeapYear(year) ) ){
days -= 365;
year += 1;
}
while ( (days > 366) && (IsLeapYear(year) ) ){
days -= 366;
year += 1;
}
even shorter rewrite:
while (days > (IsLeapYear(year) ? 366 : 365) ){
year += 1;
if (IsLeapYear(year)){
days -= 366;
}
else {
days -= 365;
}
}
Basic code reviews would have easily spotted the problem. We may not know whether this bug is the product of one too many pizza-redbull induced late night coding binge, but one thing is clear: companies that do not review code or do not understand the purpose of a proper code review are doomed to repeat those kind of trivial mistakes. As we can all realize, the consequences were pretty severe. How many customers may give up on the Zune after this ? Imagine a similar bug in a medical device, ATM, space station...
I have never heard of great programmers who did not introduce bugs. As you innovate, as you push the limits and frankly the more you code, you will be introducing more bugs. If you don't, something is wrong. That's human.
Of course there are those programmers who will indefinitely recycle old code or cut and paste all day long. Those tend to have fewer bugs in their code, of course!
Too often code reviews degenerate into a kind of blame game, and that's usually a pretty good warning sign. Bugs are part of the course in software development. If you are not finding any, then you are not looking hard enough or waiting long enough (every 4 years for the Zune).
Next leap year 2012...