* Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 @ 2009-01-02 19:25 Linas Vepstas 2009-01-02 20:04 ` Diego Calleja ` (3 more replies) 0 siblings, 4 replies; 109+ messages in thread From: Linas Vepstas @ 2009-01-02 19:25 UTC (permalink / raw) To: linux-kernel Cc: Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, Goodgerster, burdell Slashdot reported a story of Linux machines crashing on New years eve. http://ask.slashdot.org/article.pl?sid=09/01/01/1930202 Below follows a summary of the reported crashes. I'm ignoring the zillions of "mine didn't crash" reports, or the "you're a paranoid conspiracy theorist, its random chance" reports. So far, 31 users reported 53 hard crashes at/near midnight, new years. Symptoms are: -- hard hang (systems not pingable) -- irq's not serviced (if disk was active at time of crash, the disk activity light stays lit) -- cold reboot (poweroff) required -- systems work normally after reboot -- no messages in syslog, no kernel oops, no core file crash dumps -- not reproducible (simply setting the clock back is not enough to reproduce; guessing that a simulation of stratum 0 ntp server is needed to force the leap-second.) -- The affected machines seem to be running either 2.6.21, 2.6.26 or 2.6.27 Suspect its an kernel race condition triggered by ntp bumping the second. -- its the leap second, since this doesn't happen other years, -- its a race condition, since some identically configured machines didn't go down, while others did. -- its a race condition, since majority of systems were not affected. -- its a race condition, since affected systems seem to have been mostly non-idle servers, or some non-idle desktops/ tv set-tops. -- ntpd is the only service that monkeys with time adjustments. There is a "well-known" deadlock in 2.6.21 kernels that caused this: http://www.mail-archive.com/git-commits-head@vger.kernel.org/msg15039.html http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=746976a301ac9c9aa10d7d42454f8d6cdad8ff2b;hp=872aad45d6174570dd2e1defc3efee50f2cfcc72 Here's the synopsis of individual reports. Most folks left very little info, nor did they leave a way to contact them. aputerguy Fedora 8 Linux server 2.6.26.6-49.fc8 Intel p4 2.8GHz. ASUS P4PE 2 1-TB Seagate SATA hardrives, 1 200MB PATA drive. 2GB DDR. 1 pchdtv5500 card, 1 winfast 2000XP tv card, 1 nVidia 6200 graphics card. MentalMooMan (785571) <slashdotNO@SPAMjameshallam.info> mythtv box (running mythbuntu) used to be something like 7.10 upgraded to 8.10 2.6.27-9.19-generic ntpd version 4.2.4p4. The CPU is an AMD Athlon XP 1700+ or 1800+. The motherboard is an EPOX EP-8KTA3Pro. message in /var/log/messages at boot-time: "warning: `ntpd' uses 32-bit capabilities (legacy support in use)" Anonymous Coward MythTV box on Fedora 8 (Athlon XP1700+) athakur999 (44340) Mythbuntu-based HTPC AZPolarBear (661815) Fedora 8 system Anonymous Coward 5 of about 70 of our production servers Anonymous Coward I did have two 2.6.21 servers crash last night Anonymous Coward Ubuntu 8.10 MythTV box. SanjuroE (131728) Debian testing and at that time Debian kernel 2.6.26-11. lukas84 (912874) internal testing machine that's still on 2.6.21 Anonymous Coward Debian testing Kernel 2.6.26 zerosumgame (1429741) kernel 2.6.21 on older Dell 1850's Wibla (677432) Both my fileservers running debian etch installed from custom install media (pre-etchnhalf) running 2.6.21-2 and 2.6.21-6 crashed, Maow (620678) Ubuntu 8.04 on AMD64 Burdell (228580) <burdell@iruntheinter.net> RHEL 4 server RHEL 4 update 6 kernel-smp-2.6.9-67.0.7.EL.x86_64 ntp-4.2.0.a.20040617-6.el4.x86_64 Penguin Computing Altus 2600 dual dual-core Opteron 2212 HE 4G RAM nVidia MCP55 chipset I have 9 servers (mostly different hardware, but one the same as above), all running the exact same kernel and package set. Only one crashed; the others logged the leap second and went on fine. Pretzalzz (577309) Travis Crump <pretzalz@techhouse.org> http://lists.debian.org/debian-user/2009/01/msg00006.html debian lenny ntp=1:4.2.4p4+dfsg-7; Linux version 2.6.26-1-amd64 (Debian 2.6.26-4[since updated]) Processor : 2x AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ kst (168867) Ubuntu 8.10 arodland (127775) Debian 2.6.21 Goodgerster (904325) <goodgerster AT gmail DOT com> Debian Morgor (542294) Fedora 8 Two of our production servers running fedora 8 Lightjumper (532700) Fedora 9 server blit (90883) 10 machines running Fedora Core 7 Qwell (684661) Ubuntu 8.10 Jim Fenton (514449) Fedora 8 machine (kernel: 2.6.26.6-49.fc8) Anonymous Coward My laptop dmrobbin (560931) F8 box went down, Anonymous Coward F8 server also "hung" Anonymous Coward 2.6.26-1-amd64. Four of the 20 locked up Anonymous Coward (actually, me, Linas) amd64 dual core, ubuntu 8.04 custom-compiled kernel, 2.6.26-64-bit Anonymous Coward kernel 2.6.26.3-29.fc9. Anonymous Coward 3 Ubuntu 8.10 Virtual Box (3 of them) ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-02 19:25 Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 Linas Vepstas @ 2009-01-02 20:04 ` Diego Calleja 2009-01-02 20:25 ` Robert Hancock 2009-01-02 20:29 ` Linas Vepstas [not found] ` <8752a8760901021328t545a0327v58faebe1e921680a@mail.gmail.com> ` (2 subsequent siblings) 3 siblings, 2 replies; 109+ messages in thread From: Diego Calleja @ 2009-01-02 20:04 UTC (permalink / raw) To: linasvepstas Cc: linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, Goodgerster, burdell El Fri, 2 Jan 2009 13:25:38 -0600, "Linas Vepstas" <linasvepstas@gmail.com> escribió: > Suspect its an kernel race condition triggered by ntp bumping the second. How could I create a test case that reproduces what ntp does? Just add a second? ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-02 20:04 ` Diego Calleja @ 2009-01-02 20:25 ` Robert Hancock 2009-01-03 6:32 ` David Newall 2009-01-02 20:29 ` Linas Vepstas 1 sibling, 1 reply; 109+ messages in thread From: Robert Hancock @ 2009-01-02 20:25 UTC (permalink / raw) To: linux-kernel Cc: linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, Goodgerster, burdell Diego Calleja wrote: > El Fri, 2 Jan 2009 13:25:38 -0600, "Linas Vepstas" <linasvepstas@gmail.com> escribió: > >> Suspect its an kernel race condition triggered by ntp bumping the second. > > How could I create a test case that reproduces what ntp does? Just add > a second? I'd think that setting the clock to just before midnight on Dec.31 and using the adjtimex syscall to set the TIME_INS state on the clock, then waiting until midnight rolls around would be a reasonable test.. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-02 20:25 ` Robert Hancock @ 2009-01-03 6:32 ` David Newall 2009-01-03 6:37 ` Ben Goodger 2009-01-03 7:00 ` Chris Adams 0 siblings, 2 replies; 109+ messages in thread From: David Newall @ 2009-01-03 6:32 UTC (permalink / raw) To: Robert Hancock Cc: linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, Goodgerster, burdell Robert Hancock wrote: > Diego Calleja wrote: >> How could I create a test case that reproduces what ntp does? Just add >> a second? > > I'd think that setting the clock to just before midnight on Dec.31 and > using the adjtimex syscall to set the TIME_INS state on the clock, > then waiting until midnight rolls around would be a reasonable test.. I don't understand this idea, nor the patch for the problem. I don't see why adding a leap second would impact the kernel in any way. Shouldn't this be a simple zoneinfo change, whereby the last two seconds of the year (in each timezone) both map to 31dec2008 23:59:59? That's the way the change has worked in the real world. Why would ntp or the kernel be involved? ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-03 6:32 ` David Newall @ 2009-01-03 6:37 ` Ben Goodger 2009-01-04 8:43 ` David Newall 2009-01-03 7:00 ` Chris Adams 1 sibling, 1 reply; 109+ messages in thread From: Ben Goodger @ 2009-01-03 6:37 UTC (permalink / raw) To: David Newall Cc: Robert Hancock, linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell 2009/1/3 David Newall <davidn@davidnewall.com>: > Robert Hancock wrote: >> Diego Calleja wrote: >>> How could I create a test case that reproduces what ntp does? Just add >>> a second? >> >> I'd think that setting the clock to just before midnight on Dec.31 and >> using the adjtimex syscall to set the TIME_INS state on the clock, >> then waiting until midnight rolls around would be a reasonable test.. > > I don't understand this idea, nor the patch for the problem. I don't > see why adding a leap second would impact the kernel in any way. > Shouldn't this be a simple zoneinfo change, whereby the last two seconds > of the year (in each timezone) both map to 31dec2008 23:59:59? That's > the way the change has worked in the real world. Why would ntp or the > kernel be involved? Actually, the change has worked in the real world with the introduction of a new second named 23:59:60, or else ignoring the leap second entirely and correcting the clock (or not) later... -- Benjamin Goodger -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/S/M/B d- s++:-- a18 c++$ UL>+++ P--- L++>+++ E- W+++$ N--- K? w--- O? M- V? PS+(++) PE-() Y+ PGP+ t 5? X-- R- !tv() b+++>++++ DI+++ D+ G e>++++ h! !r*(-) y ------END GEEK CODE BLOCK------ ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-03 6:37 ` Ben Goodger @ 2009-01-04 8:43 ` David Newall 2009-01-04 9:00 ` Kyle Moffett 0 siblings, 1 reply; 109+ messages in thread From: David Newall @ 2009-01-04 8:43 UTC (permalink / raw) To: Ben Goodger Cc: Robert Hancock, linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell Ben Goodger wrote: > 2009/1/3 David Newall <davidn@davidnewall.com>: > >> Shouldn't this be a simple zoneinfo change, whereby the last two seconds >> of the year (in each timezone) both map to 31dec2008 23:59:59? That's >> the way the change has worked in the real world. Why would ntp or the >> kernel be involved? >> > > Actually, the change has worked in the real world with the > introduction of a new second named 23:59:60 Fine. However you want to describe that last second is immaterial. The point is that diddling the clock is not a true solution. Take seconds since epoch for January 1 and subtract the seconds since epoch since the previous day and if the result isn't 86401 it's wrong. Is Linux wrong? (I gather it is.) ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-04 8:43 ` David Newall @ 2009-01-04 9:00 ` Kyle Moffett 2009-01-04 10:03 ` David Newall 0 siblings, 1 reply; 109+ messages in thread From: Kyle Moffett @ 2009-01-04 9:00 UTC (permalink / raw) To: David Newall Cc: Ben Goodger, Robert Hancock, linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell On Sun, Jan 4, 2009 at 3:43 AM, David Newall <davidn@davidnewall.com> wrote: > Ben Goodger wrote: >> 2009/1/3 David Newall <davidn@davidnewall.com>: >> Actually, the change has worked in the real world with the >> introduction of a new second named 23:59:60 > > Fine. However you want to describe that last second is immaterial. The > point is that diddling the clock is not a true solution. Take seconds > since epoch for January 1 and subtract the seconds since epoch since the > previous day and if the result isn't 86401 it's wrong. Is Linux wrong? > (I gather it is.) Actually, "diddling the clock" is really the only valid solution to the leap-second problem. The leap-second is such a fine adjustment that it is actually affected by random "noise" introduced into the solar-system from the chaotic gravitational interactions of the planets with each other. It's impossible to reliably calculate which future years will have leap seconds, and in which direction they will occur. Our calendar year is pretty damn close after we have accounted for the standard leap-year algorithm, but that algorithm cannot be modified without breaking a great number of existing date-time systems. The proper answer (currently implemented in systems all over the world) coordinates atomic-clock systems across the world with the measured traversal of the earth (as referenced against the sun and the stars). If the clocks are slightly "ahead" of where they should be, a leap second is scheduled to be inserted, and if they're behind, a second is removed. The flow-of-time is then adjusted for the last minute so that it runs either 101.6959% of the normal rate (59-second minute) or 98.3606% of the normal rate (61-second minute). The effective end result is that time actually flows smoothly but the assumed date of the epoch is adjusted slightly relative to real time based on subtle fluctuations of the earth's rotation and orbit. Cheers, Kyle Moffett ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-04 9:00 ` Kyle Moffett @ 2009-01-04 10:03 ` David Newall 2009-01-04 11:13 ` david ` (2 more replies) 0 siblings, 3 replies; 109+ messages in thread From: David Newall @ 2009-01-04 10:03 UTC (permalink / raw) To: Kyle Moffett Cc: Ben Goodger, Robert Hancock, linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell Kyle Moffett wrote: > Actually, "diddling the clock" is really the only valid solution to > the leap-second problem. The leap-second is such a fine adjustment > that it is actually affected by random "noise" introduced into the > solar-system from the chaotic gravitational interactions of the > planets with each other. It's impossible to reliably calculate which > future years will have leap seconds, and in which direction they will > occur. > You're confusing the system of keeping time with those characteristics of the real-world which it represents. They are, in fact, two different things, hence we regularly adjust the system. Now in the case of UNIX and derivatives, the system records the number of seconds since an arbitrary point-in-time, and presents a "wall time" (i.e. the time displayed by the clock on the wall) using, amongst other things, a set of adjustment rules codified by a zoneinfo file. The number of second between 1 minute to- and midnight-ending 31 December is 61. If Linux does not reflect that it is wrong and must be fixed. If it isn't fixed we will increasingly discover a discrepancy between time-data that originates on Linux versus other, correct systems. I don't understand why such a simple thing was unnecessarily complicated. And causing crashes! Ha ha ha or what? A simple addition to zoneinfo was (and still is) all that is required. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-04 10:03 ` David Newall @ 2009-01-04 11:13 ` david 2009-01-04 23:15 ` David Newall 2009-01-04 11:35 ` Valdis.Kletnieks 2009-01-04 17:20 ` Kyle Moffett 2 siblings, 1 reply; 109+ messages in thread From: david @ 2009-01-04 11:13 UTC (permalink / raw) To: David Newall Cc: Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell On Sun, 4 Jan 2009, David Newall wrote: > Kyle Moffett wrote: >> Actually, "diddling the clock" is really the only valid solution to >> the leap-second problem. The leap-second is such a fine adjustment >> that it is actually affected by random "noise" introduced into the >> solar-system from the chaotic gravitational interactions of the >> planets with each other. It's impossible to reliably calculate which >> future years will have leap seconds, and in which direction they will >> occur. >> > > You're confusing the system of keeping time with those characteristics > of the real-world which it represents. They are, in fact, two different > things, hence we regularly adjust the system. Now in the case of UNIX > and derivatives, the system records the number of seconds since an > arbitrary point-in-time, and presents a "wall time" (i.e. the time > displayed by the clock on the wall) using, amongst other things, a set > of adjustment rules codified by a zoneinfo file. The number of second > between 1 minute to- and midnight-ending 31 December is 61. If Linux > does not reflect that it is wrong and must be fixed. If it isn't fixed > we will increasingly discover a discrepancy between time-data that > originates on Linux versus other, correct systems. > > I don't understand why such a simple thing was unnecessarily > complicated. And causing crashes! Ha ha ha or what? A simple addition > to zoneinfo was (and still is) all that is required. so are you saying that other 'correct' OS's have patches issued every time a leap second is declared so that they have an in-kernel table of them to use to calculate the correct time? what about systems that have hit end of life? what about systems that users don't want to have to reboot to install a new kernel for a 1 second shift (which NTP will take care of as far as they are concerned anyway) when the daylight savings time definitions change all the vendors had to issue patches, I saw those. I didn't see any patches for the leap second, so how do these other systems deal with it? David Lang ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-04 11:13 ` david @ 2009-01-04 23:15 ` David Newall 2009-01-04 23:25 ` Chris Adams 2009-01-05 0:29 ` david 0 siblings, 2 replies; 109+ messages in thread From: David Newall @ 2009-01-04 23:15 UTC (permalink / raw) To: david Cc: Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell david@lang.hm wrote: > so are you saying that other 'correct' OS's have patches issued every > time a leap second is declared so that they have an in-kernel table of > them to use to calculate the correct time? No. Exactly the contrary. I'm saying that through use of zoneinfo, for example, no kernel support is required for leap seconds. And! this provides correct results for seconds-between two dates. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-04 23:15 ` David Newall @ 2009-01-04 23:25 ` Chris Adams 2009-01-05 0:01 ` David Newall 2009-01-05 0:29 ` david 1 sibling, 1 reply; 109+ messages in thread From: Chris Adams @ 2009-01-04 23:25 UTC (permalink / raw) To: David Newall; +Cc: linux-kernel Once upon a time, David Newall <davidn@davidnewall.com> said: > No. Exactly the contrary. I'm saying that through use of zoneinfo, for > example, no kernel support is required for leap seconds. And! this > provides correct results for seconds-between two dates. Again: zoneinfo provides offset from UTC. Leap seconds are changes in UTC itself, not time zones, so zoneinfo can't handle that. Please go read Google, Wikipedia, and NTP lists. -- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-04 23:25 ` Chris Adams @ 2009-01-05 0:01 ` David Newall 2009-01-05 0:41 ` Alan Cox 0 siblings, 1 reply; 109+ messages in thread From: David Newall @ 2009-01-05 0:01 UTC (permalink / raw) To: Chris Adams; +Cc: linux-kernel Chris Adams wrote: > Once upon a time, David Newall <davidn@davidnewall.com> said: > >> No. Exactly the contrary. I'm saying that through use of zoneinfo, for >> example, no kernel support is required for leap seconds. And! this >> provides correct results for seconds-between two dates. >> > > Again: zoneinfo provides offset from UTC. Leap seconds are changes in > UTC itself, not time zones, so zoneinfo can't handle that. > Yes, but zoneinfo ALSO provides support for leap seconds. Do read man zic for specific details. > Please go read Google, Wikipedia, and NTP lists. I think you particularly mean NTP. I think your reasoning is that because NTP's timestamp doesn't include leap seconds, and because we all like to use NTP to synchronise our clocks, Linux has to make up the difference. But there is an alternative; which is for the NTP client to insert those missing leap seconds, which number it can get from zoneinfo. Epoch remains the start of 1978; seconds between any two dates included leap-seconds and no special kernel support is required. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-05 0:01 ` David Newall @ 2009-01-05 0:41 ` Alan Cox 2009-01-05 8:43 ` David Newall 0 siblings, 1 reply; 109+ messages in thread From: Alan Cox @ 2009-01-05 0:41 UTC (permalink / raw) To: David Newall; +Cc: Chris Adams, linux-kernel > zoneinfo. Epoch remains the start of 1978; seconds between any two > dates included leap-seconds and no special kernel support is required. Your time() values then disagree with the rest of the universe. See POSIX 1003.1 Annex B 2.2.2. if you want the whole story, For any given time based on the 1970 Epoch there is a single correct answer for the translation between each value and a UTC time. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-05 0:41 ` Alan Cox @ 2009-01-05 8:43 ` David Newall 2009-01-05 19:47 ` Alan Cox 0 siblings, 1 reply; 109+ messages in thread From: David Newall @ 2009-01-05 8:43 UTC (permalink / raw) To: Alan Cox; +Cc: Chris Adams, linux-kernel Alan Cox wrote: >> zoneinfo. Epoch remains the start of 1978; seconds between any two >> dates included leap-seconds and no special kernel support is required. >> > > Your time() values then disagree with the rest of the universe. See POSIX > 1003.1 Annex B 2.2.2. if you want the whole story, > I can't find this, except possibly (but maybe not) at a cost from ieee, and I'm not inclined to pay. If you could post a sentence from this annex it might help me to find it. > For any given time based on the 1970 Epoch there is a single correct > answer for the translation between each value and a UTC time. This confused me because the sense that I've got from this thread suggests otherwise. Unless I've misunderstood, the time() value for the first second of 2009 is one greater than the value for the second to last second of 2008 (i.e. 23:59:59), which means that there is no translation for the last second. Put another way, my understanding of what's been said is that the epoch is effectively increased by one second for each leap second. Have I got this wrong? ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-05 8:43 ` David Newall @ 2009-01-05 19:47 ` Alan Cox 0 siblings, 0 replies; 109+ messages in thread From: Alan Cox @ 2009-01-05 19:47 UTC (permalink / raw) To: David Newall; +Cc: Chris Adams, linux-kernel > > For any given time based on the 1970 Epoch there is a single correct > > answer for the translation between each value and a UTC time. > > This confused me because the sense that I've got from this thread > suggests otherwise. Unless I've misunderstood, the time() value for the > first second of 2009 is one greater than the value for the second to > last second of 2008 (i.e. 23:59:59), which means that there is no > translation for the last second. Put another way, my understanding of > what's been said is that the epoch is effectively increased by one > second for each leap second. Have I got this wrong? No I should have said from a UTC time to a value, the reverse is slightly ambiguous - as you say leap seconds cannot be distinguished (well unless you are using floating point but thats a whole can of worms) Glibc has /usr/share/zoneinfo/right as well as posix zones which I guess is Ulrich's vote on the subject. In a strictly posix environment then for 1003.1 post 2001 the definition is non-leap seconds since (a notional) 1/1/70 UTC 00:00:00. Including leap seconds in the definition would have caused problems with existing date stamps moving them by about half a minute. The kernel doesn't give a brass monkeys about interpretation on the whole with one main exception - the CMOS RTC time conversion is done without factoring in leap seconds. Alan ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-04 23:15 ` David Newall 2009-01-04 23:25 ` Chris Adams @ 2009-01-05 0:29 ` david 2009-01-04 23:37 ` David Newall 1 sibling, 1 reply; 109+ messages in thread From: david @ 2009-01-05 0:29 UTC (permalink / raw) To: David Newall Cc: Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell On Mon, 5 Jan 2009, David Newall wrote: > david@lang.hm wrote: >> so are you saying that other 'correct' OS's have patches issued every >> time a leap second is declared so that they have an in-kernel table of >> them to use to calculate the correct time? > > No. Exactly the contrary. I'm saying that through use of zoneinfo, for > example, no kernel support is required for leap seconds. And! this > provides correct results for seconds-between two dates. then new zoneinfo files need to be sent out every time there is a leap second (which from other posts on this thread is potentially every month) and if it is something to be fixed in zoneinfo, then complaining to the kernel list and demanding that 'Linux be fixed' is not productive. David Lang ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-05 0:29 ` david @ 2009-01-04 23:37 ` David Newall 2009-01-05 1:05 ` david 2009-01-05 5:48 ` Linas Vepstas 0 siblings, 2 replies; 109+ messages in thread From: David Newall @ 2009-01-04 23:37 UTC (permalink / raw) To: david Cc: Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell david@lang.hm wrote: > then new zoneinfo files need to be sent out every time there is a leap > second (which from other posts on this thread is potentially every month) Not every few months, for goodness sake! Leap seconds aren't that common! These files do change regularly, however, sometimes on a yearly basis, because that's how often the date might be changed when daylight savings transitions. This is to say that leap seconds don't particularly change the frequency of zoneinfo updates. > if it is something to be fixed in zoneinfo, then complaining to the > kernel list and demanding that 'Linux be fixed' is not productive. Updating zoneinfo is trivial. On the other hand if something has been done to Linux to support leap seconds (I gather this is the case), then the point is that it need not have been done, should not have been done, and needs to be replaced by the standard tools that have worked satisfactorily for decades, and continue doing so. So yes, I think it is productive. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-04 23:37 ` David Newall @ 2009-01-05 1:05 ` david 2009-01-05 0:14 ` David Newall 2009-01-05 0:44 ` Alan Cox 2009-01-05 5:48 ` Linas Vepstas 1 sibling, 2 replies; 109+ messages in thread From: david @ 2009-01-05 1:05 UTC (permalink / raw) To: David Newall Cc: Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell On Mon, 5 Jan 2009, David Newall wrote: > david@lang.hm wrote: >> then new zoneinfo files need to be sent out every time there is a leap >> second (which from other posts on this thread is potentially every month) > > Not every few months, for goodness sake! Leap seconds aren't that > common! These files do change regularly, however, sometimes on a yearly > basis, because that's how often the date might be changed when daylight > savings transitions. This is to say that leap seconds don't > particularly change the frequency of zoneinfo updates. another poster said that NTP packets include information about this month's leap second, so that implies that they could change monthly. the zoneinfo files normally do not change every year, they only change when !$#@$# polititions decide to monkey with things and change the rules (for the US this is once in the history of Linus IIRC), the aggregate of some country somewhere changing the rules means that more updates are created, but most people can ignore those updates if they don't directly apply to them. David Lang ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-05 1:05 ` david @ 2009-01-05 0:14 ` David Newall 2009-01-05 0:21 ` Ben Goodger 2009-01-05 0:44 ` Alan Cox 1 sibling, 1 reply; 109+ messages in thread From: David Newall @ 2009-01-05 0:14 UTC (permalink / raw) To: david Cc: Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell david@lang.hm wrote: > On Mon, 5 Jan 2009, David Newall wrote: > >> david@lang.hm wrote: >>> then new zoneinfo files need to be sent out every time there is a leap >>> second (which from other posts on this thread is potentially every >>> month) >> >> Not every few months, for goodness sake! Leap seconds aren't that >> common! These files do change regularly, however, sometimes on a yearly >> basis, because that's how often the date might be changed when daylight >> savings transitions. This is to say that leap seconds don't >> particularly change the frequency of zoneinfo updates. > > another poster said that NTP packets include information about this > month's leap second, so that implies that they could change monthly. Not "could change monthly" rather, "could change at any month". The frequency of zoneinfo updates would therefore be: every time the zones you care about change; and every time there's a leap second. No big effort. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-05 0:14 ` David Newall @ 2009-01-05 0:21 ` Ben Goodger 2009-01-05 6:34 ` David Newall 0 siblings, 1 reply; 109+ messages in thread From: Ben Goodger @ 2009-01-05 0:21 UTC (permalink / raw) To: David Newall Cc: david, Kyle Moffett, Robert Hancock, linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell 2009/1/5 David Newall <davidn@davidnewall.com>: >> another poster said that NTP packets include information about this >> month's leap second, so that implies that they could change monthly. > > Not "could change monthly" rather, "could change at any month". > > The frequency of zoneinfo updates would therefore be: every time the > zones you care about change; and every time there's a leap second. No > big effort. Unfortunately, as has been pointed out, timezones are completely unrelated to leap seconds. NB. Leap seconds, positive or negative, potentially occur every six months (June 30 or Dec 31), but since their introduction this frequency has happened only once (in 1972); historically they have been inserted on average every 1.5 years, but there have been only two since 2000. -- Benjamin Goodger -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/S/M/B d- s++:-- a18 c++$ UL>+++ P--- L++>+++ E- W+++$ N--- K? w--- O? M- V? PS+(++) PE-() Y+ PGP+ t 5? X-- R- !tv() b+++>++++ DI+++ D+ G e>++++ h! !r*(-) y ------END GEEK CODE BLOCK------ ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-05 0:21 ` Ben Goodger @ 2009-01-05 6:34 ` David Newall 2009-01-05 23:03 ` Linas Vepstas 0 siblings, 1 reply; 109+ messages in thread From: David Newall @ 2009-01-05 6:34 UTC (permalink / raw) To: Ben Goodger Cc: david, Kyle Moffett, Robert Hancock, linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell Ben Goodger wrote: > 2009/1/5 David Newall <davidn@davidnewall.com>: > >>> another poster said that NTP packets include information about this >>> month's leap second, so that implies that they could change monthly. >>> >> Not "could change monthly" rather, "could change at any month". >> >> The frequency of zoneinfo updates would therefore be: every time the >> zones you care about change; and every time there's a leap second. No >> big effort. >> > > Unfortunately, as has been pointed out, timezones are completely > unrelated to leap seconds. > Zoneinfo files cater for leap seconds. > NB. Leap seconds, positive or negative, potentially occur every six > months (June 30 or Dec 31), but since their introduction this > frequency has happened only once (in 1972); historically they have > been inserted on average every 1.5 years, but there have been only two > since 2000. If you know in advance, you can update zoneinfo files with multiple leap seconds. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-05 6:34 ` David Newall @ 2009-01-05 23:03 ` Linas Vepstas 0 siblings, 0 replies; 109+ messages in thread From: Linas Vepstas @ 2009-01-05 23:03 UTC (permalink / raw) To: David Newall Cc: Ben Goodger, david, Kyle Moffett, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell 2009/1/5 David Newall <davidn@davidnewall.com>: > Zoneinfo files cater for leap seconds. As has been (repeatedly) pointed out, the leap seconds only apply to UTC, so there is no way, given UTC, to use a zoneinfo file to twiddle UTC. Your argument *might* work if the kernel maintained TEI instead of UTC. Then there could, in principle, be a zoneinfo file that converted from TEI to UTC (by adding 24 seconds to TEI). However, this requires converting the kernel to track TEI instead of UTC, and reviewing all sorts of code in the kernel, glibc, ntp, and myriads of other libraries to figure out what's affected and whats not. As well as figuring out how to twiddle zoneinfo files so that they're backwards/forwards compatible with the timekeeping change, so that users aren't screwed when they put new kernels on old distros, or new zonefiles on old kernels. This is a fairly big chunk of work, requiring coordination between lots of different parties. > If you know in advance, you can update zoneinfo files with multiple leap > seconds. Heh. You miss the point. The whole point of leap seconds is that they're unknowable in advance. You only know if they already happened, or seem likely to happen real soon now. The previously cited wikipedia article reviews this nicely. --linas p.s. Yes, you could say I'm coming around to your point of view. If you had said, from the begining, something like "the kernel should keep TEI instead of UTC, and compute UTC in user-space from the TEI time", then you might have met a lot less resistance. But that's not how your argument came across. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-05 1:05 ` david 2009-01-05 0:14 ` David Newall @ 2009-01-05 0:44 ` Alan Cox 1 sibling, 0 replies; 109+ messages in thread From: Alan Cox @ 2009-01-05 0:44 UTC (permalink / raw) To: david Cc: David Newall, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell > another poster said that NTP packets include information about this > month's leap second, so that implies that they could change monthly. The current rules don't permit this but any rule could be change. However its unlikely. Current the earth rotation people do this twice a year if neccessary. It is also too complex to predict significantly (ie years) in advance when it will be neccessary because the complexity of the tidal forces involved is beyond simulation. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-04 23:37 ` David Newall 2009-01-05 1:05 ` david @ 2009-01-05 5:48 ` Linas Vepstas 2009-01-05 14:33 ` Nick Andrew 1 sibling, 1 reply; 109+ messages in thread From: Linas Vepstas @ 2009-01-05 5:48 UTC (permalink / raw) To: David Newall Cc: david, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell 2009/1/4 David Newall <davidn@davidnewall.com>: > david@lang.hm wrote: >> then new zoneinfo files need to be sent out every time there is a leap >> second (which from other posts on this thread is potentially every month) > > Not every few months, for goodness sake! Leap seconds aren't that > common! But they could be. Appearenly, there was a very long, multi-year lull, where no leap-seconds were required. Its not fully understood why, as they used to be common. Maybe its the melting glaciers :-) There *was* talk of eliminating them forever (so as to avoid this kind of bug, which affects banks, satellites, telecom equipment, etc.) but I guess they didn't do it. --linas ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-05 5:48 ` Linas Vepstas @ 2009-01-05 14:33 ` Nick Andrew 2009-01-05 16:08 ` Linas Vepstas 2009-01-06 1:59 ` David Newall 0 siblings, 2 replies; 109+ messages in thread From: Nick Andrew @ 2009-01-05 14:33 UTC (permalink / raw) To: Linas Vepstas Cc: David Newall, david, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell On Sun, Jan 04, 2009 at 11:48:31PM -0600, Linas Vepstas wrote: > There *was* talk of eliminating them forever (so as to > avoid this kind of bug, which affects banks, satellites, > telecom equipment, etc.) but I guess they didn't do it. I can sympathise with the opinion that linux should be able to accurately distinguish xx:59:60 when a leap second is added (or the missing :59 when one is subtracted) but not at the expense of making a day which is not 86400 seconds long. To fix the problem would require accurately modeling international timekeeping standards such as TAI and use of different syscalls to return time in TAI and UTC-with-leap-seconds represented. It wouldn't be good to change the semantics of time(). * http://en.wikipedia.org/wiki/International_Atomic_Time * http://en.wikipedia.org/wiki/Leap_second Arguably the kernel's responsibility should be to keep track of the most fundamental representation of time possible for a machine (that's probably TAI) and it is a userspace responsibility to map from that value to other time standards including UTC, using control files which are updated as leap seconds are declared. Just so long as the existing behaviour of time() which doesn't recognise leap seconds is preserved. Nick. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-05 14:33 ` Nick Andrew @ 2009-01-05 16:08 ` Linas Vepstas 2009-01-05 17:51 ` david 2009-01-06 2:31 ` Nick Andrew 2009-01-06 1:59 ` David Newall 1 sibling, 2 replies; 109+ messages in thread From: Linas Vepstas @ 2009-01-05 16:08 UTC (permalink / raw) To: Nick Andrew Cc: David Newall, david, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell 2009/1/5 Nick Andrew <nick@nick-andrew.net>: > On Sun, Jan 04, 2009 at 11:48:31PM -0600, Linas Vepstas wrote: >> There *was* talk of eliminating them forever (so as to >> avoid this kind of bug, which affects banks, satellites, >> telecom equipment, etc.) but I guess they didn't do it. > > I can sympathise with the opinion that linux should be able to accurately > distinguish xx:59:60 when a leap second is added (or the missing :59 when > one is subtracted) but not at the expense of making a day which is not > 86400 seconds long. Careful: This seems to be *exactly* the intent of the maintainers of the UTC definition: some days really will have 86401 seconds in them. That's why there's all this talk of 'solar time' (see e.g. the wikipedia article) > To fix the problem would require accurately modeling international > timekeeping standards such as TAI and use of different syscalls to > return time in TAI and UTC-with-leap-seconds represented. It > wouldn't be good to change the semantics of time(). Now, this is the first proposal that I've heard that makes sense. I believe that the Linux kernel/userspace infrastructure already " accurately models international timekeeping standards", so we're good. Changing the kernel to track TAI instead of UTC seems like an excellent idea -- but not one without a significant amount of work -- maybe new syscalls are needed, as well as new monkeying-about in glibc, maybe in ntpd, etc. > * http://en.wikipedia.org/wiki/International_Atomic_Time > * http://en.wikipedia.org/wiki/Leap_second > > Arguably the kernel's responsibility should be to keep track of the > most fundamental representation of time possible for a machine (that's > probably TAI) and it is a userspace responsibility to map from that > value to other time standards including UTC, Yes, this really does seem like the right solution. > using control files > which are updated as leap seconds are declared. Lets be clear on what "control files" means. This does *NOT* mean some config file shipped by some distro for some package. That would be a horrid solution. People don't install updates, patches, etc. Distros ship them late, or never, if the distro is old enough. A more appropriate solution would be to have either the kernel or ntpd track the leap seconds automatically. First, the ntp protocol already provides the needed notification of a leap second to anyone who cares about it (i.e. there is no point in getting a Linux distro involved in this -- a distribution mechanism already exists, and works *better* than having a distro do it). If the kernel needs to track leap seconds, it could do so using a mechanism similar to the "random pool" that is saved across reboots. Alternately, ntpd already stores slew rates &etc. in files, and could track leap seconds likewise. > Just so long as the > existing behaviour of time() which doesn't recognise leap seconds > is preserved. Well, 'man 2 time' is as clear as mud. It talks about leap seconds, but I can't figure out what its saying. I rather doubt that time() is doing what POSIX.1 seems to want it to do (which is to ignore leap seconds?) The reason I'm guessing that time() is wrong, is because it seems that POSIX wants time() to use TAI time, and we don't have that handy anywhere (because we've lost track of those leap seconds) --linas ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-05 16:08 ` Linas Vepstas @ 2009-01-05 17:51 ` david 2009-01-05 17:42 ` Linas Vepstas 2009-01-06 2:31 ` Nick Andrew 1 sibling, 1 reply; 109+ messages in thread From: david @ 2009-01-05 17:51 UTC (permalink / raw) To: Linas Vepstas Cc: Nick Andrew, David Newall, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell On Mon, 5 Jan 2009, Linas Vepstas wrote: >> Arguably the kernel's responsibility should be to keep track of the >> most fundamental representation of time possible for a machine (that's >> probably TAI) and it is a userspace responsibility to map from that >> value to other time standards including UTC, > > Yes, this really does seem like the right solution. > >> using control files >> which are updated as leap seconds are declared. > > Lets be clear on what "control files" means. This does > *NOT* mean some config file shipped by some distro > for some package. That would be a horrid solution. > People don't install updates, patches, etc. Distros > ship them late, or never, if the distro is old enough. > > A more appropriate solution would be to have > either the kernel or ntpd track the leap seconds > automatically. First, the ntp protocol already provides > the needed notification of a leap second to anyone > who cares about it (i.e. there is no point in getting a > Linux distro involved in this -- a distribution mechanism > already exists, and works *better* than having a distro > do it). I disagree with this. NTP will only know about leap seconds if it was running and connected to a server that advertised the leap seconds during that month. for example, if you installed a new server today, how would it ever know that there was a leap second a couple of days ago? David Lang > If the kernel needs to track leap seconds, it could do > so using a mechanism similar to the "random pool" > that is saved across reboots. Alternately, ntpd already > stores slew rates &etc. in files, and could track leap > seconds likewise. > >> Just so long as the >> existing behaviour of time() which doesn't recognise leap seconds >> is preserved. > > Well, 'man 2 time' is as clear as mud. It talks about leap seconds, > but I can't figure out what its saying. I rather > doubt that time() is doing what POSIX.1 seems to want > it to do (which is to ignore leap seconds?) > > The reason I'm guessing that time() is wrong, is because > it seems that POSIX wants time() to use TAI time, and > we don't have that handy anywhere (because we've lost > track of those leap seconds) > > --linas > ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-05 17:51 ` david @ 2009-01-05 17:42 ` Linas Vepstas 2009-01-06 2:27 ` john stultz-lkml ` (3 more replies) 0 siblings, 4 replies; 109+ messages in thread From: Linas Vepstas @ 2009-01-05 17:42 UTC (permalink / raw) To: david Cc: Nick Andrew, David Newall, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell, mills, Brian Haberman, Karen O'Donoghue, ntpwg 2009/1/5 <david@lang.hm>: > On Mon, 5 Jan 2009, Linas Vepstas wrote: > >>> Arguably the kernel's responsibility should be to keep track of the >>> most fundamental representation of time possible for a machine (that's >>> probably TAI) and it is a userspace responsibility to map from that >>> value to other time standards including UTC, >> >> Yes, this really does seem like the right solution. >> >>> using control files >>> which are updated as leap seconds are declared. >> >> Lets be clear on what "control files" means. This does >> *NOT* mean some config file shipped by some distro >> for some package. That would be a horrid solution. >> People don't install updates, patches, etc. Distros >> ship them late, or never, if the distro is old enough. >> >> A more appropriate solution would be to have >> either the kernel or ntpd track the leap seconds >> automatically. First, the ntp protocol already provides >> the needed notification of a leap second to anyone >> who cares about it (i.e. there is no point in getting a >> Linux distro involved in this -- a distribution mechanism >> already exists, and works *better* than having a distro >> do it). > > I disagree with this. NTP will only know about leap seconds if it was > running and connected to a server that advertised the leap seconds during > that month. > > for example, if you installed a new server today, how would it ever know > that there was a leap second a couple of days ago? OK, good point. Unless your distro was less than a few days old (unlikely), you are faced with the same problem. Sure, eventually, the distro will publish an update (which will add to the existing list of 36 leap seconds -- which is needed in any case, since no one has a server that's been up since 1958), but this is unlikely to happen during this install window. The long term solution would be write an RFC to extend NTP to also provide TAI information -- e.g. to add a message that indicates the current leap-second offset between UTC and TAI. --linas ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-05 17:42 ` Linas Vepstas @ 2009-01-06 2:27 ` john stultz-lkml 2009-01-06 4:53 ` Linas Vepstas 2009-01-06 19:50 ` M. Warner Losh ` (2 subsequent siblings) 3 siblings, 1 reply; 109+ messages in thread From: john stultz-lkml @ 2009-01-06 2:27 UTC (permalink / raw) To: linasvepstas Cc: david, Nick Andrew, David Newall, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell, mills, Brian Haberman, Karen O'Donoghue, ntpwg, zippel On Mon, Jan 5, 2009 at 9:42 AM, Linas Vepstas <linasvepstas@gmail.com> wrote: [snip] > The long term solution would be write an RFC to extend > NTP to also provide TAI information -- e.g. to add a > message that indicates the current leap-second offset > between UTC and TAI. I believe Roman has already added this ability: http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=153b5d054ac2d98ea0d86504884326b6777f683d;hp=9f14f669d18477fe3df071e2fa4da36c00acee8e thanks -john ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-06 2:27 ` john stultz-lkml @ 2009-01-06 4:53 ` Linas Vepstas 2009-01-06 5:00 ` Linas Vepstas 2009-01-06 19:40 ` [ntpwg] " M. Warner Losh 0 siblings, 2 replies; 109+ messages in thread From: Linas Vepstas @ 2009-01-06 4:53 UTC (permalink / raw) To: john stultz-lkml Cc: david, Nick Andrew, David Newall, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell, mills, Brian Haberman, Karen O'Donoghue, ntpwg, zippel 2009/1/5 john stultz-lkml <johnstul.lkml@gmail.com>: > On Mon, Jan 5, 2009 at 9:42 AM, Linas Vepstas <linasvepstas@gmail.com> wrote: > [snip] >> The long term solution would be write an RFC to extend >> NTP to also provide TAI information -- e.g. to add a >> message that indicates the current leap-second offset >> between UTC and TAI. > > I believe Roman has already added this ability: > http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=153b5d054ac2d98ea0d86504884326b6777f683d;hp=9f14f669d18477fe3df071e2fa4da36c00acee8e Well, you're answering a different statment than what I was talking about -- I wanted to make sure that TAI information was available via NTP -- this has nothing to do with the kernel, and would be something available to all operating systems. Anyway -- I'm looking at the patch you reference, and maybe I'm being dumb -- but -- I think I see a bug. case TIME_DEL decrements TAI, but TIME_INS does not increment it. Instead, there's a lonely increment in TIME_OOP which seems wrong. ?? --linas ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-06 4:53 ` Linas Vepstas @ 2009-01-06 5:00 ` Linas Vepstas 2009-01-06 19:40 ` [ntpwg] " M. Warner Losh 1 sibling, 0 replies; 109+ messages in thread From: Linas Vepstas @ 2009-01-06 5:00 UTC (permalink / raw) To: john stultz-lkml Cc: david, Nick Andrew, David Newall, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell, mills, Brian Haberman, Karen O'Donoghue, ntpwg, zippel Oops. 2009/1/5 Linas Vepstas <linasvepstas@gmail.com>: > 2009/1/5 john stultz-lkml <johnstul.lkml@gmail.com>: >> On Mon, Jan 5, 2009 at 9:42 AM, Linas Vepstas <linasvepstas@gmail.com> wrote: >> [snip] >>> The long term solution would be write an RFC to extend >>> NTP to also provide TAI information -- e.g. to add a >>> message that indicates the current leap-second offset >>> between UTC and TAI. >> >> I believe Roman has already added this ability: >> http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=153b5d054ac2d98ea0d86504884326b6777f683d;hp=9f14f669d18477fe3df071e2fa4da36c00acee8e > > Well, you're answering a different statment than what > I was talking about -- I wanted to make sure that TAI > information was available via NTP -- this has nothing > to do with the kernel, and would be something available > to all operating systems. > > Anyway -- I'm looking at the patch you reference, and > maybe I'm being dumb -- but -- I think I see a bug. > > case TIME_DEL decrements TAI, but TIME_INS does > not increment it. Instead, there's a lonely increment in > TIME_OOP which seems wrong. ?? Never mind. Sorry, I'm wrong, the code looks right. Time to stop reading email, and go to bed. :-) --linas ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-06 4:53 ` Linas Vepstas 2009-01-06 5:00 ` Linas Vepstas @ 2009-01-06 19:40 ` M. Warner Losh 1 sibling, 0 replies; 109+ messages in thread From: M. Warner Losh @ 2009-01-06 19:40 UTC (permalink / raw) To: linasvepstas Cc: johnstul.lkml, david, goodgerster, kyle, slashdot, zippel, davidn, linux-kernel, hancockr, ntpwg, pretzalz, burdell, nick, jeff In message: <3ae3aa420901052053m5a410671u13ecccfb7e29260c@mail.gmail.com> "Linas Vepstas" <linasvepstas@gmail.com> writes: : 2009/1/5 john stultz-lkml <johnstul.lkml@gmail.com>: : > On Mon, Jan 5, 2009 at 9:42 AM, Linas Vepstas <linasvepstas@gmail.com> wrote: : > [snip] : >> The long term solution would be write an RFC to extend : >> NTP to also provide TAI information -- e.g. to add a : >> message that indicates the current leap-second offset : >> between UTC and TAI. : > : > I believe Roman has already added this ability: : > http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=153b5d054ac2d98ea0d86504884326b6777f683d;hp=9f14f669d18477fe3df071e2fa4da36c00acee8e : : Well, you're answering a different statment than what : I was talking about -- I wanted to make sure that TAI : information was available via NTP -- this has nothing : to do with the kernel, and would be something available : to all operating systems. : : Anyway -- I'm looking at the patch you reference, and : maybe I'm being dumb -- but -- I think I see a bug. : : case TIME_DEL decrements TAI, but TIME_INS does : not increment it. Instead, there's a lonely increment in : TIME_OOP which seems wrong. ?? No. That's right. The increment doesn't happen until the leap second has happened. The TIME_OOP exists to increment the TAI offset at the right time. The decrement would happen right away, since the second is deleted at the end of :58. I had to draw lots of pictures when I was working this code out in FreeBSD. Warner ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-05 17:42 ` Linas Vepstas 2009-01-06 2:27 ` john stultz-lkml @ 2009-01-06 19:50 ` M. Warner Losh 2009-01-07 3:50 ` Danny Mayer 2009-01-12 16:11 ` Pavel Machek 3 siblings, 0 replies; 109+ messages in thread From: M. Warner Losh @ 2009-01-06 19:50 UTC (permalink / raw) To: linasvepstas Cc: david, hancockr, goodgerster, kyle, slashdot, davidn, linux-kernel, ntpwg, pretzalz, burdell, nick, jeff In message: <3ae3aa420901050942y56f0ecdei39c091a73e49c1fd@mail.gmail.com> "Linas Vepstas" <linasvepstas@gmail.com> writes: : 2009/1/5 <david@lang.hm>: : > On Mon, 5 Jan 2009, Linas Vepstas wrote: : > : >>> Arguably the kernel's responsibility should be to keep track of the : >>> most fundamental representation of time possible for a machine (that's : >>> probably TAI) and it is a userspace responsibility to map from that : >>> value to other time standards including UTC, : >> : >> Yes, this really does seem like the right solution. : >> : >>> using control files : >>> which are updated as leap seconds are declared. : >> : >> Lets be clear on what "control files" means. This does : >> *NOT* mean some config file shipped by some distro : >> for some package. That would be a horrid solution. : >> People don't install updates, patches, etc. Distros : >> ship them late, or never, if the distro is old enough. : >> : >> A more appropriate solution would be to have : >> either the kernel or ntpd track the leap seconds : >> automatically. First, the ntp protocol already provides : >> the needed notification of a leap second to anyone : >> who cares about it (i.e. there is no point in getting a : >> Linux distro involved in this -- a distribution mechanism : >> already exists, and works *better* than having a distro : >> do it). : > : > I disagree with this. NTP will only know about leap seconds if it was : > running and connected to a server that advertised the leap seconds during : > that month. : > : > for example, if you installed a new server today, how would it ever know : > that there was a leap second a couple of days ago? : : OK, good point. Unless your distro was less : than a few days old (unlikely), you are faced with the : same problem. Sure, eventually, the distro will publish : an update (which will add to the existing list of 36 leap List of 24 leap seconds. Although the delta is 34 right now, the first 10 leap seconds were done as tiny steps (~50-100ms) plus frequency offsets. Well, the first 'leap' was 1.4228180s on Jan 1, 1961. Everybody assumes that those seconds don't exist to simplify things (or that there was simply a 10s step between TAI and UTC on 1-Jan-1972). The leapsecond file from NIST doesn't even have them. : seconds -- which is needed in any case, since no one : has a server that's been up since 1958), but this is : unlikely to happen during this install window. : : The long term solution would be write an RFC to extend : NTP to also provide TAI information -- e.g. to add a : message that indicates the current leap-second offset : between UTC and TAI. I'd love that. There's likely going to be some resistance to that because the leapfile is available via the crypto-authenticated means. However, there's no real-time information available... Also, there are many reference clocks that would need this information plugged into it somehow (IRIG doesn't report leap seconds in any meaningful[*] way, let alone UTC-TAI offset). [*] Some IRIG extensions do support reporting leap seconds at the end of the hour, but that's too late... Warner ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-05 17:42 ` Linas Vepstas 2009-01-06 2:27 ` john stultz-lkml 2009-01-06 19:50 ` M. Warner Losh @ 2009-01-07 3:50 ` Danny Mayer 2009-01-07 4:52 ` Linas Vepstas 2009-01-07 17:39 ` M. Warner Losh 2009-01-12 16:11 ` Pavel Machek 3 siblings, 2 replies; 109+ messages in thread From: Danny Mayer @ 2009-01-07 3:50 UTC (permalink / raw) To: linasvepstas Cc: david, Robert Hancock, Ben Goodger, Kyle Moffett, MentalMooMan, David Newall, linux-kernel, ntpwg, Travis Crump, burdell, Nick Andrew, Jeffrey J. Kosowsky Linas Vepstas wrote: > 2009/1/5 <david@lang.hm>: >> On Mon, 5 Jan 2009, Linas Vepstas wrote: >> >>>> Arguably the kernel's responsibility should be to keep track of the >>>> most fundamental representation of time possible for a machine (that's >>>> probably TAI) and it is a userspace responsibility to map from that >>>> value to other time standards including UTC, >>> Yes, this really does seem like the right solution. >>> >>>> using control files >>>> which are updated as leap seconds are declared. >>> Lets be clear on what "control files" means. This does >>> *NOT* mean some config file shipped by some distro >>> for some package. That would be a horrid solution. >>> People don't install updates, patches, etc. Distros >>> ship them late, or never, if the distro is old enough. >>> >>> A more appropriate solution would be to have >>> either the kernel or ntpd track the leap seconds >>> automatically. First, the ntp protocol already provides >>> the needed notification of a leap second to anyone >>> who cares about it (i.e. there is no point in getting a >>> Linux distro involved in this -- a distribution mechanism >>> already exists, and works *better* than having a distro >>> do it). >> I disagree with this. NTP will only know about leap seconds if it was >> running and connected to a server that advertised the leap seconds during >> that month. >> >> for example, if you installed a new server today, how would it ever know >> that there was a leap second a couple of days ago? Because it gets it's time from an upstream server that already has incorporated the leap second so it doesn't really need to know that the leap second happened a few days ago or even a few years ago. > OK, good point. Unless your distro was less > than a few days old (unlikely), you are faced with the > same problem. Sure, eventually, the distro will publish > an update (which will add to the existing list of 36 leap > seconds -- which is needed in any case, since no one > has a server that's been up since 1958), but this is > unlikely to happen during this install window. > This is nonsense. That's not how NTP works. > The long term solution would be write an RFC to extend > NTP to also provide TAI information -- e.g. to add a > message that indicates the current leap-second offset > between UTC and TAI. > > --linas I don't know what this discussion is really about and why this was sent to the working group in the middle of the discussion, but there is no need for NTP to provide TAI information since NTP only uses UTC. Leap Seconds are automatically signaled and incorporated when they become due. If you don't have NTP running for some reason when a leap second is signaled it doesn't matter since your server source will already have incorporated the leap second so the NTP packet includes the timestamps that include the leap second adjustment. Operating Systems use UTC and not TAI by universal agreement and the ones that don't are extremely rare. Why don't you tell us what the real problem is instead of telling us that you need TAI offset information? Danny ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 3:50 ` Danny Mayer @ 2009-01-07 4:52 ` Linas Vepstas 2009-01-07 10:03 ` David Newall ` (2 more replies) 2009-01-07 17:39 ` M. Warner Losh 1 sibling, 3 replies; 109+ messages in thread From: Linas Vepstas @ 2009-01-07 4:52 UTC (permalink / raw) To: mayer Cc: david, Robert Hancock, Ben Goodger, Kyle Moffett, MentalMooMan, David Newall, linux-kernel, ntpwg, Travis Crump, burdell, Nick Andrew, Jeffrey J. Kosowsky 2009/1/6 Danny Mayer <mayer@ntp.isc.org>: Hi, > I don't know what this discussion is really about and why this was sent > to the working group in the middle of the discussion, but there is no > need for NTP to provide TAI information since NTP only uses UTC. Leap > Seconds are automatically signaled and incorporated when they become > due. If you don't have NTP running for some reason when a leap second is > signaled it doesn't matter since your server source will already have > incorporated the leap second so the NTP packet includes the timestamps > that include the leap second adjustment. > > Operating Systems use UTC and not TAI by universal agreement and the > ones that don't are extremely rare. > > Why don't you tell us what the real problem is instead of telling us > that you need TAI offset information? Currently, the Linux kernel keeps time in UTC. This means that it must take special actions to tick twice when a leap second comes by. Due to a (stupid) bug, some fraction of linux systems crashed; this includes everything from laptops to servers, to DVR's, to cell phones and cell phone towers. There's now a fix for this. However, during the discussion, the idea came out that maybe keeping UTC time in the kernel is just plain stupid. So there's this idea floating around that maybe the kernel should keep TAI time instead. The hope is that this will reduce the complexity in the kernel, and push it out to user space, "where it belongs" (to repeat a well-worn mantra). However, *if* we were to kick UTC out of the kernel, and push it to user-land, then, of course, there's a different problem: how does the kernel know what the correct TAI time is? As your reply makes abundantly clear, NTP is not a good source for TAI information. The comments which you labelled as "non-sense" were a mis-understanding of a discussion of a particular issue that would arise if the kernel were to keep TAI -- if it did, then user-space systems would need to have a reliable source for leap-seconds. Since NTP does not provide this, there was discussion about how that could be worked-around. This then lead to the comment that, "gee, wouldn't the right long-term solution be that NTP provide TAI info?" Clearly, it would be a lot of work to get the kernel to keep TAI instead of UTC, so this is not, at this time, a "serious proposal". But if it were possible, and all the various little issues that result were solvable, then it does seem like a better long-term solution. --linas p.s. the opinions above are not my own; I'm just summarizing the points made by the most vocal posters to this list. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 4:52 ` Linas Vepstas @ 2009-01-07 10:03 ` David Newall 2009-01-07 17:24 ` M. Warner Losh 2009-01-07 14:34 ` Danny Mayer 2009-01-07 17:36 ` M. Warner Losh 2 siblings, 1 reply; 109+ messages in thread From: David Newall @ 2009-01-07 10:03 UTC (permalink / raw) To: linasvepstas Cc: mayer, david, Robert Hancock, Ben Goodger, Kyle Moffett, MentalMooMan, linux-kernel, ntpwg, Travis Crump, burdell, Nick Andrew, Jeffrey J. Kosowsky Linas Vepstas wrote: > Currently, the Linux kernel keeps time in UTC. This means > that it must take special actions to tick twice when a leap > second comes by. Except it doesn't have to tick twice. Refer to http://lkml.org/lkml/2009/1/7/78 in which I show that a time_t (what time() returns) counts leap seconds (According to Bernstein this is what UTC means), and using zoneinfo, the library processes leap seconds correctly. I just realised that the Notes in man 2 time are confusing and probably unnecessary. Suffice to say that (assuming correctly configured zoneinfo) time() returns the number of seconds elapsed since start 1970. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 10:03 ` David Newall @ 2009-01-07 17:24 ` M. Warner Losh 2009-01-08 16:51 ` Magnus Danielson 0 siblings, 1 reply; 109+ messages in thread From: M. Warner Losh @ 2009-01-07 17:24 UTC (permalink / raw) To: davidn Cc: linasvepstas, david, hancockr, kyle, slashdot, goodgerster, mayer, linux-kernel, ntpwg, pretzalz, burdell, nick, jeff In message: <49647E0F.9030008@davidnewall.com> David Newall <davidn@davidnewall.com> writes: : Linas Vepstas wrote: : > Currently, the Linux kernel keeps time in UTC. This means : > that it must take special actions to tick twice when a leap : > second comes by. : : Except it doesn't have to tick twice. Refer to : http://lkml.org/lkml/2009/1/7/78 in which I show that a time_t (what : time() returns) counts leap seconds (According to Bernstein this is what : UTC means), and using zoneinfo, the library processes leap seconds : correctly. This is *NOT* POSIX time_t. In order to be posix compliant, you can't do what Bernstein suggests. You can be non-complaint and deal it with zoneinfo. : I just realised that the Notes in man 2 time are confusing and probably : unnecessary. Suffice to say that (assuming correctly configured : zoneinfo) time() returns the number of seconds elapsed since start 1970. That's not POSIX complaint. Warner ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 17:24 ` M. Warner Losh @ 2009-01-08 16:51 ` Magnus Danielson 0 siblings, 0 replies; 109+ messages in thread From: Magnus Danielson @ 2009-01-08 16:51 UTC (permalink / raw) To: M. Warner Losh Cc: davidn, david, hancockr, kyle, slashdot, goodgerster, mayer, linux-kernel, ntpwg, pretzalz, burdell, linasvepstas, nick, jeff, magnus M. Warner Losh skrev: > In message: <49647E0F.9030008@davidnewall.com> > David Newall <davidn@davidnewall.com> writes: > : Linas Vepstas wrote: > : > Currently, the Linux kernel keeps time in UTC. This means > : > that it must take special actions to tick twice when a leap > : > second comes by. > : > : Except it doesn't have to tick twice. Refer to > : http://lkml.org/lkml/2009/1/7/78 in which I show that a time_t (what > : time() returns) counts leap seconds (According to Bernstein this is what > : UTC means), and using zoneinfo, the library processes leap seconds > : correctly. > > This is *NOT* POSIX time_t. In order to be posix compliant, you can't > do what Bernstein suggests. You can be non-complaint and deal it with > zoneinfo. You are free to keep your core time in whatever form you wish, but if you want your time_t to be POSIX compatible when accessed over POSIX interfaces you would need to honour the POSIX time_t mapping. While POSIX tried to avoid the leapsecond issue, the mapping they do perform has a peculiar effect on what happends on time_t if you also want to honour the UTC to time_t mapping while accepting UTC from external sources. > : I just realised that the Notes in man 2 time are confusing and probably > : unnecessary. Suffice to say that (assuming correctly configured > : zoneinfo) time() returns the number of seconds elapsed since start 1970. > > That's not POSIX complaint. It just *appears* to be the number of "seconds" since 1970. This appearence is important to some and causing a greif to others. Cheers, Magnus ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 4:52 ` Linas Vepstas 2009-01-07 10:03 ` David Newall @ 2009-01-07 14:34 ` Danny Mayer 2009-01-07 15:42 ` Linas Vepstas 2009-01-07 16:04 ` john stultz 2009-01-07 17:36 ` M. Warner Losh 2 siblings, 2 replies; 109+ messages in thread From: Danny Mayer @ 2009-01-07 14:34 UTC (permalink / raw) To: linasvepstas Cc: david, Robert Hancock, Ben Goodger, Kyle Moffett, MentalMooMan, David Newall, linux-kernel, ntpwg, Travis Crump, burdell, Nick Andrew, Jeffrey J. Kosowsky Linas Vepstas wrote: > 2009/1/6 Danny Mayer <mayer@ntp.isc.org>: > Hi, > >> I don't know what this discussion is really about and why this was sent >> to the working group in the middle of the discussion, but there is no >> need for NTP to provide TAI information since NTP only uses UTC. Leap >> Seconds are automatically signaled and incorporated when they become >> due. If you don't have NTP running for some reason when a leap second is >> signaled it doesn't matter since your server source will already have >> incorporated the leap second so the NTP packet includes the timestamps >> that include the leap second adjustment. >> >> Operating Systems use UTC and not TAI by universal agreement and the >> ones that don't are extremely rare. >> >> Why don't you tell us what the real problem is instead of telling us >> that you need TAI offset information? > > Currently, the Linux kernel keeps time in UTC. This means > that it must take special actions to tick twice when a leap > second comes by. Due to a (stupid) bug, some fraction > of linux systems crashed; this includes everything from > laptops to servers, to DVR's, to cell phones and cell > phone towers. There's now a fix for this. > > However, during the discussion, the idea came out that > maybe keeping UTC time in the kernel is just plain stupid. > So there's this idea floating around that maybe the kernel > should keep TAI time instead. The hope is that this will > reduce the complexity in the kernel, and push it out to > user space, "where it belongs" (to repeat a well-worn > mantra). > > However, *if* we were to kick UTC out of the kernel, > and push it to user-land, then, of course, there's a > different problem: how does the kernel know what the > correct TAI time is? As your reply makes abundantly > clear, NTP is not a good source for TAI information. > > The comments which you labelled as "non-sense" were > a mis-understanding of a discussion of a particular issue > that would arise if the kernel were to keep TAI -- if it did, > then user-space systems would need to have a reliable > source for leap-seconds. Since NTP does not > provide this, there was discussion about how that > could be worked-around. This then lead to the comment > that, "gee, wouldn't the right long-term solution be that > NTP provide TAI info?" It was nonsense because the summary didn't contain all of the information required to provide context and you copied the Working Group in the middle of all this. NTP can provide leap-second information via an autokey protocol request, see Section 10.6 Leapseconds Values Message (LEAP) http://www.ietf.org/internet-drafts/draft-ietf-ntp-autokey-04.txt but that means you need to have autokey set up with another NTP server and that means adding infrastructure that you probably don't want and are not prepared to handle. > > Clearly, it would be a lot of work to get the kernel to keep > TAI instead of UTC, so this is not, at this time, a "serious > proposal". But if it were possible, and all the various > little issues that result were solvable, then it does seem > like a better long-term solution. > This is a *lot* more complicated than you might think. If you are thinking of implementing this similarly to the way timezone information is added for display purposes, you need the whole list of leap seconds and when the change happened since you now have to look at a timestamp and see when it was and then apply all of the leapseconds up to that point in time and none of the leapseconds beyond that. In addition, you have legacy files that have UTC timestamps on them so you would need to distinguish between UTC (legacy) and TAI timestamps in the file system among other places (anywhere where a timestamp exists) and what would you do about database tables which contain timestamps? The list goes on. I'd much rather you spend the time tackling the clock interrupt losses that many of our Linux users complain about. See: https://support.ntp.org/bin/view/Support/KnownOsIssues#Section_9.2.4. for some of the gorier details. I'm sure you don't really want us recommending that they set HZ=100 in the kernel to alleviate the problem. Danny > --linas > > p.s. the opinions above are not my own; I'm just > summarizing the points made by the most vocal > posters to this list. > > ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 14:34 ` Danny Mayer @ 2009-01-07 15:42 ` Linas Vepstas 2009-01-07 19:23 ` Danny Mayer 2009-01-07 16:04 ` john stultz 1 sibling, 1 reply; 109+ messages in thread From: Linas Vepstas @ 2009-01-07 15:42 UTC (permalink / raw) To: mayer Cc: david, Robert Hancock, Ben Goodger, Kyle Moffett, MentalMooMan, David Newall, linux-kernel, ntpwg, Travis Crump, burdell, Nick Andrew, Jeffrey J. Kosowsky Thanks for the reply. 2009/1/7 Danny Mayer <mayer@ntp.isc.org>: > Linas Vepstas wrote: >> 2009/1/6 Danny Mayer <mayer@ntp.isc.org>: >>> Why don't you tell us what the real problem is instead of telling us >>> that you need TAI offset information? >> >> Currently, the Linux kernel keeps time in UTC. This means >> that it must take special actions to tick twice when a leap >> second comes by. Due to a (stupid) bug, some fraction >> of linux systems crashed; this includes everything from >> laptops to servers, to DVR's, to cell phones and cell >> phone towers. There's now a fix for this. >> >> However, during the discussion, the idea came out that >> maybe keeping UTC time in the kernel is just plain stupid. >> So there's this idea floating around that maybe the kernel >> should keep TAI time instead. The hope is that this will >> reduce the complexity in the kernel, and push it out to >> user space, "where it belongs" (to repeat a well-worn >> mantra). >> >> However, *if* we were to kick UTC out of the kernel, >> and push it to user-land, then, of course, there's a >> different problem: how does the kernel know what the >> correct TAI time is? As your reply makes abundantly >> clear, NTP is not a good source for TAI information. [...] >> a discussion of a particular issue >> that would arise if the kernel were to keep TAI -- if it did, >> then user-space systems would need to have a reliable >> source for leap-seconds. Since NTP does not >> provide this, there was discussion about how that >> could be worked-around. This then lead to the comment >> that, "gee, wouldn't the right long-term solution be that >> NTP provide TAI info?" > > NTP can provide leap-second information via an autokey protocol request, > see Section 10.6 Leapseconds Values Message (LEAP) > http://www.ietf.org/internet-drafts/draft-ietf-ntp-autokey-04.txt but Yes, that look like exactly what would be wanted. It would be nice if such a message was available in the regular, non-encrypted protocol. > that means you need to have autokey set up with another NTP server and > that means adding infrastructure that you probably don't want and are > not prepared to handle. Heh. Yes, well, I still haven't figured out how to secure DNS. Yet clearly this whole security mess must march on, and somehow the security infrastructure must eventually become easy to install. >> Clearly, it would be a lot of work to get the kernel to keep >> TAI instead of UTC, so this is not, at this time, a "serious >> proposal". But if it were possible, and all the various >> little issues that result were solvable, then it does seem >> like a better long-term solution. >> > > This is a *lot* more complicated than you might think. If you are > thinking of implementing this similarly to the way timezone information > is added for display purposes, you need the whole list of leap seconds > and when the change happened since you now have to look at a timestamp > and see when it was and then apply all of the leapseconds up to that > point in time and none of the leapseconds beyond that. In addition, you > have legacy files that have UTC timestamps on them so you would need to > distinguish between UTC (legacy) and TAI timestamps in the file system > among other places (anywhere where a timestamp exists) and what would > you do about database tables which contain timestamps? The list goes on. Yes. > I'd much rather you spend the time tackling the clock interrupt losses > that many of our Linux users complain about. See: > https://support.ntp.org/bin/view/Support/KnownOsIssues#Section_9.2.4. > for some of the gorier details. I'm sure you don't really want us > recommending that they set HZ=100 in the kernel to alleviate the problem. Actually, this is rather sorely lacking in 'gory details', rather, its a complaint that 'things don't work' with no discussion of the actual problem. It would be much better if there was a link to any previous discussions on LKML on this issue. My knee-jerk reaction on reading about the lost-interrupts issue is that, yes, setting HZ=100 and disabling ACPI is indeed a decent short-term work-around (APIC is something completely different and not something you can disable). The correct long-term solution would be to use real-time kernels, which are designed to make sure that things like lost interrupts never happen. I have no idea what the status of real-time Linux is, whether it would now have gaurantees for timer ticks, and whether anything there would now be mergeable into the mainline kernel. --linas ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 15:42 ` Linas Vepstas @ 2009-01-07 19:23 ` Danny Mayer 0 siblings, 0 replies; 109+ messages in thread From: Danny Mayer @ 2009-01-07 19:23 UTC (permalink / raw) To: linasvepstas Cc: david, Robert Hancock, Ben Goodger, Kyle Moffett, MentalMooMan, David Newall, linux-kernel, ntpwg, Travis Crump, burdell, Nick Andrew, Jeffrey J. Kosowsky Linas Vepstas wrote: > [...] > >>> a discussion of a particular issue >>> that would arise if the kernel were to keep TAI -- if it did, >>> then user-space systems would need to have a reliable >>> source for leap-seconds. Since NTP does not >>> provide this, there was discussion about how that >>> could be worked-around. This then lead to the comment >>> that, "gee, wouldn't the right long-term solution be that >>> NTP provide TAI info?" >> NTP can provide leap-second information via an autokey protocol request, >> see Section 10.6 Leapseconds Values Message (LEAP) >> http://www.ietf.org/internet-drafts/draft-ietf-ntp-autokey-04.txt but > > Yes, that look like exactly what would be wanted. It would be nice > if such a message was available in the regular, non-encrypted protocol. It's not encrypted, it's an authentication protocol. You really do need to know that you are receiving a reliable set of information otherwise anyone can spoof you with bad data and play havoc with your clock and timestamps. >> that means you need to have autokey set up with another NTP server and >> that means adding infrastructure that you probably don't want and are >> not prepared to handle. > > Heh. Yes, well, I still haven't figured out how to secure DNS. Yet clearly > this whole security mess must march on, and somehow the security > infrastructure must eventually become easy to install. > <DNS hat> That's pretty easy. Install BIND 9.6.0. Read the DNSSEC deployment instructions here: https://www.isc.org/files/DNSSEC_in_6_minutes.pdf and implement. You should be done in almost no time. </DNS hat> >>> Clearly, it would be a lot of work to get the kernel to keep >>> TAI instead of UTC, so this is not, at this time, a "serious >>> proposal". But if it were possible, and all the various >>> little issues that result were solvable, then it does seem >>> like a better long-term solution. >>> >> This is a *lot* more complicated than you might think. If you are >> thinking of implementing this similarly to the way timezone information >> is added for display purposes, you need the whole list of leap seconds >> and when the change happened since you now have to look at a timestamp >> and see when it was and then apply all of the leapseconds up to that >> point in time and none of the leapseconds beyond that. In addition, you >> have legacy files that have UTC timestamps on them so you would need to >> distinguish between UTC (legacy) and TAI timestamps in the file system >> among other places (anywhere where a timestamp exists) and what would >> you do about database tables which contain timestamps? The list goes on. > > Yes. > >> I'd much rather you spend the time tackling the clock interrupt losses >> that many of our Linux users complain about. See: >> https://support.ntp.org/bin/view/Support/KnownOsIssues#Section_9.2.4. >> for some of the gorier details. I'm sure you don't really want us >> recommending that they set HZ=100 in the kernel to alleviate the problem. > > Actually, this is rather sorely lacking in 'gory details', rather, its > a complaint > that 'things don't work' with no discussion of the actual problem. It would > be much better if there was a link to any previous discussions on LKML on > this issue. Sorry, but that's not my area of expertise. I just know we have many people running Linux and have these issues. > > My knee-jerk reaction on reading about the lost-interrupts issue is that, > yes, setting HZ=100 and disabling ACPI is indeed a decent short-term > work-around (APIC is something completely different and not something > you can disable). The correct long-term solution would be to use real-time > kernels, which are designed to make sure that things like lost interrupts > never happen. > I bow to your superior knowledge in this area. Danny ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 14:34 ` Danny Mayer 2009-01-07 15:42 ` Linas Vepstas @ 2009-01-07 16:04 ` john stultz 1 sibling, 0 replies; 109+ messages in thread From: john stultz @ 2009-01-07 16:04 UTC (permalink / raw) To: mayer Cc: linasvepstas, david, Robert Hancock, Ben Goodger, Kyle Moffett, MentalMooMan, David Newall, linux-kernel, ntpwg, Travis Crump, burdell, Nick Andrew, Jeffrey J. Kosowsky On Wed, Jan 7, 2009 at 6:34 AM, Danny Mayer <mayer@ntp.isc.org> wrote: > I'd much rather you spend the time tackling the clock interrupt losses > that many of our Linux users complain about. See: > https://support.ntp.org/bin/view/Support/KnownOsIssues#Section_9.2.4. > for some of the gorier details. I'm sure you don't really want us > recommending that they set HZ=100 in the kernel to alleviate the problem. I believe the lost tick issue as well as the HZ=100 suggestions at the page above are out of date for 2.6.21 and higher kernels as the generic timekeeping rework addressed these problems. Please let me know if you're still seeing any such issues with NTP. thanks -john ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 4:52 ` Linas Vepstas 2009-01-07 10:03 ` David Newall 2009-01-07 14:34 ` Danny Mayer @ 2009-01-07 17:36 ` M. Warner Losh 2 siblings, 0 replies; 109+ messages in thread From: M. Warner Losh @ 2009-01-07 17:36 UTC (permalink / raw) To: linasvepstas Cc: mayer, david, hancockr, kyle, slashdot, goodgerster, davidn, linux-kernel, ntpwg, pretzalz, burdell, nick, jeff In message: <3ae3aa420901062052h75fcab11n8ce45c41ac0e4cd2@mail.gmail.com> "Linas Vepstas" <linasvepstas@gmail.com> writes: : However, during the discussion, the idea came out that : maybe keeping UTC time in the kernel is just plain stupid. : So there's this idea floating around that maybe the kernel : should keep TAI time instead. The hope is that this will : reduce the complexity in the kernel, and push it out to : user space, "where it belongs" (to repeat a well-worn : mantra). I agree that this is where it belongs, but it is hard to do that in a POSIX compliant way. It also becomes hard to timestamp things in filesystems using UTC rather than TAI. There are other protocols that deal with UTC times as well. : However, *if* we were to kick UTC out of the kernel, : and push it to user-land, then, of course, there's a : different problem: how does the kernel know what the : correct TAI time is? As your reply makes abundantly : clear, NTP is not a good source for TAI information. Agreed. That's the whole crux of the 'multiple time scales suck' threads that I've talked about in other forums. You have to know this information before you start, have to deal with 'dusty system' problem for systems that have been off for 6 months or not upgraded. You also have to cope with learning after the fact that your initial guess was wrong. I've had many systems that would get this information from GPS and stall the rest of the system until this data came in. I did this mostly because there were big issues with the software down stream if you changed the delta between your putative UTC and TAI after the fact. : The comments which you labelled as "non-sense" were : a mis-understanding of a discussion of a particular issue : that would arise if the kernel were to keep TAI -- if it did, : then user-space systems would need to have a reliable : source for leap-seconds. Since NTP does not : provide this, there was discussion about how that : could be worked-around. This then lead to the comment : that, "gee, wouldn't the right long-term solution be that : NTP provide TAI info?" I've wanted this for a long time... : Clearly, it would be a lot of work to get the kernel to keep : TAI instead of UTC, so this is not, at this time, a "serious : proposal". But if it were possible, and all the various : little issues that result were solvable, then it does seem : like a better long-term solution. Yes. The kernel would need to be able to return both UTC and TAI times to the kernel as well, since there are requirements for NFS to return timestamps in UTC, not in TAI. Many file systems specify UTC time, or have traditionally been implemented that way. Warner ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 3:50 ` Danny Mayer 2009-01-07 4:52 ` Linas Vepstas @ 2009-01-07 17:39 ` M. Warner Losh 2009-01-07 19:31 ` Alan Cox 2009-01-08 20:09 ` Steve Allen 1 sibling, 2 replies; 109+ messages in thread From: M. Warner Losh @ 2009-01-07 17:39 UTC (permalink / raw) To: mayer Cc: linasvepstas, david, hancockr, kyle, slashdot, goodgerster, davidn, linux-kernel, ntpwg, pretzalz, burdell, nick, jeff In message: <49642674.9080703@ntp.isc.org> Danny Mayer <mayer@ntp.isc.org> writes: : Why don't you tell us what the real problem is instead of telling us : that you need TAI offset information? The real problem is that POSIX time_t totally ignores leap seconds. This forces systems that are rolling through a leap second to repeat time, causing time to jump backwards by 1s (or violate POSIX time_t's invariant that midnight time_t is % 86400 == 0). This jump backwards is a pita in the kernel, and violates the assumption that many programs have that time doesn't flow backwards. The suggestion to solving this would be to tick in TAI time, and force userland to cope with the leapsecond issues. Of course, there's a number of problems with this solution as well, but it feels like it belongs there... Warner ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 17:39 ` M. Warner Losh @ 2009-01-07 19:31 ` Alan Cox 2009-01-07 19:42 ` M. Warner Losh 2009-01-08 3:57 ` Danny Mayer 2009-01-08 20:09 ` Steve Allen 1 sibling, 2 replies; 109+ messages in thread From: Alan Cox @ 2009-01-07 19:31 UTC (permalink / raw) To: M. Warner Losh Cc: mayer, linasvepstas, david, hancockr, kyle, slashdot, goodgerster, davidn, linux-kernel, ntpwg, pretzalz, burdell, nick, jeff > time, causing time to jump backwards by 1s (or violate POSIX time_t's > invariant that midnight time_t is % 86400 == 0). This jump backwards > is a pita in the kernel, and violates the assumption that many > programs have that time doesn't flow backwards. They can slew the clock slowly as well. There is a wonderful quote from one of the summaries of the POSIX committee discussions on time that says quite simply "the posix clock is not guaranteed to be accurate" As it currently stands the kernel contains sufficient support that at the point you know a leap second is coming you can adjust the second length marginally over the entire period. The current behaviour is an implementation decision. Jumping on a second shouldn't be an issue to most people, jumping back is asking for badness but isn't in fact used in the world today. Slewing the entire day so that each second is 1/86400 of a second longer or shorter wouldn't be noticed by anyone. Alan ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 19:31 ` Alan Cox @ 2009-01-07 19:42 ` M. Warner Losh 2009-01-08 3:57 ` Danny Mayer 1 sibling, 0 replies; 109+ messages in thread From: M. Warner Losh @ 2009-01-07 19:42 UTC (permalink / raw) To: alan Cc: mayer, linasvepstas, david, hancockr, kyle, slashdot, goodgerster, davidn, linux-kernel, ntpwg, pretzalz, burdell, nick, jeff In message: <20090107193127.0bec8ad8@lxorguk.ukuu.org.uk> Alan Cox <alan@lxorguk.ukuu.org.uk> writes: : > time, causing time to jump backwards by 1s (or violate POSIX time_t's : > invariant that midnight time_t is % 86400 == 0). This jump backwards : > is a pita in the kernel, and violates the assumption that many : > programs have that time doesn't flow backwards. : : They can slew the clock slowly as well. There is a wonderful quote from : one of the summaries of the POSIX committee discussions on time that says : quite simply "the posix clock is not guaranteed to be accurate" True, You can. However, anybody you peer with via ntpd will have issues unless things are coordinated with ntpd (and aren't a leaf node). There you have much higher tolerances for correctness. : As it currently stands the kernel contains sufficient support that at the : point you know a leap second is coming you can adjust the second length : marginally over the entire period. : : The current behaviour is an implementation decision. Jumping on a second : shouldn't be an issue to most people, jumping back is asking for badness : but isn't in fact used in the world today. Slewing the entire day so that : each second is 1/86400 of a second longer or shorter wouldn't be noticed : by anyone. If you are an ntp leaf node, that doesn't care about UTC accurate to the second, this will work well. For most users, this effectively papers over the problem. If you do care about UTC time being more accurate than this slewing will be too large and introduce errors that are too big. Likewise for non-leaf ntp nodes. For these machines, having time be off by 1/2 second can be very bad. There are many real-time systems that fall into this category, trading systems on wall street, systems that control things based on doing things at certain points within UTC second, etc. For those types of systems, changing the length of the second by this much isn't going to work at all. ntpd also lights the INS bit only on 'leap day' so depending on when you poll, you might not have a full day's notice of these changes, but that can be managed... Warner ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 19:31 ` Alan Cox 2009-01-07 19:42 ` M. Warner Losh @ 2009-01-08 3:57 ` Danny Mayer 2009-01-08 4:42 ` M. Warner Losh 1 sibling, 1 reply; 109+ messages in thread From: Danny Mayer @ 2009-01-08 3:57 UTC (permalink / raw) To: Alan Cox Cc: M. Warner Losh, linasvepstas, david, hancockr, kyle, slashdot, goodgerster, davidn, linux-kernel, ntpwg, pretzalz, burdell, nick, jeff Alan Cox wrote: >> time, causing time to jump backwards by 1s (or violate POSIX time_t's >> invariant that midnight time_t is % 86400 == 0). This jump backwards >> is a pita in the kernel, and violates the assumption that many >> programs have that time doesn't flow backwards. > > They can slew the clock slowly as well. There is a wonderful quote from > one of the summaries of the POSIX committee discussions on time that says > quite simply "the posix clock is not guaranteed to be accurate" > > As it currently stands the kernel contains sufficient support that at the > point you know a leap second is coming you can adjust the second length > marginally over the entire period. > > The current behaviour is an implementation decision. Jumping on a second > shouldn't be an issue to most people, jumping back is asking for badness > but isn't in fact used in the world today. Slewing the entire day so that > each second is 1/86400 of a second longer or shorter wouldn't be noticed > by anyone. NTP handles most of this, but it needs the cooperation of the O/S kernel and most of the Unix kernels are able to provide the required API's. FreeBSD doesn't have any of these problems but Linux historically has. Most of that code was designed by Dave Mills but since each kernel is different we should not expect them all to behave the same way and generally requires an understanding of what NTP expects and that's not always clear to kernel developers who are not expected to know NTP. Danny ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-08 3:57 ` Danny Mayer @ 2009-01-08 4:42 ` M. Warner Losh 2009-01-08 10:48 ` Alan Cox 0 siblings, 1 reply; 109+ messages in thread From: M. Warner Losh @ 2009-01-08 4:42 UTC (permalink / raw) To: mayer Cc: alan, linasvepstas, david, hancockr, kyle, slashdot, goodgerster, davidn, linux-kernel, ntpwg, pretzalz, burdell, nick, jeff In message: <496579C2.5050800@ntp.isc.org> Danny Mayer <mayer@ntp.isc.org> writes: : Alan Cox wrote: : >> time, causing time to jump backwards by 1s (or violate POSIX time_t's : >> invariant that midnight time_t is % 86400 == 0). This jump backwards : >> is a pita in the kernel, and violates the assumption that many : >> programs have that time doesn't flow backwards. : > : > They can slew the clock slowly as well. There is a wonderful quote from : > one of the summaries of the POSIX committee discussions on time that says : > quite simply "the posix clock is not guaranteed to be accurate" : > : > As it currently stands the kernel contains sufficient support that at the : > point you know a leap second is coming you can adjust the second length : > marginally over the entire period. : > : > The current behaviour is an implementation decision. Jumping on a second : > shouldn't be an issue to most people, jumping back is asking for badness : > but isn't in fact used in the world today. Slewing the entire day so that : > each second is 1/86400 of a second longer or shorter wouldn't be noticed : > by anyone. : : NTP handles most of this, but it needs the cooperation of the O/S kernel : and most of the Unix kernels are able to provide the required API's. : FreeBSD doesn't have any of these problems but Linux historically has. : Most of that code was designed by Dave Mills but since each kernel is : different we should not expect them all to behave the same way and : generally requires an understanding of what NTP expects and that's not : always clear to kernel developers who are not expected to know NTP. On FreeBSD, Solaris and Digital Unix, I'll point out, that jumping backwards is used, and has been used since at least 1994. So saying it isn't used in the world today is flat out wrong. Warner ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-08 4:42 ` M. Warner Losh @ 2009-01-08 10:48 ` Alan Cox 2009-01-08 10:56 ` Alan Cox 2009-01-08 15:02 ` M. Warner Losh 0 siblings, 2 replies; 109+ messages in thread From: Alan Cox @ 2009-01-08 10:48 UTC (permalink / raw) To: M. Warner Losh Cc: mayer, linasvepstas, david, hancockr, kyle, slashdot, goodgerster, davidn, linux-kernel, ntpwg, pretzalz, burdell, nick, jeff > On FreeBSD, Solaris and Digital Unix, I'll point out, that jumping > backwards is used, and has been used since at least 1994. So saying > it isn't used in the world today is flat out wrong. I stand by my comment - when was the last time the IERS used a leap second removal ? The code may exist but it doesn't happen. Alan ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-08 10:48 ` Alan Cox @ 2009-01-08 10:56 ` Alan Cox 2009-01-08 22:22 ` David Mills 2009-01-08 15:02 ` M. Warner Losh 1 sibling, 1 reply; 109+ messages in thread From: Alan Cox @ 2009-01-08 10:56 UTC (permalink / raw) Cc: M. Warner Losh, mayer, linasvepstas, david, hancockr, kyle, slashdot, goodgerster, davidn, linux-kernel, ntpwg, pretzalz, burdell, nick, jeff On Thu, 8 Jan 2009 10:48:54 +0000 Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: > > On FreeBSD, Solaris and Digital Unix, I'll point out, that jumping > > backwards is used, and has been used since at least 1994. So saying > > it isn't used in the world today is flat out wrong. [Ignore previous email, must remember not to post before waking up ;)] You are correct - and providing gettimeofday() is being used on Linux rather than time() which simply appears to stall due to resolution the same is true. Some users do run with the "right" timezone data in non posix mode because they want their seconds 'sane' but that isn't the default. Alan ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-08 10:56 ` Alan Cox @ 2009-01-08 22:22 ` David Mills 0 siblings, 0 replies; 109+ messages in thread From: David Mills @ 2009-01-08 22:22 UTC (permalink / raw) Cc: linux-kernel, ntpwg Folks, You are not correct. The kernel software clock variable is in fact stepped back, but the routine that actually reads the clock does not step the clock back unless set back more than two seconds.. Otherwise, the clock is strictly monotonic. That is the ad vice I gave in rfc1583 and implemented the Digital Unix kernel because I wrote tthe code. Other kernelmongers might or might not have taken the advice. As for the TAI issue discussed earlier, note that the generic NTP kernel supportfrom me since 1991 has TAI . However, support to read it requires the ntp_gettime() syscall and nlot all kernels support it. The recent leap was observed to work correctly in Solaris and FreeBSD. It worked fine with the WWV driver and the Spectracom GPS driver, but not the NMEA, Arbiter, Meinberg nor any of the NIST or USNO primary servers. It probably did work with the Canadian servers, since the Ottowa primary server is synchronized via my CHU audio driver. It didn't work onn my carefully contrived backroom servers, as they lost power durring the event. See www.eecis.udel.edu/~mills/leap.html and/or the online NTP documentation and/or my book. Dave Alan Cox wrote: >On Thu, 8 Jan 2009 10:48:54 +0000 >Alan Cox <alan@lxorguk.ukuu.org.uk> wrote: > > > >>>On FreeBSD, Solaris and Digital Unix, I'll point out, that jumping >>>backwards is used, and has been used since at least 1994. So saying >>>it isn't used in the world today is flat out wrong. >>> >>> > >[Ignore previous email, must remember not to post before waking up ;)] > >You are correct - and providing gettimeofday() is being used on Linux >rather than time() which simply appears to stall due to resolution the >same is true. > >Some users do run with the "right" timezone data in non posix mode >because they want their seconds 'sane' but that isn't the default. > >Alan > > >_______________________________________________ >ntpwg mailing list >ntpwg@lists.ntp.org >https://lists.ntp.org/mailman/listinfo/ntpwg > > ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-08 10:48 ` Alan Cox 2009-01-08 10:56 ` Alan Cox @ 2009-01-08 15:02 ` M. Warner Losh 2009-01-08 18:57 ` Marshall Eubanks 1 sibling, 1 reply; 109+ messages in thread From: M. Warner Losh @ 2009-01-08 15:02 UTC (permalink / raw) To: alan Cc: mayer, linasvepstas, david, hancockr, kyle, slashdot, goodgerster, davidn, linux-kernel, ntpwg, pretzalz, burdell, nick, jeff In message: <20090108104854.2dbc41b1@lxorguk.ukuu.org.uk> Alan Cox <alan@lxorguk.ukuu.org.uk> writes: : > On FreeBSD, Solaris and Digital Unix, I'll point out, that jumping : > backwards is used, and has been used since at least 1994. So saying : > it isn't used in the world today is flat out wrong. : : I stand by my comment - when was the last time the IERS used a leap : second removal ? The code may exist but it doesn't happen. Jumping backwards is used for every leap second that IERS has ever done, which was your original comment. There's has never been a case where there was a leap second for jump forward though. The proper technical term here is 'negative leap second'. All leap seconds up until now have been positive leap seconds, and it is unlikely there ever will be a negative one. Warner ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-08 15:02 ` M. Warner Losh @ 2009-01-08 18:57 ` Marshall Eubanks 0 siblings, 0 replies; 109+ messages in thread From: Marshall Eubanks @ 2009-01-08 18:57 UTC (permalink / raw) To: M. Warner Losh Cc: alan, david, hancockr, kyle, slashdot, goodgerster, mayer, davidn, linux-kernel, ntpwg, pretzalz, burdell, linasvepstas, nick, jeff On Jan 8, 2009, at 10:02 AM, M. Warner Losh wrote: > In message: <20090108104854.2dbc41b1@lxorguk.ukuu.org.uk> > Alan Cox <alan@lxorguk.ukuu.org.uk> writes: > : > On FreeBSD, Solaris and Digital Unix, I'll point out, that jumping > : > backwards is used, and has been used since at least 1994. So > saying > : > it isn't used in the world today is flat out wrong. > : > : I stand by my comment - when was the last time the IERS used a leap > : second removal ? The code may exist but it doesn't happen. > > Jumping backwards is used for every leap second that IERS has ever > done, which was your original comment. There's has never been a case > where there was a leap second for jump forward though. The proper > technical term here is 'negative leap second'. All leap seconds up > until now have been positive leap seconds, and it is unlikely there > ever will be a negative one. I disagree. In the 1970's, the excess LOD was as much as 3 msec. After going down some, the mid 1990's it rose to around 2 msec. Now, it is around 1 msec. Here is a plot http://www.iers.org/MainDisp.csl?pid=95-100 Only the long period variations count for leap seconds - the seasonal and other high frequency oscillations tend to average out. In the early part of the last century (~1905), it decreased by ~ 5 msec in a year or so. If that happened right now, it would go to ~ -4 msec negative, and we would be seeing 2 negative leap seconds or more per year. Even if the decrease from 1975 to 1985 happened again, it would be at -1 msec, and we would have a negative leap second every two years or so. What is a reasonable assumption is that we would likely have a year or more warning of the likelihood of a negative leap second. Regards Marshall Eubanks > > > Warner > _______________________________________________ > ntpwg mailing list > ntpwg@lists.ntp.org > https://lists.ntp.org/mailman/listinfo/ntpwg ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 17:39 ` M. Warner Losh 2009-01-07 19:31 ` Alan Cox @ 2009-01-08 20:09 ` Steve Allen 1 sibling, 0 replies; 109+ messages in thread From: Steve Allen @ 2009-01-08 20:09 UTC (permalink / raw) To: M. Warner Losh Cc: mayer, david, hancockr, kyle, slashdot, goodgerster, davidn, linux-kernel, ntpwg, pretzalz, burdell, linasvepstas, nick, jeff On Wed 2009-01-07T10:39:47 -0700, M. Warner Losh hath writ: > The suggestion to solving this would be to tick in TAI time, and force > userland to cope with the leapsecond issues. Of course, there's a > number of problems with this solution as well, but it feels like it > belongs there... Agreed that the leap second belong in userland, but BIPM itself refuses to agree with the idea of the underlying time scale being TAI. TAI has no standing as an international recommendation, and it is not available via the established broadcast mechanisms, and BIPM does not want those things to happen. What would be needed is a leap-free time scale with an international recommendation standing behind it so as to legitimize its use. The most recent public insight to the ITU-R process of reconsidering leap seconds in UTC is from September, here http://www.navcen.uscg.gov/cgsic/meetings/48thmeeting/Reports/Timing%20Subcommittee/48-LS%2020080916.pdf In the schedule given on page 16 we see that even if the ITU-R process goes smoothly there will be leap seconds at least until 2017, so we have to live with them for at least that long. However, at the October meeting of ITU-R WP7A things did not go smoothly. There were two countries objecting to any change to UTC, so the process of considering any change to the broadcast time scale is stalled. Basically, any change to UTC is currently stalled by the international political/diplomatic process which controls it. Way back in 2003 the ITU-R asked for advice on the broadcast time scale, and the advice from the experts included changing the name if leaps are dropped. http://www.inrim.it/luc/cesio/itu/closure.pdf At that point nobody managed to point out that POSIX demands that the zoneinfo mechanisms allow for offsets of seconds as well as minutes, so there was no clear path for implementing that advice while preserving compliance with specifications that still demand UTC. Any epoch-based time scale has issues with UTC as it has been defined http://www.ucolick.org/~sla/leapsecs/epochtime.html and by ignoring leap seconds POSIX makes that even harder to implement. During the past century we have seen the creation of at least 4 different uniform time scales, two of which are widely available by broacast (LORAN-C and GPS), but none of which has the backing of an international standard behind it. http://www.ucolick.org/~sla/leapsecs/deltat.html All civil time scales are conventional constructs, and zoneinfo is designed to handle the arbitrary nature of changes to civil time. If the underlying time scale changes its name and stops having leaps, then leap seconds in UTC are just another form of conventional change to civil time. UTC could become a time zone. Processes which happen when POSIX time_t % 86400 == 0 would happen at "atomic midnight" instead of "civil" midnight, not a big difference. If the ITU-R were to take the advice of the colloquium it organized, if they were to abandon the name UTC, and establish a new international broadcast time scale with a new name, then the operational systems of the world receiving those broadcasts would not notice. There would be some rewriting of documents, specifications, and some extra work streamlining zoneinfo. It's not just an engineering tradeoff, it's a political tradeoff. The question for the NTP implementors, kernel hackers, application writers is whether it's worth waiting to see if the current political impasse about UTC can be broken, or whether it seems better, easier, and quicker to lobby the ITU-R delegations to abandon the name UTC and give a new name to a broadcast time scale without leaps. Either way we will have to handle another decade of leap seconds before the broadcast time scale can change its characteristics. replies directed to the LEAPSECS list -- Steve Allen <sla@ucolick.org> WGS-84 (GPS) UCO/Lick Observatory Natural Sciences II, Room 165 Lat +36.99855 University of California Voice: +1 831 459 3046 Lng -122.06015 Santa Cruz, CA 95064 http://www.ucolick.org/~sla/ Hgt +250 m ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-05 17:42 ` Linas Vepstas ` (2 preceding siblings ...) 2009-01-07 3:50 ` Danny Mayer @ 2009-01-12 16:11 ` Pavel Machek 2009-01-12 17:07 ` [ntpwg] " M. Warner Losh 3 siblings, 1 reply; 109+ messages in thread From: Pavel Machek @ 2009-01-12 16:11 UTC (permalink / raw) To: Linas Vepstas Cc: david, Nick Andrew, David Newall, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell, mills, Brian Haberman, Karen O'Donoghue, ntpwg On Mon 2009-01-05 11:42:35, Linas Vepstas wrote: > 2009/1/5 <david@lang.hm>: > > On Mon, 5 Jan 2009, Linas Vepstas wrote: > > > >>> Arguably the kernel's responsibility should be to keep track of the > >>> most fundamental representation of time possible for a machine (that's > >>> probably TAI) and it is a userspace responsibility to map from that > >>> value to other time standards including UTC, > >> > >> Yes, this really does seem like the right solution. > >> > >>> using control files > >>> which are updated as leap seconds are declared. > >> > >> Lets be clear on what "control files" means. This does > >> *NOT* mean some config file shipped by some distro > >> for some package. That would be a horrid solution. > >> People don't install updates, patches, etc. Distros > >> ship them late, or never, if the distro is old enough. > >> > >> A more appropriate solution would be to have > >> either the kernel or ntpd track the leap seconds > >> automatically. First, the ntp protocol already provides > >> the needed notification of a leap second to anyone > >> who cares about it (i.e. there is no point in getting a > >> Linux distro involved in this -- a distribution mechanism > >> already exists, and works *better* than having a distro > >> do it). > > > > I disagree with this. NTP will only know about leap seconds if it was > > running and connected to a server that advertised the leap seconds during > > that month. > > > > for example, if you installed a new server today, how would it ever know > > that there was a leap second a couple of days ago? > > OK, good point. Unless your distro was less > than a few days old (unlikely), you are faced with the > same problem. Sure, eventually, the distro will publish > an update (which will add to the existing list of 36 leap > seconds -- which is needed in any case, since no one > has a server that's been up since 1958), but this is > unlikely to happen during this install window. > > The long term solution would be write an RFC to extend > NTP to also provide TAI information -- e.g. to add a > message that indicates the current leap-second offset > between UTC and TAI. Offset is not enough; you'd have to provide list of all previous leap seconds with 'when it happened' timestamps. -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-12 16:11 ` Pavel Machek @ 2009-01-12 17:07 ` M. Warner Losh 2009-01-12 21:45 ` Valdis.Kletnieks 0 siblings, 1 reply; 109+ messages in thread From: M. Warner Losh @ 2009-01-12 17:07 UTC (permalink / raw) To: pavel Cc: linasvepstas, david, goodgerster, kyle, slashdot, davidn, linux-kernel, hancockr, ntpwg, pretzalz, burdell, nick, jeff In message: <20090112161115.GA1474@ucw.cz> Pavel Machek <pavel@suse.cz> writes: : On Mon 2009-01-05 11:42:35, Linas Vepstas wrote: : > 2009/1/5 <david@lang.hm>: : > > On Mon, 5 Jan 2009, Linas Vepstas wrote: : > > : > >>> Arguably the kernel's responsibility should be to keep track of the : > >>> most fundamental representation of time possible for a machine (that's : > >>> probably TAI) and it is a userspace responsibility to map from that : > >>> value to other time standards including UTC, : > >> : > >> Yes, this really does seem like the right solution. : > >> : > >>> using control files : > >>> which are updated as leap seconds are declared. : > >> : > >> Lets be clear on what "control files" means. This does : > >> *NOT* mean some config file shipped by some distro : > >> for some package. That would be a horrid solution. : > >> People don't install updates, patches, etc. Distros : > >> ship them late, or never, if the distro is old enough. : > >> : > >> A more appropriate solution would be to have : > >> either the kernel or ntpd track the leap seconds : > >> automatically. First, the ntp protocol already provides : > >> the needed notification of a leap second to anyone : > >> who cares about it (i.e. there is no point in getting a : > >> Linux distro involved in this -- a distribution mechanism : > >> already exists, and works *better* than having a distro : > >> do it). : > > : > > I disagree with this. NTP will only know about leap seconds if it was : > > running and connected to a server that advertised the leap seconds during : > > that month. : > > : > > for example, if you installed a new server today, how would it ever know : > > that there was a leap second a couple of days ago? : > : > OK, good point. Unless your distro was less : > than a few days old (unlikely), you are faced with the : > same problem. Sure, eventually, the distro will publish : > an update (which will add to the existing list of 36 leap : > seconds -- which is needed in any case, since no one : > has a server that's been up since 1958), but this is : > unlikely to happen during this install window. : > : > The long term solution would be write an RFC to extend : > NTP to also provide TAI information -- e.g. to add a : > message that indicates the current leap-second offset : > between UTC and TAI. : : Offset is not enough; you'd have to provide list of all previous leap : seconds with 'when it happened' timestamps. Well, today you can ftp the leapseconds.txt file from NIST. Of course, that assumes your machine is on the network, and not a dumb slave of a smart head-end that's off the net... Warner ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [ntpwg] Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-12 17:07 ` [ntpwg] " M. Warner Losh @ 2009-01-12 21:45 ` Valdis.Kletnieks 0 siblings, 0 replies; 109+ messages in thread From: Valdis.Kletnieks @ 2009-01-12 21:45 UTC (permalink / raw) To: M. Warner Losh Cc: pavel, linasvepstas, david, goodgerster, kyle, slashdot, davidn, linux-kernel, hancockr, ntpwg, pretzalz, burdell, nick, jeff [-- Attachment #1: Type: text/plain, Size: 599 bytes --] On Mon, 12 Jan 2009 10:07:12 MST, "M. Warner Losh" said: > Well, today you can ftp the leapseconds.txt file from NIST. Of > course, that assumes your machine is on the network, and not a dumb > slave of a smart head-end that's off the net... If you're a dumb slave off a smart head-end, the sysadmin has already solved the problem of getting files from the outside to the dumb slave, just for their own sanity in pushing patches and *other* config file updates. And if you're *not* getting updates pushed to you for all the *other* stuff, the leapseconds is probably the least of your worries. [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-05 16:08 ` Linas Vepstas 2009-01-05 17:51 ` david @ 2009-01-06 2:31 ` Nick Andrew 1 sibling, 0 replies; 109+ messages in thread From: Nick Andrew @ 2009-01-06 2:31 UTC (permalink / raw) To: Linas Vepstas Cc: David Newall, david, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell On Mon, Jan 05, 2009 at 10:08:50AM -0600, Linas Vepstas wrote: > 2009/1/5 Nick Andrew <nick@nick-andrew.net>: > > On Sun, Jan 04, 2009 at 11:48:31PM -0600, Linas Vepstas wrote: > > Arguably the kernel's responsibility should be to keep track of the > > most fundamental representation of time possible for a machine (that's > > probably TAI) and it is a userspace responsibility to map from that > > value to other time standards including UTC, > > Yes, this really does seem like the right solution. > > > using control files > > which are updated as leap seconds are declared. > > Lets be clear on what "control files" means. This does > *NOT* mean some config file shipped by some distro > for some package. That would be a horrid solution. > People don't install updates, patches, etc. Distros > ship them late, or never, if the distro is old enough. To clarify - as far as I know, TAI is a fundamental time scale because it's regular and monotonically increasing. Wikipedia talks about specifying TAI using both Julian Dates and the Gregorian Calendar - I don't know whether that means representations of TAI time may suffer gaps depending on declared (subtracted) leap seconds. In any case I was thinking of something like Bernsteins TAI64 (http://cr.yp.to/libtai/tai64.html) which is just a count of seconds (and nanoseconds using TAI64N). Considering TAI64 as a count of seconds, other time values (UTC, unix epoch time) can be derived from TAI64 by applying some mapping function which takes into account all the irregularities introduced by our complex time systems (including leap years, leap seconds, DST, pre-Gregorian calendars and so on). Unix epoch time (seconds since 1 Jan 1970 00:00:00 GMT) is also regular and monotonically increasing however it's no longer suitable as a fundamental timebase because it doesn't recognise the existence of leap seconds. In unix epoch time a day is always 86400 seconds long and when I said "preserve the existing behaviour of time()" I meant that this constant must be maintained. As Linas correctly noted, UTC allows a distinct representation of a leap second (xx:59:60). It follows from the previous paragraph that a mapping from time_t to UTC can never result in ":60". Mapping from UTC to time_t is lossy: if the input is a leap second then something must be done with it: mktime() for 09:59:60 returns the same time_t value as for 10:00:00. Mapping from TAI64 to UTC or time_t requires knowledge of what leap seconds were already applied, and when. Wikipedia says TAI is 34 seconds ahead of UTC right now, but I'm talking about converting any past TAI value, not just current time. So it's not really suitable for the kernel to just learn about leap seconds on the fly, there needs to be a persistent table of some kind which states what changes happened and when. This is analogous to the zoneinfo file, which states not just the current DST rules but also all past ones. There will certainly be hosts where this mapping file is out of date, however it is supplied. That's the case with zoneinfo too, and there's a general problem in that politicians keep mucking about with daylight saving time. We're experiencing that now in Australia, where the state of Western Australia which never had DST in the past, now has it as a "test". So WA has got it now, much to my displeasure, and may or may not have it in future. In general it's not possible to reliably convert future dates from time_t to local time, where future dates are anything more recent than your zoneinfo file. The same constraint applies to conversion from TAI64. There's a good argument for including up-to-date conversion information in the NTP protocol. I don't know enough about NTP whether it has this capability already. Hosts which don't have up-to-date zoneinfo files and don't sync time with NTP probably don't care about accurate time conversion anyway. > Well, 'man 2 time' is as clear as mud. It talks about leap seconds, > but I can't figure out what its saying. I rather > doubt that time() is doing what POSIX.1 seems to want > it to do (which is to ignore leap seconds?) I think I read that linux "ticks the second twice" (I don't know whether that's the 59 second or the 00 second, it should be 00 for ctime(3) to make any sense) and I don't know whether gettimeofday(2) will show tv_usec returning to zero and re-counting the microseconds. I think POSIX.1 wants time_t to ignore leap seconds as if they didn't exist. That means that the :59:60 and :00:00 wall clock seconds share a single time_t value ... in other words, one time_t second in linux persists for two wall clock seconds during a leap second. Sane behaviour would be for tv_sec and tv_usec to be monotonically increasing while this is going on; the microseconds should pass at half the usual rate to preserve this. > The reason I'm guessing that time() is wrong, is because > it seems that POSIX wants time() to use TAI time, and > we don't have that handy anywhere (because we've lost > track of those leap seconds) I don't think POSIX wants TAI, but it makes sense for a kernel to provide an unambiguous time reference to userspace. time_t is a convenient approximation but it is non-linear due to ignoring the leap seconds and it probably causes havoc for any precise measurements occurring during the leap second. Nick. -- PGP Key ID = 0x418487E7 http://www.nick-andrew.net/ PGP Key fingerprint = B3ED 6894 8E49 1770 C24A 67E3 6266 6EB9 4184 87E7 ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-05 14:33 ` Nick Andrew 2009-01-05 16:08 ` Linas Vepstas @ 2009-01-06 1:59 ` David Newall 2009-01-06 2:18 ` Chris Adams 2009-01-06 2:51 ` Nick Andrew 1 sibling, 2 replies; 109+ messages in thread From: David Newall @ 2009-01-06 1:59 UTC (permalink / raw) To: Nick Andrew Cc: Linas Vepstas, david, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell Nick Andrew wrote: > I can sympathise with the opinion that linux should be able to accurately > distinguish xx:59:60 when a leap second is added (or the missing :59 when > one is subtracted) but not at the expense of making a day which is not > 86400 seconds long. > Some days are not 86400 seconds long. That's a fact and regardless of how inconvenient it is, we have to live with it. Some years don't have 365 days; some months don't have 30 days; some Februaries don' have 28 days; and now, some days don't have 86400 seconds. What's the point in fighting this? If you want to know the days between two times, dividing by 86400 doesn't cut it. > Arguably the kernel's responsibility should be to keep track of the > most fundamental representation of time possible for a machine (that's > probably TAI) and it is a userspace responsibility to map from that > value to other time standards including UTC, using control files > which are updated as leap seconds are declared. We have this already; zoneinfo > Just so long as the > existing behaviour of time() which doesn't recognise leap seconds > is preserved. I haven't been able to find this Annex B that Alan talked of, so I can only go by the man page, which states, simply and explicitly, that time() returns seconds since Epoch, and also that Epoch is start of January 1 1970. To my mind, time *does* recognise leap seconds. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-06 1:59 ` David Newall @ 2009-01-06 2:18 ` Chris Adams 2009-01-06 2:51 ` Nick Andrew 1 sibling, 0 replies; 109+ messages in thread From: Chris Adams @ 2009-01-06 2:18 UTC (permalink / raw) To: David Newall; +Cc: linux-kernel Once upon a time, David Newall <davidn@davidnewall.com> said: > We have this already; zoneinfo How many times: zoneinfo is for offset from UTC, not changes in UTC. > I haven't been able to find this Annex B that Alan talked of, so I can > only go by the man page, which states, simply and explicitly, that > time() returns seconds since Epoch, and also that Epoch is start of > January 1 1970. To my mind, time *does* recognise leap seconds. Part of the rationale for SUSv3 (aka 1003.1-2001), xbd_chap04.html in my copy: The topic of whether seconds since the Epoch should account for leap seconds has been debated on a number of occasions, and each time consensus was reached (with acknowledged dissent each time) that the majority of users are best served by treating all days identically. (That is, the majority of applications were judged to assume a single length-as measured in seconds since the Epoch-for all days. Thus, leap seconds are not applied to seconds since the Epoch.) Those applications which do care about leap seconds can determine how to handle them in whatever way those applications feel is best. This was particularly emphasized because there was disagreement about what the best way of handling leap seconds might be. It is a practical impossibility to mandate that a conforming implementation must have a fixed relationship to any particular official clock (consider isolated systems, or systems performing "reruns" by setting the clock to some arbitrary time). Now, you are wrong, the standard says so, please take this somewhere else and stop CCing me. -- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-06 1:59 ` David Newall 2009-01-06 2:18 ` Chris Adams @ 2009-01-06 2:51 ` Nick Andrew 2009-01-06 9:40 ` Alan Cox 1 sibling, 1 reply; 109+ messages in thread From: Nick Andrew @ 2009-01-06 2:51 UTC (permalink / raw) To: David Newall Cc: Linas Vepstas, david, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell On Tue, Jan 06, 2009 at 12:29:47PM +1030, David Newall wrote: > Nick Andrew wrote: > > I can sympathise with the opinion that linux should be able to accurately > > distinguish xx:59:60 when a leap second is added (or the missing :59 when > > one is subtracted) but not at the expense of making a day which is not > > 86400 seconds long. > > > > Some days are not 86400 seconds long. That's a fact and regardless of > how inconvenient it is, we have to live with it. Sorry, but you're wrong - in the context of time_t, every day is 86400 seconds long. man 2 time says so clearly in the notes: NOTES POSIX.1 defines seconds since the Epoch as a value to be interpreted as the number of sec‐ onds between a specified time and the Epoch, according to a formula for conversion from UTC equivalent to conversion on the naive basis that leap seconds are ignored and all years divisible by 4 are leap years. This value is not the same as the actual number of seconds between the time and the Epoch, because of leap seconds and because clocks are not required to be synchronized to a standard reference. > Some years don't have > 365 days; some months don't have 30 days; some Februaries don' have 28 > days; and now, some days don't have 86400 seconds. What's the point in > fighting this? I'm not fighting this - the real world has all these issues but the world of time_t does not. You want to redefine time_t to include all the leap seconds that were already added (34) or perhaps only the future ones; either approach is a disaster. It's unreasonable to change the semantics of something as fundamental as time_t when so much code depends on those semantics. Instead, define a new timebase which counts time predictably and unambiguously then a set of mappings to derived time values like time_t, UTC and local time. > > Just so long as the > > existing behaviour of time() which doesn't recognise leap seconds > > is preserved. > > I haven't been able to find this Annex B that Alan talked of, so I can > only go by the man page, which states, simply and explicitly, that > time() returns seconds since Epoch, and also that Epoch is start of > January 1 1970. To my mind, time *does* recognise leap seconds. Please read the NOTES section, which clarifies what "seconds since the Epoch" means. Nick. -- PGP Key ID = 0x418487E7 http://www.nick-andrew.net/ PGP Key fingerprint = B3ED 6894 8E49 1770 C24A 67E3 6266 6EB9 4184 87E7 ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-06 2:51 ` Nick Andrew @ 2009-01-06 9:40 ` Alan Cox 2009-01-07 1:17 ` Nick Andrew 2009-01-07 9:46 ` David Newall 0 siblings, 2 replies; 109+ messages in thread From: Alan Cox @ 2009-01-06 9:40 UTC (permalink / raw) To: Nick Andrew Cc: David Newall, Linas Vepstas, david, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell > UTC equivalent to conversion on the naive basis that leap seconds are ignored and all > years divisible by 4 are leap years. This value is not the same as the actual number of > seconds between the time and the Epoch, because of leap seconds and because clocks are not > required to be synchronized to a standard reference. I'm not sure what you are quoting from but it is out of date on the subject of leap years. The rest looks right. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-06 9:40 ` Alan Cox @ 2009-01-07 1:17 ` Nick Andrew 2009-01-07 9:37 ` Alan Cox 2009-01-07 9:46 ` David Newall 1 sibling, 1 reply; 109+ messages in thread From: Nick Andrew @ 2009-01-07 1:17 UTC (permalink / raw) To: Alan Cox Cc: David Newall, Linas Vepstas, david, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell On Tue, Jan 06, 2009 at 09:40:58AM +0000, Alan Cox wrote: > > UTC equivalent to conversion on the naive basis that leap seconds are ignored and all > > years divisible by 4 are leap years. This value is not the same as the actual number of > > seconds between the time and the Epoch, because of leap seconds and because clocks are not > > required to be synchronized to a standard reference. > > I'm not sure what you are quoting from but it is out of date on the > subject of leap years. "man 2 time" on Debian Lenny. The treatment of leap years looks ridiculous, but within the context of a 32-bit time_t, all divisible-by-4 years between 1901 and 2038 are leap years. It's a bit of a problem for 64-bit time_t though. Nick. -- PGP Key ID = 0x418487E7 http://www.nick-andrew.net/ PGP Key fingerprint = B3ED 6894 8E49 1770 C24A 67E3 6266 6EB9 4184 87E7 ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 1:17 ` Nick Andrew @ 2009-01-07 9:37 ` Alan Cox 0 siblings, 0 replies; 109+ messages in thread From: Alan Cox @ 2009-01-07 9:37 UTC (permalink / raw) To: Nick Andrew Cc: David Newall, Linas Vepstas, david, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell > "man 2 time" on Debian Lenny. The treatment of leap years looks ridiculous, but > within the context of a 32-bit time_t, all divisible-by-4 years between 1901 and > 2038 are leap years. It's a bit of a problem for 64-bit time_t though. Then Debian documentation needs fixing. POSIX fixed their definition some years ago. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-06 9:40 ` Alan Cox 2009-01-07 1:17 ` Nick Andrew @ 2009-01-07 9:46 ` David Newall 2009-01-07 9:54 ` Alan Cox 1 sibling, 1 reply; 109+ messages in thread From: David Newall @ 2009-01-07 9:46 UTC (permalink / raw) To: Alan Cox Cc: Nick Andrew, Linas Vepstas, david, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell Alan Cox wrote: >> UTC equivalent to conversion on the naive basis that leap seconds are ignored and all >> years divisible by 4 are leap years. This value is not the same as the actual number of >> seconds between the time and the Epoch, because of leap seconds and because clocks are not >> required to be synchronized to a standard reference. >> > > I'm not sure what you are quoting from but it is out of date on the > subject of leap years. > The range of signed 32-bit times is 1901 through 2039, which has only one century, 2000, which is a leap year. So the caveat for leap years is correct but unnecessary. So I've discoverd, at least on Ubuntu, something wonderful and reassuring. It already works exactly the way I think is correct. Look: I create a test timezone with no daylight saving and one leap second: davidn@takauji:~/timetest$ cat tz Zone testzone 0:00 0 XXX/YYY davidn@takauji:~/timetest$ cat leapseconds Leap 2008 Dec 31 23:59:59 + S davidn@takauji:~/timetest$ zic -d . -L leapseconds tz Then the test program, which makes a time_t (what time() returns) for a few seconds before the leap second, then counts off seconds... davidn@takauji:~/timetest$ cat timetest.c #include <time.h> #include <stdio.h> main() { setenv("TZ", ":/home/davidn/timetest/testzone", 1); struct tm tm1 = { 55, 59, 23, 31, 11, 108 }; time_t t1 = mktime(&tm1); int i; for (i = 10; --i; t1++) printf("ctime(%ld) = %s", t1, ctime(&t1)); return 0; } Observe two 23:59:59's. Apparently it could be better if the second 23:59:59 was 23:59:60, but I prefer it this way. davidn@takauji:~/timetest$ ./timetest ctime(1230767995) = Wed Dec 31 23:59:55 2008 ctime(1230767996) = Wed Dec 31 23:59:56 2008 ctime(1230767997) = Wed Dec 31 23:59:57 2008 ctime(1230767998) = Wed Dec 31 23:59:58 2008 ctime(1230767999) = Wed Dec 31 23:59:59 2008 ctime(1230768000) = Wed Dec 31 23:59:59 2008 ctime(1230768001) = Thu Jan 1 00:00:00 2009 ctime(1230768002) = Thu Jan 1 00:00:01 2009 ctime(1230768003) = Thu Jan 1 00:00:02 2009 Perhaps this is distribution-dependent, but even so, there's no need for the kernel to drop the second (and it's wrong if it does.) ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 9:46 ` David Newall @ 2009-01-07 9:54 ` Alan Cox 2009-01-07 10:18 ` David Newall 0 siblings, 1 reply; 109+ messages in thread From: Alan Cox @ 2009-01-07 9:54 UTC (permalink / raw) To: David Newall Cc: Nick Andrew, Linas Vepstas, david, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell > The range of signed 32-bit times is 1901 through 2039, which has only > one century, 2000, which is a leap year. So the caveat for leap years > is correct but unnecessary. The standard however (and library code) were updated many years ago, so the description is still wrong. > So I've discoverd, at least on Ubuntu, something wonderful and > reassuring. It already works exactly the way I think is correct. Look: > I create a test timezone with no daylight saving and one leap second: This is entirely configurable - see my earlier post about the "right" and posix timezones. Really however that belongs on the glibc list. As far as the kernel and leapseconds go - remember the kernel RTC support does not know about leap seconds ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 9:54 ` Alan Cox @ 2009-01-07 10:18 ` David Newall 2009-01-07 10:52 ` Alan Cox 2009-01-07 13:33 ` Chris Adams 0 siblings, 2 replies; 109+ messages in thread From: David Newall @ 2009-01-07 10:18 UTC (permalink / raw) To: Alan Cox Cc: Nick Andrew, Linas Vepstas, david, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell Alan Cox wrote: > As far as the kernel and leapseconds go - remember the kernel RTC support > does not know about leap seconds > True but irrelevant because the RTC returns a timestamp. And it's quietly understood that the RTC is only an approximation. The remaining fly in the ointment, if indeed the NTP client doesn't already do what I've outlined, is that leap seconds aren't reckoned into NTP broadcasts. As intimated, this is correctable using leap second information from zoneinfo. Even though this is manifestly not a kernel issue, I'll work up a patch for ntpdate (apparently what I use) and post her, which I'm sure will be useful for all other NTP clients. However it is now clear that no special kernel support is required for leap-seconds, and any such code that's been incorporated needs to be removed. Removed I say! ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 10:18 ` David Newall @ 2009-01-07 10:52 ` Alan Cox 2009-01-07 13:45 ` David Newall 2009-01-07 13:33 ` Chris Adams 1 sibling, 1 reply; 109+ messages in thread From: Alan Cox @ 2009-01-07 10:52 UTC (permalink / raw) To: David Newall Cc: Nick Andrew, Linas Vepstas, david, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell > True but irrelevant because the RTC returns a timestamp. And it's > quietly understood that the RTC is only an approximation. You miss the point. The RTC stores the CMOS time in MM DD YY HH:MM:SS format. That conversion is done kernel side when reading/writing the RTC chip. Thus if you are using leap second timing your BIOS RTC values will not agree with the expected value. > However it is now clear that no special kernel support is required for > leap-seconds, and any such code that's been incorporated needs to be > removed. Removed I say! There never has been any. Its all handled (both posix and sane) by glibc. Alan ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 10:52 ` Alan Cox @ 2009-01-07 13:45 ` David Newall 2009-01-07 14:10 ` Alan Cox 0 siblings, 1 reply; 109+ messages in thread From: David Newall @ 2009-01-07 13:45 UTC (permalink / raw) To: Alan Cox Cc: Nick Andrew, Linas Vepstas, david, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell Alan Cox wrote: >> True but irrelevant because the RTC returns a timestamp. And it's >> quietly understood that the RTC is only an approximation. >> > > You miss the point. > No, I got the point. I see no problem. > The RTC stores the CMOS time in MM DD YY HH:MM:SS format. Yes, which is perfect for mktime(), which knows about leap seconds and so produces the correct time_t. >> However it is now clear that no special kernel support is required for >> leap-seconds, and any such code that's been incorporated needs to be >> removed. Removed I say! >> > > There never has been any. Its all handled (both posix and sane) by glibc. Which is what one would expect. It's reports of crashes and kernel bugs being found and fixed in code to handle leap seconds which lead me to a different understanding. I thought it was said that there's kernel support to handle the leap second flag in NTP's broadcasts, and that that was where the bug was. So. What is the situation? ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 13:45 ` David Newall @ 2009-01-07 14:10 ` Alan Cox 2009-01-07 14:36 ` David Newall 0 siblings, 1 reply; 109+ messages in thread From: Alan Cox @ 2009-01-07 14:10 UTC (permalink / raw) To: David Newall Cc: Nick Andrew, Linas Vepstas, david, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell > > The RTC stores the CMOS time in MM DD YY HH:MM:SS format. > > Yes, which is perfect for mktime(), which knows about leap seconds and > so produces the correct time_t. mktime in the kernel has no knowledge of leap seconds whatsoever. Go read kernel/time.c > different understanding. I thought it was said that there's kernel > support to handle the leap second flag in NTP's broadcasts, and that > that was where the bug was. All the kernel knows how to do is to slew time (in general) and to repeat or remove one second. It has no knowledge of leap seconds and it doesn't know how to convert between UTC/TAI/Unix Epoch etc ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 14:10 ` Alan Cox @ 2009-01-07 14:36 ` David Newall 2009-01-07 15:40 ` Alan Cox 2009-01-07 22:13 ` Chris Adams 0 siblings, 2 replies; 109+ messages in thread From: David Newall @ 2009-01-07 14:36 UTC (permalink / raw) To: Alan Cox Cc: Nick Andrew, Linas Vepstas, david, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell Alan Cox wrote: >>> The RTC stores the CMOS time in MM DD YY HH:MM:SS format. >>> >> Yes, which is perfect for mktime(), which knows about leap seconds and >> so produces the correct time_t. >> > > mktime in the kernel has no knowledge of leap seconds whatsoever. Go read > kernel/time.c > Is there a mktime() in the kernel? Isn't it pure user-space? Mktime does appear to know all about leap seconds (assuming they're in zoneinfo.) >> different understanding. I thought it was said that there's kernel >> support to handle the leap second flag in NTP's broadcasts, and that >> that was where the bug was. >> > > All the kernel knows how to do is to slew time (in general) and to repeat > or remove one second. It has no knowledge of leap seconds and it doesn't > know how to convert between UTC/TAI/Unix Epoch etc. I went back to the start of the thread. Chris posted a stack trace showing "#15 0xffffffff8104ec16 in ntp_leap_second (timer=<value optimized out>) at kernel/time/ntp.c:143". That would be kernel code to process leap seconds from NTP broadcasts, I think. That code needs to be removed. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 14:36 ` David Newall @ 2009-01-07 15:40 ` Alan Cox 2009-01-10 9:46 ` David Newall 2009-01-07 22:13 ` Chris Adams 1 sibling, 1 reply; 109+ messages in thread From: Alan Cox @ 2009-01-07 15:40 UTC (permalink / raw) To: David Newall Cc: Nick Andrew, Linas Vepstas, david, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell > Is there a mktime() in the kernel? Isn't it pure user-space? Mktime > does appear to know all about leap seconds (assuming they're in zoneinfo.) The GPL goes to great trouble to ensure you get the kernel source code. Why not use it. > showing "#15 0xffffffff8104ec16 in ntp_leap_second (timer=<value > optimized out>) at kernel/time/ntp.c:143". That would be kernel code to > process leap seconds from NTP broadcasts, I think. That code needs to > be removed. I suggest you read that code and understand it. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 15:40 ` Alan Cox @ 2009-01-10 9:46 ` David Newall 0 siblings, 0 replies; 109+ messages in thread From: David Newall @ 2009-01-10 9:46 UTC (permalink / raw) To: Alan Cox Cc: Nick Andrew, Linas Vepstas, david, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell Alan Cox wrote: >> Is there a mktime() in the kernel? Isn't it pure user-space? Mktime >> does appear to know all about leap seconds (assuming they're in zoneinfo.) >> > > The GPL goes to great trouble to ensure you get the kernel source code. > Why not use it. > Okay. I'm not sure how long you have realised that two, completely different mktimes have been confused with each other, but surely longer than me. The kernel mktime, as far as I can tell without spending a week on it, is used only on some platforms, during startup to read and set real time clocks and alarms. Where it matters is RTC ioctls which, tragically, I think, are passed a struct rtc_time. They should be passed a time_t or struct timeval because these are what's used everywhere else that I can think of, between kernel and user-space. Ideally, struct rtc_time should be deprecated in favour of time_t. Changes to user-space programs should be trivial; probably, they currently look like { struct rtc_time *rt = gmtime(&t); ioctl(fd, RTC_xxx, rt); } This is not going to happen without a huge song and dance, which I certainly don't have the energy for. I think it makes no practical difference, and only affects how tidy the kernel looks. Assuming leap-seconds are properly configured if zoneinfo, user-space programs which use gmtime() to set the RTC will run it fast by the current number of leap-seconds. However the RTC will continue to advance by one second per second, and being that fast, mktime will produce the correct time_t when RTC is read back. This will eventually cause a real problem with RTCs that handle leap years. When we have 4 years worth of leap-seconds, or maybe its 96 years worth, the RTC will be set for a leap-year when it is not, or vice versa. That's a long time away. It's something of a farce that some systems crashed at the leap-second because no adjustment was needed. Trying to turn two seconds into one was a mistake. I gather the mistake is in the NTP client, which should just ignore the LEAP-SECOND bit, and use the leap-second information from zoneinfo to convert from the NTP timebase to Linux's. >> showing "#15 0xffffffff8104ec16 in ntp_leap_second (timer=<value >> optimized out>) at kernel/time/ntp.c:143". That would be kernel code to >> process leap seconds from NTP broadcasts, I think. That code needs to >> be removed. >> > > I suggest you read that code and understand it. > Well, there's rather a lot wrong with it, isn't there? All of the stuff that tries to handle leap seconds is wrong; that goes for timex.h, too. The kernel needs to do nothing special to handle leap-seconds; they're just seconds, like every other one. For the third time, this code has to come out. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 14:36 ` David Newall 2009-01-07 15:40 ` Alan Cox @ 2009-01-07 22:13 ` Chris Adams 1 sibling, 0 replies; 109+ messages in thread From: Chris Adams @ 2009-01-07 22:13 UTC (permalink / raw) To: David Newall; +Cc: linux-kernel Once upon a time, David Newall <davidn@davidnewall.com> said: > I went back to the start of the thread. Chris posted a stack trace > showing "#15 0xffffffff8104ec16 in ntp_leap_second (timer=<value > optimized out>) at kernel/time/ntp.c:143". That would be kernel code to > process leap seconds from NTP broadcasts, I think. That code needs to > be removed. Well, the code is to process when the kernel is told about leap seconds (it doesn't have to be NTP, you can do it with adjtimex, which is what I did to track down the problem). But why should it be removed? Why change Linux to be incompatible with POSIX and other Unix systems? This could create real problems with things like network file systems for example. Even trying to get interoperation between UTC-Linux and TAI-Linux would be a PITA. There was a bug, there is a patch, it should be fixed. There's no reason to reinvent the wheel just because there was a bug. Looking at comments, there was another bug related to the same xtime_lock/printk issue but not in leap second related code; it was trying to print a message about changing clock sources. Should we now re-architect all of that as well? -- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 10:18 ` David Newall 2009-01-07 10:52 ` Alan Cox @ 2009-01-07 13:33 ` Chris Adams 2009-01-07 13:37 ` Alan Cox 2009-01-07 14:09 ` David Newall 1 sibling, 2 replies; 109+ messages in thread From: Chris Adams @ 2009-01-07 13:33 UTC (permalink / raw) To: David Newall; +Cc: linux-kernel Once upon a time, David Newall <davidn@davidnewall.com> said: > The remaining fly in the ointment, if indeed the NTP client doesn't > already do what I've outlined, is that leap seconds aren't reckoned into > NTP broadcasts. As intimated, this is correctable using leap second > information from zoneinfo. No it isn't; you are still wrong. Yet again, you are ignoring the facts: - zoneinfo is for offset from UTC, leap seconds are changes in UTC - the standards say that time() returns seconds since the epoch in UTC _except_ explicity NOT including leap seconds - NTP already has a way to distribute leap second information to trusted clients > Even though this is manifestly not a kernel issue, I'll work up a patch > for ntpdate (apparently what I use) and post her, which I'm sure will be > useful for all other NTP clients. ntpdate is obsolete. > However it is now clear that no special kernel support is required for > leap-seconds, and any such code that's been incorporated needs to be > removed. Removed I say! And you are wrong. -- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 13:33 ` Chris Adams @ 2009-01-07 13:37 ` Alan Cox 2009-01-07 14:12 ` David Newall 2009-01-07 14:09 ` David Newall 1 sibling, 1 reply; 109+ messages in thread From: Alan Cox @ 2009-01-07 13:37 UTC (permalink / raw) To: Chris Adams; +Cc: David Newall, linux-kernel > - zoneinfo is for offset from UTC, leap seconds are changes in UTC If you two would stop throwing toys at each other and read the glibc documentation and source you might get somewhere. > - the standards say that time() returns seconds since the epoch in UTC > _except_ explicity NOT including leap seconds Glibc has timezone support for both leap second inclusive ("right" as it calls them) and posix time offsets. Alan ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 13:37 ` Alan Cox @ 2009-01-07 14:12 ` David Newall 0 siblings, 0 replies; 109+ messages in thread From: David Newall @ 2009-01-07 14:12 UTC (permalink / raw) To: Alan Cox; +Cc: linux-kernel Alan Cox wrote: > If you two would stop throwing toys at each other I object! I'd accept, if that were your claim, that I should have ignored Chris, but don't accept that I've been bickering or "throwing toys." ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 13:33 ` Chris Adams 2009-01-07 13:37 ` Alan Cox @ 2009-01-07 14:09 ` David Newall 2009-01-07 21:42 ` Chris Adams 1 sibling, 1 reply; 109+ messages in thread From: David Newall @ 2009-01-07 14:09 UTC (permalink / raw) To: Chris Adams; +Cc: linux-kernel Chris Adams wrote: > Once upon a time, David Newall <davidn@davidnewall.com> said: > >> The remaining fly in the ointment, if indeed the NTP client doesn't >> already do what I've outlined, is that leap seconds aren't reckoned into >> NTP broadcasts. As intimated, this is correctable using leap second >> information from zoneinfo. >> > > No it isn't; you are still wrong. Yet again, you are ignoring the > facts: > Curiously strong opinions when I've already demonstrated otherwise. On my system, and possibly also on yours, a time_t, which is what time() returns, is the number of seconds since epoch; which, in turn, is the start of 1970. And on my system, zoneinfo handles leap seconds. Just saying that I'm wrong is contrary and stubborn since evidence shows my understanding has been correct from the start. If you're sure I'm wrong, take my demonstration and find a flaw. Otherwise I little value in your contribution to this discussion. > - the standards say that time() returns seconds since the epoch in UTC > _except_ explicity NOT including leap seconds > Don't believe everything you read. For example, the time(2) man page says what POSIX does, but doesn't actually say that Linux also does the same. It also says what you paraphrased above, but demonstrably that's not the case. Man pages often are wrong in some details. Hence RTSL. > - NTP already has a way to distribute leap second information to trusted > clients > Yesterday, when I scanned the RFC, it was clear that NTP broadcasts do not factor leap seconds; that every day has 86400 seconds. However if NTP does have a way of distributing leap seconds, other than the almost pointless leap-second flag, then that's great. If it doesn't (which is what I understand), there's no problem anyway (as explained.) >> However it is now clear that no special kernel support is required for >> leap-seconds, and any such code that's been incorporated needs to be >> removed. Removed I say! >> > > And you are wrong. > So you say, but you have no code to back you up, whereas I do. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-07 14:09 ` David Newall @ 2009-01-07 21:42 ` Chris Adams 0 siblings, 0 replies; 109+ messages in thread From: Chris Adams @ 2009-01-07 21:42 UTC (permalink / raw) To: David Newall; +Cc: linux-kernel Once upon a time, David Newall <davidn@davidnewall.com> said: > > - the standards say that time() returns seconds since the epoch in UTC > > _except_ explicity NOT including leap seconds > > Don't believe everything you read. For example, the time(2) man page > says what POSIX does, but doesn't actually say that Linux also does the > same. It also says what you paraphrased above, but demonstrably that's > not the case. Man pages often are wrong in some details. Hence RTSL. I wasn't talking about man pages. I already quoted the section from the Single Unix Specification version 3 (which supersedes POSIX) that explicitly says leap seconds are ignored (despite sometimes heated disagreement, as seen repeated here). The standard "seconds since the epoch" is seconds since 1970-01-01 00:00:00 UTC but not including leap seconds (00:00:00 UTC is always 86400*n seconds since the epoch). As long as Linux wants to work like all the other POSIX systems (which it should unless there is huge advantage in doing otherwise), time(), gettimeofday(), etc., all must work without leap seconds. -- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-04 10:03 ` David Newall 2009-01-04 11:13 ` david @ 2009-01-04 11:35 ` Valdis.Kletnieks 2009-01-05 0:08 ` David Newall 2009-01-04 17:20 ` Kyle Moffett 2 siblings, 1 reply; 109+ messages in thread From: Valdis.Kletnieks @ 2009-01-04 11:35 UTC (permalink / raw) To: David Newall Cc: Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell [-- Attachment #1: Type: text/plain, Size: 709 bytes --] On Sun, 04 Jan 2009 20:33:41 +1030, David Newall said: > I don't understand why such a simple thing was unnecessarily > complicated. And causing crashes! Ha ha ha or what? A simple addition > to zoneinfo was (and still is) all that is required. Something to keep in mind is that the Posix standard does *NOT* say anything about leap seconds - poke around in a 'struct tm' sometime. That's why /usr/share/zoneinfo has separate 'posix' and 'right' subdirectories. The fun starts when software using the 'right' rules tries to interact with other software using the Posix rules (quite possibly running on a non-Unixy system that doesn't even *use* zoneinfo). Repeat after me: Not all the world is Linux. [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-04 11:35 ` Valdis.Kletnieks @ 2009-01-05 0:08 ` David Newall 2009-01-06 3:53 ` Valdis.Kletnieks 0 siblings, 1 reply; 109+ messages in thread From: David Newall @ 2009-01-05 0:08 UTC (permalink / raw) To: Valdis.Kletnieks Cc: Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell Valdis.Kletnieks@vt.edu wrote: > On Sun, 04 Jan 2009 20:33:41 +1030, David Newall said: > > >> I don't understand why such a simple thing was unnecessarily >> complicated. And causing crashes! Ha ha ha or what? A simple addition >> to zoneinfo was (and still is) all that is required. >> > > Something to keep in mind is that the Posix standard does *NOT* say anything > about leap seconds - poke around in a 'struct tm' sometime. > I have poked, decades ago. There's nothing in struct tm that's a problem. > That's why /usr/share/zoneinfo has separate 'posix' and 'right' subdirectories. > > The fun starts when software using the 'right' rules tries to interact with > other software using the Posix rules (quite possibly running on a non-Unixy > system that doesn't even *use* zoneinfo). > > Repeat after me: Not all the world is Linux. But Linux is; and that's what we're discussing. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-05 0:08 ` David Newall @ 2009-01-06 3:53 ` Valdis.Kletnieks 0 siblings, 0 replies; 109+ messages in thread From: Valdis.Kletnieks @ 2009-01-06 3:53 UTC (permalink / raw) To: David Newall Cc: Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell [-- Attachment #1: Type: text/plain, Size: 383 bytes --] On Mon, 05 Jan 2009 10:38:48 +1030, David Newall said: > Valdis.Kletnieks@vt.edu wrote: > > Something to keep in mind is that the Posix standard does *NOT* say anything > > about leap seconds - poke around in a 'struct tm' sometime. > I have poked, decades ago. There's nothing in struct tm that's a problem. More correctly: "There's nothing in struct time - that's the problem." [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-04 10:03 ` David Newall 2009-01-04 11:13 ` david 2009-01-04 11:35 ` Valdis.Kletnieks @ 2009-01-04 17:20 ` Kyle Moffett 2 siblings, 0 replies; 109+ messages in thread From: Kyle Moffett @ 2009-01-04 17:20 UTC (permalink / raw) To: David Newall Cc: Ben Goodger, Robert Hancock, linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell On Sun, Jan 4, 2009 at 5:03 AM, David Newall <davidn@davidnewall.com> wrote: > You're confusing the system of keeping time with those characteristics > of the real-world which it represents. They are, in fact, two different > things, hence we regularly adjust the system. Now in the case of UNIX > and derivatives, the system records the number of seconds since an > arbitrary point-in-time, and presents a "wall time" (i.e. the time > displayed by the clock on the wall) using, amongst other things, a set > of adjustment rules codified by a zoneinfo file. The number of second > between 1 minute to- and midnight-ending 31 December is 61. If Linux > does not reflect that it is wrong and must be fixed. If it isn't fixed > we will increasingly discover a discrepancy between time-data that > originates on Linux versus other, correct systems. > > I don't understand why such a simple thing was unnecessarily > complicated. And causing crashes! Ha ha ha or what? A simple addition > to zoneinfo was (and still is) all that is required. Leap seconds are an integral part of the NTP standard for the reasons I described. You can't "update zoneinfo" because a leap second is applied to *all* timezones... not just a single one. Specifically, each NTP message includes some bits indicating what the next leap-second is going to be (at the end of the current month), whether +1, 0, or -1. I believe that under Linux if you request a monotonic clock then you won't "experience" leap-seconds at all; although such a clock will probably stop while your computer is suspended. On the other hand, if you explicitly ask for a wall-clock, it is the responsibility of NTP to keep the wall-clock accurate to the actual passage of days, even if that involves slight slewing adjustments. The UTC timezone is explicitly defined to include "leap seconds", and so we cannot honestly claim to implement the standard unless we provide a method for those leap seconds to be applied. If you don't want leap-seconds, submit a patch to the ntp daemon to allow it to run in "UT1" mode in which it will ignore leap second notifications over the NTP protocol, or just use a GPS clock. Cheers, Kyle Moffett ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-03 6:32 ` David Newall 2009-01-03 6:37 ` Ben Goodger @ 2009-01-03 7:00 ` Chris Adams 2009-01-04 8:41 ` David Newall 1 sibling, 1 reply; 109+ messages in thread From: Chris Adams @ 2009-01-03 7:00 UTC (permalink / raw) To: David Newall Cc: Robert Hancock, linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, Goodgerster Once upon a time, David Newall <davidn@davidnewall.com> said: > I don't understand this idea, nor the patch for the problem. I don't > see why adding a leap second would impact the kernel in any way. > Shouldn't this be a simple zoneinfo change, whereby the last two seconds > of the year (in each timezone) both map to 31dec2008 23:59:59? That's > the way the change has worked in the real world. Why would ntp or the > kernel be involved? The leap second isn't a simple thing like a time zone. Zones account for an offset from UTC, but a leap second is an extra second inserted into (or possibly removed from) UTC itself. There was actually a 61 second minute on Dec. 31. The trouble comes in keeping the "seconds since the epoch" counter sane, meaning (seconds % 86400) == 0 at 00:00:00 UTC. Since there were 86401 seconds Dec. 31, the kernel had to tick the last second twice to keep correct UTC time. NTP is used to distribute and synchronize time information, including leap second info. See Wikipedia and Google for more information. -- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-03 7:00 ` Chris Adams @ 2009-01-04 8:41 ` David Newall 0 siblings, 0 replies; 109+ messages in thread From: David Newall @ 2009-01-04 8:41 UTC (permalink / raw) To: Chris Adams Cc: Robert Hancock, linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, Goodgerster Chris Adams wrote: > The leap second isn't a simple thing like a time zone. Zones account > for an offset from UTC Time zones are described that way, but I was wondering why not use zoneinfo, which describes the local time after an arbitrary number of seconds since the epoch. The leap second is a textbook case for updating zoneinfo. > , but a leap second is an extra second inserted > into (or possibly removed from) UTC itself. There was actually a 61 > second minute on Dec. 31. > The trouble comes in keeping the "seconds > since the epoch" counter sane, meaning (seconds % 86400) == 0 at > 00:00:00 UTC. That sounds like an irrelevant quality, and as we've seen, striving for it has caused difficulties. Worse, we've now got the situation where the number of seconds between midnight starting December 31 and midnight starting January 1 is incorrect. The correct value is 86401, because that's how many seconds there were. > Since there were 86401 seconds Dec. 31, the kernel had to > tick the last second twice to keep correct UTC time. It didn't have to, but apparently, and regrettably, that's what was done; leaving an even bigger problem. How many seconds does the computer claim were in 2008? Probably not enough. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-02 20:04 ` Diego Calleja 2009-01-02 20:25 ` Robert Hancock @ 2009-01-02 20:29 ` Linas Vepstas 1 sibling, 0 replies; 109+ messages in thread From: Linas Vepstas @ 2009-01-02 20:29 UTC (permalink / raw) To: Diego Calleja Cc: linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, Goodgerster, burdell [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain; charset=UTF-8, Size: 955 bytes --] 2009/1/2 Diego Calleja <diegocg@gmail.com>:> El Fri, 2 Jan 2009 13:25:38 -0600, "Linas Vepstas" <linasvepstas@gmail.com> escribió:>>> Suspect its an kernel race condition triggered by ntp bumping the second.>> How could I create a test case that reproduces what ntp does? Just add> a second? It might be more subtle than that. One of these cases is discussed in aDebian mailing list thread, where one user claims his hardware clock runsso poorly, it loses second every hour, and he doesn't have problems.ntp normally drifts to adjust time; for exceptional jumps in time, it won'tdrift, but just set. There's another thread of bug reports on Oracle servers (linux based) whichappearently hit the same problem, although they think it has something todo with a backwards leap-second jump. --linasÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥ ^ permalink raw reply [flat|nested] 109+ messages in thread
[parent not found: <8752a8760901021328t545a0327v58faebe1e921680a@mail.gmail.com>]
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 [not found] ` <8752a8760901021328t545a0327v58faebe1e921680a@mail.gmail.com> @ 2009-01-02 21:29 ` Ben Goodger 0 siblings, 0 replies; 109+ messages in thread From: Ben Goodger @ 2009-01-02 21:29 UTC (permalink / raw) To: linux-kernel 2009/1/2 Ben Goodger <goodgerster@gmail.com> > > 2009/1/2 Linas Vepstas <linasvepstas@gmail.com> >> >> Slashdot reported a story of Linux machines crashing on New years eve. >> >> So far, 31 users reported 53 hard crashes at/near midnight, new years. > > Further details about my crash (Goodgerster): > -- system works normally after reboot; > -- no messages were written to /var/log/kernel; > -- affected machine was running 2.6.26-1-amd64 from Debian testing; > -- the other machine on the network was unaffected (to the extent that it continues normal operation as an NFS server) and is running 2.6.18 from Debian Etch; > -- the affected machine was using NTP (not sure about the server machine.) > > I was unable to find any logs on the Etch machine that would tell us whether the affected machine continued writing to its NFS share after the crash. File corruption is evident, but this would have been caused by the hard reset or the crash in equal measure. Unfortunately, I was careless enough to just hit the reset button after hitting ctrl-alt-backspace a couple of times, but I know that either the X window system or the kernel hung entirely (I do not know whether the NumLock key was inoperable, but the cursor/system monitor/clock stopped moving. The clock displayed 23:59:59 when I returned to it at around 00:15. I am in the UTC+0 timezone; the system clock was therefore in UTC, but I had set it to "windows compatibility" mode (i.e. local timezone). > > Hope this helps (?)... > > -- > Benjamin Goodger > > -----BEGIN GEEK CODE BLOCK----- > Version: 3.1 > GCS/S/M/B d- s++:-- a18 c++$ UL>+++ P--- L++>+++ E- W+++$ N--- K? w--- O? M- V? PS+(++) PE-() Y+ PGP+ t 5? X-- R- !tv() b+++>++++ DI+++ D+ G e>++++ h! !r*(-) y > ------END GEEK CODE BLOCK------ -- Benjamin Goodger -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/S/M/B d- s++:-- a18 c++$ UL>+++ P--- L++>+++ E- W+++$ N--- K? w--- O? M- V? PS+(++) PE-() Y+ PGP+ t 5? X-- R- !tv() b+++>++++ DI+++ D+ G e>++++ h! !r*(-) y ------END GEEK CODE BLOCK------ ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-02 19:25 Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 Linas Vepstas 2009-01-02 20:04 ` Diego Calleja [not found] ` <8752a8760901021328t545a0327v58faebe1e921680a@mail.gmail.com> @ 2009-01-03 0:21 ` Chris Adams 2009-01-03 2:23 ` Duane Griffin 2009-01-06 2:21 ` john stultz-lkml 2009-01-03 3:49 ` Linas Vepstas 3 siblings, 2 replies; 109+ messages in thread From: Chris Adams @ 2009-01-03 0:21 UTC (permalink / raw) To: Linas Vepstas; +Cc: linux-kernel Once upon a time, Linas Vepstas <linasvepstas@gmail.com> said: > Below follows a summary of the reported crashes. I'm ignoring the > zillions of "mine didn't crash" reports, or the "you're a paranoid > conspiracy theorist, its random chance" reports. I have reproduced this and got a stack trace (this is with Fedora 8 and kernel kernel-2.6.26.6-49.fc8.x86_64): #0 ktime_get_ts (ts=0xffffffff8158bb30) at include/asm/processor.h:691 #1 0xffffffff8104c09a in ktime_get () at kernel/hrtimer.c:59 #2 0xffffffff8102a39a in hrtick_start_fair (rq=0xffff810009013880, p=<value optimized out>) at kernel/sched.c:1064 #3 0xffffffff8102decc in enqueue_task_fair (rq=0xffff810009013880, p=0xffff81003fb02d40, wakeup=1) at kernel/sched_fair.c:863 #4 0xffffffff81029a08 in enqueue_task (rq=0xffffffff8158bb30, p=0xffff81003b8ac418, wakeup=-994836480) at kernel/sched.c:1550 #5 0xffffffff81029a39 in activate_task (rq=0xffff810009013880, p=0xffff81003b8ac418, wakeup=20045) at kernel/sched.c:1614 #6 0xffffffff8102be38 in try_to_wake_up (p=0xffff81003fb02d40, state=<value optimized out>, sync=0) at kernel/sched.c:2173 #7 0xffffffff8102be9c in default_wake_function (curr=<value optimized out>, mode=998949912, sync=20045, key=0x4c4b40000) at kernel/sched.c:4366 #8 0xffffffff810492ed in autoremove_wake_function (wait=0xffffffff8158bb30, mode=998949912, sync=20045, key=0x4c4b40000) at kernel/wait.c:132 #9 0xffffffff810296a2 in __wake_up_common (q=0xffffffff813d3180, mode=1, nr_exclusive=1, sync=0, key=0x0) at kernel/sched.c:4387 #10 0xffffffff8102b97b in __wake_up (q=0xffffffff813d3180, mode=1, nr_exclusive=1, key=0x0) at kernel/sched.c:4406 #11 0xffffffff8103692f in wake_up_klogd () at kernel/printk.c:1005 #12 0xffffffff81036abb in release_console_sem () at kernel/printk.c:1051 #13 0xffffffff81036fd1 in vprintk (fmt=<value optimized out>, args=<value optimized out>) at kernel/printk.c:789 #14 0xffffffff81037081 in printk ( fmt=0xffffffff8158bb30 "yj$\201????\2008\001\t") at kernel/printk.c:613 #15 0xffffffff8104ec16 in ntp_leap_second (timer=<value optimized out>) at kernel/time/ntp.c:143 #16 0xffffffff8104b7a6 in run_hrtimer_pending (cpu_base=0xffff81000900f740) at kernel/hrtimer.c:1204 #17 0xffffffff8104b86a in run_hrtimer_softirq (h=<value optimized out>) at kernel/hrtimer.c:1355 #18 0xffffffff8103b31f in __do_softirq () at kernel/softirq.c:234 #19 0xffffffff8100d52c in call_softirq () at include/asm/current_64.h:10 #20 0xffffffff8100ed5e in do_softirq () at arch/x86/kernel/irq_64.c:262 #21 0xffffffff8103b280 in irq_exit () at kernel/softirq.c:310 #22 0xffffffff8101b0fe in smp_apic_timer_interrupt (regs=<value optimized out>) at arch/x86/kernel/apic_64.c:514 #23 0xffffffff8100cf52 in apic_timer_interrupt () at include/asm/current_64.h:10 #24 0xffff81003b9d5a90 in ?? () #25 0x0000000000000000 in ?? () Basically (to my untrained eye), the leap second code is called from the timer interrupt handler, which holds xtime_lock. The leap second code does a printk to notify about the leap second. The printk code tries to wake up klogd (I assume to prioritize kernel messages), and (under some conditions), the scheduler attempts to get the current time, which tries to get xtime_lock => deadlock. I can only reproduce this if the system is busy. If the system is otherwise idle at the timer interrupt, I guess the scheduler doesn't try to get the time. I can run a "find / | xargs cat > /dev/nul" in one window and then trigger the leap second in another, and the system dies most of the time. I'm looking at the source for the RHEL 4 kernel 2.6.9-67.0.7.EL (which I had crash on a system), and the scheduler is enough different that I am not finding the path to the deadlock right off. In any case, the quick-n-dirty fix would be to not try to printk while holding xtime_lock (I think the NTP code is the only thing that does). However, it would be nice to still get the leap second notification, so some other fix would be better I guess. -- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-03 0:21 ` Chris Adams @ 2009-01-03 2:23 ` Duane Griffin 2009-01-03 3:45 ` Linas Vepstas 2009-01-03 4:41 ` [PATCH] " Chris Adams 2009-01-06 2:21 ` john stultz-lkml 1 sibling, 2 replies; 109+ messages in thread From: Duane Griffin @ 2009-01-03 2:23 UTC (permalink / raw) To: Chris Adams; +Cc: Linas Vepstas, linux-kernel On Fri, Jan 02, 2009 at 06:21:14PM -0600, Chris Adams wrote: > Once upon a time, Linas Vepstas <linasvepstas@gmail.com> said: > > Below follows a summary of the reported crashes. I'm ignoring the > > zillions of "mine didn't crash" reports, or the "you're a paranoid > > conspiracy theorist, its random chance" reports. > > I have reproduced this and got a stack trace (this is with Fedora 8 and > kernel kernel-2.6.26.6-49.fc8.x86_64): > > #0 ktime_get_ts (ts=0xffffffff8158bb30) at include/asm/processor.h:691 > #1 0xffffffff8104c09a in ktime_get () at kernel/hrtimer.c:59 > #2 0xffffffff8102a39a in hrtick_start_fair (rq=0xffff810009013880, > p=<value optimized out>) at kernel/sched.c:1064 > #3 0xffffffff8102decc in enqueue_task_fair (rq=0xffff810009013880, > p=0xffff81003fb02d40, wakeup=1) at kernel/sched_fair.c:863 > #4 0xffffffff81029a08 in enqueue_task (rq=0xffffffff8158bb30, > p=0xffff81003b8ac418, wakeup=-994836480) at kernel/sched.c:1550 > #5 0xffffffff81029a39 in activate_task (rq=0xffff810009013880, > p=0xffff81003b8ac418, wakeup=20045) at kernel/sched.c:1614 > #6 0xffffffff8102be38 in try_to_wake_up (p=0xffff81003fb02d40, > state=<value optimized out>, sync=0) at kernel/sched.c:2173 > #7 0xffffffff8102be9c in default_wake_function (curr=<value optimized out>, > mode=998949912, sync=20045, key=0x4c4b40000) at kernel/sched.c:4366 > #8 0xffffffff810492ed in autoremove_wake_function (wait=0xffffffff8158bb30, > mode=998949912, sync=20045, key=0x4c4b40000) at kernel/wait.c:132 > #9 0xffffffff810296a2 in __wake_up_common (q=0xffffffff813d3180, mode=1, > nr_exclusive=1, sync=0, key=0x0) at kernel/sched.c:4387 > #10 0xffffffff8102b97b in __wake_up (q=0xffffffff813d3180, mode=1, > nr_exclusive=1, key=0x0) at kernel/sched.c:4406 > #11 0xffffffff8103692f in wake_up_klogd () at kernel/printk.c:1005 > #12 0xffffffff81036abb in release_console_sem () at kernel/printk.c:1051 > #13 0xffffffff81036fd1 in vprintk (fmt=<value optimized out>, > args=<value optimized out>) at kernel/printk.c:789 > #14 0xffffffff81037081 in printk ( > fmt=0xffffffff8158bb30 "yj$\201????\2008\001\t") at kernel/printk.c:613 > #15 0xffffffff8104ec16 in ntp_leap_second (timer=<value optimized out>) > at kernel/time/ntp.c:143 > #16 0xffffffff8104b7a6 in run_hrtimer_pending (cpu_base=0xffff81000900f740) > at kernel/hrtimer.c:1204 > #17 0xffffffff8104b86a in run_hrtimer_softirq (h=<value optimized out>) > at kernel/hrtimer.c:1355 > #18 0xffffffff8103b31f in __do_softirq () at kernel/softirq.c:234 > #19 0xffffffff8100d52c in call_softirq () at include/asm/current_64.h:10 > #20 0xffffffff8100ed5e in do_softirq () at arch/x86/kernel/irq_64.c:262 > #21 0xffffffff8103b280 in irq_exit () at kernel/softirq.c:310 > #22 0xffffffff8101b0fe in smp_apic_timer_interrupt (regs=<value optimized out>) > at arch/x86/kernel/apic_64.c:514 > #23 0xffffffff8100cf52 in apic_timer_interrupt () > at include/asm/current_64.h:10 > #24 0xffff81003b9d5a90 in ?? () > #25 0x0000000000000000 in ?? () > > > Basically (to my untrained eye), the leap second code is called from the > timer interrupt handler, which holds xtime_lock. The leap second code > does a printk to notify about the leap second. The printk code tries to > wake up klogd (I assume to prioritize kernel messages), and (under some > conditions), the scheduler attempts to get the current time, which tries > to get xtime_lock => deadlock. > > I can only reproduce this if the system is busy. If the system is > otherwise idle at the timer interrupt, I guess the scheduler doesn't try > to get the time. I can run a "find / | xargs cat > /dev/nul" in one > window and then trigger the leap second in another, and the system dies > most of the time. > > I'm looking at the source for the RHEL 4 kernel 2.6.9-67.0.7.EL (which I > had crash on a system), and the scheduler is enough different that I am > not finding the path to the deadlock right off. > > In any case, the quick-n-dirty fix would be to not try to printk while > holding xtime_lock (I think the NTP code is the only thing that does). > However, it would be nice to still get the leap second notification, so > some other fix would be better I guess. How about just moving the printk out of the lock? I.e. something like this: diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index f5f793d..ad3e2b7 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -140,8 +140,6 @@ static enum hrtimer_restart ntp_leap_second(struct hrtimer *timer) xtime.tv_sec--; wall_to_monotonic.tv_sec++; time_state = TIME_OOP; - printk(KERN_NOTICE "Clock: " - "inserting leap second 23:59:60 UTC\n"); hrtimer_add_expires_ns(&leap_timer, NSEC_PER_SEC); res = HRTIMER_RESTART; break; @@ -166,6 +164,10 @@ static enum hrtimer_restart ntp_leap_second(struct hrtimer *timer) write_sequnlock(&xtime_lock); + if (res == HRTIMER_RESTART) + printk(KERN_NOTICE "Clock: " + "inserting leap second 23:59:60 UTC\n"); + return res; } > -- > Chris Adams <cmadams@hiwaay.net> > Systems and Network Administrator - HiWAAY Internet Services > I don't speak for anybody but myself - that's enough trouble. Cheers, Duane. -- "I never could learn to drink that blood and call it wine" - Bob Dylan ^ permalink raw reply related [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-03 2:23 ` Duane Griffin @ 2009-01-03 3:45 ` Linas Vepstas 2009-01-03 4:41 ` [PATCH] " Chris Adams 1 sibling, 0 replies; 109+ messages in thread From: Linas Vepstas @ 2009-01-03 3:45 UTC (permalink / raw) To: Duane Griffin; +Cc: Chris Adams, linux-kernel 2009/1/2 Duane Griffin <duaneg@dghda.com>: > On Fri, Jan 02, 2009 at 06:21:14PM -0600, Chris Adams wrote: >> Once upon a time, Linas Vepstas <linasvepstas@gmail.com> said: >> > Below follows a summary of the reported crashes. I'm ignoring the >> > zillions of "mine didn't crash" reports, or the "you're a paranoid >> > conspiracy theorist, its random chance" reports. >> >> I have reproduced this and got a stack trace (this is with Fedora 8 and >> kernel kernel-2.6.26.6-49.fc8.x86_64): >> >> Basically (to my untrained eye), the leap second code is called from the >> timer interrupt handler, which holds xtime_lock. The leap second code >> does a printk to notify about the leap second. The printk code tries to >> wake up klogd (I assume to prioritize kernel messages), and (under some >> conditions), the scheduler attempts to get the current time, which tries >> to get xtime_lock => deadlock. > > How about just moving the printk out of the lock? I.e. something like > this: [...] Sure looks like the right fix to me. (Although there's more than one printk under that lock). Who's going to write the formal patch? --linas ^ permalink raw reply [flat|nested] 109+ messages in thread
* [PATCH] Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-03 2:23 ` Duane Griffin 2009-01-03 3:45 ` Linas Vepstas @ 2009-01-03 4:41 ` Chris Adams 2009-01-03 4:52 ` Duane Griffin 1 sibling, 1 reply; 109+ messages in thread From: Chris Adams @ 2009-01-03 4:41 UTC (permalink / raw) To: Duane Griffin; +Cc: Linas Vepstas, linux-kernel Once upon a time, Duane Griffin <duaneg@dghda.com> said: > On Fri, Jan 02, 2009 at 06:21:14PM -0600, Chris Adams wrote: > > In any case, the quick-n-dirty fix would be to not try to printk while > > holding xtime_lock (I think the NTP code is the only thing that does). > > However, it would be nice to still get the leap second notification, so > > some other fix would be better I guess. > > How about just moving the printk out of the lock? I.e. something like > this: Well, you've only fixed the inserting a leap second case, not the removing a leap second case. AFAIK we've never actually had a leap second removed, but it could happen (and the code is already there), so it should be fixed as well. Also, I didn't notice the locking was right there in the ntp_leap_second function in the 2.6.26.6 kernel I was looking at, because I've also been looking at the 2.6.9-based RHEL 4 kernel (which is a good bit different; the lock is held outside the function, so it wouldn't be easy to drop it for the printk). I guess that's Red Hat's (and other long-term support vendors') problem. The simplest thing for them is still probably to just remove the printks. Here's a patch that moves both prinkts outside the lock. I am unable to make a kernel with this patch crash on a leap second insertion or deletion. -- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble. From: Chris Adams <cmadams@hiwaay.net> The code to handle leap seconds printks an information message when the second is inserted or deleted. It does this while holding xtime_lock. However, printk wakes up klogd, and in some cases, the scheduler tries to get the current kernel time, trying to get xtime_lock (which results in a deadlock). This moved the printks outside of the lock. Signed-off-by: Chris Adams <cmadams@hiwaay.net> --- diff -urpN linux-2.6.28-git5-vanilla/kernel/time/ntp.c linux-2.6.28-git5/kernel/time/ntp.c --- linux-2.6.28-git5-vanilla/kernel/time/ntp.c 2009-01-02 22:09:34.000000000 -0600 +++ linux-2.6.28-git5/kernel/time/ntp.c 2009-01-02 22:11:23.000000000 -0600 @@ -130,6 +130,7 @@ void ntp_clear(void) static enum hrtimer_restart ntp_leap_second(struct hrtimer *timer) { enum hrtimer_restart res = HRTIMER_NORESTART; + int msg = 0; write_seqlock(&xtime_lock); @@ -140,8 +141,7 @@ static enum hrtimer_restart ntp_leap_sec xtime.tv_sec--; wall_to_monotonic.tv_sec++; time_state = TIME_OOP; - printk(KERN_NOTICE "Clock: " - "inserting leap second 23:59:60 UTC\n"); + msg = 1; hrtimer_add_expires_ns(&leap_timer, NSEC_PER_SEC); res = HRTIMER_RESTART; break; @@ -150,8 +150,7 @@ static enum hrtimer_restart ntp_leap_sec time_tai--; wall_to_monotonic.tv_sec--; time_state = TIME_WAIT; - printk(KERN_NOTICE "Clock: " - "deleting leap second 23:59:59 UTC\n"); + msg = 2; break; case TIME_OOP: time_tai++; @@ -166,6 +165,17 @@ static enum hrtimer_restart ntp_leap_sec write_sequnlock(&xtime_lock); + switch (msg) { + case 1: + printk(KERN_NOTICE "Clock: " + "inserting leap second 23:59:60 UTC\n"); + break; + case 2: + printk(KERN_NOTICE "Clock: " + "deleting leap second 23:59:59 UTC\n"); + break; + } + return res; } ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-03 4:41 ` [PATCH] " Chris Adams @ 2009-01-03 4:52 ` Duane Griffin 2009-01-03 18:01 ` [PATCH] v2 " Chris Adams 0 siblings, 1 reply; 109+ messages in thread From: Duane Griffin @ 2009-01-03 4:52 UTC (permalink / raw) To: Chris Adams; +Cc: Duane Griffin, Linas Vepstas, linux-kernel On Fri, Jan 02, 2009 at 10:41:43PM -0600, Chris Adams wrote: > Once upon a time, Duane Griffin <duaneg@dghda.com> said: > > On Fri, Jan 02, 2009 at 06:21:14PM -0600, Chris Adams wrote: > > > In any case, the quick-n-dirty fix would be to not try to printk while > > > holding xtime_lock (I think the NTP code is the only thing that does). > > > However, it would be nice to still get the leap second notification, so > > > some other fix would be better I guess. > > > > How about just moving the printk out of the lock? I.e. something like > > this: > > Well, you've only fixed the inserting a leap second case, not the > removing a leap second case. AFAIK we've never actually had a leap > second removed, but it could happen (and the code is already there), so > it should be fixed as well. Quite right... > Also, I didn't notice the locking was right there in the ntp_leap_second > function in the 2.6.26.6 kernel I was looking at, because I've also been > looking at the 2.6.9-based RHEL 4 kernel (which is a good bit different; > the lock is held outside the function, so it wouldn't be easy to drop it > for the printk). I guess that's Red Hat's (and other long-term support > vendors') problem. The simplest thing for them is still probably to > just remove the printks. > > Here's a patch that moves both prinkts outside the lock. I am unable to > make a kernel with this patch crash on a leap second insertion or > deletion. > -- > Chris Adams <cmadams@hiwaay.net> > Systems and Network Administrator - HiWAAY Internet Services > I don't speak for anybody but myself - that's enough trouble. > > > From: Chris Adams <cmadams@hiwaay.net> > > The code to handle leap seconds printks an information message when the > second is inserted or deleted. It does this while holding xtime_lock. > However, printk wakes up klogd, and in some cases, the scheduler tries > to get the current kernel time, trying to get xtime_lock (which results > in a deadlock). This moved the printks outside of the lock. > > Signed-off-by: Chris Adams <cmadams@hiwaay.net> > --- > diff -urpN linux-2.6.28-git5-vanilla/kernel/time/ntp.c linux-2.6.28-git5/kernel/time/ntp.c > --- linux-2.6.28-git5-vanilla/kernel/time/ntp.c 2009-01-02 22:09:34.000000000 -0600 > +++ linux-2.6.28-git5/kernel/time/ntp.c 2009-01-02 22:11:23.000000000 -0600 > @@ -130,6 +130,7 @@ void ntp_clear(void) > static enum hrtimer_restart ntp_leap_second(struct hrtimer *timer) > { > enum hrtimer_restart res = HRTIMER_NORESTART; > + int msg = 0; > > write_seqlock(&xtime_lock); > > @@ -140,8 +141,7 @@ static enum hrtimer_restart ntp_leap_sec > xtime.tv_sec--; > wall_to_monotonic.tv_sec++; > time_state = TIME_OOP; > - printk(KERN_NOTICE "Clock: " > - "inserting leap second 23:59:60 UTC\n"); > + msg = 1; > hrtimer_add_expires_ns(&leap_timer, NSEC_PER_SEC); > res = HRTIMER_RESTART; > break; > @@ -150,8 +150,7 @@ static enum hrtimer_restart ntp_leap_sec > time_tai--; > wall_to_monotonic.tv_sec--; > time_state = TIME_WAIT; > - printk(KERN_NOTICE "Clock: " > - "deleting leap second 23:59:59 UTC\n"); > + msg = 2; > break; > case TIME_OOP: > time_tai++; > @@ -166,6 +165,17 @@ static enum hrtimer_restart ntp_leap_sec > > write_sequnlock(&xtime_lock); > > + switch (msg) { > + case 1: > + printk(KERN_NOTICE "Clock: " > + "inserting leap second 23:59:60 UTC\n"); > + break; > + case 2: > + printk(KERN_NOTICE "Clock: " > + "deleting leap second 23:59:59 UTC\n"); > + break; > + } > + > return res; > } > How about instead of a switch statement, assigning the message to a variable and printing that. I.e. something like: static enum hrtimer_restart ntp_leap_second(struct hrtimer *timer) { enum hrtimer_restart res = HRTIMER_NORESTART; const char *msg = NULL; ... msg = "Clock: inserting leap second 23:59:60 UTC"; ... msg = "Clock: deleting leap second 23:59:59 UTC"; ... if (msg) printk(KERN_NOTICE "%s\n", msg); Cheers, Duane. -- "I never could learn to drink that blood and call it wine" - Bob Dylan ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] v2 Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-03 4:52 ` Duane Griffin @ 2009-01-03 18:01 ` Chris Adams 2009-01-03 19:04 ` Duane Griffin ` (2 more replies) 0 siblings, 3 replies; 109+ messages in thread From: Chris Adams @ 2009-01-03 18:01 UTC (permalink / raw) To: Duane Griffin; +Cc: Linas Vepstas, linux-kernel Once upon a time, Duane Griffin <duaneg@dghda.com> said: > How about instead of a switch statement, assigning the message to a > variable and printing that. I.e. something like: Good point. Here's an updated version that also adds a comment to the xtime_lock definition about not using printk. -- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble. From: Chris Adams <cmadams@hiwaay.net> The code to handle leap seconds printks an information message when the second is inserted or deleted. It does this while holding xtime_lock. However, printk wakes up klogd, and in some cases, the scheduler tries to get the current kernel time, trying to get xtime_lock (which results in a deadlock). This moved the printks outside of the lock. It also adds a comment to not use printk while holding xtime_lock. Signed-off-by: Chris Adams <cmadams@hiwaay.net> --- diff -urpN linux-2.6.28-git5-vanilla/include/linux/time.h linux-2.6.28-git5/include/linux/time.h --- linux-2.6.28-git5-vanilla/include/linux/time.h 2009-01-02 22:09:10.000000000 -0600 +++ linux-2.6.28-git5/include/linux/time.h 2009-01-03 11:57:27.000000000 -0600 @@ -99,6 +99,12 @@ static inline struct timespec timespec_s extern struct timespec xtime; extern struct timespec wall_to_monotonic; + +/* + * Do not call printk while holding this lock; it wakes klogd and the + * scheduler may try to get the current kernel time, which will try to get + * this lock. + */ extern seqlock_t xtime_lock; extern unsigned long read_persistent_clock(void); diff -urpN linux-2.6.28-git5-vanilla/kernel/time/ntp.c linux-2.6.28-git5/kernel/time/ntp.c --- linux-2.6.28-git5-vanilla/kernel/time/ntp.c 2009-01-02 22:09:34.000000000 -0600 +++ linux-2.6.28-git5/kernel/time/ntp.c 2009-01-03 11:57:46.000000000 -0600 @@ -130,6 +130,7 @@ void ntp_clear(void) static enum hrtimer_restart ntp_leap_second(struct hrtimer *timer) { enum hrtimer_restart res = HRTIMER_NORESTART; + const char *msg = NULL; write_seqlock(&xtime_lock); @@ -140,8 +141,7 @@ static enum hrtimer_restart ntp_leap_sec xtime.tv_sec--; wall_to_monotonic.tv_sec++; time_state = TIME_OOP; - printk(KERN_NOTICE "Clock: " - "inserting leap second 23:59:60 UTC\n"); + msg = "Clock: inserting leap second 23:59:60 UTC"; hrtimer_add_expires_ns(&leap_timer, NSEC_PER_SEC); res = HRTIMER_RESTART; break; @@ -150,8 +150,7 @@ static enum hrtimer_restart ntp_leap_sec time_tai--; wall_to_monotonic.tv_sec--; time_state = TIME_WAIT; - printk(KERN_NOTICE "Clock: " - "deleting leap second 23:59:59 UTC\n"); + msg = "Clock: deleting leap second 23:59:59 UTC"; break; case TIME_OOP: time_tai++; @@ -166,6 +165,9 @@ static enum hrtimer_restart ntp_leap_sec write_sequnlock(&xtime_lock); + if (msg) + printk(KERN_NOTICE "%s\n", msg); + return res; } ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] v2 Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-03 18:01 ` [PATCH] v2 " Chris Adams @ 2009-01-03 19:04 ` Duane Griffin 2009-01-03 20:01 ` Linas Vepstas 2009-06-08 2:18 ` Ben Hutchings 2 siblings, 0 replies; 109+ messages in thread From: Duane Griffin @ 2009-01-03 19:04 UTC (permalink / raw) To: Chris Adams; +Cc: Linas Vepstas, linux-kernel 2009/1/3 Chris Adams <cmadams@hiwaay.net>: > Once upon a time, Duane Griffin <duaneg@dghda.com> said: >> How about instead of a switch statement, assigning the message to a >> variable and printing that. I.e. something like: > > Good point. Here's an updated version that also adds a comment to the > xtime_lock definition about not using printk. Good idea. > -- > Chris Adams <cmadams@hiwaay.net> > Systems and Network Administrator - HiWAAY Internet Services > I don't speak for anybody but myself - that's enough trouble. > > > From: Chris Adams <cmadams@hiwaay.net> > > The code to handle leap seconds printks an information message when the > second is inserted or deleted. It does this while holding xtime_lock. > However, printk wakes up klogd, and in some cases, the scheduler tries > to get the current kernel time, trying to get xtime_lock (which results > in a deadlock). This moved the printks outside of the lock. It also > adds a comment to not use printk while holding xtime_lock. > > Signed-off-by: Chris Adams <cmadams@hiwaay.net> > --- > diff -urpN linux-2.6.28-git5-vanilla/include/linux/time.h linux-2.6.28-git5/include/linux/time.h > --- linux-2.6.28-git5-vanilla/include/linux/time.h 2009-01-02 22:09:10.000000000 -0600 > +++ linux-2.6.28-git5/include/linux/time.h 2009-01-03 11:57:27.000000000 -0600 > @@ -99,6 +99,12 @@ static inline struct timespec timespec_s > > extern struct timespec xtime; > extern struct timespec wall_to_monotonic; > + > +/* > + * Do not call printk while holding this lock; it wakes klogd and the > + * scheduler may try to get the current kernel time, which will try to get > + * this lock. > + */ > extern seqlock_t xtime_lock; > > extern unsigned long read_persistent_clock(void); > diff -urpN linux-2.6.28-git5-vanilla/kernel/time/ntp.c linux-2.6.28-git5/kernel/time/ntp.c > --- linux-2.6.28-git5-vanilla/kernel/time/ntp.c 2009-01-02 22:09:34.000000000 -0600 > +++ linux-2.6.28-git5/kernel/time/ntp.c 2009-01-03 11:57:46.000000000 -0600 > @@ -130,6 +130,7 @@ void ntp_clear(void) > static enum hrtimer_restart ntp_leap_second(struct hrtimer *timer) > { > enum hrtimer_restart res = HRTIMER_NORESTART; > + const char *msg = NULL; > > write_seqlock(&xtime_lock); > > @@ -140,8 +141,7 @@ static enum hrtimer_restart ntp_leap_sec > xtime.tv_sec--; > wall_to_monotonic.tv_sec++; > time_state = TIME_OOP; > - printk(KERN_NOTICE "Clock: " > - "inserting leap second 23:59:60 UTC\n"); > + msg = "Clock: inserting leap second 23:59:60 UTC"; > hrtimer_add_expires_ns(&leap_timer, NSEC_PER_SEC); > res = HRTIMER_RESTART; > break; > @@ -150,8 +150,7 @@ static enum hrtimer_restart ntp_leap_sec > time_tai--; > wall_to_monotonic.tv_sec--; > time_state = TIME_WAIT; > - printk(KERN_NOTICE "Clock: " > - "deleting leap second 23:59:59 UTC\n"); > + msg = "Clock: deleting leap second 23:59:59 UTC"; > break; > case TIME_OOP: > time_tai++; > @@ -166,6 +165,9 @@ static enum hrtimer_restart ntp_leap_sec > > write_sequnlock(&xtime_lock); > > + if (msg) > + printk(KERN_NOTICE "%s\n", msg); > + > return res; > } Looks good to me! Cheers, Duane. -- "I never could learn to drink that blood and call it wine" - Bob Dylan ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] v2 Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-03 18:01 ` [PATCH] v2 " Chris Adams 2009-01-03 19:04 ` Duane Griffin @ 2009-01-03 20:01 ` Linas Vepstas 2009-06-08 2:18 ` Ben Hutchings 2 siblings, 0 replies; 109+ messages in thread From: Linas Vepstas @ 2009-01-03 20:01 UTC (permalink / raw) To: Chris Adams Cc: Duane Griffin, linux-kernel, Thomas Gleixner, Paul Gortmaker, Alessandro Zummo, rtc-linux 2009/1/3 Chris Adams <cmadams@hiwaay.net>: > > From: Chris Adams <cmadams@hiwaay.net> > > The code to handle leap seconds printks an information message when the > second is inserted or deleted. It does this while holding xtime_lock. > However, printk wakes up klogd, and in some cases, the scheduler tries > to get the current kernel time, trying to get xtime_lock (which results > in a deadlock). This moved the printks outside of the lock. It also > adds a comment to not use printk while holding xtime_lock. > > Signed-off-by: Chris Adams <cmadams@hiwaay.net> Acked-by: Linas Vepstas <linasvepstas@gmail.com> BTW, I audited the other code in kernel/time/*.c and it looks like there are no other printk's under the lock. Not surprising -- if there were, they'd have been found by now. Indeed, in timekeeping.c line 198, it seems that someone else had indeed tripped over this :-P --linas ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] v2 Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-03 18:01 ` [PATCH] v2 " Chris Adams 2009-01-03 19:04 ` Duane Griffin 2009-01-03 20:01 ` Linas Vepstas @ 2009-06-08 2:18 ` Ben Hutchings 2009-06-18 22:34 ` Chris Friesen 2 siblings, 1 reply; 109+ messages in thread From: Ben Hutchings @ 2009-06-08 2:18 UTC (permalink / raw) To: linux-kernel; +Cc: Chris Adams, 510478 [-- Attachment #1: Type: text/plain, Size: 1236 bytes --] On Sat, 2009-01-03 at 12:01 -0600, Chris Adams wrote: > Once upon a time, Duane Griffin <duaneg@dghda.com> said: > > How about instead of a switch statement, assigning the message to a > > variable and printing that. I.e. something like: > > Good point. Here's an updated version that also adds a comment to the > xtime_lock definition about not using printk. > -- > Chris Adams <cmadams@hiwaay.net> > Systems and Network Administrator - HiWAAY Internet Services > I don't speak for anybody but myself - that's enough trouble. > > > From: Chris Adams <cmadams@hiwaay.net> > > The code to handle leap seconds printks an information message when the > second is inserted or deleted. It does this while holding xtime_lock. > However, printk wakes up klogd, and in some cases, the scheduler tries > to get the current kernel time, trying to get xtime_lock (which results > in a deadlock). This moved the printks outside of the lock. It also > adds a comment to not use printk while holding xtime_lock. [...] This patch doesn't seem to have gone anywhere. Was this bug fixed in some other way or has it been forgotten? Ben. -- Ben Hutchings Logic doesn't apply to the real world. - Marvin Minsky [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] v2 Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-06-08 2:18 ` Ben Hutchings @ 2009-06-18 22:34 ` Chris Friesen 2009-06-18 22:58 ` Ben Hutchings 0 siblings, 1 reply; 109+ messages in thread From: Chris Friesen @ 2009-06-18 22:34 UTC (permalink / raw) To: Ben Hutchings; +Cc: linux-kernel, Chris Adams, 510478, Peter Zijlstra Ben Hutchings wrote: > On Sat, 2009-01-03 at 12:01 -0600, Chris Adams wrote: >> Once upon a time, Duane Griffin <duaneg@dghda.com> said: >>> How about instead of a switch statement, assigning the message to a >>> variable and printing that. I.e. something like: >> Good point. Here's an updated version that also adds a comment to the >> xtime_lock definition about not using printk. >> -- >> Chris Adams <cmadams@hiwaay.net> >> Systems and Network Administrator - HiWAAY Internet Services >> I don't speak for anybody but myself - that's enough trouble. >> >> >> From: Chris Adams <cmadams@hiwaay.net> >> >> The code to handle leap seconds printks an information message when the >> second is inserted or deleted. It does this while holding xtime_lock. >> However, printk wakes up klogd, and in some cases, the scheduler tries >> to get the current kernel time, trying to get xtime_lock (which results >> in a deadlock). This moved the printks outside of the lock. It also >> adds a comment to not use printk while holding xtime_lock. > [...] > > This patch doesn't seem to have gone anywhere. Was this bug fixed in > some other way or has it been forgotten? I'm interested in this as well...the current code still issues a printk() while holding the xtime_lock for writing. Is this allowed or not? In addition, is it allowed for older kernels also or is Chris Adams' patch something that should get picked up for the 2.6.27 stable series? Chris ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] v2 Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-06-18 22:34 ` Chris Friesen @ 2009-06-18 22:58 ` Ben Hutchings 2009-06-18 23:48 ` Chris Friesen 0 siblings, 1 reply; 109+ messages in thread From: Ben Hutchings @ 2009-06-18 22:58 UTC (permalink / raw) To: Chris Friesen; +Cc: linux-kernel, Chris Adams, 510478, Peter Zijlstra [-- Attachment #1: Type: text/plain, Size: 2641 bytes --] On Thu, 2009-06-18 at 16:34 -0600, Chris Friesen wrote: > Ben Hutchings wrote: > > On Sat, 2009-01-03 at 12:01 -0600, Chris Adams wrote: > >> Once upon a time, Duane Griffin <duaneg@dghda.com> said: > >>> How about instead of a switch statement, assigning the message to a > >>> variable and printing that. I.e. something like: > >> Good point. Here's an updated version that also adds a comment to the > >> xtime_lock definition about not using printk. > >> -- > >> Chris Adams <cmadams@hiwaay.net> > >> Systems and Network Administrator - HiWAAY Internet Services > >> I don't speak for anybody but myself - that's enough trouble. > >> > >> > >> From: Chris Adams <cmadams@hiwaay.net> > >> > >> The code to handle leap seconds printks an information message when the > >> second is inserted or deleted. It does this while holding xtime_lock. > >> However, printk wakes up klogd, and in some cases, the scheduler tries > >> to get the current kernel time, trying to get xtime_lock (which results > >> in a deadlock). This moved the printks outside of the lock. It also > >> adds a comment to not use printk while holding xtime_lock. > > [...] > > > > This patch doesn't seem to have gone anywhere. Was this bug fixed in > > some other way or has it been forgotten? > > I'm interested in this as well...the current code still issues a > printk() while holding the xtime_lock for writing. Is this allowed or not? Having investigated further, I believe it has been safe since this change made in 2.6.27 (which cleverly preempted the new year): commit b845b517b5e3706a3729f6ea83b88ab85f0725b0 Author: Peter Zijlstra <a.p.zijlstra@chello.nl> Date: Fri Aug 8 21:47:09 2008 +0200 printk: robustify printk Avoid deadlocks against rq->lock and xtime_lock by deferring the klogd wakeup by polling from the timer tick. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> > In addition, is it allowed for older kernels also or is Chris Adams' > patch something that should get picked up for the 2.6.27 stable series? Anything older than 2.6.27 appears to need a change along the lines of the above-mentioned commit or Chris's patch. Note that this was not the only case where printk() could be called under xtime_lock. For example, in arch/alpha/kernel/time.c timer_interrupt() calls set_rtc_mmss() which can call printk(). Ben. -- Ben Hutchings The generation of random numbers is too important to be left to chance. - Robert Coveyou [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: [PATCH] v2 Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-06-18 22:58 ` Ben Hutchings @ 2009-06-18 23:48 ` Chris Friesen 0 siblings, 0 replies; 109+ messages in thread From: Chris Friesen @ 2009-06-18 23:48 UTC (permalink / raw) To: Ben Hutchings; +Cc: linux-kernel, Chris Adams, 510478, Peter Zijlstra Ben Hutchings wrote: > On Thu, 2009-06-18 at 16:34 -0600, Chris Friesen wrote: > Having investigated further, I believe it has been safe since this > change made in 2.6.27 (which cleverly preempted the new year): > > commit b845b517b5e3706a3729f6ea83b88ab85f0725b0 > Author: Peter Zijlstra <a.p.zijlstra@chello.nl> > Date: Fri Aug 8 21:47:09 2008 +0200 > > printk: robustify printk > > Avoid deadlocks against rq->lock and xtime_lock by deferring the klogd > wakeup by polling from the timer tick. > > Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> > Signed-off-by: Ingo Molnar <mingo@elte.hu> > >> In addition, is it allowed for older kernels also or is Chris Adams' >> patch something that should get picked up for the 2.6.27 stable series? > > Anything older than 2.6.27 appears to need a change along the lines of > the above-mentioned commit or Chris's patch. Note that this was not the > only case where printk() could be called under xtime_lock. For example, > in arch/alpha/kernel/time.c timer_interrupt() calls set_rtc_mmss() which > can call printk(). It appears that the patch in question went into mainline in 2.6.28-rc1 after being developed on the -tip tree. So it doesn't appear to be present in the mainline 2.6.27 kernel. Chris ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-03 0:21 ` Chris Adams 2009-01-03 2:23 ` Duane Griffin @ 2009-01-06 2:21 ` john stultz-lkml 2009-01-06 2:25 ` Chris Adams 2009-01-06 4:35 ` Linas Vepstas 1 sibling, 2 replies; 109+ messages in thread From: john stultz-lkml @ 2009-01-06 2:21 UTC (permalink / raw) To: Chris Adams; +Cc: Linas Vepstas, linux-kernel, Thomas Gleixner On Fri, Jan 2, 2009 at 4:21 PM, Chris Adams <cmadams@hiwaay.net> wrote: > Once upon a time, Linas Vepstas <linasvepstas@gmail.com> said: >> Below follows a summary of the reported crashes. I'm ignoring the >> zillions of "mine didn't crash" reports, or the "you're a paranoid >> conspiracy theorist, its random chance" reports. > > I have reproduced this and got a stack trace (this is with Fedora 8 and > kernel kernel-2.6.26.6-49.fc8.x86_64): > [snip] > Basically (to my untrained eye), the leap second code is called from the > timer interrupt handler, which holds xtime_lock. The leap second code > does a printk to notify about the leap second. The printk code tries to > wake up klogd (I assume to prioritize kernel messages), and (under some > conditions), the scheduler attempts to get the current time, which tries > to get xtime_lock => deadlock. This analysis looks correct to me. Grrrr. This has bit us a few times since the "no printk while holding the xtime lock" restriction was added. Thomas: Do you think this warrents adding a check to the printk path to make sure the xtime lock isn't held? This way we can at least get a warning when someone accidentally adds a printk or calls a function that does while holding the xtime_lock. thanks -john ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-06 2:21 ` john stultz-lkml @ 2009-01-06 2:25 ` Chris Adams 2009-01-06 4:35 ` Linas Vepstas 1 sibling, 0 replies; 109+ messages in thread From: Chris Adams @ 2009-01-06 2:25 UTC (permalink / raw) To: john stultz-lkml; +Cc: Linas Vepstas, linux-kernel, Thomas Gleixner Once upon a time, john stultz-lkml <johnstul.lkml@gmail.com> said: > Grrrr. This has bit us a few times since the "no printk while holding > the xtime lock" restriction was added. I didn't see that documented anywhere, so my patch adds a comment to that effect. > Thomas: Do you think this warrents adding a check to the printk path > to make sure the xtime lock isn't held? This way we can at least get a > warning when someone accidentally adds a printk or calls a function > that does while holding the xtime_lock. I'm no kernel locking or scheduling (or anything else) expert, but if printk can check to see if xtime_lock is held, can it skip trying to wake klogd (so messages still get logged, just maybe not quite as fast)? Is there anything else that will wake klogd later? -- Chris Adams <cmadams@hiwaay.net> Systems and Network Administrator - HiWAAY Internet Services I don't speak for anybody but myself - that's enough trouble. ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-06 2:21 ` john stultz-lkml 2009-01-06 2:25 ` Chris Adams @ 2009-01-06 4:35 ` Linas Vepstas 1 sibling, 0 replies; 109+ messages in thread From: Linas Vepstas @ 2009-01-06 4:35 UTC (permalink / raw) To: john stultz-lkml; +Cc: Chris Adams, linux-kernel, Thomas Gleixner 2009/1/5 john stultz-lkml <johnstul.lkml@gmail.com>: > On Fri, Jan 2, 2009 at 4:21 PM, Chris Adams <cmadams@hiwaay.net> wrote: >> Basically (to my untrained eye), the leap second code is called from the >> timer interrupt handler, which holds xtime_lock. The leap second code >> does a printk to notify about the leap second. The printk code tries to >> wake up klogd (I assume to prioritize kernel messages), and (under some >> conditions), the scheduler attempts to get the current time, which tries >> to get xtime_lock => deadlock. > > This analysis looks correct to me. > > Grrrr. This has bit us a few times since the "no printk while holding > the xtime lock" restriction was added. > > Thomas: Do you think this warrents adding a check to the printk path > to make sure the xtime lock isn't held? No. > This way we can at least get a > warning when someone accidentally adds a printk or calls a function > that does while holding the xtime_lock. This seems like a basic mistake, that should be avoidable with code review. I'm sort-of surprised to even see it; anyone even vaguely familiar with that code would spot it quickly. Heh. Take that with a grain of salt -- not like I never make mistakes ;-/ I mean, how many more times can the mistake be made? I'm arguing its gonna be zero. --linas ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-02 19:25 Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 Linas Vepstas ` (2 preceding siblings ...) 2009-01-03 0:21 ` Chris Adams @ 2009-01-03 3:49 ` Linas Vepstas 2009-01-03 4:02 ` Ben Goodger 2009-01-03 22:58 ` Jeffrey J. Kosowsky 3 siblings, 2 replies; 109+ messages in thread From: Linas Vepstas @ 2009-01-03 3:49 UTC (permalink / raw) To: linux-kernel Cc: Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, Goodgerster, burdell 2009/1/2 Linas Vepstas <linasvepstas@gmail.com>: > Slashdot reported a story of Linux machines crashing on New years eve. FYI, Looks like the bug has been found, and theres a patch! http://lkml.org/lkml/2009/1/2/389 --linas ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-03 3:49 ` Linas Vepstas @ 2009-01-03 4:02 ` Ben Goodger 2009-01-03 4:46 ` Duane Griffin 2009-01-03 22:58 ` Jeffrey J. Kosowsky 1 sibling, 1 reply; 109+ messages in thread From: Ben Goodger @ 2009-01-03 4:02 UTC (permalink / raw) To: linasvepstas Cc: linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell 2009/1/3 Linas Vepstas <linasvepstas@gmail.com> > > Slashdot reported a story of Linux machines crashing on New years eve. > > FYI, Looks like the bug has been found, and theres a patch! > > http://lkml.org/lkml/2009/1/2/389 Great. I look forward to not crashing the next time it is 2008-31-31-23:59:59. Sarcasm aside, please pass on my thanks to Mr Griffin. -- Benjamin Goodger ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-03 4:02 ` Ben Goodger @ 2009-01-03 4:46 ` Duane Griffin 2009-01-03 4:50 ` Ben Goodger 0 siblings, 1 reply; 109+ messages in thread From: Duane Griffin @ 2009-01-03 4:46 UTC (permalink / raw) To: Ben Goodger Cc: linasvepstas, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell 2009/1/3 Ben Goodger <goodgerster@gmail.com>: > 2009/1/3 Linas Vepstas <linasvepstas@gmail.com> >> > Slashdot reported a story of Linux machines crashing on New years eve. >> >> FYI, Looks like the bug has been found, and theres a patch! >> >> http://lkml.org/lkml/2009/1/2/389 > > Great. I look forward to not crashing the next time it is 2008-31-31-23:59:59. > Sarcasm aside, please pass on my thanks to Mr Griffin. Thanks, but I'm not the one who deserves the thanks: Chris Adams did all the work in reproducing and diagnosing the problem. My patch was entirely trivial (and indeed, incomplete). Cheers, Duane. -- "I never could learn to drink that blood and call it wine" - Bob Dylan ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-03 4:46 ` Duane Griffin @ 2009-01-03 4:50 ` Ben Goodger 0 siblings, 0 replies; 109+ messages in thread From: Ben Goodger @ 2009-01-03 4:50 UTC (permalink / raw) To: Duane Griffin Cc: linasvepstas, linux-kernel, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell 2009/1/3 Duane Griffin <duaneg@dghda.com>: > Thanks, but I'm not the one who deserves the thanks: Chris Adams did > all the work in reproducing and diagnosing the problem. My patch was > entirely trivial (and indeed, incomplete). I mean 2009-_12_-31, of course. Thank you, Mr Adams... -- Benjamin Goodger -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/S/M/B d- s++:-- a18 c++$ UL>+++ P--- L++>+++ E- W+++$ N--- K? w--- O? M- V? PS+(++) PE-() Y+ PGP+ t 5? X-- R- !tv() b+++>++++ DI+++ D+ G e>++++ h! !r*(-) y ------END GEEK CODE BLOCK------ ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-03 3:49 ` Linas Vepstas 2009-01-03 4:02 ` Ben Goodger @ 2009-01-03 22:58 ` Jeffrey J. Kosowsky 1 sibling, 0 replies; 109+ messages in thread From: Jeffrey J. Kosowsky @ 2009-01-03 22:58 UTC (permalink / raw) To: linasvepstas Cc: linux-kernel, MentalMooMan, Travis Crump, Goodgerster, burdell Linas Vepstas wrote at about 21:49:53 -0600 on Friday, January 2, 2009: > 2009/1/2 Linas Vepstas <linasvepstas@gmail.com>: > > Slashdot reported a story of Linux machines crashing on New years eve. > > FYI, Looks like the bug has been found, and theres a patch! > > http://lkml.org/lkml/2009/1/2/389 > > --linas > As the OP, good to know that all those who said "it's just a coincidence that your machine that has been rock stable for 6 years just happened to crash at midnight GMT when the leap second was inserted..." were wrong :) Thanks for the good follow-up and detective work. Hopefully, it's not too late for Fedora to provide the patch before support for Fedora 8 expires on the 7th... ^ permalink raw reply [flat|nested] 109+ messages in thread
[parent not found: <fa.dw2l5ZM+UL3xoF6IYh5RLMmbYfw@ifi.uio.no>]
[parent not found: <fa.XOM1F85uBvmj4QzZKaDu36nYBk0@ifi.uio.no>]
[parent not found: <fa.rviZJBmVqkAE5uxDjhJOpIuKT4g@ifi.uio.no>]
[parent not found: <fa.OPVERUiJ763jH2/QynTxgBgoKYw@ifi.uio.no>]
[parent not found: <fa.v3FUjJ43bw2G7KiZGaxqL3tD4xo@ifi.uio.no>]
[parent not found: <fa.BrNhgY8S+TEOLMiPd27M7YHo9bI@ifi.uio.no>]
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 [not found] ` <fa.BrNhgY8S+TEOLMiPd27M7YHo9bI@ifi.uio.no> @ 2009-01-04 16:15 ` Sitsofe Wheeler 2009-01-04 17:26 ` Kyle Moffett 0 siblings, 1 reply; 109+ messages in thread From: Sitsofe Wheeler @ 2009-01-04 16:15 UTC (permalink / raw) To: david Cc: David Newall, Kyle Moffett, Ben Goodger, Robert Hancock, linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell david@lang.hm wrote: > so are you saying that other 'correct' OS's have patches issued every > time a leap second is declared so that they have an in-kernel table of > them to use to calculate the correct time? I think the number of other "correct" OSes that actually step the time on leap seconds is not that large (at least doing the announcement via NTP). According to http://www.ntp.org/ntpfaq/NTP-s-algo-real.htm#AEN2499 leap seconds are only changed via stepping if you have the right kernel discipline (notes on how to check whether a given OS has the kernel kernel discipline are mentioned on http://www.ntp.org/ntpfaq/NTP-s-algo-kernel.htm#AEN2220 ). I have a feeling that OSX doesn't do it (there's a mailing list post from 2005 where someone was trying to add FreeBSD's ntp_adjtime to Darwin http://lists.apple.com/archives/Darwin-kernel/2005/Jan/msg00004.html ). Additionally folks I know using ntpd synchronized OSX machines said their machines were off by one second right after the new year. Windows is also known not to do it without slewing: http://www.meinberg.de/english/info/leap-second.htm#os . ^ permalink raw reply [flat|nested] 109+ messages in thread
* Re: Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 2009-01-04 16:15 ` Sitsofe Wheeler @ 2009-01-04 17:26 ` Kyle Moffett 0 siblings, 0 replies; 109+ messages in thread From: Kyle Moffett @ 2009-01-04 17:26 UTC (permalink / raw) To: Sitsofe Wheeler Cc: david, David Newall, Ben Goodger, Robert Hancock, linux-kernel, linasvepstas, Jeffrey J. Kosowsky, MentalMooMan, Travis Crump, burdell On Sun, Jan 4, 2009 at 11:15 AM, Sitsofe Wheeler <sitsofe@yahoo.com> wrote: > Windows is also known not to do it without slewing: > http://www.meinberg.de/english/info/leap-second.htm#os . Well... Microsoft "[does] not guarantee and [does] not support the accuracy of the W32Time service between nodes on a network. The W32Time service is not a full-featured NTP solution that meets time-sensitive application needs." (See http://support.microsoft.com/kb/939322). The w32time daemon is not guaranteed to be within a few seconds of UTC at *any* time of the year, let alone immediately after a leap-second. In addition, windows does not have any built-in interpolation between timer ticks, so time increases in ~15ms steps regardless of how accurate your clock is. Cheers, Kyle Moffett ^ permalink raw reply [flat|nested] 109+ messages in thread
end of thread, other threads:[~2009-06-18 23:49 UTC | newest] Thread overview: 109+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2009-01-02 19:25 Bug: Status/Summary of slashdot leap-second crash on new years 2008-2009 Linas Vepstas 2009-01-02 20:04 ` Diego Calleja 2009-01-02 20:25 ` Robert Hancock 2009-01-03 6:32 ` David Newall 2009-01-03 6:37 ` Ben Goodger 2009-01-04 8:43 ` David Newall 2009-01-04 9:00 ` Kyle Moffett 2009-01-04 10:03 ` David Newall 2009-01-04 11:13 ` david 2009-01-04 23:15 ` David Newall 2009-01-04 23:25 ` Chris Adams 2009-01-05 0:01 ` David Newall 2009-01-05 0:41 ` Alan Cox 2009-01-05 8:43 ` David Newall 2009-01-05 19:47 ` Alan Cox 2009-01-05 0:29 ` david 2009-01-04 23:37 ` David Newall 2009-01-05 1:05 ` david 2009-01-05 0:14 ` David Newall 2009-01-05 0:21 ` Ben Goodger 2009-01-05 6:34 ` David Newall 2009-01-05 23:03 ` Linas Vepstas 2009-01-05 0:44 ` Alan Cox 2009-01-05 5:48 ` Linas Vepstas 2009-01-05 14:33 ` Nick Andrew 2009-01-05 16:08 ` Linas Vepstas 2009-01-05 17:51 ` david 2009-01-05 17:42 ` Linas Vepstas 2009-01-06 2:27 ` john stultz-lkml 2009-01-06 4:53 ` Linas Vepstas 2009-01-06 5:00 ` Linas Vepstas 2009-01-06 19:40 ` [ntpwg] " M. Warner Losh 2009-01-06 19:50 ` M. Warner Losh 2009-01-07 3:50 ` Danny Mayer 2009-01-07 4:52 ` Linas Vepstas 2009-01-07 10:03 ` David Newall 2009-01-07 17:24 ` M. Warner Losh 2009-01-08 16:51 ` Magnus Danielson 2009-01-07 14:34 ` Danny Mayer 2009-01-07 15:42 ` Linas Vepstas 2009-01-07 19:23 ` Danny Mayer 2009-01-07 16:04 ` john stultz 2009-01-07 17:36 ` M. Warner Losh 2009-01-07 17:39 ` M. Warner Losh 2009-01-07 19:31 ` Alan Cox 2009-01-07 19:42 ` M. Warner Losh 2009-01-08 3:57 ` Danny Mayer 2009-01-08 4:42 ` M. Warner Losh 2009-01-08 10:48 ` Alan Cox 2009-01-08 10:56 ` Alan Cox 2009-01-08 22:22 ` David Mills 2009-01-08 15:02 ` M. Warner Losh 2009-01-08 18:57 ` Marshall Eubanks 2009-01-08 20:09 ` Steve Allen 2009-01-12 16:11 ` Pavel Machek 2009-01-12 17:07 ` [ntpwg] " M. Warner Losh 2009-01-12 21:45 ` Valdis.Kletnieks 2009-01-06 2:31 ` Nick Andrew 2009-01-06 1:59 ` David Newall 2009-01-06 2:18 ` Chris Adams 2009-01-06 2:51 ` Nick Andrew 2009-01-06 9:40 ` Alan Cox 2009-01-07 1:17 ` Nick Andrew 2009-01-07 9:37 ` Alan Cox 2009-01-07 9:46 ` David Newall 2009-01-07 9:54 ` Alan Cox 2009-01-07 10:18 ` David Newall 2009-01-07 10:52 ` Alan Cox 2009-01-07 13:45 ` David Newall 2009-01-07 14:10 ` Alan Cox 2009-01-07 14:36 ` David Newall 2009-01-07 15:40 ` Alan Cox 2009-01-10 9:46 ` David Newall 2009-01-07 22:13 ` Chris Adams 2009-01-07 13:33 ` Chris Adams 2009-01-07 13:37 ` Alan Cox 2009-01-07 14:12 ` David Newall 2009-01-07 14:09 ` David Newall 2009-01-07 21:42 ` Chris Adams 2009-01-04 11:35 ` Valdis.Kletnieks 2009-01-05 0:08 ` David Newall 2009-01-06 3:53 ` Valdis.Kletnieks 2009-01-04 17:20 ` Kyle Moffett 2009-01-03 7:00 ` Chris Adams 2009-01-04 8:41 ` David Newall 2009-01-02 20:29 ` Linas Vepstas [not found] ` <8752a8760901021328t545a0327v58faebe1e921680a@mail.gmail.com> 2009-01-02 21:29 ` Ben Goodger 2009-01-03 0:21 ` Chris Adams 2009-01-03 2:23 ` Duane Griffin 2009-01-03 3:45 ` Linas Vepstas 2009-01-03 4:41 ` [PATCH] " Chris Adams 2009-01-03 4:52 ` Duane Griffin 2009-01-03 18:01 ` [PATCH] v2 " Chris Adams 2009-01-03 19:04 ` Duane Griffin 2009-01-03 20:01 ` Linas Vepstas 2009-06-08 2:18 ` Ben Hutchings 2009-06-18 22:34 ` Chris Friesen 2009-06-18 22:58 ` Ben Hutchings 2009-06-18 23:48 ` Chris Friesen 2009-01-06 2:21 ` john stultz-lkml 2009-01-06 2:25 ` Chris Adams 2009-01-06 4:35 ` Linas Vepstas 2009-01-03 3:49 ` Linas Vepstas 2009-01-03 4:02 ` Ben Goodger 2009-01-03 4:46 ` Duane Griffin 2009-01-03 4:50 ` Ben Goodger 2009-01-03 22:58 ` Jeffrey J. Kosowsky [not found] <fa.dw2l5ZM+UL3xoF6IYh5RLMmbYfw@ifi.uio.no> [not found] ` <fa.XOM1F85uBvmj4QzZKaDu36nYBk0@ifi.uio.no> [not found] ` <fa.rviZJBmVqkAE5uxDjhJOpIuKT4g@ifi.uio.no> [not found] ` <fa.OPVERUiJ763jH2/QynTxgBgoKYw@ifi.uio.no> [not found] ` <fa.v3FUjJ43bw2G7KiZGaxqL3tD4xo@ifi.uio.no> [not found] ` <fa.BrNhgY8S+TEOLMiPd27M7YHo9bI@ifi.uio.no> 2009-01-04 16:15 ` Sitsofe Wheeler 2009-01-04 17:26 ` Kyle Moffett
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).