All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Time precision, adjtime(x) vs. gettimeofday
@ 2003-10-10  5:12 Bill Fink
  2003-10-10  7:33 ` Gabriel Paubert
  2003-10-10  7:53 ` Ethan Benson
  0 siblings, 2 replies; 15+ messages in thread
From: Bill Fink @ 2003-10-10  5:12 UTC (permalink / raw)
  To: LinuxPPC Developers; +Cc: Bill Fink


On Wed, 08 Oct 2003, Benjamin Herrenschmidt wrote:

> > I repeat the question: what are the values of drift on the machines
> > that encounter the problem ? Is this drift stable or unstable?
>
> So far, there is no problem. The problem that was happening
> was a via_calibrate_decr() bug with HZ != 100, but when
> investigating, I figured out that we had a potential problem
> there, that's all and that's why I want people like you who
> know those problems well to state if it's worth bothering ;)
>
> > > On all cases, those will drift some way from what the NTP server
> > > will give, either a lot or not, it will. So we may end up adjusting
> > > our kernel rate and thus opening a window for the problem.
> >
> > The worst variations of drift I've seen are a few ppm for a given
> > machine, barring the occasional boot-time calibration problems that I
> > have encountered.
>
> OK.

This discussion prompted me to finally ask about another clock related
problem I see on the 867 MHz G4 systems at work.  The clocks on these
systems continuously run 0.2% slow (about 3 minutes per day).  Apparently
this is more than ntp can adjust for (using scaling), as I get many of
these error messages in the log:

Oct 10 00:11:29 clifford ntpd[425]: time reset 2.641342 s
Oct 10 00:11:29 clifford ntpd[425]: synchronisation lost
Oct 10 00:32:07 clifford ntpd[425]: time reset 2.671741 s
Oct 10 00:32:07 clifford ntpd[425]: synchronisation lost
Oct 10 00:52:46 clifford ntpd[425]: time reset 2.671729 s
Oct 10 00:52:46 clifford ntpd[425]: synchronisation lost

This causes problems if I take these systems off the network for a few
hours, if I forget to reset them to the correct time when I reconnect
them, since we use Kerberos for security, and the time difference between
the system and the Kerberos KDC will prevent remote logins.

These systems are using a 2.4.20-ben1 kernel.

						-Bill

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Time precision, adjtime(x) vs. gettimeofday
  2003-10-10  5:12 Time precision, adjtime(x) vs. gettimeofday Bill Fink
@ 2003-10-10  7:33 ` Gabriel Paubert
  2003-10-10 16:39   ` Bill Fink
  2003-10-10  7:53 ` Ethan Benson
  1 sibling, 1 reply; 15+ messages in thread
From: Gabriel Paubert @ 2003-10-10  7:33 UTC (permalink / raw)
  To: Bill Fink; +Cc: LinuxPPC Developers


On Fri, Oct 10, 2003 at 01:12:54AM -0400, Bill Fink wrote:
>
> On Wed, 08 Oct 2003, Benjamin Herrenschmidt wrote:
>
> > > I repeat the question: what are the values of drift on the machines
> > > that encounter the problem ? Is this drift stable or unstable?
> >
> > So far, there is no problem. The problem that was happening
> > was a via_calibrate_decr() bug with HZ != 100, but when
> > investigating, I figured out that we had a potential problem
> > there, that's all and that's why I want people like you who
> > know those problems well to state if it's worth bothering ;)
> >
> > > > On all cases, those will drift some way from what the NTP server
> > > > will give, either a lot or not, it will. So we may end up adjusting
> > > > our kernel rate and thus opening a window for the problem.
> > >
> > > The worst variations of drift I've seen are a few ppm for a given
> > > machine, barring the occasional boot-time calibration problems that I
> > > have encountered.
> >
> > OK.
>
> This discussion prompted me to finally ask about another clock related
> problem I see on the 867 MHz G4 systems at work.  The clocks on these
> systems continuously run 0.2% slow (about 3 minutes per day).  Apparently
> this is more than ntp can adjust for (using scaling), as I get many of
> these error messages in the log:

Indeed, the limit of NTP is about 500ppm (0.05%) AFAIR. Anything higher
and you go into period time steps like the one you report.

2.4.20 is recent enough and should not have this kind of problems.

What is the initial decrementer frequency from boot messages log?

What is the timebase frequency from OF?
(od -td4 /proc/device-tree/cpus/PowerPC,G4/timebase-frequency)

	Gabriel

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Time precision, adjtime(x) vs. gettimeofday
  2003-10-10  5:12 Time precision, adjtime(x) vs. gettimeofday Bill Fink
  2003-10-10  7:33 ` Gabriel Paubert
@ 2003-10-10  7:53 ` Ethan Benson
  1 sibling, 0 replies; 15+ messages in thread
From: Ethan Benson @ 2003-10-10  7:53 UTC (permalink / raw)
  To: LinuxPPC Developers


On Fri, Oct 10, 2003 at 01:12:54AM -0400, Bill Fink wrote:
>
> This discussion prompted me to finally ask about another clock related
> problem I see on the 867 MHz G4 systems at work. The clocks on
> these systems continuously run 0.2% slow (about 3 minutes per day).
> Apparently this is more than ntp can adjust for (using scaling), as I
> get many of these error messages in the log:

is it a quicksilver G4?  i maintain one of those and its time goes off
much faster then that (3 minutes within a couple hours).

the fix is rather simple:

--- linux.old/arch/ppc/platforms/pmac_time.c.orig	Sat Nov 30 02:33:49 2002
+++ linux/arch/ppc/platforms/pmac_time.c	Sat Nov 30 02:33:22 2002
@@ -262,7 +262,9 @@
 	 * calibration. That's better since the VIA itself seems
 	 * to be slightly off. --BenH
 	 */
+#if 0
 	if (!machine_is_compatible("MacRISC2"))
+#endif
 		if (via_calibrate_decr())
 			return;

in the case of the quicksilver VIA is FAR better then whatever it uses
instead.

--
Ethan Benson
http://www.alaska.net/~erbenson/

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Time precision, adjtime(x) vs. gettimeofday
  2003-10-10  7:33 ` Gabriel Paubert
@ 2003-10-10 16:39   ` Bill Fink
  0 siblings, 0 replies; 15+ messages in thread
From: Bill Fink @ 2003-10-10 16:39 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linuxppc-dev, Bill Fink


On Fri, 10 Oct 2003, Gabriel Paubert wrote:

> On Fri, Oct 10, 2003 at 01:12:54AM -0400, Bill Fink wrote:
> >
> > This discussion prompted me to finally ask about another clock related
> > problem I see on the 867 MHz G4 systems at work.  The clocks on these
> > systems continuously run 0.2% slow (about 3 minutes per day).  Apparently
> > this is more than ntp can adjust for (using scaling), as I get many of
> > these error messages in the log:
>
> Indeed, the limit of NTP is about 500ppm (0.05%) AFAIR. Anything higher
> and you go into period time steps like the one you report.
>
> 2.4.20 is recent enough and should not have this kind of problems.
>
> What is the initial decrementer frequency from boot messages log?

clifford% dmesg | grep -i decr
time_init: decrementer frequency = 33.290001 MHz

> What is the timebase frequency from OF?
> (od -td4 /proc/device-tree/cpus/PowerPC,G4/timebase-frequency)

clifford% od -td4 /proc/device-tree/cpus/PowerPC,G4/timebase-frequency
0000000    33290001
0000004

						-Bill

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Time precision, adjtime(x) vs. gettimeofday
  2003-10-14  7:07     ` Gabriel Paubert
@ 2003-10-14 11:16       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2003-10-14 11:16 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: Ethan Benson, LinuxPPC Developers


> How does the Keylargo timer work? Any pointer?

Darwin code... But it's basically a 64 bits counter at
KL base +

#define   kKeyLargoCounterLoOffset      0x15038
#define   kKeyLargoCounterHiOffset      0x1503C

MacOS X Appelle une fonction asm "TimeSystemBusKeyLargo" qui mesure
le nombre de "ticks" KeyLargo pour 1,048,575 PowerPC decrementer/tb
units.

(copied below)

And then uses that "tick" value this way:

        ticks = TimeSystemBusKeyLargo (keyLargoBaseAddress);
        if (intLock) {
                IOSimpleLockUnlockEnableInterrupt(intLock, is); // As you were
                IOSimpleLockFree (intLock);
        }

        systemBusHz = 4194300;
        systemBusHz *= 18432000;
        systemBusHz /= ticks;


;
; TimeSystemBusKeyLargo(inKeyLargoBaseAddress)
;
; TimeSystemBusKeyLargo - Times how long it takes the PowerPC decrementer to count down
; 1,048,575 ticks.
;
; returns, in r3, the number of KeyLargo timer ticks per 1,048,575 PowerPC decrementer ticks.
;
; trashes r3 - r10
;
; NOTE - interrupts should be disabled when calling this code
;



ENTRY(TimeSystemBusKeyLargo, TAG_NO_FRAME_USED)

                        lis             r4, 0x000F
                        ori             r4, r4, 0xFFFF          ; Load decrementer tick count (1,048,575)
                        lis             r6, kKeyLargoCounterLoOffset >> 16
                        ori             r6, r6, kKeyLargoCounterLoOffset & 0xFFFF ; Counter lo offset
                        lis             r7, kKeyLargoCounterHiOffset >> 16
                        ori             r7, r7, kKeyLargoCounterHiOffset & 0xFFFF ; Counter hi offset
                        lwbrx   r8, r6, r3                      ; Read low 32-bits of counter
                        lwbrx   r9, r7, r3                      ; Read hi 32-bits of counter

                        ; Set up decrementer and wait for it to tick down

                        mtdec   r4                                      ; Set decrementer to 1,048,575
                        isync

NewDecrementerLoop:
                        mfdec   r5                                      ; Read current decrementer value
                        cmpwi   r5, 0                           ; Check if decrementer is zero
                        bgt+    NewDecrementerLoop              ; If not yet to zero, keep looping
                        sync

                        ; Read current value of KeyLargo to get delta time

                        lwbrx   r4, r6, r3                      ; Load low 32-bits of timer (latches all 64 bits)
                        lwbrx   r5, r7, r3                      ; Load high 32-bits of timer (clear latch)

                        ; Calculate difference
                        subf    r3, r8, r4                      ; Subtract low bits (ignore wrap)
                        blr                                                     ; Return
(END)



> Also for these machines it seems that OF also returns wrong values.
> Maybe there is an OF update somewhere.
>
> Does anybody know what MacOS X (most MacOS X machines probably use
> ntp) do?
>
> Sorry, more questions than answers. It superficially looks
> like a HW screw-up in one specific series of machines.
>
> 	Gabriel
--
Benjamin Herrenschmidt <benh@kernel.crashing.org>


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Time precision, adjtime(x) vs. gettimeofday
  2003-10-11 14:58   ` Benjamin Herrenschmidt
@ 2003-10-14  7:07     ` Gabriel Paubert
  2003-10-14 11:16       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Gabriel Paubert @ 2003-10-14  7:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Ethan Benson, LinuxPPC Developers


On Sat, Oct 11, 2003 at 04:58:14PM +0200, Benjamin Herrenschmidt wrote:
>
> > > Assuming the fix works, is there a simple way to test for the
> > > quickserver G4 model rather than doing the "#if 0", since I like to
> > > run a common kernel across a variety of different processor models.
> >
> > i don't know, ive discussed it with benh, but he won't accept that VIA
> > is a better choice here.
>
> Hrm... I do accept that it's a better choice, but I'm sure using
> the KeyLargo timer is even better :) Anyway, I'll switch to VIA by
> default on "PowerMac3,5" type machines for now.

How does the Keylargo timer work? Any pointer?

Also for these machines it seems that OF also returns wrong values.
Maybe there is an OF update somewhere.

Does anybody know what MacOS X (most MacOS X machines probably use
ntp) do?

Sorry, more questions than answers. It superficially looks
like a HW screw-up in one specific series of machines.

	Gabriel

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Time precision, adjtime(x) vs. gettimeofday
  2003-10-11  5:27 ` Ethan Benson
@ 2003-10-11 14:58   ` Benjamin Herrenschmidt
  2003-10-14  7:07     ` Gabriel Paubert
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2003-10-11 14:58 UTC (permalink / raw)
  To: Ethan Benson; +Cc: LinuxPPC Developers


> > Assuming the fix works, is there a simple way to test for the
> > quickserver G4 model rather than doing the "#if 0", since I like to
> > run a common kernel across a variety of different processor models.
>
> i don't know, ive discussed it with benh, but he won't accept that VIA
> is a better choice here.

Hrm... I do accept that it's a better choice, but I'm sure using
the KeyLargo timer is even better :) Anyway, I'll switch to VIA by
default on "PowerMac3,5" type machines for now.

Ben.


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Time precision, adjtime(x) vs. gettimeofday
  2003-10-11  4:45 Bill Fink
@ 2003-10-11  5:27 ` Ethan Benson
  2003-10-11 14:58   ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Ethan Benson @ 2003-10-11  5:27 UTC (permalink / raw)
  To: LinuxPPC Developers


On Sat, Oct 11, 2003 at 12:45:41AM -0400, Bill Fink wrote:
>
> On Thu, 9 Oct 2003, Ethan Benson wrote:
>
> > On Fri, Oct 10, 2003 at 01:12:54AM -0400, Bill Fink wrote:
> > >
> > > This discussion prompted me to finally ask about another clock
> > > related problem I see on the 867 MHz G4 systems at work. The
> > > clocks on these systems continuously run 0.2% slow (about 3
> > > minutes per day). Apparently this is more than ntp can adjust for
> > > (using scaling), as I get many of these error messages in the log:
> >
> > is it a quicksilver G4? i maintain one of those and its time goes
> > off much faster then that (3 minutes within a couple hours).
>
> Yes I believe it's a quicksilver G4.
>
> clifford% cat /proc/cpuinfo
> cpu             : 7450, altivec supported
> clock           : 866MHz
> revision        : 2.1 (pvr 8000 0201)
> bogomips        : 865.07
> machine         : PowerMac3,5
> motherboard     : PowerMac3,5 MacRISC2 MacRISC Power Macintosh
> detected as     : 69 (PowerMac G4 Silver)
> pmac flags      : 00000000
> L2 cache        : 256K unified
> memory          : 640MB
> pmac-generation : NewWorld

thats a quicksilver alright.

> > the fix is rather simple:
> >
> > --- linux.old/arch/ppc/platforms/pmac_time.c.orig Sat Nov 30 02:33:49 2002
> > +++ linux/arch/ppc/platforms/pmac_time.c Sat Nov 30 02:33:22 2002
> > @@ -262,7 +262,9 @@
> > * calibration. That's better since the VIA itself seems
> > * to be slightly off. --BenH
> > */
> > +#if 0
> > if (!machine_is_compatible("MacRISC2"))
> > +#endif
> > if (via_calibrate_decr())
> > return;
>
> Thanks for the suggested fix. I'll give it a try when I get a chance.
>
> > in the case of the quicksilver VIA is FAR better then whatever it
> > inuses stead.
>
> Assuming the fix works, is there a simple way to test for the
> quickserver G4 model rather than doing the "#if 0", since I like to
> run a common kernel across a variety of different processor models.

i don't know, ive discussed it with benh, but he won't accept that VIA
is a better choice here.

--
Ethan Benson
http://www.alaska.net/~erbenson/

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Time precision, adjtime(x) vs. gettimeofday
@ 2003-10-11  4:45 Bill Fink
  2003-10-11  5:27 ` Ethan Benson
  0 siblings, 1 reply; 15+ messages in thread
From: Bill Fink @ 2003-10-11  4:45 UTC (permalink / raw)
  To: LinuxPPC Developers; +Cc: Bill Fink


On Thu, 9 Oct 2003, Ethan Benson wrote:

> On Fri, Oct 10, 2003 at 01:12:54AM -0400, Bill Fink wrote:
> >
> > This discussion prompted me to finally ask about another clock related
> > problem I see on the 867 MHz G4 systems at work. The clocks on
> > these systems continuously run 0.2% slow (about 3 minutes per day).
> > Apparently this is more than ntp can adjust for (using scaling), as I
> > get many of these error messages in the log:
>
> is it a quicksilver G4? i maintain one of those and its time goes off
> much faster then that (3 minutes within a couple hours).

Yes I believe it's a quicksilver G4.

clifford% cat /proc/cpuinfo
cpu             : 7450, altivec supported
clock           : 866MHz
revision        : 2.1 (pvr 8000 0201)
bogomips        : 865.07
machine         : PowerMac3,5
motherboard     : PowerMac3,5 MacRISC2 MacRISC Power Macintosh
detected as     : 69 (PowerMac G4 Silver)
pmac flags      : 00000000
L2 cache        : 256K unified
memory          : 640MB
pmac-generation : NewWorld

> the fix is rather simple:
>
> --- linux.old/arch/ppc/platforms/pmac_time.c.orig Sat Nov 30 02:33:49 2002
> +++ linux/arch/ppc/platforms/pmac_time.c Sat Nov 30 02:33:22 2002
> @@ -262,7 +262,9 @@
> * calibration. That's better since the VIA itself seems
> * to be slightly off. --BenH
> */
> +#if 0
> if (!machine_is_compatible("MacRISC2"))
> +#endif
> if (via_calibrate_decr())
> return;

Thanks for the suggested fix.  I'll give it a try when I get a chance.

> in the case of the quicksilver VIA is FAR better then whatever it uses
> instead.

Assuming the fix works, is there a simple way to test for the
quickserver G4 model rather than doing the "#if 0", since I like
to run a common kernel across a variety of different processor
models.

						-Bill

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Time precision, adjtime(x) vs. gettimeofday
  2003-10-08 13:32 Benjamin Herrenschmidt
  2003-10-08 15:48 ` Gabriel Paubert
@ 2003-10-08 22:17 ` Pavel Machek
  1 sibling, 0 replies; 15+ messages in thread
From: Pavel Machek @ 2003-10-08 22:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list, Linux Kernel list

Hi!

> While fixing problems experienced by some scientific users who
> found out that gettimeofday() could sometimes run backward, I

Having time run backward is not really an option; screensavers start
kicking randomly, make has problems, etc, etc.
								Pavel
-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Time precision, adjtime(x) vs. gettimeofday
  2003-10-08 17:50     ` Gabriel Paubert
@ 2003-10-08 18:22       ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2003-10-08 18:22 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linuxppc-dev list, Linux Kernel list


> I repeat the question: what are the values of drift on the machines
> that encounter the problem ? Is this drift stable or unstable?

So far, there is no problem. The problem that was happening
was a via_calibrate_decr() bug with HZ != 100, but when
investigating, I figured out that we had a potential problem
there, that's all and that's why I want people like you who
know those problems well to state if it's worth bothering ;)
> >
> > On all cases, those will drift some way from what the NTP server will
> > give, either a lot or not, it will. So we may end up adjusting our
> > kernel rate and thus opening a window for the problem.
> 
> The worst variations of drift I've seen are a few ppm for a given
> machine, barring the occasional boot-time calibration problems that
> I have encountered.

OK.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Time precision, adjtime(x) vs. gettimeofday
  2003-10-08 16:22   ` Benjamin Herrenschmidt
@ 2003-10-08 17:50     ` Gabriel Paubert
  2003-10-08 18:22       ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 15+ messages in thread
From: Gabriel Paubert @ 2003-10-08 17:50 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list, Linux Kernel list

On Wed, Oct 08, 2003 at 06:22:58PM +0200, Benjamin Herrenschmidt wrote:
> 
> > Well, it it affects gettimeofday which has a precision of 1 part in
> > 10000 (100 ppm), it means that our boot time timebase calibration was
> > not very good to start with, on my set of running VME machines I have
> > the following (values in ppm):
> >
> > ../..
> 
> Boot time calibration can't be perfect... 

No, indeed.

> I depends very much on the quality of what your are calibrating against, 
> and the bus path to it.

At the time you it is performed, most devices should not be active
(no long DMA bursts) so the variations should be rather small.
Another solution is to increase the measurement period. I have to 
use one second on some machines because I don't have anything else
reliable (only the RTC which changes every second and its interrupt
pin is not routed), even a 1 to 2 second delay does not significantly
affect boot times.

> 
> On most pmacs, I'm calibrating either against a VIA timer which isn't
> _that_ good or on OF value (which are themselves calibrated, I think,
> against the KeyLargo timer).

On the Macs I have around here, the ntp drift values are:
- on a PB G3/400: +8ppm
- on a PM G4/466: -6ppm

that's not _that_ bad (I believe these come from OF). 

10 ppm of a 10ms jiffy is 0.1 microseconds. Increasing HZ can only 
improve this figure, although it is stupid to run the correction loop 
that often IMNSHO.

I repeat the question: what are the values of drift on the machines
that encounter the problem ? Is this drift stable or unstable? 

> 
> On all cases, those will drift some way from what the NTP server will
> give, either a lot or not, it will. So we may end up adjusting our
> kernel rate and thus opening a window for the problem.

The worst variations of drift I've seen are a few ppm for a given
machine, barring the occasional boot-time calibration problems that
I have encountered.

	Gabriel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Time precision, adjtime(x) vs. gettimeofday
  2003-10-08 15:48 ` Gabriel Paubert
@ 2003-10-08 16:22   ` Benjamin Herrenschmidt
  2003-10-08 17:50     ` Gabriel Paubert
  0 siblings, 1 reply; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2003-10-08 16:22 UTC (permalink / raw)
  To: Gabriel Paubert; +Cc: linuxppc-dev list, Linux Kernel list


> Well, it it affects gettimeofday which has a precision of 1 part in
> 10000 (100 ppm), it means that our boot time timebase calibration was
> not very good to start with, on my set of running VME machines I have
> the following (values in ppm):
>
> ../..

Boot time calibration can't be perfect... I depends very much on the
quality of what your are calibrating against, and the bus path to it.

On most pmacs, I'm calibrating either against a VIA timer which isn't
_that_ good or on OF value (which are themselves calibrated, I think,
against the KeyLargo timer).

On all cases, those will drift some way from what the NTP server will
give, either a lot or not, it will. So we may end up adjusting our
kernel rate and thus opening a window for the problem.

Ben.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Time precision, adjtime(x) vs. gettimeofday
  2003-10-08 13:32 Benjamin Herrenschmidt
@ 2003-10-08 15:48 ` Gabriel Paubert
  2003-10-08 16:22   ` Benjamin Herrenschmidt
  2003-10-08 22:17 ` Pavel Machek
  1 sibling, 1 reply; 15+ messages in thread
From: Gabriel Paubert @ 2003-10-08 15:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev list, Linux Kernel list


	Hi!,

On Wed, Oct 08, 2003 at 03:32:31PM +0200, Benjamin Herrenschmidt wrote:
> 
> Hi !
> 
> While fixing problems experienced by some scientific users who
> found out that gettimeofday() could sometimes run backward, I
> found a nasty issue I don't know if we can fix at all or if it's
> not worth bothering.
> 
> So the problem is with any arch (ppc, x86, ...) who uses a HW
> timer (like the CPU timebase on PPC) to provide better-than-jiffy
> precision in do_gettimeofday().
> 
> The problem is that the offset added to xtime value (typically
> the HW timer current value minus the HW timer value at the last
> timer interrupt scaled to usec) uses a scaling factor which has
> been calibrated once, and doesn't take into account the adjustements
> done to xtime increase by adjtime/adjtimex algorithm.

Well, it it affects gettimeofday which has a precision of 1 part in
10000 (100 ppm), it means that our boot time timebase calibration was 
not very good to start with, on my set of running VME machines I have
the following (values in ppm): 

$cat  /nfsroots/v*/etc/ntp/drift
-10.191
-2.787
3.869
-5.645
-1.146
-7.383
4.400
5.824
4.640
0.014
-8.371
0.056
-2.324
-5.655
-5.828
-4.862
-3.380

I can understand that we'll certainly have more serious problems
of non-monotonicity for nanosecond precision timestamps.

I also have from time to time a bad timebase calibration at boot which 
makes the drift go to about 400ppm. I just don't have this problem
on any machine right now. I believed I mentioned this issue once 
on the list but never found time to track it. 

Maybe the boot-time timebase calibration could use a longer period.
However, I'd first like to know by how much the timebase calibration
of the user which has the problem varies between reboots.

> 
> That means that if, for example, adjtimex was called with a factor
> that is trying to slow you down a bit, and you call gettimeofday
> right before the end of a jiffy period, you may calculate an offset
> based on the HW timer that is actually higher than what will be
> really added to xtime on the next interrupt.
> 
> So you can end-up returning non-monotonic values from gettimeofday().

As I said, only if you have fairly large corrections. Anything below


> 
> I don't see a way to fix that that wouldn't bloat do_gettimeofday(),
> except if we can, at jiffy interrupt time, pre-calculate a scaling
> factor for the next jiffy and just apply it on the HW timer value
> on the next calls to do_gettimeofday(). But that option would need
> better understanding of the adjtime(x) algorithm that what I have
> at this point.
> 
> Storing the last value to make sure we don't return a value that is
> lower will defeat the read_lock/write_lock mecanism, forcing us to
> take the write_lock(), and thus screwing up scalability.
> 
> Any idea ?
> 
> Note: In addition to the above, there seem to be a race on x86 2.4
> (only, 2.6 doesn't have it) due to the fact that the actual xtime
> increase is done from a bottom half. The HW timer "last stamp" is
> stored from the HW interrupt, xtime is only updated on the BH, so
> if gettimeofday is called in between those 2, you'll end up using
> the "new" "last stamp" with the old xtime, thus returning an
> incorrect value. A fix we use on PPC is to use
> 
>  jiffies - wall_jiffies
> 
> As an additional correction.

AFAIR, this correction is also done on x86.


	Regards,
	Gabriel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Time precision, adjtime(x) vs. gettimeofday
@ 2003-10-08 13:32 Benjamin Herrenschmidt
  2003-10-08 15:48 ` Gabriel Paubert
  2003-10-08 22:17 ` Pavel Machek
  0 siblings, 2 replies; 15+ messages in thread
From: Benjamin Herrenschmidt @ 2003-10-08 13:32 UTC (permalink / raw)
  To: linuxppc-dev list, Linux Kernel list

Hi !

While fixing problems experienced by some scientific users who
found out that gettimeofday() could sometimes run backward, I
found a nasty issue I don't know if we can fix at all or if it's
not worth bothering.

So the problem is with any arch (ppc, x86, ...) who uses a HW
timer (like the CPU timebase on PPC) to provide better-than-jiffy
precision in do_gettimeofday().

The problem is that the offset added to xtime value (typically
the HW timer current value minus the HW timer value at the last
timer interrupt scaled to usec) uses a scaling factor which has
been calibrated once, and doesn't take into account the adjustements
done to xtime increase by adjtime/adjtimex algorithm.

That means that if, for example, adjtimex was called with a factor
that is trying to slow you down a bit, and you call gettimeofday
right before the end of a jiffy period, you may calculate an offset
based on the HW timer that is actually higher than what will be
really added to xtime on the next interrupt.

So you can end-up returning non-monotonic values from gettimeofday().

I don't see a way to fix that that wouldn't bloat do_gettimeofday(),
except if we can, at jiffy interrupt time, pre-calculate a scaling
factor for the next jiffy and just apply it on the HW timer value
on the next calls to do_gettimeofday(). But that option would need
better understanding of the adjtime(x) algorithm that what I have
at this point.

Storing the last value to make sure we don't return a value that is
lower will defeat the read_lock/write_lock mecanism, forcing us to
take the write_lock(), and thus screwing up scalability.

Any idea ?

Note: In addition to the above, there seem to be a race on x86 2.4
(only, 2.6 doesn't have it) due to the fact that the actual xtime
increase is done from a bottom half. The HW timer "last stamp" is
stored from the HW interrupt, xtime is only updated on the BH, so
if gettimeofday is called in between those 2, you'll end up using
the "new" "last stamp" with the old xtime, thus returning an
incorrect value. A fix we use on PPC is to use

 jiffies - wall_jiffies

As an additional correction.

Ben.



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2003-10-14 11:16 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-10  5:12 Time precision, adjtime(x) vs. gettimeofday Bill Fink
2003-10-10  7:33 ` Gabriel Paubert
2003-10-10 16:39   ` Bill Fink
2003-10-10  7:53 ` Ethan Benson
  -- strict thread matches above, loose matches on Subject: below --
2003-10-11  4:45 Bill Fink
2003-10-11  5:27 ` Ethan Benson
2003-10-11 14:58   ` Benjamin Herrenschmidt
2003-10-14  7:07     ` Gabriel Paubert
2003-10-14 11:16       ` Benjamin Herrenschmidt
2003-10-08 13:32 Benjamin Herrenschmidt
2003-10-08 15:48 ` Gabriel Paubert
2003-10-08 16:22   ` Benjamin Herrenschmidt
2003-10-08 17:50     ` Gabriel Paubert
2003-10-08 18:22       ` Benjamin Herrenschmidt
2003-10-08 22:17 ` Pavel Machek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.