All of lore.kernel.org
 help / color / mirror / Atom feed
* Time precision, adjtime(x) vs. gettimeofday
@ 2003-10-08 13:32 Benjamin Herrenschmidt
  2003-10-08 15:48 ` Gabriel Paubert
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Benjamin Herrenschmidt @ 2003-10-08 13:32 UTC (permalink / raw)
  To: linuxppc-dev list, Linux Kernel list

Hi !

While fixing problems experienced by some scientific users who
found out that gettimeofday() could sometimes run backward, I
found a nasty issue I don't know if we can fix at all or if it's
not worth bothering.

So the problem is with any arch (ppc, x86, ...) who uses a HW
timer (like the CPU timebase on PPC) to provide better-than-jiffy
precision in do_gettimeofday().

The problem is that the offset added to xtime value (typically
the HW timer current value minus the HW timer value at the last
timer interrupt scaled to usec) uses a scaling factor which has
been calibrated once, and doesn't take into account the adjustements
done to xtime increase by adjtime/adjtimex algorithm.

That means that if, for example, adjtimex was called with a factor
that is trying to slow you down a bit, and you call gettimeofday
right before the end of a jiffy period, you may calculate an offset
based on the HW timer that is actually higher than what will be
really added to xtime on the next interrupt.

So you can end-up returning non-monotonic values from gettimeofday().

I don't see a way to fix that that wouldn't bloat do_gettimeofday(),
except if we can, at jiffy interrupt time, pre-calculate a scaling
factor for the next jiffy and just apply it on the HW timer value
on the next calls to do_gettimeofday(). But that option would need
better understanding of the adjtime(x) algorithm that what I have
at this point.

Storing the last value to make sure we don't return a value that is
lower will defeat the read_lock/write_lock mecanism, forcing us to
take the write_lock(), and thus screwing up scalability.

Any idea ?

Note: In addition to the above, there seem to be a race on x86 2.4
(only, 2.6 doesn't have it) due to the fact that the actual xtime
increase is done from a bottom half. The HW timer "last stamp" is
stored from the HW interrupt, xtime is only updated on the BH, so
if gettimeofday is called in between those 2, you'll end up using
the "new" "last stamp" with the old xtime, thus returning an
incorrect value. A fix we use on PPC is to use

 jiffies - wall_jiffies

As an additional correction.

Ben.



^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: Time precision, adjtime(x) vs. gettimeofday
@ 2003-10-10  5:12 Bill Fink
  2003-10-10  7:33 ` Gabriel Paubert
  2003-10-10  7:53 ` Ethan Benson
  0 siblings, 2 replies; 18+ messages in thread
From: Bill Fink @ 2003-10-10  5:12 UTC (permalink / raw)
  To: LinuxPPC Developers; +Cc: Bill Fink


On Wed, 08 Oct 2003, Benjamin Herrenschmidt wrote:

> > I repeat the question: what are the values of drift on the machines
> > that encounter the problem ? Is this drift stable or unstable?
>
> So far, there is no problem. The problem that was happening
> was a via_calibrate_decr() bug with HZ != 100, but when
> investigating, I figured out that we had a potential problem
> there, that's all and that's why I want people like you who
> know those problems well to state if it's worth bothering ;)
>
> > > On all cases, those will drift some way from what the NTP server
> > > will give, either a lot or not, it will. So we may end up adjusting
> > > our kernel rate and thus opening a window for the problem.
> >
> > The worst variations of drift I've seen are a few ppm for a given
> > machine, barring the occasional boot-time calibration problems that I
> > have encountered.
>
> OK.

This discussion prompted me to finally ask about another clock related
problem I see on the 867 MHz G4 systems at work.  The clocks on these
systems continuously run 0.2% slow (about 3 minutes per day).  Apparently
this is more than ntp can adjust for (using scaling), as I get many of
these error messages in the log:

Oct 10 00:11:29 clifford ntpd[425]: time reset 2.641342 s
Oct 10 00:11:29 clifford ntpd[425]: synchronisation lost
Oct 10 00:32:07 clifford ntpd[425]: time reset 2.671741 s
Oct 10 00:32:07 clifford ntpd[425]: synchronisation lost
Oct 10 00:52:46 clifford ntpd[425]: time reset 2.671729 s
Oct 10 00:52:46 clifford ntpd[425]: synchronisation lost

This causes problems if I take these systems off the network for a few
hours, if I forget to reset them to the correct time when I reconnect
them, since we use Kerberos for security, and the time difference between
the system and the Kerberos KDC will prevent remote logins.

These systems are using a 2.4.20-ben1 kernel.

						-Bill

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: Time precision, adjtime(x) vs. gettimeofday
@ 2003-10-11  4:45 Bill Fink
  2003-10-11  5:27 ` Ethan Benson
  0 siblings, 1 reply; 18+ messages in thread
From: Bill Fink @ 2003-10-11  4:45 UTC (permalink / raw)
  To: LinuxPPC Developers; +Cc: Bill Fink


On Thu, 9 Oct 2003, Ethan Benson wrote:

> On Fri, Oct 10, 2003 at 01:12:54AM -0400, Bill Fink wrote:
> >
> > This discussion prompted me to finally ask about another clock related
> > problem I see on the 867 MHz G4 systems at work. The clocks on
> > these systems continuously run 0.2% slow (about 3 minutes per day).
> > Apparently this is more than ntp can adjust for (using scaling), as I
> > get many of these error messages in the log:
>
> is it a quicksilver G4? i maintain one of those and its time goes off
> much faster then that (3 minutes within a couple hours).

Yes I believe it's a quicksilver G4.

clifford% cat /proc/cpuinfo
cpu             : 7450, altivec supported
clock           : 866MHz
revision        : 2.1 (pvr 8000 0201)
bogomips        : 865.07
machine         : PowerMac3,5
motherboard     : PowerMac3,5 MacRISC2 MacRISC Power Macintosh
detected as     : 69 (PowerMac G4 Silver)
pmac flags      : 00000000
L2 cache        : 256K unified
memory          : 640MB
pmac-generation : NewWorld

> the fix is rather simple:
>
> --- linux.old/arch/ppc/platforms/pmac_time.c.orig Sat Nov 30 02:33:49 2002
> +++ linux/arch/ppc/platforms/pmac_time.c Sat Nov 30 02:33:22 2002
> @@ -262,7 +262,9 @@
> * calibration. That's better since the VIA itself seems
> * to be slightly off. --BenH
> */
> +#if 0
> if (!machine_is_compatible("MacRISC2"))
> +#endif
> if (via_calibrate_decr())
> return;

Thanks for the suggested fix.  I'll give it a try when I get a chance.

> in the case of the quicksilver VIA is FAR better then whatever it uses
> instead.

Assuming the fix works, is there a simple way to test for the
quickserver G4 model rather than doing the "#if 0", since I like
to run a common kernel across a variety of different processor
models.

						-Bill

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2003-10-14 11:16 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-08 13:32 Time precision, adjtime(x) vs. gettimeofday Benjamin Herrenschmidt
2003-10-08 15:48 ` Gabriel Paubert
2003-10-08 16:22   ` Benjamin Herrenschmidt
2003-10-08 17:50     ` Gabriel Paubert
2003-10-08 18:22       ` Benjamin Herrenschmidt
2003-10-08 18:25 ` [PATCH] " Stephen Hemminger
2003-10-08 18:43   ` Benjamin Herrenschmidt
2003-10-08 19:11   ` john stultz
2003-10-08 22:17 ` Pavel Machek
2003-10-10  5:12 Bill Fink
2003-10-10  7:33 ` Gabriel Paubert
2003-10-10 16:39   ` Bill Fink
2003-10-10  7:53 ` Ethan Benson
2003-10-11  4:45 Bill Fink
2003-10-11  5:27 ` Ethan Benson
2003-10-11 14:58   ` Benjamin Herrenschmidt
2003-10-14  7:07     ` Gabriel Paubert
2003-10-14 11:16       ` Benjamin Herrenschmidt

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.