linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* time jumps (again)
@ 2003-08-04 16:35 Patrick Moor
  2003-08-04 16:37 ` Alan Cox
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Patrick Moor @ 2003-08-04 16:35 UTC (permalink / raw)
  To: linux-kernel

Hi

Some days ago I started noticing strange time jumps on my Athlon system. 
(Asus board, VIA chipset, AMD Athlon 650MHz processor). I haven't 
noticed them before and I am pretty sure there weren't any for the last 
few years! Uptime of the machine is now 218 days, and problems began 
appearing after 215 days approximately.

What happens: when doing a
  $ while true; do date; done
I'm noticing time jumps _exactly_ at the beginning of a "new" second (or 
at the end of an "old" one). the jump is exactly 4294 (4295) seconds 
into the future. Example:
...
Mon Aug  4 18:11:06 CEST 2003
Mon Aug  4 18:11:06 CEST 2003
Mon Aug  4 19:22:41 CEST 2003
Mon Aug  4 19:22:41 CEST 2003
Mon Aug  4 19:22:41 CEST 2003
Mon Aug  4 18:11:07 CEST 2003
Mon Aug  4 18:11:07 CEST 2003
...

I've found some previous discussions about this about a year ago:

   http://www.ussg.iu.edu/hypermail/linux/kernel/0203.3/0557.html
   http://www.ussg.iu.edu/hypermail/linux/kernel/0206.0/1505.html

What seems strange to me is, that these jumps have never occured before. 
The machine is running a plain 2.4.20 kernel.

So my question is: will disabling the CONFIG_X86_TSC option and passing 
"notsc" as boot parameter fix the problem? Or did I get something wrong 
there?

thanks
  patrick



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: time jumps (again)
  2003-08-04 16:35 time jumps (again) Patrick Moor
@ 2003-08-04 16:37 ` Alan Cox
  2003-08-04 21:49 ` Tim Schmielau
  2003-08-05 10:32 ` Jan Niehusmann
  2 siblings, 0 replies; 9+ messages in thread
From: Alan Cox @ 2003-08-04 16:37 UTC (permalink / raw)
  To: Patrick Moor; +Cc: Linux Kernel Mailing List

On Llu, 2003-08-04 at 17:35, Patrick Moor wrote:
> few years! Uptime of the machine is now 218 days, and problems began 
> appearing after 215 days approximately.

Not sure why 215 days should be significant

> What happens: when doing a
>   $ while true; do date; done
> I'm noticing time jumps _exactly_ at the beginning of a "new" second (or 
> at the end of an "old" one). the jump is exactly 4294 (4295) seconds 
> into the future. Example:

4294.. top of -1

Smells of some kind of sign propogation bug


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: time jumps (again)
  2003-08-04 16:35 time jumps (again) Patrick Moor
  2003-08-04 16:37 ` Alan Cox
@ 2003-08-04 21:49 ` Tim Schmielau
  2003-08-04 22:38   ` george anzinger
  2003-08-06 18:16   ` Timothy Miller
  2003-08-05 10:32 ` Jan Niehusmann
  2 siblings, 2 replies; 9+ messages in thread
From: Tim Schmielau @ 2003-08-04 21:49 UTC (permalink / raw)
  To: Patrick Moor; +Cc: lkml, george anzinger

> Some days ago I started noticing strange time jumps on my Athlon system.
> (Asus board, VIA chipset, AMD Athlon 650MHz processor). I haven't
> noticed them before and I am pretty sure there weren't any for the last
> few years! Uptime of the machine is now 218 days, and problems began
> appearing after 215 days approximately.
>
> What happens: when doing a
>   $ while true; do date; done
> I'm noticing time jumps _exactly_ at the beginning of a "new" second (or
> at the end of an "old" one). the jump is exactly 4294 (4295) seconds
> into the future. Example:
> ...
> Mon Aug  4 18:11:06 CEST 2003
> Mon Aug  4 18:11:06 CEST 2003
> Mon Aug  4 19:22:41 CEST 2003
> Mon Aug  4 19:22:41 CEST 2003
> Mon Aug  4 19:22:41 CEST 2003
> Mon Aug  4 18:11:07 CEST 2003
> Mon Aug  4 18:11:07 CEST 2003
> ...
>

Wild guess - does the following patch fix it?

Tim


--- linux-2.4.20/arch/i386/kernel/time.c.orig	Mon Aug  4 23:38:47 2003
+++ linux-2.4.20/arch/i386/kernel/time.c	Mon Aug  4 23:40:53 2003
@@ -274,8 +274,8 @@
 	read_lock_irqsave(&xtime_lock, flags);
 	usec = do_gettimeoffset();
 	{
-		unsigned long lost = jiffies - wall_jiffies;
-		if (lost)
+		long lost = jiffies - wall_jiffies;
+		if (lost>0)
 			usec += lost * (1000000 / HZ);
 	}
 	sec = xtime.tv_sec;


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: time jumps (again)
  2003-08-04 21:49 ` Tim Schmielau
@ 2003-08-04 22:38   ` george anzinger
  2003-08-05  1:08     ` Andries Brouwer
  2003-08-06 18:16   ` Timothy Miller
  1 sibling, 1 reply; 9+ messages in thread
From: george anzinger @ 2003-08-04 22:38 UTC (permalink / raw)
  To: Tim Schmielau; +Cc: Patrick Moor, lkml

Tim Schmielau wrote:
>>Some days ago I started noticing strange time jumps on my Athlon system.
>>(Asus board, VIA chipset, AMD Athlon 650MHz processor). I haven't
>>noticed them before and I am pretty sure there weren't any for the last
>>few years! Uptime of the machine is now 218 days, and problems began
>>appearing after 215 days approximately.
>>
>>What happens: when doing a
>>  $ while true; do date; done
>>I'm noticing time jumps _exactly_ at the beginning of a "new" second (or
>>at the end of an "old" one). the jump is exactly 4294 (4295) seconds
>>into the future. Example:
>>...
>>Mon Aug  4 18:11:06 CEST 2003
>>Mon Aug  4 18:11:06 CEST 2003
>>Mon Aug  4 19:22:41 CEST 2003
>>Mon Aug  4 19:22:41 CEST 2003
>>Mon Aug  4 19:22:41 CEST 2003
>>Mon Aug  4 18:11:07 CEST 2003
>>Mon Aug  4 18:11:07 CEST 2003
>>...
>>
> 
> 
> Wild guess - does the following patch fix it?

And your theory is that wall_jiffies > jiffies.  How does this happen? 
  Both of these are only changed under the write_irq lock....

I would feel better with a patch that made jiffies volatile, but it 
already is.

I agree that the jump implies overflow here, but just HOW is it happening?

Time for some dianostic code...

  Tim
> 
> 
> --- linux-2.4.20/arch/i386/kernel/time.c.orig	Mon Aug  4 23:38:47 2003
> +++ linux-2.4.20/arch/i386/kernel/time.c	Mon Aug  4 23:40:53 2003
> @@ -274,8 +274,8 @@
>  	read_lock_irqsave(&xtime_lock, flags);
>  	usec = do_gettimeoffset();
>  	{
> -		unsigned long lost = jiffies - wall_jiffies;
> -		if (lost)
> +		long lost = jiffies - wall_jiffies;
> +		if (lost>0)
>  			usec += lost * (1000000 / HZ);
>  	}
>  	sec = xtime.tv_sec;
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: time jumps (again)
  2003-08-04 22:38   ` george anzinger
@ 2003-08-05  1:08     ` Andries Brouwer
  0 siblings, 0 replies; 9+ messages in thread
From: Andries Brouwer @ 2003-08-05  1:08 UTC (permalink / raw)
  To: george anzinger; +Cc: Tim Schmielau, Patrick Moor, lkml

> Tim Schmielau wrote:

> >>What happens: when doing a
> >> $ while true; do date; done
> >>I'm noticing time jumps _exactly_ at the beginning of a "new" second (or
> >>at the end of an "old" one). the jump is exactly 4294 (4295) seconds
> >>into the future. Example:
> >>...
> >>Mon Aug  4 18:11:06 CEST 2003
> >>Mon Aug  4 19:22:41 CEST 2003
> >>Mon Aug  4 18:11:07 CEST 2003
> >>...

> >--- linux-2.4.20/arch/i386/kernel/time.c.orig	Mon Aug  4 23:38:47 2003
> >+++ linux-2.4.20/arch/i386/kernel/time.c	Mon Aug  4 23:40:53 2003
> >@@ -274,8 +274,8 @@
> > 	read_lock_irqsave(&xtime_lock, flags);
> > 	usec = do_gettimeoffset();
> > 	{
> >-		unsigned long lost = jiffies - wall_jiffies;
> >-		if (lost)
> >+		long lost = jiffies - wall_jiffies;
> >+		if (lost>0)
> > 			usec += lost * (1000000 / HZ);
> > 	}
> > 	sec = xtime.tv_sec;

At first sight jiffies and wall_jiffies increase monotonically, and
wall_jiffies always has a value jiffies had a moment earlier, so the
difference jiffies - wall_jiffies ought to be nonnegative.

On the other hand, do_gettimeoffset() is a much more obscure function,
and the jumps are also explained if that can return a negative value.

Depending on CONFIG_X86_TSC it does do_slow_gettimeoffset or
do_fast_gettimeoffset. Both offer plenty of opportunities to
return a negative value. Things depend on hardware details.

So, instead of adding a test inside { } I would propose to catch
problems after the {}, e.g. by
	if (usec < 0)
		usec = 0;

There should be a clue in the fact that the jump happens at the
start of a new second. I don't know what it is.

Andries


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: time jumps (again)
  2003-08-04 16:35 time jumps (again) Patrick Moor
  2003-08-04 16:37 ` Alan Cox
  2003-08-04 21:49 ` Tim Schmielau
@ 2003-08-05 10:32 ` Jan Niehusmann
  2 siblings, 0 replies; 9+ messages in thread
From: Jan Niehusmann @ 2003-08-05 10:32 UTC (permalink / raw)
  To: Patrick Moor; +Cc: linux-kernel

On Mon, Aug 04, 2003 at 06:35:07PM +0200, Patrick Moor wrote:
> I'm noticing time jumps _exactly_ at the beginning of a "new" second (or 
> at the end of an "old" one). the jump is exactly 4294 (4295) seconds 
> into the future. Example:

We had the same problem with a similar setup (ASUS board, VIA chipset,
AMD CPU). 

The solution is in the following thread, and AFAIK the patch went into
2.4.21:
http://www.ussg.iu.edu/hypermail/linux/kernel/0211.0/0330.html

Jan


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: time jumps (again)
  2003-08-04 21:49 ` Tim Schmielau
  2003-08-04 22:38   ` george anzinger
@ 2003-08-06 18:16   ` Timothy Miller
  2003-08-06 18:55     ` George Anzinger
  2003-08-07  0:29     ` Andries Brouwer
  1 sibling, 2 replies; 9+ messages in thread
From: Timothy Miller @ 2003-08-06 18:16 UTC (permalink / raw)
  To: Tim Schmielau; +Cc: Patrick Moor, lkml, george anzinger

Is there any way the kernel could detect clock problems like drift and 
jumps by comparing the effects of different timers?  And when a problem 
is detected, it can correct the situation automatically.

How many interrupt timers are there in various systems?  How much can we 
rely on the accuracy of each one?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: time jumps (again)
  2003-08-06 18:16   ` Timothy Miller
@ 2003-08-06 18:55     ` George Anzinger
  2003-08-07  0:29     ` Andries Brouwer
  1 sibling, 0 replies; 9+ messages in thread
From: George Anzinger @ 2003-08-06 18:55 UTC (permalink / raw)
  To: Timothy Miller; +Cc: Tim Schmielau, Patrick Moor, lkml

Timothy Miller wrote:
> Is there any way the kernel could detect clock problems like drift and 
> jumps by comparing the effects of different timers?  And when a problem 
> is detected, it can correct the situation automatically.
> 
> How many interrupt timers are there in various systems?  How much can we 
> rely on the accuracy of each one?
> 
In my high-res-timers model I don't rely on interrupts to "clock" 
time, but rather pick some stable time source such as the ACPIC 
pm_timer.  The interrupts are just used to remind the system to read 
the clock.

In this model, the gettimeofday() request just reads that clock. 
There is also code to keep the interrupts occurring on the proper 
"boundaries" as defined by that clock.

The problem is finding a stable fast (as in time to read) clock 
source.  The TSC is not stable in a fair number of machines.  The 
pm_timer is an I/O access which is sloooow and will only get slower 
WRT cpu cycle time as the boxes get faster.

Archs other than the x86 seem to do much better in this regard.

As for fixing what is in the x86 now,  I would suggest that, if we are 
using the TSC, we trust it with a bit of a longer time than the tick 
time.  It is relatively easy to detect drift WRT the PIT and correct 
the TSC base line, but this should be done over a second or so and not 
each tick as is done now.  This would eliminate the PIT as well as the 
TSC reference read at each interrupt and result in a more stable result.

To work correctly with NTP we would also need to adjust the TSC to 
useconds multiplier to match what NTP thinks the TSC rate should be at 
the moment.

I don't know if this work should be attempted at this point in the 
development cycle, however.  Possibly waiting for 2.7 is better.
> 

-- 
George Anzinger   george@mvista.com
High-res-timers:  http://sourceforge.net/projects/high-res-timers/
Preemption patch: http://www.kernel.org/pub/linux/kernel/people/rml


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: time jumps (again)
  2003-08-06 18:16   ` Timothy Miller
  2003-08-06 18:55     ` George Anzinger
@ 2003-08-07  0:29     ` Andries Brouwer
  1 sibling, 0 replies; 9+ messages in thread
From: Andries Brouwer @ 2003-08-07  0:29 UTC (permalink / raw)
  To: Timothy Miller; +Cc: Tim Schmielau, Patrick Moor, lkml, george anzinger

On Wed, Aug 06, 2003 at 02:16:35PM -0400, Timothy Miller wrote:

> Is there any way the kernel could detect clock problems like drift and 
> jumps by comparing the effects of different timers?  And when a problem 
> is detected, it can correct the situation automatically.

In this particular case, I think my stopgap
	if ((long) usec < 0)
		usec = 0;
would suffice to eliminate the jumps.
Of course it would be better to understand the hardware details,
but perhaps we are insufficiently documented.


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2003-08-07  0:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-08-04 16:35 time jumps (again) Patrick Moor
2003-08-04 16:37 ` Alan Cox
2003-08-04 21:49 ` Tim Schmielau
2003-08-04 22:38   ` george anzinger
2003-08-05  1:08     ` Andries Brouwer
2003-08-06 18:16   ` Timothy Miller
2003-08-06 18:55     ` George Anzinger
2003-08-07  0:29     ` Andries Brouwer
2003-08-05 10:32 ` Jan Niehusmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).