linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Back to the Future ? or some thing sinister ?
@ 2006-01-08 19:31 Chaitanya Hazarey
  2006-01-09  4:03 ` Nathan Lynch
  0 siblings, 1 reply; 7+ messages in thread
From: Chaitanya Hazarey @ 2006-01-08 19:31 UTC (permalink / raw)
  To: linux-kernel

I think this is a problem that does not come along quite frequently.

We have got a machine, lets say X , make is IBM and the CPU is Intel
Pentium 4 2.60 GHz. Its running a 2.6.13.1 Kernel and previously,
2.6.27-4 Kernel the distribution is Debian Sagre.

processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Pentium(R) 4 CPU 2.60GHz
stepping        : 9
cpu MHz         : 2591.888
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
xtpr
bogomips        : 5188.79




The problem is that, after a some time ( fuzzy , but I think like 2
hours ) of inactivity or because of some esoteric factor which triggers
a state in which the time on the machine starts going around in a loop.
if I do cat /proc/uptime, it goes 4  ticks ahead and again rewinds back
to the starting count ( not zero, but the moment in time when the event
was triggred. )

The problem seems to be specific to the 2.6 series of kernel, not the
2.4 series.

I  would like to know how to go about the debugging of the problem, and
that which specific part of the kernel will be directly interacting with
the rtc / system clock.

Thanks,

Chaitanya

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Back to the Future ? or some thing sinister ?
  2006-01-08 19:31 Back to the Future ? or some thing sinister ? Chaitanya Hazarey
@ 2006-01-09  4:03 ` Nathan Lynch
  2006-01-09 15:26   ` Ram Gupta
  2006-01-11 22:03   ` john stultz
  0 siblings, 2 replies; 7+ messages in thread
From: Nathan Lynch @ 2006-01-09  4:03 UTC (permalink / raw)
  To: Chaitanya Hazarey; +Cc: linux-kernel

Chaitanya Hazarey wrote:
>
> We have got a machine, lets say X , make is IBM and the CPU is Intel
> Pentium 4 2.60 GHz. Its running a 2.6.13.1 Kernel and previously,
> 2.6.27-4 Kernel the distribution is Debian Sagre.
> 
> processor       : 0
> vendor_id       : GenuineIntel
> cpu family      : 15
> model           : 2
> model name      : Intel(R) Pentium(R) 4 CPU 2.60GHz
> stepping        : 9
> cpu MHz         : 2591.888
> cache size      : 512 KB
> fdiv_bug        : no
> hlt_bug         : no
> f00f_bug        : no
> coma_bug        : no
> fpu             : yes
> fpu_exception   : yes
> cpuid level     : 2
> wp              : yes
> flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
> mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe cid
> xtpr
> bogomips        : 5188.79
> 
> 
> 
> 
> The problem is that, after a some time ( fuzzy , but I think like 2
> hours ) of inactivity or because of some esoteric factor which triggers
> a state in which the time on the machine starts going around in a loop.
> if I do cat /proc/uptime, it goes 4  ticks ahead and again rewinds back
> to the starting count ( not zero, but the moment in time when the event
> was triggred. )
> 
> The problem seems to be specific to the 2.6 series of kernel, not the
> 2.4 series.
> 
> I  would like to know how to go about the debugging of the problem, and
> that which specific part of the kernel will be directly interacting with
> the rtc / system clock.

Look into upgrading the BIOS on that machine; I've had similar
problems on a IBM P4 workstation that were fixed in this way.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Back to the Future ? or some thing sinister ?
  2006-01-09  4:03 ` Nathan Lynch
@ 2006-01-09 15:26   ` Ram Gupta
  2006-01-11 18:32     ` Chaitanya Vinay Hazarey
  2006-01-11 22:03   ` john stultz
  1 sibling, 1 reply; 7+ messages in thread
From: Ram Gupta @ 2006-01-09 15:26 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: Chaitanya Hazarey, linux-kernel

On 1/8/06, Nathan Lynch <ntl@pobox.com> wrote:
> Chaitanya Hazarey wrote:
> >
> > We have got a machine, lets say X , make is IBM and the CPU is Intel
> > Pentium 4 2.60 GHz. Its running a 2.6.13.1 Kernel and previously,


Is this machine's time is synchronized with some server using ntp. I
had seen some very similar issue when the clock deviation was more
than a second .If clock is adjusted and time difference becomes more
than 2 sec the diffence becomes negative because timeval has its
members as signed int.It think that issue might be playing a role
here.

Ram

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Back to the Future ? or some thing sinister ?
  2006-01-09 15:26   ` Ram Gupta
@ 2006-01-11 18:32     ` Chaitanya Vinay Hazarey
  0 siblings, 0 replies; 7+ messages in thread
From: Chaitanya Vinay Hazarey @ 2006-01-11 18:32 UTC (permalink / raw)
  To: linux-kernel

Ram Gupta wrote:

>On 1/8/06, Nathan Lynch <ntl@pobox.com> wrote:
>  
>
>>Chaitanya Hazarey wrote:
>>    
>>
>>>We have got a machine, lets say X , make is IBM and the CPU is Intel
>>>Pentium 4 2.60 GHz. Its running a 2.6.13.1 Kernel and previously,
>>>      
>>>
>
>
>Is this machine's time is synchronized with some server using ntp. I
>had seen some very similar issue when the clock deviation was more
>than a second .If clock is adjusted and time difference becomes more
>than 2 sec the diffence becomes negative because timeval has its
>members as signed int.It think that issue might be playing a role
>here.
>
>  
>
Nope tried every thing, shutting down the ntp server, changing the Ntp 
server, any thing I do it still will hang intermittently. And if the 
problem is because of the Ntp why should it hang only on 2.6 not 2.4 
kernels ?

And the point is that when it reaches that stage all the commands seem 
to execute ultra slow.

Any help for diagnosing the problem is most welcome.

Thanks,

Chaitanya



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Back to the Future ? or some thing sinister ?
  2006-01-09  4:03 ` Nathan Lynch
  2006-01-09 15:26   ` Ram Gupta
@ 2006-01-11 22:03   ` john stultz
  2006-01-12 14:33     ` Ram Gupta
  1 sibling, 1 reply; 7+ messages in thread
From: john stultz @ 2006-01-11 22:03 UTC (permalink / raw)
  To: Nathan Lynch; +Cc: Chaitanya Hazarey, linux-kernel

On Sun, 2006-01-08 at 22:03 -0600, Nathan Lynch wrote:
> Chaitanya Hazarey wrote:
> >
> > We have got a machine, lets say X , make is IBM and the CPU is Intel
> > Pentium 4 2.60 GHz. Its running a 2.6.13.1 Kernel and previously,
> > 2.6.27-4 Kernel the distribution is Debian Sagre.
> > 
[snip]
> > 
> > The problem is that, after a some time ( fuzzy , but I think like 2
> > hours ) of inactivity or because of some esoteric factor which triggers
> > a state in which the time on the machine starts going around in a loop.
> > if I do cat /proc/uptime, it goes 4  ticks ahead and again rewinds back
> > to the starting count ( not zero, but the moment in time when the event
> > was triggred. )
> > 
> > The problem seems to be specific to the 2.6 series of kernel, not the
> > 2.4 series.
> > 
> > I  would like to know how to go about the debugging of the problem, and
> > that which specific part of the kernel will be directly interacting with
> > the rtc / system clock.
> 
> Look into upgrading the BIOS on that machine; I've had similar
> problems on a IBM P4 workstation that were fixed in this way.

Yes, there was a problematic BIOS on some IBM P4 systems that after a
few hours messed up the apic's timer interrupt frequency. I believe
booting w/ noapic will work around the issue, but the correct fix is to
update your BIOS.

Please file a bugzilla bug if upgrading your BIOS does resolve the
issue.

thanks
-john


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Back to the Future ? or some thing sinister ?
  2006-01-11 22:03   ` john stultz
@ 2006-01-12 14:33     ` Ram Gupta
  2006-01-12 18:08       ` john stultz
  0 siblings, 1 reply; 7+ messages in thread
From: Ram Gupta @ 2006-01-12 14:33 UTC (permalink / raw)
  To: john stultz; +Cc: Nathan Lynch, Chaitanya Hazarey, linux-kernel

On 1/11/06, john stultz <johnstul@us.ibm.com> wrote:
> On Sun, 2006-01-08 at 22:03 -0600, Nathan Lynch wrote:
> > Chaitanya Hazarey wrote:
> > >
> > > We have got a machine, lets say X , make is IBM and the CPU is Intel
> > > Pentium 4 2.60 GHz. Its running a 2.6.13.1 Kernel and previously,
> > > 2.6.27-4 Kernel the distribution is Debian Sagre.

It may be BIOS related. But I feel it might be an overflow related
issue. If the variable is signed int then there will be a transition
from 0x7fffffff ns  to 0x80000000 ns which is basically from +2 sec to
-2 sec which will result in 4 sec loss.

Ram

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Back to the Future ? or some thing sinister ?
  2006-01-12 14:33     ` Ram Gupta
@ 2006-01-12 18:08       ` john stultz
  0 siblings, 0 replies; 7+ messages in thread
From: john stultz @ 2006-01-12 18:08 UTC (permalink / raw)
  To: Ram Gupta; +Cc: Nathan Lynch, Chaitanya Hazarey, linux-kernel

On Thu, 2006-01-12 at 08:33 -0600, Ram Gupta wrote:
> On 1/11/06, john stultz <johnstul@us.ibm.com> wrote:
> > On Sun, 2006-01-08 at 22:03 -0600, Nathan Lynch wrote:
> > > Chaitanya Hazarey wrote:
> > > >
> > > > We have got a machine, lets say X , make is IBM and the CPU is Intel
> > > > Pentium 4 2.60 GHz. Its running a 2.6.13.1 Kernel and previously,
> > > > 2.6.27-4 Kernel the distribution is Debian Sagre.
> 
> It may be BIOS related. But I feel it might be an overflow related
> issue. If the variable is signed int then there will be a transition
> from 0x7fffffff ns  to 0x80000000 ns which is basically from +2 sec to
> -2 sec which will result in 4 sec loss.

I'm pretty sure this is the BIOS issue. If your hesitant about updating
the BIOS, try booting w/ noapic, and see if that works around the issue.

The 4 second loss is the tv_nsec portion of the xtime timespec wrapping.
Since time is not accumulated (timer_interrupt isn't being called at the
normal HZ frequency), the TSC offset grows and grows (and finally will
wrap repeating the processes), causing the xtime.tv_nsec to wrap.

Thus you are correct that the symptom is overflow related, but the cause
is most likely the BIOS.

thanks
-john


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2006-01-12 18:08 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-08 19:31 Back to the Future ? or some thing sinister ? Chaitanya Hazarey
2006-01-09  4:03 ` Nathan Lynch
2006-01-09 15:26   ` Ram Gupta
2006-01-11 18:32     ` Chaitanya Vinay Hazarey
2006-01-11 22:03   ` john stultz
2006-01-12 14:33     ` Ram Gupta
2006-01-12 18:08       ` john stultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).