linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* timer interrupts on HP machines
@ 2003-05-19 13:00 Edwin Top
  0 siblings, 0 replies; 10+ messages in thread
From: Edwin Top @ 2003-05-19 13:00 UTC (permalink / raw)
  To: linux-kernel

I saw some discussion on this list about HP netserver hardware having
problems with time running forwards & backwards.

Some people tried some MP spec settings in the BIOS and it worked, some
people said it did not work for them.

We are having the same problem here with around 12 (!) servers here
every now and then and are not getting any helpfull support from HP.

Could someone who had the problem (discussed in February 2003 here)
contact me and tell me if they are still experiencing the problem or
tell me how they solved it?

I am suspecting that it is a HP firmware problem, but specifically
triggered by the linux kernel.

Cheers,
-- 
Edwin Top <e.top@uzorg.nl>
Uzorg BV

The person who says it cannot be done should not interrupt the person doing it.
--Chinese Proverb



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: timer interrupts on HP machines
  2003-02-04  4:59     ` Nohez
@ 2003-02-04 19:48       ` Matt C
  0 siblings, 0 replies; 10+ messages in thread
From: Matt C @ 2003-02-04 19:48 UTC (permalink / raw)
  To: Nohez; +Cc: linux-kernel

Yup, it's definitely the HP hardware, since we also only see this problem 
on the NetServers. I haven't worked with the LH series, though, just the 
LT series. We've brought up issues like this with their support 
organization, with the inevitable response "unable to reproduce problem" 
and a closed ticket. We've given up since they ditched the NetServer line 
in favor of the Proliants anyways.

Good Luck.

-Matt

On Tue, 4 Feb 2003, Nohez wrote:

> 
> Hi Matt,
> 
> We have the MP spec set to v1.4 for more than a year and the systems have
> been unplugged for more than 1 hr for system maintenance many times. The
> BIOS firmware is 4.06.43. We suspect the kernel triggering a hardware bug
> as we see this only on HP Netservers. We have other unbranded Intel
> SMP machines running the same kernel, distro & same services without this
> problem.
> 
> Nohez.
> 
> On Mon, 3 Feb 2003, Matt C wrote:
> 
> > Hi Nohez:
> >
> > That's interesting. We've traced almost all of the times when this happens
> > back to an incorrect MP spec. I know it sounds goofy, but have you tried
> > unplugging AC power from the machine for ~5 minutes or so? We've seen that
> > make a difference in the Netservers. Also make sure you're up-to-date with
> > the firmware (latest is 4.06.43 or so?). Outside of that, I don't have any
> > other suggestions besides calling HP and having them replace the system
> > board.
> >
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: timer interrupts on HP machines
  2003-02-04  2:54   ` Matt C
@ 2003-02-04  4:59     ` Nohez
  2003-02-04 19:48       ` Matt C
  0 siblings, 1 reply; 10+ messages in thread
From: Nohez @ 2003-02-04  4:59 UTC (permalink / raw)
  To: Matt C; +Cc: linux-kernel


Hi Matt,

We have the MP spec set to v1.4 for more than a year and the systems have
been unplugged for more than 1 hr for system maintenance many times. The
BIOS firmware is 4.06.43. We suspect the kernel triggering a hardware bug
as we see this only on HP Netservers. We have other unbranded Intel
SMP machines running the same kernel, distro & same services without this
problem.

Nohez.

On Mon, 3 Feb 2003, Matt C wrote:

> Hi Nohez:
>
> That's interesting. We've traced almost all of the times when this happens
> back to an incorrect MP spec. I know it sounds goofy, but have you tried
> unplugging AC power from the machine for ~5 minutes or so? We've seen that
> make a difference in the Netservers. Also make sure you're up-to-date with
> the firmware (latest is 4.06.43 or so?). Outside of that, I don't have any
> other suggestions besides calling HP and having them replace the system
> board.
>



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: timer interrupts on HP machines
  2003-02-03 15:03   ` Valdis.Kletnieks
@ 2003-02-04  4:34     ` Nohez
  0 siblings, 0 replies; 10+ messages in thread
From: Nohez @ 2003-02-04  4:34 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: linux-kernel


On Mon, 3 Feb 2003 Valdis.Kletnieks@vt.edu wrote:

> On Mon, 03 Feb 2003 18:52:14 +0530, Nohez said:
>
> > server: # date
> > Mon Feb  3 17:38:30 IST 2003
> > server: # date
> > Mon Feb  3 17:38:20 IST 2003
>
> > We have xntpd daemon running on all our servers.
>
> Any xntpd messages in the syslog that correlate with these events? I've
> seen similar behavior on my laptop (although the clock ran very slow and
> was getting slammed 10-15 seconds forward by xntpd - was a missing interrupt
> problem).   I've seen oddness with corrupted /etc/ntp/drift files as well...

I have attached ntp log entries for the relevant time period.
Server was rebooted at approx 10:15 and the server time stopped
at 4:57. Before xntpd we used to sync time using "netdate" once
every hour. Problem occured even while using netdate.

/var/log/ntp:
-------------

7 Jan 00:46:11 xntpd[477]: offset -0.000146 sec freq 22.645 ppm error 0.000059 poll 9
7 Jan 01:46:47 xntpd[477]: offset -0.000174 sec freq 22.636 ppm error 0.000059 poll 10
7 Jan 02:47:23 xntpd[477]: offset 0.001350 sec freq 22.634 ppm error 0.000566 poll 10
7 Jan 03:47:59 xntpd[477]: offset -0.000288 sec freq 22.631 ppm error 0.000368 poll 10
7 Jan 04:48:35 xntpd[477]: offset -0.000312 sec freq 22.627 ppm error 0.000208 poll 10
7 Jan 10:18:52 xntpd[476]: system event 'event_restart' (0x01) status \
                           'sync_alarm, sync_unspec, 1 event, event_unspec'
7 Jan 10:19:08 xntpd[476]: peer LOCAL(0) event 'event_reach' (0x84) \
                           status 'unreach, conf, 1 event, event_reach' \
			   (0x801
7 Jan 10:19:09 xntpd[476]: peer xxx.x.x.xx event 'event_reach' (0x84) \
                           status 'unreach, conf, 1 event, event_reach' (0x8
7 Jan 10:22:21 xntpd[476]: system event 'event_peer/strat_chg' (0x04) \
                           status 'sync_alarm, sync_ntp, 2 events, event_res
7 Jan 10:22:21 xntpd[476]: system event 'event_sync_chg' (0x03) \
                           status 'leap_none, sync_ntp, 3 events, \
			   event_peer/strat
7 Jan 10:22:21 xntpd[476]: system event 'event_peer/strat_chg' (0x04) \
                           status 'leap_none, sync_ntp, 4 events, event_sync
7 Jan 11:19:28 xntpd[476]: offset 0.000093 sec freq 22.940 ppm error 0.000051 poll 7
7 Jan 12:20:04 xntpd[476]: offset 0.000134 sec freq 23.146 ppm error 0.000123 poll 6
7 Jan 13:20:40 xntpd[476]: offset -0.000233 sec freq 23.147 ppm error 0.000111 poll 10



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: timer interrupts on HP machines
  2003-02-03 13:22 ` Nohez
  2003-02-03 15:03   ` Valdis.Kletnieks
@ 2003-02-04  2:54   ` Matt C
  2003-02-04  4:59     ` Nohez
  1 sibling, 1 reply; 10+ messages in thread
From: Matt C @ 2003-02-04  2:54 UTC (permalink / raw)
  To: Nohez; +Cc: linux-kernel, bgana

Hi Nohez:

That's interesting. We've traced almost all of the times when this happens 
back to an incorrect MP spec. I know it sounds goofy, but have you tried 
unplugging AC power from the machine for ~5 minutes or so? We've seen that 
make a difference in the Netservers. Also make sure you're up-to-date with 
the firmware (latest is 4.06.43 or so?). Outside of that, I don't have any 
other suggestions besides calling HP and having them replace the system 
board.

-Matt

On Mon, 3 Feb 2003, Nohez wrote:

> 
> We have a similar problem with our HP servers. We are facing this problem
> for more than a year. We have reported this problem to HP support.
> 
> We have five HP Netserver LH6000 running k_smp-2.4.18-47 (SuSE7.1).
> We are sure that MP spec is v1.4 in the BIOS.  But we have not
> checked /proc/interrupts. Will check the next time this problem occurs.
> 
> Problem:
> --------
> 
> System Time behaved erratically but servers do not hang. We noticed that
> all time related apps (sendmail, ping, top, cron etc) stopped. We
> noticed that time goes forward & backward in seconds only.
> 
> server: # date
> Mon Feb  3 17:38:26 IST 2003
> server: # date
> Mon Feb  3 17:38:30 IST 2003
> server: # date
> Mon Feb  3 17:38:20 IST 2003
> server: # date
> Mon Feb  3 17:38:25 IST 2003
> server: # date
> Mon Feb  3 17:38:28 IST 2003
> server: # date
> Mon Feb  3 17:38:21 IST 2003
> 
> The above is just an example. We could not find any pattern.
> 
> We could not access the server remotely. But we could login from console.
> All programs using system time failed - like sendmail, top, cron etc.
> 
> We could umount filesystems. But the server had to be forcibly shut (power
> reset). After system reboot everything was ok.
> 
> We have xntpd daemon running on all our servers.
> 
> Four servers are file/print servers (samba/nfs/cups) and one is database
> server. The above problem has NEVER occured on the database server.
> The only difference between the file-server and database server is:
>    1. DB server has a external HP Ultrium & HP DDS4 tape drive
>       connected to Adaptec 29160N Ultra160 SCSI adapter.
>    2. DB server has a Intel PRO/1000 Network (gigabit ethernet card)
> 
> Hardware details :
> ----------------
> HP Netserver LH6000
> 6 * 550Mhz Xeon CPUs
> 1GB RAM
> Integrated Megaraid Ultra-2 SCSI Raid Controller
> BIOS MP spec is v1.4
> 
> Software:
> ---------
> SuSE Linux 7.1
> kernel 2.4.18 (k_smp-2.4.18-47)
> glibc-2.2-7
> samba-2.0.10-0
> xntp-4.0.99f-6
> "Unsynced TSC support" is enabled in default SuSE kernel k_smp-2.4.18-47
> Kernel debugging is set
> 
> 
> Nohez.
> 
> ------------------------------------------------------------------------
> List:     linux-kernel
> Subject:  Re: timer interrupts on HP machines
> From:     Matt C <wago () phlinux ! com>
> Date:     2003-01-30 17:01:50
> [Download message RAW]
> 
> Hi Praveen-
> 
> We have a few LT6000r servers as well, and have the same problem on all
> 2.4 kernels -- this happens when your MP spec is set to 1.1 in the BIOS.
> Change it to 1.4 and you should be okay.
> 
> The other common problem on these guys is the CPU speed misdetect, which
> causes the kernel to think your CPU is roughly 2x as fast as it really is.
> The solution to that one is to unplug and replug the power cords (even a
> power-off doesn't fix it, go figure).
> 
> Hope that helps.
> 
> -Matt
> 
> On Thu, 30 Jan 2003, Praveen Ray wrote:
> 
> > We have few HP (LPR NetServers and LT6000) which run 2.4.18  (from RedHat 8.0)
> > . The problem is that sometimes the time interrupts stop coming - i.e. the
> > (time) counts in /proc/interrupts stop getting incremented! When this
> > happens, the date on the system falls behind, 'sleep' calls stop working and
> > basically machine becomes unusable.Has anyone else encountered this problem?
> 
> > Is it an HP issue?
> 
> > Thanks.
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: timer interrupts on HP machines
  2003-02-03 13:22 ` Nohez
@ 2003-02-03 15:03   ` Valdis.Kletnieks
  2003-02-04  4:34     ` Nohez
  2003-02-04  2:54   ` Matt C
  1 sibling, 1 reply; 10+ messages in thread
From: Valdis.Kletnieks @ 2003-02-03 15:03 UTC (permalink / raw)
  To: Nohez; +Cc: linux-kernel, bgana

[-- Attachment #1: Type: text/plain, Size: 578 bytes --]

On Mon, 03 Feb 2003 18:52:14 +0530, Nohez said:

> server: # date
> Mon Feb  3 17:38:30 IST 2003
> server: # date
> Mon Feb  3 17:38:20 IST 2003

> We have xntpd daemon running on all our servers.

Any xntpd messages in the syslog that correlate with these events? I've
seen similar behavior on my laptop (although the clock ran very slow and
was getting slammed 10-15 seconds forward by xntpd - was a missing interrupt
problem).   I've seen oddness with corrupted /etc/ntp/drift files as well...
-- 
				Valdis Kletnieks
				Computer Systems Senior Engineer
				Virginia Tech


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: timer interrupts on HP machines
       [not found] <Pine.LNX.4.33.0301291003220.19934-100000@mars.cmie.ernet.in>
@ 2003-02-03 13:22 ` Nohez
  2003-02-03 15:03   ` Valdis.Kletnieks
  2003-02-04  2:54   ` Matt C
  0 siblings, 2 replies; 10+ messages in thread
From: Nohez @ 2003-02-03 13:22 UTC (permalink / raw)
  To: linux-kernel; +Cc: bgana


We have a similar problem with our HP servers. We are facing this problem
for more than a year. We have reported this problem to HP support.

We have five HP Netserver LH6000 running k_smp-2.4.18-47 (SuSE7.1).
We are sure that MP spec is v1.4 in the BIOS.  But we have not
checked /proc/interrupts. Will check the next time this problem occurs.

Problem:
--------

System Time behaved erratically but servers do not hang. We noticed that
all time related apps (sendmail, ping, top, cron etc) stopped. We
noticed that time goes forward & backward in seconds only.

server: # date
Mon Feb  3 17:38:26 IST 2003
server: # date
Mon Feb  3 17:38:30 IST 2003
server: # date
Mon Feb  3 17:38:20 IST 2003
server: # date
Mon Feb  3 17:38:25 IST 2003
server: # date
Mon Feb  3 17:38:28 IST 2003
server: # date
Mon Feb  3 17:38:21 IST 2003

The above is just an example. We could not find any pattern.

We could not access the server remotely. But we could login from console.
All programs using system time failed - like sendmail, top, cron etc.

We could umount filesystems. But the server had to be forcibly shut (power
reset). After system reboot everything was ok.

We have xntpd daemon running on all our servers.

Four servers are file/print servers (samba/nfs/cups) and one is database
server. The above problem has NEVER occured on the database server.
The only difference between the file-server and database server is:
   1. DB server has a external HP Ultrium & HP DDS4 tape drive
      connected to Adaptec 29160N Ultra160 SCSI adapter.
   2. DB server has a Intel PRO/1000 Network (gigabit ethernet card)

Hardware details :
----------------
HP Netserver LH6000
6 * 550Mhz Xeon CPUs
1GB RAM
Integrated Megaraid Ultra-2 SCSI Raid Controller
BIOS MP spec is v1.4

Software:
---------
SuSE Linux 7.1
kernel 2.4.18 (k_smp-2.4.18-47)
glibc-2.2-7
samba-2.0.10-0
xntp-4.0.99f-6
"Unsynced TSC support" is enabled in default SuSE kernel k_smp-2.4.18-47
Kernel debugging is set


Nohez.

------------------------------------------------------------------------
List:     linux-kernel
Subject:  Re: timer interrupts on HP machines
From:     Matt C <wago () phlinux ! com>
Date:     2003-01-30 17:01:50
[Download message RAW]

Hi Praveen-

We have a few LT6000r servers as well, and have the same problem on all
2.4 kernels -- this happens when your MP spec is set to 1.1 in the BIOS.
Change it to 1.4 and you should be okay.

The other common problem on these guys is the CPU speed misdetect, which
causes the kernel to think your CPU is roughly 2x as fast as it really is.
The solution to that one is to unplug and replug the power cords (even a
power-off doesn't fix it, go figure).

Hope that helps.

-Matt

On Thu, 30 Jan 2003, Praveen Ray wrote:

> We have few HP (LPR NetServers and LT6000) which run 2.4.18  (from RedHat 8.0)
> . The problem is that sometimes the time interrupts stop coming - i.e. the
> (time) counts in /proc/interrupts stop getting incremented! When this
> happens, the date on the system falls behind, 'sleep' calls stop working and
> basically machine becomes unusable.Has anyone else encountered this problem?

> Is it an HP issue?

> Thanks.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: timer interrupts on HP machines
  2003-01-30 14:34 Praveen Ray
  2003-01-30 16:44 ` Alan Cox
@ 2003-01-30 17:01 ` Matt C
  1 sibling, 0 replies; 10+ messages in thread
From: Matt C @ 2003-01-30 17:01 UTC (permalink / raw)
  To: Praveen Ray; +Cc: linux-kernel

Hi Praveen-

We have a few LT6000r servers as well, and have the same problem on all 
2.4 kernels -- this happens when your MP spec is set to 1.1 in the BIOS. 
Change it to 1.4 and you should be okay.

The other common problem on these guys is the CPU speed misdetect, which 
causes the kernel to think your CPU is roughly 2x as fast as it really is. 
The solution to that one is to unplug and replug the power cords (even a 
power-off doesn't fix it, go figure).

Hope that helps.

-Matt

On Thu, 30 Jan 2003, Praveen Ray wrote:

> We have few HP (LPR NetServers and LT6000) which run 2.4.18  (from RedHat 8.0) 
> . The problem is that sometimes the time interrupts stop coming - i.e. the 
> (time) counts in /proc/interrupts stop getting incremented! When this 
> happens, the date on the system falls behind, 'sleep' calls stop working and 
> basically machine becomes unusable.Has anyone else encountered this problem? 
> Is it an HP issue?
> Thanks.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: timer interrupts on HP machines
  2003-01-30 14:34 Praveen Ray
@ 2003-01-30 16:44 ` Alan Cox
  2003-01-30 17:01 ` Matt C
  1 sibling, 0 replies; 10+ messages in thread
From: Alan Cox @ 2003-01-30 16:44 UTC (permalink / raw)
  To: praveen.ray; +Cc: Linux Kernel Mailing List

On Thu, 2003-01-30 at 14:34, Praveen Ray wrote:
> We have few HP (LPR NetServers and LT6000) which run 2.4.18  (from RedHat 8.0) 
> . The problem is that sometimes the time interrupts stop coming - i.e. the 
> (time) counts in /proc/interrupts stop getting incremented! When this 
> happens, the date on the system falls behind, 'sleep' calls stop working and 
> basically machine becomes unusable.Has anyone else encountered this problem? 
> Is it an HP issue?

That I don't know ut my first question other than the usual "Have you applied
the errata kernels" is probably whether its hitting some of the APIC funnies
older hw occasionally has. Are they stable running "noapic" ?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* timer interrupts on HP machines
@ 2003-01-30 14:34 Praveen Ray
  2003-01-30 16:44 ` Alan Cox
  2003-01-30 17:01 ` Matt C
  0 siblings, 2 replies; 10+ messages in thread
From: Praveen Ray @ 2003-01-30 14:34 UTC (permalink / raw)
  To: linux-kernel

We have few HP (LPR NetServers and LT6000) which run 2.4.18  (from RedHat 8.0) 
. The problem is that sometimes the time interrupts stop coming - i.e. the 
(time) counts in /proc/interrupts stop getting incremented! When this 
happens, the date on the system falls behind, 'sleep' calls stop working and 
basically machine becomes unusable.Has anyone else encountered this problem? 
Is it an HP issue?
Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2003-05-19 12:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-19 13:00 timer interrupts on HP machines Edwin Top
     [not found] <Pine.LNX.4.33.0301291003220.19934-100000@mars.cmie.ernet.in>
2003-02-03 13:22 ` Nohez
2003-02-03 15:03   ` Valdis.Kletnieks
2003-02-04  4:34     ` Nohez
2003-02-04  2:54   ` Matt C
2003-02-04  4:59     ` Nohez
2003-02-04 19:48       ` Matt C
  -- strict thread matches above, loose matches on Subject: below --
2003-01-30 14:34 Praveen Ray
2003-01-30 16:44 ` Alan Cox
2003-01-30 17:01 ` Matt C

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).