linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BUG: soft lockup detected on CPU#1, ipmi_si
@ 2006-06-13 23:35 Peter Palfrader
  2006-06-19  3:59 ` [Openipmi-developer] " Corey Minyard
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Palfrader @ 2006-06-13 23:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: openipmi-developer

Hi,

On 2.6.17-rc6 I get these soft lockup warnings quite regularly on one of
my DL145-G2 systems (dual dual-core (opteron 275), x86_64).

Here's an example of one:

Jun 14 00:57:55 laura kernel: [97133.246571] BUG: soft lockup detected on CPU#1!
Jun 14 00:57:55 laura kernel: [97133.247456] 
Jun 14 00:57:55 laura kernel: [97133.247457] Call Trace: <IRQ> <ffffffff802a6eba>{softlockup_tick+250}
Jun 14 00:57:55 laura kernel: [97133.247496]        <ffffffff80293bb7>{update_process_times+87} <ffffffff80279923>{smp_local_timer_interrupt+35}
Jun 14 00:57:55 laura kernel: [97133.247546]        <ffffffff80279e61>{smp_apic_timer_interrupt+65} <ffffffff8026712e>{apic_timer_interrupt+98} <EOI>
Jun 14 00:57:55 laura kernel: [97133.247599]        <ffffffff802526c3>{try_to_del_timer_sync+83} <ffffffff8026be38>{_spin_unlock_irqrestore+8}
Jun 14 00:57:55 laura kernel: [97133.247652]        <ffffffff880eab78>{:ipmi_si:ipmi_thread+72} <ffffffff880eab30>{:ipmi_si:ipmi_thread+0}
Jun 14 00:57:55 laura kernel: [97133.247707]        <ffffffff80237099>{kthread+217} <ffffffff8026743a>{child_rip+8}
Jun 14 00:57:55 laura kernel: [97133.247757]        <ffffffff8029d280>{keventd_create_kthread+0} <ffffffff80236fc0>{kthread+0}
Jun 14 00:57:55 laura kernel: [97133.247807]        <ffffffff80267432>{child_rip+0}

I have several more, all of which include ipmi_si in one way or
another.  Usually these lockups happened on CPUs 2 and 3, tho today I
got one of CPU 1.

It's interesting to note that these logs are happening every 30 minutes
(munin queries 'openipmi -I open sensors' about once every 5 minutes).

Jun 12 18:45:40 laura kernel: [ 2876.040688] BUG: soft lockup detected on CPU#2!
Jun 12 19:15:54 laura kernel: [ 4688.141966] BUG: soft lockup detected on CPU#2!
Jun 12 19:46:06 laura kernel: [ 6499.159958] BUG: soft lockup detected on CPU#3!
Jun 12 20:16:17 laura kernel: [ 8309.394462] BUG: soft lockup detected on CPU#2!
Jun 12 20:46:29 laura kernel: [10120.468415] BUG: soft lockup detected on CPU#3!
Jun 12 21:16:42 laura kernel: [11931.614321] BUG: soft lockup detected on CPU#2!
Jun 12 21:46:54 laura kernel: [13742.672285] BUG: soft lockup detected on CPU#2!
[rebooted in between]
Jun 14 00:57:55 laura kernel: [97133.246571] BUG: soft lockup detected on CPU#1!
Jun 14 01:28:07 laura kernel: [98944.316467] BUG: soft lockup detected on CPU#3!

Not having the ipmi_si module loaded means these warnings don't happen.

I suspect this started happening when I upgraded the firmware of the
iLo 100i BMC from 1.00 to version 1.23 (the other system's firmware
wasn't upgraded yet).  I'm still reporting it here on the chance that it
actually is a kernel/ipmi_si module bug.

Cheers,
Peter
-- 
                           |  .''`.  ** Debian GNU/Linux **
      Peter Palfrader      | : :' :      The  universal
 http://www.palfrader.org/ | `. `'      Operating System
                           |   `-    http://www.debian.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Openipmi-developer] BUG: soft lockup detected on CPU#1, ipmi_si
  2006-06-13 23:35 BUG: soft lockup detected on CPU#1, ipmi_si Peter Palfrader
@ 2006-06-19  3:59 ` Corey Minyard
  2006-06-19  9:38   ` Peter Palfrader
  0 siblings, 1 reply; 9+ messages in thread
From: Corey Minyard @ 2006-06-19  3:59 UTC (permalink / raw)
  To: Peter Palfrader, linux-kernel, openipmi-developer

Sorry, I've been on vacation for a week without internet, so I'm just
getting caught up.

The IPMI driver spawns a low-priority thread that will poll the driver
when it finds there is something to do.  It's possible that the hardware
is not setting things properly and is always telling the driver it has
to do something.  It's possible that the new version of the firmware
enabled interrupts; I think there's a problem with the driver here; it
should not really enable the kernel thread if interrupts are working. 
The driver should also probably call schedule() instead of udelay() in
the kernel thread when a short timeout is requested by the state machine.

In either situation, the kernel thread will sit there and spin, and if
nothing else is scheduled for 10 seconds on that CPU you will get that
warning.  Can you check a few things for me?

cat /proc/ipmi/0/si_stats and send me the output.

If you do "top", is the kipmi0 always running?

Is your IPMI interface KCS or SMIC?  The IPMI driver should report this
in the system log at startup.

Thanks,

-Corey

Peter Palfrader wrote:
> Hi,
>
> On 2.6.17-rc6 I get these soft lockup warnings quite regularly on one of
> my DL145-G2 systems (dual dual-core (opteron 275), x86_64).
>
> Here's an example of one:
>
> Jun 14 00:57:55 laura kernel: [97133.246571] BUG: soft lockup detected on CPU#1!
> Jun 14 00:57:55 laura kernel: [97133.247456] 
> Jun 14 00:57:55 laura kernel: [97133.247457] Call Trace: <IRQ> <ffffffff802a6eba>{softlockup_tick+250}
> Jun 14 00:57:55 laura kernel: [97133.247496]        <ffffffff80293bb7>{update_process_times+87} <ffffffff80279923>{smp_local_timer_interrupt+35}
> Jun 14 00:57:55 laura kernel: [97133.247546]        <ffffffff80279e61>{smp_apic_timer_interrupt+65} <ffffffff8026712e>{apic_timer_interrupt+98} <EOI>
> Jun 14 00:57:55 laura kernel: [97133.247599]        <ffffffff802526c3>{try_to_del_timer_sync+83} <ffffffff8026be38>{_spin_unlock_irqrestore+8}
> Jun 14 00:57:55 laura kernel: [97133.247652]        <ffffffff880eab78>{:ipmi_si:ipmi_thread+72} <ffffffff880eab30>{:ipmi_si:ipmi_thread+0}
> Jun 14 00:57:55 laura kernel: [97133.247707]        <ffffffff80237099>{kthread+217} <ffffffff8026743a>{child_rip+8}
> Jun 14 00:57:55 laura kernel: [97133.247757]        <ffffffff8029d280>{keventd_create_kthread+0} <ffffffff80236fc0>{kthread+0}
> Jun 14 00:57:55 laura kernel: [97133.247807]        <ffffffff80267432>{child_rip+0}
>
> I have several more, all of which include ipmi_si in one way or
> another.  Usually these lockups happened on CPUs 2 and 3, tho today I
> got one of CPU 1.
>
> It's interesting to note that these logs are happening every 30 minutes
> (munin queries 'openipmi -I open sensors' about once every 5 minutes).
>
> Jun 12 18:45:40 laura kernel: [ 2876.040688] BUG: soft lockup detected on CPU#2!
> Jun 12 19:15:54 laura kernel: [ 4688.141966] BUG: soft lockup detected on CPU#2!
> Jun 12 19:46:06 laura kernel: [ 6499.159958] BUG: soft lockup detected on CPU#3!
> Jun 12 20:16:17 laura kernel: [ 8309.394462] BUG: soft lockup detected on CPU#2!
> Jun 12 20:46:29 laura kernel: [10120.468415] BUG: soft lockup detected on CPU#3!
> Jun 12 21:16:42 laura kernel: [11931.614321] BUG: soft lockup detected on CPU#2!
> Jun 12 21:46:54 laura kernel: [13742.672285] BUG: soft lockup detected on CPU#2!
> [rebooted in between]
> Jun 14 00:57:55 laura kernel: [97133.246571] BUG: soft lockup detected on CPU#1!
> Jun 14 01:28:07 laura kernel: [98944.316467] BUG: soft lockup detected on CPU#3!
>
> Not having the ipmi_si module loaded means these warnings don't happen.
>
> I suspect this started happening when I upgraded the firmware of the
> iLo 100i BMC from 1.00 to version 1.23 (the other system's firmware
> wasn't upgraded yet).  I'm still reporting it here on the chance that it
> actually is a kernel/ipmi_si module bug.
>
> Cheers,
> Peter
>   


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Openipmi-developer] BUG: soft lockup detected on CPU#1, ipmi_si
  2006-06-19  3:59 ` [Openipmi-developer] " Corey Minyard
@ 2006-06-19  9:38   ` Peter Palfrader
  2006-06-22 14:03     ` Corey Minyard
  0 siblings, 1 reply; 9+ messages in thread
From: Peter Palfrader @ 2006-06-19  9:38 UTC (permalink / raw)
  To: linux-kernel, openipmi-developer

On Sun, 18 Jun 2006, Corey Minyard wrote:

> The IPMI driver spawns a low-priority thread that will poll the driver
> when it finds there is something to do.  It's possible that the hardware
> is not setting things properly and is always telling the driver it has
> to do something.  It's possible that the new version of the firmware
> enabled interrupts; I think there's a problem with the driver here; it
> should not really enable the kernel thread if interrupts are working. 
> The driver should also probably call schedule() instead of udelay() in
> the kernel thread when a short timeout is requested by the state machine.
> 
> In either situation, the kernel thread will sit there and spin, and if
> nothing else is scheduled for 10 seconds on that CPU you will get that
> warning.  Can you check a few things for me?
> 
> cat /proc/ipmi/0/si_stats and send me the output.

After running for about 35 minutes (and one instance of the soft lockup
warning):

| interrupts_enabled:    0
| short_timeouts:        8835
| long_timeouts:         263709
| timeout_restarts:      0
| idles:                 793108
| interrupts:            0
| attentions:            0
| flag_fetches:          2137
| hosed_count:           0
| complete_transactions: 3516
| events:                0
| watchdog_pretimeouts:  0
| incoming_messages:     0


> If you do "top", is the kipmi0 always running?

Yes, running since the system started around 11:00:

| root      1331  0.8  0.0     0    0 ?        SN   10:59   0:17  \_ [kipmi0]

> Is your IPMI interface KCS or SMIC?  The IPMI driver should report this
> in the system log at startup.

It's KCS:

| laura:~# dmesg  | grep -i ipmi
| [   85.110244] ipmi message handler version 39.0
| [   85.111491] ipmi device interface
| [   85.127866] IPMI System Interface driver.
| [   85.127929] ipmi_si: Trying SMBIOS-specified KCS state machine at I/O address 0xca2, slave address 0x20, irq 0
| [   85.274699] ipmi: Found new BMC (man_id: 0x000f85,  prod_id: 0x0000, dev_id: 0x00)
| [   85.274852]  IPMI KCS interface initialized
| [   85.284710] IPMI Watchdog: driver initialized

Cheers,
Peter
-- 
                           |  .''`.  ** Debian GNU/Linux **
      Peter Palfrader      | : :' :      The  universal
 http://www.palfrader.org/ | `. `'      Operating System
                           |   `-    http://www.debian.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Openipmi-developer] BUG: soft lockup detected on CPU#1, ipmi_si
  2006-06-19  9:38   ` Peter Palfrader
@ 2006-06-22 14:03     ` Corey Minyard
  2006-06-23  2:56       ` Matt Domsch
  2006-06-25  1:00       ` Peter Palfrader
  0 siblings, 2 replies; 9+ messages in thread
From: Corey Minyard @ 2006-06-22 14:03 UTC (permalink / raw)
  To: Peter Palfrader, linux-kernel, openipmi-developer

Peter, can you make a code change for me and try something out?

If possible, could you change the call to udelay(1) in the function
ipmi_thread() in drivers/char/ipmi_si_intf.c to be a call to schedule()
instead?  I'm guessing that will fix this problem.

-Corey

Peter Palfrader wrote:
> On Sun, 18 Jun 2006, Corey Minyard wrote:
>
>   
>> The IPMI driver spawns a low-priority thread that will poll the driver
>> when it finds there is something to do.  It's possible that the hardware
>> is not setting things properly and is always telling the driver it has
>> to do something.  It's possible that the new version of the firmware
>> enabled interrupts; I think there's a problem with the driver here; it
>> should not really enable the kernel thread if interrupts are working. 
>> The driver should also probably call schedule() instead of udelay() in
>> the kernel thread when a short timeout is requested by the state machine.
>>
>> In either situation, the kernel thread will sit there and spin, and if
>> nothing else is scheduled for 10 seconds on that CPU you will get that
>> warning.  Can you check a few things for me?
>>
>> cat /proc/ipmi/0/si_stats and send me the output.
>>     
>
> After running for about 35 minutes (and one instance of the soft lockup
> warning):
>
> | interrupts_enabled:    0
> | short_timeouts:        8835
> | long_timeouts:         263709
> | timeout_restarts:      0
> | idles:                 793108
> | interrupts:            0
> | attentions:            0
> | flag_fetches:          2137
> | hosed_count:           0
> | complete_transactions: 3516
> | events:                0
> | watchdog_pretimeouts:  0
> | incoming_messages:     0
>
>
>   
>> If you do "top", is the kipmi0 always running?
>>     
>
> Yes, running since the system started around 11:00:
>
> | root      1331  0.8  0.0     0    0 ?        SN   10:59   0:17  \_ [kipmi0]
>
>   
>> Is your IPMI interface KCS or SMIC?  The IPMI driver should report this
>> in the system log at startup.
>>     
>
> It's KCS:
>
> | laura:~# dmesg  | grep -i ipmi
> | [   85.110244] ipmi message handler version 39.0
> | [   85.111491] ipmi device interface
> | [   85.127866] IPMI System Interface driver.
> | [   85.127929] ipmi_si: Trying SMBIOS-specified KCS state machine at I/O address 0xca2, slave address 0x20, irq 0
> | [   85.274699] ipmi: Found new BMC (man_id: 0x000f85,  prod_id: 0x0000, dev_id: 0x00)
> | [   85.274852]  IPMI KCS interface initialized
> | [   85.284710] IPMI Watchdog: driver initialized
>
> Cheers,
> Peter
>   


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Openipmi-developer] BUG: soft lockup detected on CPU#1, ipmi_si
  2006-06-22 14:03     ` Corey Minyard
@ 2006-06-23  2:56       ` Matt Domsch
  2006-06-23 14:55         ` Corey Minyard
  2006-06-23 23:44         ` Corey Minyard
  2006-06-25  1:00       ` Peter Palfrader
  1 sibling, 2 replies; 9+ messages in thread
From: Matt Domsch @ 2006-06-23  2:56 UTC (permalink / raw)
  To: Corey Minyard; +Cc: Peter Palfrader, linux-kernel, openipmi-developer

On Thu, Jun 22, 2006 at 09:03:12AM -0500, Corey Minyard wrote:
> Peter, can you make a code change for me and try something out?
> 
> If possible, could you change the call to udelay(1) in the function
> ipmi_thread() in drivers/char/ipmi_si_intf.c to be a call to schedule()
> instead?  I'm guessing that will fix this problem.

won't that cause the thread to be scheduled out for at least one timer
tick (1-10ms depending on HZ), especially as it's at nice 19?  This
will cause the commands to be quite slow, which was the primary reason
for the kthread here in the first place.

Thanks,
Matt

-- 
Matt Domsch
Software Architect
Dell Linux Solutions linux.dell.com & www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Openipmi-developer] BUG: soft lockup detected on CPU#1, ipmi_si
  2006-06-23  2:56       ` Matt Domsch
@ 2006-06-23 14:55         ` Corey Minyard
  2006-06-23 23:44         ` Corey Minyard
  1 sibling, 0 replies; 9+ messages in thread
From: Corey Minyard @ 2006-06-23 14:55 UTC (permalink / raw)
  To: Matt Domsch; +Cc: Peter Palfrader, linux-kernel, openipmi-developer

Matt Domsch wrote:
> On Thu, Jun 22, 2006 at 09:03:12AM -0500, Corey Minyard wrote:
>   
>> Peter, can you make a code change for me and try something out?
>>
>> If possible, could you change the call to udelay(1) in the function
>> ipmi_thread() in drivers/char/ipmi_si_intf.c to be a call to schedule()
>> instead?  I'm guessing that will fix this problem.
>>     
>
> won't that cause the thread to be scheduled out for at least one timer
> tick (1-10ms depending on HZ), especially as it's at nice 19?  This
> will cause the commands to be quite slow, which was the primary reason
> for the kthread here in the first place.
>   
I think that will only happen if there are other things to run that are
higher priority, and you want those to run, anyway.  This doesn't affect
the processes priority like yield() would.

-Corey

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Openipmi-developer] BUG: soft lockup detected on CPU#1, ipmi_si
  2006-06-23  2:56       ` Matt Domsch
  2006-06-23 14:55         ` Corey Minyard
@ 2006-06-23 23:44         ` Corey Minyard
  2006-06-24 19:37           ` Matt Domsch
  1 sibling, 1 reply; 9+ messages in thread
From: Corey Minyard @ 2006-06-23 23:44 UTC (permalink / raw)
  To: Matt Domsch; +Cc: Peter Palfrader, openipmi-developer, linux-kernel

I also had a report from Matt that running the driver full-bore caused
the mouse to go haywire.  I did some testing and the mouse didn't go
haywire, but my keyboard did.  I changed the udelay(1) to schedule() and
kipmi0 is running at 100% as I type right now, and things seem to be
running smoothly.  Testing the performance, I got 4.5ms per message for
a get device id, which was the same as I saw before the change.  So I
think this change is a good idea.

I'll let Peter test it out when he gets his machines back, and if it all
looks good I'll do a patch.

Thanks,

-Corey

Matt Domsch wrote:
> On Thu, Jun 22, 2006 at 09:03:12AM -0500, Corey Minyard wrote:
>   
>> Peter, can you make a code change for me and try something out?
>>
>> If possible, could you change the call to udelay(1) in the function
>> ipmi_thread() in drivers/char/ipmi_si_intf.c to be a call to schedule()
>> instead?  I'm guessing that will fix this problem.
>>     
>
> won't that cause the thread to be scheduled out for at least one timer
> tick (1-10ms depending on HZ), especially as it's at nice 19?  This
> will cause the commands to be quite slow, which was the primary reason
> for the kthread here in the first place.
>
> Thanks,
> Matt
>
>   


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Openipmi-developer] BUG: soft lockup detected on CPU#1, ipmi_si
  2006-06-23 23:44         ` Corey Minyard
@ 2006-06-24 19:37           ` Matt Domsch
  0 siblings, 0 replies; 9+ messages in thread
From: Matt Domsch @ 2006-06-24 19:37 UTC (permalink / raw)
  To: Corey Minyard; +Cc: Peter Palfrader, openipmi-developer, linux-kernel

On Fri, Jun 23, 2006 at 06:44:48PM -0500, Corey Minyard wrote:
> I also had a report from Matt that running the driver full-bore caused
> the mouse to go haywire.  I did some testing and the mouse didn't go
> haywire, but my keyboard did.  I changed the udelay(1) to schedule() and
> kipmi0 is running at 100% as I type right now, and things seem to be
> running smoothly.  Testing the performance, I got 4.5ms per message for
> a get device id, which was the same as I saw before the change.  So I
> think this change is a good idea.

I tried this here too, and while I'm not at the console to try the
mouse, the times for 'ipmitool sensor list' are still within the same
range (5-10 seconds, average is about 6) as with udelay(1) instead.
Running some load generators to keep the CPUs pegged didn't change
this result either.  So I'm inclined to agree changing to schedule()
is the right approach.

Thanks,
Matt

-- 
Matt Domsch
Software Architect
Dell Linux Solutions linux.dell.com & www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [Openipmi-developer] BUG: soft lockup detected on CPU#1, ipmi_si
  2006-06-22 14:03     ` Corey Minyard
  2006-06-23  2:56       ` Matt Domsch
@ 2006-06-25  1:00       ` Peter Palfrader
  1 sibling, 0 replies; 9+ messages in thread
From: Peter Palfrader @ 2006-06-25  1:00 UTC (permalink / raw)
  To: linux-kernel, openipmi-developer

On Thu, 22 Jun 2006, Corey Minyard wrote:

> Peter, can you make a code change for me and try something out?
> 
> If possible, could you change the call to udelay(1) in the function
> ipmi_thread() in drivers/char/ipmi_si_intf.c to be a call to schedule()
> instead?  I'm guessing that will fix this problem.

That appears to work.  I can still get sensor data using ipmitool, and I
haven't gotten any of the soft lockup warnings in 4 hours.

Thanks!
-- 
                           |  .''`.  ** Debian GNU/Linux **
      Peter Palfrader      | : :' :      The  universal
 http://www.palfrader.org/ | `. `'      Operating System
                           |   `-    http://www.debian.org/

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-06-25  1:00 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-06-13 23:35 BUG: soft lockup detected on CPU#1, ipmi_si Peter Palfrader
2006-06-19  3:59 ` [Openipmi-developer] " Corey Minyard
2006-06-19  9:38   ` Peter Palfrader
2006-06-22 14:03     ` Corey Minyard
2006-06-23  2:56       ` Matt Domsch
2006-06-23 14:55         ` Corey Minyard
2006-06-23 23:44         ` Corey Minyard
2006-06-24 19:37           ` Matt Domsch
2006-06-25  1:00       ` Peter Palfrader

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).