* [BUG] hpsa: Controller lockup detected: 0x00150028
@ 2015-05-18 12:40 Peter Zijlstra
2015-05-18 13:57 ` Oelke, Mark
0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2015-05-18 12:40 UTC (permalink / raw)
To: don.brace; +Cc: iss_storagedev, storagedev, linux-scsi
Hi,
On my HP-DL180-G6 with a HP Smart Array P212.
I can reliably trigger a controller lockup by running smartctl.
I'm trying to monitor my HDD temps using:
for ((i=0; i<8; i++)) ; do
smartctl -d cciss,$i -a /dev/sg0 | grep ^194 ;
done | awk '{t=$10; if (t > T) T = t;} END {print T}'
After a few of those runs, I get:
[ 1540.277776] hpsa 0000:06:00.0: Controller lockup detected: 0x00150028
And my disks are gone.
With linux 3.16 the whole kernel came down with NMI watchdog timeouts /
RCU stalls in the detect_lockup() worklet.
On linux 4.0 those appear to be gone, but the controller isn't coming
back either.
It this a known 'feature'; is there anything I can do to help
diagnose/fix this issue?
~ Peter
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [BUG] hpsa: Controller lockup detected: 0x00150028
2015-05-18 12:40 [BUG] hpsa: Controller lockup detected: 0x00150028 Peter Zijlstra
@ 2015-05-18 13:57 ` Oelke, Mark
2015-05-18 15:20 ` Peter Zijlstra
0 siblings, 1 reply; 11+ messages in thread
From: Oelke, Mark @ 2015-05-18 13:57 UTC (permalink / raw)
To: Peter Zijlstra, don.brace; +Cc: ISS StorageDev, storagedev, linux-scsi
The P212/P410/P411 firmware was recently spun to address an issue that sounds exactly like this problem.
Which version of controller firmware are you using?
-----Original Message-----
From: Peter Zijlstra [mailto:peterz@infradead.org]
Sent: Monday, May 18, 2015 7:41 AM
To: don.brace@pmcs.com
Cc: ISS StorageDev; storagedev@pmcs.com; linux-scsi@vger.kernel.org
Subject: [BUG] hpsa: Controller lockup detected: 0x00150028
Hi,
On my HP-DL180-G6 with a HP Smart Array P212.
I can reliably trigger a controller lockup by running smartctl.
I'm trying to monitor my HDD temps using:
for ((i=0; i<8; i++)) ; do
smartctl -d cciss,$i -a /dev/sg0 | grep ^194 ;
done | awk '{t=$10; if (t > T) T = t;} END {print T}'
After a few of those runs, I get:
[ 1540.277776] hpsa 0000:06:00.0: Controller lockup detected: 0x00150028
And my disks are gone.
With linux 3.16 the whole kernel came down with NMI watchdog timeouts / RCU stalls in the detect_lockup() worklet.
On linux 4.0 those appear to be gone, but the controller isn't coming back either.
It this a known 'feature'; is there anything I can do to help diagnose/fix this issue?
~ Peter
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [BUG] hpsa: Controller lockup detected: 0x00150028
2015-05-18 13:57 ` Oelke, Mark
@ 2015-05-18 15:20 ` Peter Zijlstra
2015-05-18 16:03 ` Peter Zijlstra
0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2015-05-18 15:20 UTC (permalink / raw)
To: Oelke, Mark; +Cc: don.brace, ISS StorageDev, storagedev, linux-scsi
On Mon, May 18, 2015 at 01:57:39PM +0000, Oelke, Mark wrote:
> The P212/P410/P411 firmware was recently spun to address an issue that sounds exactly like this problem.
> Which version of controller firmware are you using?
Smart Array P212 in Slot 1
Hardware Revision: C
Firmware Version: 6.60
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [BUG] hpsa: Controller lockup detected: 0x00150028
2015-05-18 15:20 ` Peter Zijlstra
@ 2015-05-18 16:03 ` Peter Zijlstra
2015-05-18 16:11 ` Peter Zijlstra
0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2015-05-18 16:03 UTC (permalink / raw)
To: Oelke, Mark; +Cc: don.brace, ISS StorageDev, storagedev, linux-scsi
On Mon, May 18, 2015 at 05:20:34PM +0200, Peter Zijlstra wrote:
> On Mon, May 18, 2015 at 01:57:39PM +0000, Oelke, Mark wrote:
> > The P212/P410/P411 firmware was recently spun to address an issue that sounds exactly like this problem.
> > Which version of controller firmware are you using?
>
> Smart Array P212 in Slot 1
>
> Hardware Revision: C
> Firmware Version: 6.60
I've updated to 6.62 and it appears to be working now; or rather, it has
not locked up yet where I think it would've locked up by now earlier.
I'll let it run for a few more hours before calling it fixed, I'll let
you know.
Thanks!
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [BUG] hpsa: Controller lockup detected: 0x00150028
2015-05-18 16:03 ` Peter Zijlstra
@ 2015-05-18 16:11 ` Peter Zijlstra
2015-05-22 15:10 ` Tomas Henzl
0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2015-05-18 16:11 UTC (permalink / raw)
To: Oelke, Mark; +Cc: don.brace, ISS StorageDev, storagedev, linux-scsi
On Mon, May 18, 2015 at 06:03:45PM +0200, Peter Zijlstra wrote:
> On Mon, May 18, 2015 at 05:20:34PM +0200, Peter Zijlstra wrote:
> > On Mon, May 18, 2015 at 01:57:39PM +0000, Oelke, Mark wrote:
> > > The P212/P410/P411 firmware was recently spun to address an issue that sounds exactly like this problem.
> > > Which version of controller firmware are you using?
> >
> > Smart Array P212 in Slot 1
> >
> > Hardware Revision: C
> > Firmware Version: 6.60
>
> I've updated to 6.62 and it appears to be working now; or rather, it has
> not locked up yet where I think it would've locked up by now earlier.
>
> I'll let it run for a few more hours before calling it fixed, I'll let
> you know.
And right after sending this email it went...
[ 1119.052144] hpsa 0000:06:00.0: Controller lockup detected: 0x00150029
So sadly no dice.
Anything else I can do?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [BUG] hpsa: Controller lockup detected: 0x00150028
2015-05-18 16:11 ` Peter Zijlstra
@ 2015-05-22 15:10 ` Tomas Henzl
2015-05-22 16:40 ` Peter Zijlstra
0 siblings, 1 reply; 11+ messages in thread
From: Tomas Henzl @ 2015-05-22 15:10 UTC (permalink / raw)
To: Peter Zijlstra, Oelke, Mark
Cc: don.brace, ISS StorageDev, storagedev, linux-scsi
On 05/18/2015 06:11 PM, Peter Zijlstra wrote:
> On Mon, May 18, 2015 at 06:03:45PM +0200, Peter Zijlstra wrote:
>> On Mon, May 18, 2015 at 05:20:34PM +0200, Peter Zijlstra wrote:
>>> On Mon, May 18, 2015 at 01:57:39PM +0000, Oelke, Mark wrote:
>>>> The P212/P410/P411 firmware was recently spun to address an issue that sounds exactly like this problem.
>>>> Which version of controller firmware are you using?
>>>
>>> Smart Array P212 in Slot 1
>>>
>>> Hardware Revision: C
>>> Firmware Version: 6.60
>>
>> I've updated to 6.62 and it appears to be working now; or rather, it has
>> not locked up yet where I think it would've locked up by now earlier.
>>
>> I'll let it run for a few more hours before calling it fixed, I'll let
>> you know.
>
> And right after sending this email it went...
>
> [ 1119.052144] hpsa 0000:06:00.0: Controller lockup detected: 0x00150029
>
> So sadly no dice.
>
> Anything else I can do?
An older issue for mptsas seems to handle a similar case
2a1b7e575b [SCSI] mptsas: fix hangs caused by ATA pass-through
that might be for hpsa -
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -1067,6 +1067,8 @@ static int hpsa_slave_alloc(struct scsi_device *sdev)
if (sd != NULL)
sdev->hostdata = sd;
spin_unlock_irqrestore(&h->devlock, flags);
+
+ blk_queue_dma_alignment (sdev->request_queue, 512 - 1);
return 0;
}
-tm
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [BUG] hpsa: Controller lockup detected: 0x00150028
2015-05-22 15:10 ` Tomas Henzl
@ 2015-05-22 16:40 ` Peter Zijlstra
2015-05-22 16:48 ` Handzik, Joe
0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2015-05-22 16:40 UTC (permalink / raw)
To: Tomas Henzl
Cc: Oelke, Mark, don.brace, ISS StorageDev, storagedev, linux-scsi
On Fri, May 22, 2015 at 05:10:44PM +0200, Tomas Henzl wrote:
> >> I've updated to 6.62 and it appears to be working now; or rather, it has
I've since gotten 6.64 from HP to test; which does not seem public yet.
6.64 actually fixes the issue for me.
> An older issue for mptsas seems to handle a similar case
> 2a1b7e575b [SCSI] mptsas: fix hangs caused by ATA pass-through
> that might be for hpsa -
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -1067,6 +1067,8 @@ static int hpsa_slave_alloc(struct scsi_device *sdev)
> if (sd != NULL)
> sdev->hostdata = sd;
> spin_unlock_irqrestore(&h->devlock, flags);
> +
> + blk_queue_dma_alignment (sdev->request_queue, 512 - 1);
> return 0;
> }
That does indeed seem _very_ similar; I'll have to defer to Mark Oelke
and or Don Brace to say if the above is a useful alternative. Since they
seem to now know what was the root cause.
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [BUG] hpsa: Controller lockup detected: 0x00150028
2015-05-22 16:40 ` Peter Zijlstra
@ 2015-05-22 16:48 ` Handzik, Joe
2015-08-24 9:43 ` Wouter Depuydt
0 siblings, 1 reply; 11+ messages in thread
From: Handzik, Joe @ 2015-05-22 16:48 UTC (permalink / raw)
To: Peter Zijlstra, Tomas Henzl
Cc: Oelke, Mark, don.brace, ISS StorageDev, storagedev, linux-scsi
No, the problem here (iirc) actually dealt with buffers in the firmware.
Don or Mark, agree?
Joe
-----Original Message-----
From: Peter Zijlstra [mailto:peterz@infradead.org]
Sent: Friday, May 22, 2015 11:40 AM
To: Tomas Henzl
Cc: Oelke, Mark; don.brace@pmcs.com; ISS StorageDev; storagedev@pmcs.com; linux-scsi@vger.kernel.org
Subject: Re: [BUG] hpsa: Controller lockup detected: 0x00150028
On Fri, May 22, 2015 at 05:10:44PM +0200, Tomas Henzl wrote:
> >> I've updated to 6.62 and it appears to be working now; or rather, it has
I've since gotten 6.64 from HP to test; which does not seem public yet.
6.64 actually fixes the issue for me.
> An older issue for mptsas seems to handle a similar case
> 2a1b7e575b [SCSI] mptsas: fix hangs caused by ATA pass-through
> that might be for hpsa -
> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -1067,6 +1067,8 @@ static int hpsa_slave_alloc(struct scsi_device *sdev)
> if (sd != NULL)
> sdev->hostdata = sd;
> spin_unlock_irqrestore(&h->devlock, flags);
> +
> + blk_queue_dma_alignment (sdev->request_queue, 512 - 1);
> return 0;
> }
That does indeed seem _very_ similar; I'll have to defer to Mark Oelke
and or Don Brace to say if the above is a useful alternative. Since they
seem to now know what was the root cause.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [BUG] hpsa: Controller lockup detected: 0x00150028
2015-05-22 16:48 ` Handzik, Joe
@ 2015-08-24 9:43 ` Wouter Depuydt
2015-08-24 10:02 ` Wouter Depuydt
0 siblings, 1 reply; 11+ messages in thread
From: Wouter Depuydt @ 2015-08-24 9:43 UTC (permalink / raw)
To: linux-scsi
> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz <at> infradead.org]
> Sent: Friday, May 22, 2015 11:40 AM
> To: Tomas Henzl
> Cc: Oelke, Mark; don.brace <at> pmcs.com; ISS StorageDev; storagedev <at>
pmcs.com; linux-scsi <at> vger.kernel.org
> Subject: Re: [BUG] hpsa: Controller lockup detected: 0x00150028
>
> On Fri, May 22, 2015 at 05:10:44PM +0200, Tomas Henzl wrote:
> > >> I've updated to 6.62 and it appears to be working now; or rather, it has
>
> I've since gotten 6.64 from HP to test; which does not seem public yet.
>
> 6.64 actually fixes the issue for me.
>
Hi everone,
I've experienced a similar problem with a P411 controller in HBA mode.
Serial Number PDNMH0ARH8P04A
Model HP Smart Array P441 Controller
Firmware Version 2.52
This seems to be the latest firmware:
http://h20564.www2.hpe.com/hpsc/swd/public/detail?sp4ts.oid=7274903&swItemId=MTX_a476e21cd5e142608ff8d6aed5&swEnvOid=4176
I'm running smartd for monitoring, dayly short tests and weekly long tests.
Aug 23 13:57:11 smartd[3344]: Device: /dev/sdv [SAT], SMART Usage Attribute:
194 Temperature_Celsius changed from 41 to 42
Aug 23 13:57:42 kernel: [349157.766608] hpsa 0000:05:00.0: Abort request on
C6:B2:T18:L0
Aug 23 13:57:42 kernel: [349157.830545] hpsa 0000:05:00.0: Abort request on
C6:B2:T19:L0
Aug 23 14:00:10 kernel: [349305.554986] [<ffffffffc02a3d20>]
hpsa_send_abort+0xa0/0x250 [hpsa]
Aug 23 14:00:10 kernel: [349305.555000] [<ffffffffc02a7ff3>]
hpsa_eh_abort_handler+0x683/0x1290 [hpsa]
Aug 23 14:00:10 kernel: [349305.575005] [<ffffffffc02a3d20>]
hpsa_send_abort+0xa0/0x250 [hpsa]
Aug 23 14:00:10 kernel: [349305.575019] [<ffffffffc02a7ff3>]
hpsa_eh_abort_handler+0x683/0x1290 [hpsa]
Aug 23 14:02:10 kernel: [349425.572350] [<ffffffffc02a3d20>]
hpsa_send_abort+0xa0/0x250 [hpsa]
Aug 23 14:02:10 kernel: [349425.572364] [<ffffffffc02a7ff3>]
hpsa_eh_abort_handler+0x683/0x1290 [hpsa]
Aug 23 14:02:10 kernel: [349425.682988] [<ffffffffc02a3d20>]
hpsa_send_abort+0xa0/0x250 [hpsa]
Aug 23 14:02:10 kernel: [349425.683001] [<ffffffffc02a7ff3>]
hpsa_eh_abort_handler+0x683/0x1290 [hpsa]
Aug 23 14:03:10 kernel: [349457.674593] hpsa 0000:05:00.0: Controller lockup
detected: 0x00130001
Aug 23 14:03:38 kernel: [349485.657605] Workqueue: events
hpsa_monitor_ctlr_worker [hpsa]
Aug 23 14:03:38 kernel: [349485.657736] <<EOE>> [<ffffffffc02a5384>]
fail_all_cmds_on_list+0x74/0x6c0 [hpsa]
Aug 23 14:03:38 kernel: [349485.657765] [<ffffffffc02aa9bb>]
hpsa_monitor_ctlr_worker+0x40b/0x4e0 [hpsa]
Aug 23 14:03:43 kernel: [349518.024245] Workqueue: events
hpsa_monitor_ctlr_worker [hpsa]
Aug 23 14:04:10 kernel: [349545.689592] [<ffffffffc02a3d20>]
hpsa_send_abort+0xa0/0x250 [hpsa]
Aug 23 14:04:10 kernel: [349545.689605] [<ffffffffc02a7ff3>]
hpsa_eh_abort_handler+0x683/0x1290 [hpsa]
Aug 23 14:04:10 kernel: [349545.805674] [<ffffffffc02a3d20>]
hpsa_send_abort+0xa0/0x250 [hpsa]
Aug 23 14:04:10 kernel: [349545.805693] [<ffffffffc02a7ff3>]
hpsa_eh_abort_handler+0x683/0x1290 [hpsa]
Aug 23 14:06:11 kernel: [349665.810469] [<ffffffffc02a3d20>]
hpsa_send_abort+0xa0/0x250 [hpsa]
Aug 23 14:06:11 kernel: [349665.810483] [<ffffffffc02a7ff3>]
hpsa_eh_abort_handler+0x683/0x1290 [hpsa]
W.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [BUG] hpsa: Controller lockup detected: 0x00150028
2015-08-24 9:43 ` Wouter Depuydt
@ 2015-08-24 10:02 ` Wouter Depuydt
2015-08-24 14:11 ` Don Brace
0 siblings, 1 reply; 11+ messages in thread
From: Wouter Depuydt @ 2015-08-24 10:02 UTC (permalink / raw)
To: linux-scsi
Wouter Depuydt <wouter.depuydt <at> gmail.com> writes:
>
> > -----Original Message-----
> > From: Peter Zijlstra [mailto:peterz <at> infradead.org]
> > Sent: Friday, May 22, 2015 11:40 AM
> > To: Tomas Henzl
> > Cc: Oelke, Mark; don.brace <at> pmcs.com; ISS StorageDev; storagedev <at>
> pmcs.com; linux-scsi <at> vger.kernel.org
> > Subject: Re: [BUG] hpsa: Controller lockup detected: 0x00150028
> >
> > On Fri, May 22, 2015 at 05:10:44PM +0200, Tomas Henzl wrote:
> > > >> I've updated to 6.62 and it appears to be working now; or rather,
it has
> >
> > I've since gotten 6.64 from HP to test; which does not seem public yet.
> >
> > 6.64 actually fixes the issue for me.
> >
>
> Hi everone,
>
> I've experienced a similar problem with a P411 controller in HBA mode.
> Serial Number PDNMH0ARH8P04A
> Model HP Smart Array P441 Controller
> Firmware Version 2.52
>
Other System info:
HP Proliant D380p Gen9
Ubuntu LTS 14.04
ii linux-image-3.19.0-26-generic
ii linux-image-extra-3.19.0-26-generic
ii linux-image-generic-lts-vivid
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [BUG] hpsa: Controller lockup detected: 0x00150028
2015-08-24 10:02 ` Wouter Depuydt
@ 2015-08-24 14:11 ` Don Brace
0 siblings, 0 replies; 11+ messages in thread
From: Don Brace @ 2015-08-24 14:11 UTC (permalink / raw)
To: Wouter Depuydt, linux-scsi
On 08/24/2015 05:02 AM, Wouter Depuydt wrote:
> Wouter Depuydt <wouter.depuydt <at> gmail.com> writes:
>
>>> -----Original Message-----
>>> From: Peter Zijlstra [mailto:peterz <at> infradead.org]
>>> Sent: Friday, May 22, 2015 11:40 AM
>>> To: Tomas Henzl
>>> Cc: Oelke, Mark; don.brace <at> pmcs.com; ISS StorageDev; storagedev <at>
>> pmcs.com; linux-scsi <at> vger.kernel.org
>>> Subject: Re: [BUG] hpsa: Controller lockup detected: 0x00150028
>>>
>>> On Fri, May 22, 2015 at 05:10:44PM +0200, Tomas Henzl wrote:
>>>>>> I've updated to 6.62 and it appears to be working now; or rather,
> it has
>>> I've since gotten 6.64 from HP to test; which does not seem public yet.
>>>
>>> 6.64 actually fixes the issue for me.
>>>
>> Hi everone,
>>
>> I've experienced a similar problem with a P411 controller in HBA mode.
>> Serial Number PDNMH0ARH8P04A
>> Model HP Smart Array P441 Controller
>> Firmware Version 2.52
>>
> Other System info:
>
> HP Proliant D380p Gen9
> Ubuntu LTS 14.04
> ii linux-image-3.19.0-26-generic
> ii linux-image-extra-3.19.0-26-generic
> ii linux-image-generic-lts-vivid
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
I see this issue addressed in the Firmware update page under the "Fixes"
tab.
http://h20564.www2.hpe.com/hpsc/swd/public/detail?sp4ts.oid=3984645&swItemId=MTX_55b304486f544f148de6c5cc6e&swEnvOid=4103#tab4
Problems Fixed:
Running SMARTCTL (smartmontools) on HP Proliant G6/G7 (Px1x) Smart
Array controllers that have firmware version 5.70 to 6.62 installed with
SATA drives attached may result in system not responding or reboot. Wehn
reboot occurred, a reboot 1719 POST error message with lockup 0x15
displayed.
Hope this helps you.
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2015-08-24 14:11 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-18 12:40 [BUG] hpsa: Controller lockup detected: 0x00150028 Peter Zijlstra
2015-05-18 13:57 ` Oelke, Mark
2015-05-18 15:20 ` Peter Zijlstra
2015-05-18 16:03 ` Peter Zijlstra
2015-05-18 16:11 ` Peter Zijlstra
2015-05-22 15:10 ` Tomas Henzl
2015-05-22 16:40 ` Peter Zijlstra
2015-05-22 16:48 ` Handzik, Joe
2015-08-24 9:43 ` Wouter Depuydt
2015-08-24 10:02 ` Wouter Depuydt
2015-08-24 14:11 ` Don Brace
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.