All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] hpsa: Controller lockup detected: 0x00150028
@ 2015-05-18 12:40 Peter Zijlstra
  2015-05-18 13:57 ` Oelke, Mark
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2015-05-18 12:40 UTC (permalink / raw)
  To: don.brace; +Cc: iss_storagedev, storagedev, linux-scsi

Hi,

On my HP-DL180-G6 with a HP Smart Array P212.

I can reliably trigger a controller lockup by running smartctl.

I'm trying to monitor my HDD temps using:

  for ((i=0; i<8; i++)) ; do
	smartctl -d cciss,$i -a /dev/sg0 | grep ^194 ;

  done | awk '{t=$10; if (t > T) T = t;} END {print T}'

After a few of those runs, I get:

[ 1540.277776] hpsa 0000:06:00.0: Controller lockup detected: 0x00150028

And my disks are gone.

With linux 3.16 the whole kernel came down with NMI watchdog timeouts /
RCU stalls in the detect_lockup() worklet.

On linux 4.0 those appear to be gone, but the controller isn't coming
back either.

It this a known 'feature'; is there anything I can do to help
diagnose/fix this issue?

 ~ Peter

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [BUG] hpsa: Controller lockup detected: 0x00150028
  2015-05-18 12:40 [BUG] hpsa: Controller lockup detected: 0x00150028 Peter Zijlstra
@ 2015-05-18 13:57 ` Oelke, Mark
  2015-05-18 15:20   ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Oelke, Mark @ 2015-05-18 13:57 UTC (permalink / raw)
  To: Peter Zijlstra, don.brace; +Cc: ISS StorageDev, storagedev, linux-scsi

The P212/P410/P411 firmware was recently spun to address an issue that sounds exactly like this problem.
Which version of controller firmware are you using?

-----Original Message-----
From: Peter Zijlstra [mailto:peterz@infradead.org] 
Sent: Monday, May 18, 2015 7:41 AM
To: don.brace@pmcs.com
Cc: ISS StorageDev; storagedev@pmcs.com; linux-scsi@vger.kernel.org
Subject: [BUG] hpsa: Controller lockup detected: 0x00150028

Hi,

On my HP-DL180-G6 with a HP Smart Array P212.

I can reliably trigger a controller lockup by running smartctl.

I'm trying to monitor my HDD temps using:

  for ((i=0; i<8; i++)) ; do
	smartctl -d cciss,$i -a /dev/sg0 | grep ^194 ;

  done | awk '{t=$10; if (t > T) T = t;} END {print T}'

After a few of those runs, I get:

[ 1540.277776] hpsa 0000:06:00.0: Controller lockup detected: 0x00150028

And my disks are gone.

With linux 3.16 the whole kernel came down with NMI watchdog timeouts / RCU stalls in the detect_lockup() worklet.

On linux 4.0 those appear to be gone, but the controller isn't coming back either.

It this a known 'feature'; is there anything I can do to help diagnose/fix this issue?

 ~ Peter

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] hpsa: Controller lockup detected: 0x00150028
  2015-05-18 13:57 ` Oelke, Mark
@ 2015-05-18 15:20   ` Peter Zijlstra
  2015-05-18 16:03     ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2015-05-18 15:20 UTC (permalink / raw)
  To: Oelke, Mark; +Cc: don.brace, ISS StorageDev, storagedev, linux-scsi

On Mon, May 18, 2015 at 01:57:39PM +0000, Oelke, Mark wrote:
> The P212/P410/P411 firmware was recently spun to address an issue that sounds exactly like this problem.
> Which version of controller firmware are you using?

Smart Array P212 in Slot 1

   Hardware Revision: C
   Firmware Version: 6.60

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] hpsa: Controller lockup detected: 0x00150028
  2015-05-18 15:20   ` Peter Zijlstra
@ 2015-05-18 16:03     ` Peter Zijlstra
  2015-05-18 16:11       ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2015-05-18 16:03 UTC (permalink / raw)
  To: Oelke, Mark; +Cc: don.brace, ISS StorageDev, storagedev, linux-scsi

On Mon, May 18, 2015 at 05:20:34PM +0200, Peter Zijlstra wrote:
> On Mon, May 18, 2015 at 01:57:39PM +0000, Oelke, Mark wrote:
> > The P212/P410/P411 firmware was recently spun to address an issue that sounds exactly like this problem.
> > Which version of controller firmware are you using?
> 
> Smart Array P212 in Slot 1
> 
>    Hardware Revision: C
>    Firmware Version: 6.60

I've updated to 6.62 and it appears to be working now; or rather, it has
not locked up yet where I think it would've locked up by now earlier.

I'll let it run for a few more hours before calling it fixed, I'll let
you know.

Thanks!

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] hpsa: Controller lockup detected: 0x00150028
  2015-05-18 16:03     ` Peter Zijlstra
@ 2015-05-18 16:11       ` Peter Zijlstra
  2015-05-22 15:10         ` Tomas Henzl
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2015-05-18 16:11 UTC (permalink / raw)
  To: Oelke, Mark; +Cc: don.brace, ISS StorageDev, storagedev, linux-scsi

On Mon, May 18, 2015 at 06:03:45PM +0200, Peter Zijlstra wrote:
> On Mon, May 18, 2015 at 05:20:34PM +0200, Peter Zijlstra wrote:
> > On Mon, May 18, 2015 at 01:57:39PM +0000, Oelke, Mark wrote:
> > > The P212/P410/P411 firmware was recently spun to address an issue that sounds exactly like this problem.
> > > Which version of controller firmware are you using?
> > 
> > Smart Array P212 in Slot 1
> > 
> >    Hardware Revision: C
> >    Firmware Version: 6.60
> 
> I've updated to 6.62 and it appears to be working now; or rather, it has
> not locked up yet where I think it would've locked up by now earlier.
> 
> I'll let it run for a few more hours before calling it fixed, I'll let
> you know.

And right after sending this email it went...

[ 1119.052144] hpsa 0000:06:00.0: Controller lockup detected: 0x00150029

So sadly no dice.

Anything else I can do?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] hpsa: Controller lockup detected: 0x00150028
  2015-05-18 16:11       ` Peter Zijlstra
@ 2015-05-22 15:10         ` Tomas Henzl
  2015-05-22 16:40           ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Tomas Henzl @ 2015-05-22 15:10 UTC (permalink / raw)
  To: Peter Zijlstra, Oelke, Mark
  Cc: don.brace, ISS StorageDev, storagedev, linux-scsi

On 05/18/2015 06:11 PM, Peter Zijlstra wrote:
> On Mon, May 18, 2015 at 06:03:45PM +0200, Peter Zijlstra wrote:
>> On Mon, May 18, 2015 at 05:20:34PM +0200, Peter Zijlstra wrote:
>>> On Mon, May 18, 2015 at 01:57:39PM +0000, Oelke, Mark wrote:
>>>> The P212/P410/P411 firmware was recently spun to address an issue that sounds exactly like this problem.
>>>> Which version of controller firmware are you using?
>>>
>>> Smart Array P212 in Slot 1
>>>
>>>    Hardware Revision: C
>>>    Firmware Version: 6.60
>>
>> I've updated to 6.62 and it appears to be working now; or rather, it has
>> not locked up yet where I think it would've locked up by now earlier.
>>
>> I'll let it run for a few more hours before calling it fixed, I'll let
>> you know.
> 
> And right after sending this email it went...
> 
> [ 1119.052144] hpsa 0000:06:00.0: Controller lockup detected: 0x00150029
> 
> So sadly no dice.
> 
> Anything else I can do?
An older issue for mptsas seems to handle a similar case
2a1b7e575b [SCSI] mptsas: fix hangs caused by ATA pass-through
that might be for hpsa -
--- a/drivers/scsi/hpsa.c
+++ b/drivers/scsi/hpsa.c
@@ -1067,6 +1067,8 @@ static int hpsa_slave_alloc(struct scsi_device *sdev)
        if (sd != NULL)
                sdev->hostdata = sd;
        spin_unlock_irqrestore(&h->devlock, flags);
+
+       blk_queue_dma_alignment (sdev->request_queue, 512 - 1);
        return 0;
 }
-tm

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] hpsa: Controller lockup detected: 0x00150028
  2015-05-22 15:10         ` Tomas Henzl
@ 2015-05-22 16:40           ` Peter Zijlstra
  2015-05-22 16:48             ` Handzik, Joe
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2015-05-22 16:40 UTC (permalink / raw)
  To: Tomas Henzl
  Cc: Oelke, Mark, don.brace, ISS StorageDev, storagedev, linux-scsi

On Fri, May 22, 2015 at 05:10:44PM +0200, Tomas Henzl wrote:
> >> I've updated to 6.62 and it appears to be working now; or rather, it has

I've since gotten 6.64 from HP to test; which does not seem public yet.

6.64 actually fixes the issue for me.

> An older issue for mptsas seems to handle a similar case
> 2a1b7e575b [SCSI] mptsas: fix hangs caused by ATA pass-through
> that might be for hpsa -

> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -1067,6 +1067,8 @@ static int hpsa_slave_alloc(struct scsi_device *sdev)
>         if (sd != NULL)
>                 sdev->hostdata = sd;
>         spin_unlock_irqrestore(&h->devlock, flags);
> +
> +       blk_queue_dma_alignment (sdev->request_queue, 512 - 1);
>         return 0;
>  }

That does indeed seem _very_ similar; I'll have to defer to Mark Oelke
and or Don Brace to say if the above is a useful alternative. Since they
seem to now know what was the root cause.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: [BUG] hpsa: Controller lockup detected: 0x00150028
  2015-05-22 16:40           ` Peter Zijlstra
@ 2015-05-22 16:48             ` Handzik, Joe
  2015-08-24  9:43               ` Wouter Depuydt
  0 siblings, 1 reply; 11+ messages in thread
From: Handzik, Joe @ 2015-05-22 16:48 UTC (permalink / raw)
  To: Peter Zijlstra, Tomas Henzl
  Cc: Oelke, Mark, don.brace, ISS StorageDev, storagedev, linux-scsi

No, the problem here (iirc) actually dealt with buffers in the firmware.

Don or Mark, agree?

Joe

-----Original Message-----
From: Peter Zijlstra [mailto:peterz@infradead.org] 
Sent: Friday, May 22, 2015 11:40 AM
To: Tomas Henzl
Cc: Oelke, Mark; don.brace@pmcs.com; ISS StorageDev; storagedev@pmcs.com; linux-scsi@vger.kernel.org
Subject: Re: [BUG] hpsa: Controller lockup detected: 0x00150028

On Fri, May 22, 2015 at 05:10:44PM +0200, Tomas Henzl wrote:
> >> I've updated to 6.62 and it appears to be working now; or rather, it has

I've since gotten 6.64 from HP to test; which does not seem public yet.

6.64 actually fixes the issue for me.

> An older issue for mptsas seems to handle a similar case
> 2a1b7e575b [SCSI] mptsas: fix hangs caused by ATA pass-through
> that might be for hpsa -

> --- a/drivers/scsi/hpsa.c
> +++ b/drivers/scsi/hpsa.c
> @@ -1067,6 +1067,8 @@ static int hpsa_slave_alloc(struct scsi_device *sdev)
>         if (sd != NULL)
>                 sdev->hostdata = sd;
>         spin_unlock_irqrestore(&h->devlock, flags);
> +
> +       blk_queue_dma_alignment (sdev->request_queue, 512 - 1);
>         return 0;
>  }

That does indeed seem _very_ similar; I'll have to defer to Mark Oelke
and or Don Brace to say if the above is a useful alternative. Since they
seem to now know what was the root cause.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] hpsa: Controller lockup detected: 0x00150028
  2015-05-22 16:48             ` Handzik, Joe
@ 2015-08-24  9:43               ` Wouter Depuydt
  2015-08-24 10:02                 ` Wouter Depuydt
  0 siblings, 1 reply; 11+ messages in thread
From: Wouter Depuydt @ 2015-08-24  9:43 UTC (permalink / raw)
  To: linux-scsi

> -----Original Message-----
> From: Peter Zijlstra [mailto:peterz <at> infradead.org] 
> Sent: Friday, May 22, 2015 11:40 AM
> To: Tomas Henzl
> Cc: Oelke, Mark; don.brace <at> pmcs.com; ISS StorageDev; storagedev <at>
pmcs.com; linux-scsi <at> vger.kernel.org
> Subject: Re: [BUG] hpsa: Controller lockup detected: 0x00150028
> 
> On Fri, May 22, 2015 at 05:10:44PM +0200, Tomas Henzl wrote:
> > >> I've updated to 6.62 and it appears to be working now; or rather, it has
> 
> I've since gotten 6.64 from HP to test; which does not seem public yet.
> 
> 6.64 actually fixes the issue for me.
> 

Hi everone,

I've experienced a similar problem with a P411 controller in HBA mode.
Serial Number	PDNMH0ARH8P04A
Model	HP Smart Array P441 Controller
Firmware Version	2.52

This seems to be the latest firmware:
http://h20564.www2.hpe.com/hpsc/swd/public/detail?sp4ts.oid=7274903&swItemId=MTX_a476e21cd5e142608ff8d6aed5&swEnvOid=4176

I'm running smartd for monitoring, dayly short tests and weekly long tests.

Aug 23 13:57:11 smartd[3344]: Device: /dev/sdv [SAT], SMART Usage Attribute:
194 Temperature_Celsius changed from 41 to 42
Aug 23 13:57:42 kernel: [349157.766608] hpsa 0000:05:00.0: Abort request on
C6:B2:T18:L0
Aug 23 13:57:42 kernel: [349157.830545] hpsa 0000:05:00.0: Abort request on
C6:B2:T19:L0
Aug 23 14:00:10 kernel: [349305.554986]  [<ffffffffc02a3d20>]
hpsa_send_abort+0xa0/0x250 [hpsa]
Aug 23 14:00:10 kernel: [349305.555000]  [<ffffffffc02a7ff3>]
hpsa_eh_abort_handler+0x683/0x1290 [hpsa]
Aug 23 14:00:10 kernel: [349305.575005]  [<ffffffffc02a3d20>]
hpsa_send_abort+0xa0/0x250 [hpsa]
Aug 23 14:00:10 kernel: [349305.575019]  [<ffffffffc02a7ff3>]
hpsa_eh_abort_handler+0x683/0x1290 [hpsa]
Aug 23 14:02:10 kernel: [349425.572350]  [<ffffffffc02a3d20>]
hpsa_send_abort+0xa0/0x250 [hpsa]
Aug 23 14:02:10 kernel: [349425.572364]  [<ffffffffc02a7ff3>]
hpsa_eh_abort_handler+0x683/0x1290 [hpsa]
Aug 23 14:02:10 kernel: [349425.682988]  [<ffffffffc02a3d20>]
hpsa_send_abort+0xa0/0x250 [hpsa]
Aug 23 14:02:10 kernel: [349425.683001]  [<ffffffffc02a7ff3>]
hpsa_eh_abort_handler+0x683/0x1290 [hpsa]
Aug 23 14:03:10 kernel: [349457.674593] hpsa 0000:05:00.0: Controller lockup
detected: 0x00130001
Aug 23 14:03:38 kernel: [349485.657605] Workqueue: events
hpsa_monitor_ctlr_worker [hpsa]
Aug 23 14:03:38 kernel: [349485.657736]  <<EOE>>  [<ffffffffc02a5384>]
fail_all_cmds_on_list+0x74/0x6c0 [hpsa]
Aug 23 14:03:38 kernel: [349485.657765]  [<ffffffffc02aa9bb>]
hpsa_monitor_ctlr_worker+0x40b/0x4e0 [hpsa]
Aug 23 14:03:43 kernel: [349518.024245] Workqueue: events
hpsa_monitor_ctlr_worker [hpsa]
Aug 23 14:04:10 kernel: [349545.689592]  [<ffffffffc02a3d20>]
hpsa_send_abort+0xa0/0x250 [hpsa]
Aug 23 14:04:10 kernel: [349545.689605]  [<ffffffffc02a7ff3>]
hpsa_eh_abort_handler+0x683/0x1290 [hpsa]
Aug 23 14:04:10 kernel: [349545.805674]  [<ffffffffc02a3d20>]
hpsa_send_abort+0xa0/0x250 [hpsa]
Aug 23 14:04:10 kernel: [349545.805693]  [<ffffffffc02a7ff3>]
hpsa_eh_abort_handler+0x683/0x1290 [hpsa]
Aug 23 14:06:11 kernel: [349665.810469]  [<ffffffffc02a3d20>]
hpsa_send_abort+0xa0/0x250 [hpsa]
Aug 23 14:06:11 kernel: [349665.810483]  [<ffffffffc02a7ff3>]
hpsa_eh_abort_handler+0x683/0x1290 [hpsa]

W.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] hpsa: Controller lockup detected: 0x00150028
  2015-08-24  9:43               ` Wouter Depuydt
@ 2015-08-24 10:02                 ` Wouter Depuydt
  2015-08-24 14:11                   ` Don Brace
  0 siblings, 1 reply; 11+ messages in thread
From: Wouter Depuydt @ 2015-08-24 10:02 UTC (permalink / raw)
  To: linux-scsi

Wouter Depuydt <wouter.depuydt <at> gmail.com> writes:

> 
> > -----Original Message-----
> > From: Peter Zijlstra [mailto:peterz <at> infradead.org] 
> > Sent: Friday, May 22, 2015 11:40 AM
> > To: Tomas Henzl
> > Cc: Oelke, Mark; don.brace <at> pmcs.com; ISS StorageDev; storagedev <at>
> pmcs.com; linux-scsi <at> vger.kernel.org
> > Subject: Re: [BUG] hpsa: Controller lockup detected: 0x00150028
> > 
> > On Fri, May 22, 2015 at 05:10:44PM +0200, Tomas Henzl wrote:
> > > >> I've updated to 6.62 and it appears to be working now; or rather,
it has
> > 
> > I've since gotten 6.64 from HP to test; which does not seem public yet.
> > 
> > 6.64 actually fixes the issue for me.
> > 
> 
> Hi everone,
> 
> I've experienced a similar problem with a P411 controller in HBA mode.
> Serial Number	PDNMH0ARH8P04A
> Model	HP Smart Array P441 Controller
> Firmware Version	2.52
> 

Other System info:

HP Proliant D380p Gen9
Ubuntu LTS 14.04
ii  linux-image-3.19.0-26-generic
ii  linux-image-extra-3.19.0-26-generic
ii  linux-image-generic-lts-vivid       


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [BUG] hpsa: Controller lockup detected: 0x00150028
  2015-08-24 10:02                 ` Wouter Depuydt
@ 2015-08-24 14:11                   ` Don Brace
  0 siblings, 0 replies; 11+ messages in thread
From: Don Brace @ 2015-08-24 14:11 UTC (permalink / raw)
  To: Wouter Depuydt, linux-scsi


On 08/24/2015 05:02 AM, Wouter Depuydt wrote:
> Wouter Depuydt <wouter.depuydt <at> gmail.com> writes:
>
>>> -----Original Message-----
>>> From: Peter Zijlstra [mailto:peterz <at> infradead.org]
>>> Sent: Friday, May 22, 2015 11:40 AM
>>> To: Tomas Henzl
>>> Cc: Oelke, Mark; don.brace <at> pmcs.com; ISS StorageDev; storagedev <at>
>> pmcs.com; linux-scsi <at> vger.kernel.org
>>> Subject: Re: [BUG] hpsa: Controller lockup detected: 0x00150028
>>>
>>> On Fri, May 22, 2015 at 05:10:44PM +0200, Tomas Henzl wrote:
>>>>>> I've updated to 6.62 and it appears to be working now; or rather,
> it has
>>> I've since gotten 6.64 from HP to test; which does not seem public yet.
>>>
>>> 6.64 actually fixes the issue for me.
>>>
>> Hi everone,
>>
>> I've experienced a similar problem with a P411 controller in HBA mode.
>> Serial Number	PDNMH0ARH8P04A
>> Model	HP Smart Array P441 Controller
>> Firmware Version	2.52
>>
> Other System info:
>
> HP Proliant D380p Gen9
> Ubuntu LTS 14.04
> ii  linux-image-3.19.0-26-generic
> ii  linux-image-extra-3.19.0-26-generic
> ii  linux-image-generic-lts-vivid
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
I see this issue addressed in the Firmware update page under the "Fixes" 
tab.
http://h20564.www2.hpe.com/hpsc/swd/public/detail?sp4ts.oid=3984645&swItemId=MTX_55b304486f544f148de6c5cc6e&swEnvOid=4103#tab4


Problems Fixed:

     Running SMARTCTL (smartmontools) on HP Proliant G6/G7 (Px1x) Smart 
Array controllers that have firmware version 5.70 to 6.62 installed with 
SATA drives attached may result in system not responding or reboot. Wehn 
reboot occurred, a reboot 1719 POST error message with lockup 0x15 
displayed.

Hope this helps you.



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-08-24 14:11 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-18 12:40 [BUG] hpsa: Controller lockup detected: 0x00150028 Peter Zijlstra
2015-05-18 13:57 ` Oelke, Mark
2015-05-18 15:20   ` Peter Zijlstra
2015-05-18 16:03     ` Peter Zijlstra
2015-05-18 16:11       ` Peter Zijlstra
2015-05-22 15:10         ` Tomas Henzl
2015-05-22 16:40           ` Peter Zijlstra
2015-05-22 16:48             ` Handzik, Joe
2015-08-24  9:43               ` Wouter Depuydt
2015-08-24 10:02                 ` Wouter Depuydt
2015-08-24 14:11                   ` Don Brace

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.