All of lore.kernel.org
 help / color / mirror / Atom feed
* Adaptec ASR-51245 and aacraid driver timeouts
@ 2020-10-21 12:02 David C. Partridge
  2020-10-24 12:33 ` David C. Partridge
  2020-10-26 22:15 ` Martin K. Petersen
  0 siblings, 2 replies; 6+ messages in thread
From: David C. Partridge @ 2020-10-21 12:02 UTC (permalink / raw)
  To: linux-scsi

I'm running LUbuntu x64 20.04.1 kernel 5.4.0-52-generic with an Adapted
ASR-51245 hosting a RAID-5 array.

If I configure the card to power down the drives in the raid array after a
period of idleness, the next time my server attempts to access the logical
device I get:

Oct 19 04:03:03 charon kernel: aacraid: Host adapter abort request.
                               aacraid: Outstanding commands on (0,0,0,0):
Oct 19 04:03:03 charon kernel: aacraid: Host adapter reset request. SCSI
hang ?
Oct 19 04:03:18 charon kernel: aacraid: Host adapter reset request. SCSI
hang ?
Oct 19 04:03:18 charon kernel: aacraid 0000:01:00.0: outstanding cmd:
midlevel-0
Oct 19 04:03:18 charon kernel: aacraid 0000:01:00.0: outstanding cmd:
lowlevel-0
Oct 19 04:03:18 charon kernel: aacraid 0000:01:00.0: outstanding cmd: error
handler-0
Oct 19 04:03:18 charon kernel: aacraid 0000:01:00.0: outstanding cmd:
firmware-1
Oct 19 04:03:18 charon kernel: aacraid 0000:01:00.0: outstanding cmd:
kernel-0
Oct 19 04:03:48 charon kernel: sd 0:0:0:0: Device offlined - not ready after
error recovery
Oct 19 04:03:48 charon kernel: sd 0:0:0:0: [sda] tag#215 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Oct 19 04:03:48 charon kernel: sd 0:0:0:0: [sda] tag#215 CDB: Read(16) 88 00
00 00 00 00 00 05 27 48 00 00 00 08 00 00
Oct 19 04:03:48 charon kernel: blk_update_request: I/O error, dev sda,
sector 337736 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 0
Oct 19 04:03:48 charon kernel: BTRFS error (device sda1): bdev /dev/sda1
errs: wr 1, rd 1, flush 0, corrupt 3, gen 0

at which point the drive is now effectively offline :/

I tried upping the timeout:

root@charon:/etc/udev/rules.d# cat 99-aacraid.rules 
SUBSYSTEM=="block", ACTION=="add", ENV{ID_VENDOR}=="Adaptec",
ENV{ID_MODEL}=="Shared", RUN+="/bin/sh -c 'echo 135 >
/sys/block/%k/device/timeout'"

but that didn't appear to stop the problem occurring (and the kernel wasn't
over happy about a >120s timeout).

Any help much appreciated.
David






^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Adaptec ASR-51245 and aacraid driver timeouts
  2020-10-21 12:02 Adaptec ASR-51245 and aacraid driver timeouts David C. Partridge
@ 2020-10-24 12:33 ` David C. Partridge
  2020-10-25 10:34   ` David C. Partridge
  2020-11-06  9:02   ` Hannes Reinecke
  2020-10-26 22:15 ` Martin K. Petersen
  1 sibling, 2 replies; 6+ messages in thread
From: David C. Partridge @ 2020-10-24 12:33 UTC (permalink / raw)
  To: linux-scsi

Hi again,

I know there's been a lot of activity lately, so this could well have been
missed.

I'd dearly like to be able to power the drives down when the array is idle,
but this problem seems to make that impossible.

Are any of the folks that know the Adaptec raid cards and the aacraid driver
here?

Thanks
David 

-----Original Message-----
From: David C. Partridge [mailto:david.partridge@perdrix.co.uk] 
Sent: 21 October 2020 13:02
To: linux-scsi@vger.kernel.org
Subject: Adaptec ASR-51245 and aacraid driver timeouts

I'm running LUbuntu x64 20.04.1 kernel 5.4.0-52-generic with an Adapted
ASR-51245 hosting a RAID-5 array.

If I configure the card to power down the drives in the raid array after a
period of idleness, the next time my server attempts to access the logical
device I get:

Oct 19 04:03:03 charon kernel: aacraid: Host adapter abort request.
                               aacraid: Outstanding commands on (0,0,0,0):
Oct 19 04:03:03 charon kernel: aacraid: Host adapter reset request. SCSI
hang ?
Oct 19 04:03:18 charon kernel: aacraid: Host adapter reset request. SCSI
hang ?
Oct 19 04:03:18 charon kernel: aacraid 0000:01:00.0: outstanding cmd:
midlevel-0
Oct 19 04:03:18 charon kernel: aacraid 0000:01:00.0: outstanding cmd:
lowlevel-0
Oct 19 04:03:18 charon kernel: aacraid 0000:01:00.0: outstanding cmd: error
handler-0
Oct 19 04:03:18 charon kernel: aacraid 0000:01:00.0: outstanding cmd:
firmware-1
Oct 19 04:03:18 charon kernel: aacraid 0000:01:00.0: outstanding cmd:
kernel-0
Oct 19 04:03:48 charon kernel: sd 0:0:0:0: Device offlined - not ready after
error recovery
Oct 19 04:03:48 charon kernel: sd 0:0:0:0: [sda] tag#215 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Oct 19 04:03:48 charon kernel: sd 0:0:0:0: [sda] tag#215 CDB: Read(16) 88 00
00 00 00 00 00 05 27 48 00 00 00 08 00 00
Oct 19 04:03:48 charon kernel: blk_update_request: I/O error, dev sda,
sector 337736 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 0
Oct 19 04:03:48 charon kernel: BTRFS error (device sda1): bdev /dev/sda1
errs: wr 1, rd 1, flush 0, corrupt 3, gen 0

at which point the drive is now effectively offline :/

I tried upping the timeout:

root@charon:/etc/udev/rules.d# cat 99-aacraid.rules 
SUBSYSTEM=="block", ACTION=="add", ENV{ID_VENDOR}=="Adaptec",
ENV{ID_MODEL}=="Shared", RUN+="/bin/sh -c 'echo 135 >
/sys/block/%k/device/timeout'"

but that didn't appear to stop the problem occurring (and the kernel wasn't
over happy about a >120s timeout).

Any help much appreciated.
David






^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Adaptec ASR-51245 and aacraid driver timeouts
  2020-10-24 12:33 ` David C. Partridge
@ 2020-10-25 10:34   ` David C. Partridge
  2020-11-06  9:02   ` Hannes Reinecke
  1 sibling, 0 replies; 6+ messages in thread
From: David C. Partridge @ 2020-10-25 10:34 UTC (permalink / raw)
  To: linux-scsi

If this is not the right mailing list, please point me to the correct one
...

-----Original Message-----
From: David C. Partridge [mailto:david.partridge@perdrix.co.uk] 
Sent: 24 October 2020 13:34
To: linux-scsi@vger.kernel.org
Subject: RE: Adaptec ASR-51245 and aacraid driver timeouts

Hi again,

I know there's been a lot of activity lately, so this could well have been
missed.

I'd dearly like to be able to power the drives down when the array is idle,
but this problem seems to make that impossible.

Are any of the folks that know the Adaptec raid cards and the aacraid driver
here?

Thanks
David 

-----Original Message-----
From: David C. Partridge [mailto:david.partridge@perdrix.co.uk] 
Sent: 21 October 2020 13:02
To: linux-scsi@vger.kernel.org
Subject: Adaptec ASR-51245 and aacraid driver timeouts

I'm running LUbuntu x64 20.04.1 kernel 5.4.0-52-generic with an Adapted
ASR-51245 hosting a RAID-5 array.

If I configure the card to power down the drives in the raid array after a
period of idleness, the next time my server attempts to access the logical
device I get:

Oct 19 04:03:03 charon kernel: aacraid: Host adapter abort request.
                               aacraid: Outstanding commands on (0,0,0,0):
Oct 19 04:03:03 charon kernel: aacraid: Host adapter reset request. SCSI
hang ?
Oct 19 04:03:18 charon kernel: aacraid: Host adapter reset request. SCSI
hang ?
Oct 19 04:03:18 charon kernel: aacraid 0000:01:00.0: outstanding cmd:
midlevel-0
Oct 19 04:03:18 charon kernel: aacraid 0000:01:00.0: outstanding cmd:
lowlevel-0
Oct 19 04:03:18 charon kernel: aacraid 0000:01:00.0: outstanding cmd: error
handler-0
Oct 19 04:03:18 charon kernel: aacraid 0000:01:00.0: outstanding cmd:
firmware-1
Oct 19 04:03:18 charon kernel: aacraid 0000:01:00.0: outstanding cmd:
kernel-0
Oct 19 04:03:48 charon kernel: sd 0:0:0:0: Device offlined - not ready after
error recovery
Oct 19 04:03:48 charon kernel: sd 0:0:0:0: [sda] tag#215 FAILED Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Oct 19 04:03:48 charon kernel: sd 0:0:0:0: [sda] tag#215 CDB: Read(16) 88 00
00 00 00 00 00 05 27 48 00 00 00 08 00 00
Oct 19 04:03:48 charon kernel: blk_update_request: I/O error, dev sda,
sector 337736 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 0
Oct 19 04:03:48 charon kernel: BTRFS error (device sda1): bdev /dev/sda1
errs: wr 1, rd 1, flush 0, corrupt 3, gen 0

at which point the drive is now effectively offline :/

I tried upping the timeout:

root@charon:/etc/udev/rules.d# cat 99-aacraid.rules 
SUBSYSTEM=="block", ACTION=="add", ENV{ID_VENDOR}=="Adaptec",
ENV{ID_MODEL}=="Shared", RUN+="/bin/sh -c 'echo 135 >
/sys/block/%k/device/timeout'"

but that didn't appear to stop the problem occurring (and the kernel wasn't
over happy about a >120s timeout).

Any help much appreciated.
David






^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Adaptec ASR-51245 and aacraid driver timeouts
  2020-10-21 12:02 Adaptec ASR-51245 and aacraid driver timeouts David C. Partridge
  2020-10-24 12:33 ` David C. Partridge
@ 2020-10-26 22:15 ` Martin K. Petersen
  2020-10-27 21:18   ` Sagar.Biradar
  1 sibling, 1 reply; 6+ messages in thread
From: Martin K. Petersen @ 2020-10-26 22:15 UTC (permalink / raw)
  To: David C. Partridge; +Cc: linux-scsi


David,

> I'm running LUbuntu x64 20.04.1 kernel 5.4.0-52-generic with an
> Adapted ASR-51245 hosting a RAID-5 array.
>
> If I configure the card to power down the drives in the raid array
> after a period of idleness, the next time my server attempts to access
> the logical device I get:

If card firmware decides to power down the drives, why doesn't it spin
them back up on access?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: Adaptec ASR-51245 and aacraid driver timeouts
  2020-10-26 22:15 ` Martin K. Petersen
@ 2020-10-27 21:18   ` Sagar.Biradar
  0 siblings, 0 replies; 6+ messages in thread
From: Sagar.Biradar @ 2020-10-27 21:18 UTC (permalink / raw)
  To: martin.petersen, david.partridge; +Cc: linux-scsi

Hi David,
Apologies for the delay in response.

It looks like the firmware failed to spin up the drives. 
You can try "arcconf  RESCAN <Controller#>" and see if it spins up the drives. (arcconf can be downloaded from https://storage.microsemi.com/en-us/support/raid/ )

From the description - I see you are using a Series-5(ASR-51245) controller, which was marked EOL in 2012. The last version of Ubuntu supported for this product was 11.04.
Since there have been so many submissions (both kernel and driver), there could also be some compatibility issues between 11.04 and 20.01.
Microchip no longer supports this product.

Thanks
Sagar

> -----Original Message-----
> From: Martin K. Petersen [mailto:martin.petersen@oracle.com]
> Sent: Monday, October 26, 2020 3:15 PM
> To: David C. Partridge <david.partridge@perdrix.co.uk>
> Cc: linux-scsi@vger.kernel.org
> Subject: Re: Adaptec ASR-51245 and aacraid driver timeouts
> 
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the
> content is safe
> 
> David,
> 
> > I'm running LUbuntu x64 20.04.1 kernel 5.4.0-52-generic with an
> > Adapted ASR-51245 hosting a RAID-5 array.
> >
> > If I configure the card to power down the drives in the raid array
> > after a period of idleness, the next time my server attempts to access
> > the logical device I get:
> 
> If card firmware decides to power down the drives, why doesn't it spin
> them back up on access?
> 
> --
> Martin K. Petersen      Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Adaptec ASR-51245 and aacraid driver timeouts
  2020-10-24 12:33 ` David C. Partridge
  2020-10-25 10:34   ` David C. Partridge
@ 2020-11-06  9:02   ` Hannes Reinecke
  1 sibling, 0 replies; 6+ messages in thread
From: Hannes Reinecke @ 2020-11-06  9:02 UTC (permalink / raw)
  To: David C. Partridge, linux-scsi

On 10/24/20 2:33 PM, David C. Partridge wrote:
> Hi again,
> 
> I know there's been a lot of activity lately, so this could well have been
> missed.
> 
> I'd dearly like to be able to power the drives down when the array is idle,
> but this problem seems to make that impossible.
> 
> Are any of the folks that know the Adaptec raid cards and the aacraid driver
> here?
> 
> Thanks
> David
> 
> -----Original Message-----
> From: David C. Partridge [mailto:david.partridge@perdrix.co.uk]
> Sent: 21 October 2020 13:02
> To: linux-scsi@vger.kernel.org
> Subject: Adaptec ASR-51245 and aacraid driver timeouts
> 
> I'm running LUbuntu x64 20.04.1 kernel 5.4.0-52-generic with an Adapted
> ASR-51245 hosting a RAID-5 array.
> 
> If I configure the card to power down the drives in the raid array after a
> period of idleness, the next time my server attempts to access the logical
> device I get:
> 
> Oct 19 04:03:03 charon kernel: aacraid: Host adapter abort request.
>                                 aacraid: Outstanding commands on (0,0,0,0):
> Oct 19 04:03:03 charon kernel: aacraid: Host adapter reset request. SCSI
> hang ?
> Oct 19 04:03:18 charon kernel: aacraid: Host adapter reset request. SCSI
> hang ?
> Oct 19 04:03:18 charon kernel: aacraid 0000:01:00.0: outstanding cmd:
> midlevel-0
> Oct 19 04:03:18 charon kernel: aacraid 0000:01:00.0: outstanding cmd:
> lowlevel-0
> Oct 19 04:03:18 charon kernel: aacraid 0000:01:00.0: outstanding cmd: error
> handler-0
> Oct 19 04:03:18 charon kernel: aacraid 0000:01:00.0: outstanding cmd:
> firmware-1
> Oct 19 04:03:18 charon kernel: aacraid 0000:01:00.0: outstanding cmd:
> kernel-0
> Oct 19 04:03:48 charon kernel: sd 0:0:0:0: Device offlined - not ready after
> error recovery
> Oct 19 04:03:48 charon kernel: sd 0:0:0:0: [sda] tag#215 FAILED Result:
> hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
> Oct 19 04:03:48 charon kernel: sd 0:0:0:0: [sda] tag#215 CDB: Read(16) 88 00
> 00 00 00 00 00 05 27 48 00 00 00 08 00 00
> Oct 19 04:03:48 charon kernel: blk_update_request: I/O error, dev sda,
> sector 337736 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 0
> Oct 19 04:03:48 charon kernel: BTRFS error (device sda1): bdev /dev/sda1
> errs: wr 1, rd 1, flush 0, corrupt 3, gen 0
> 
> at which point the drive is now effectively offline :/
> 
> I tried upping the timeout:
> 
> root@charon:/etc/udev/rules.d# cat 99-aacraid.rules
> SUBSYSTEM=="block", ACTION=="add", ENV{ID_VENDOR}=="Adaptec",
> ENV{ID_MODEL}=="Shared", RUN+="/bin/sh -c 'echo 135 >
> /sys/block/%k/device/timeout'"
> 
> but that didn't appear to stop the problem occurring (and the kernel wasn't
> over happy about a >120s timeout).
> 
> Any help much appreciated.
> David
> 
Can you send a 'START/STOP UNIT' command to the device,
eg via sg_start /dev/sda?

It looks to me as if the devices are simply spun down, and for some 
reason the driver doesn't report this correctly.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2020-11-06  9:02 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-21 12:02 Adaptec ASR-51245 and aacraid driver timeouts David C. Partridge
2020-10-24 12:33 ` David C. Partridge
2020-10-25 10:34   ` David C. Partridge
2020-11-06  9:02   ` Hannes Reinecke
2020-10-26 22:15 ` Martin K. Petersen
2020-10-27 21:18   ` Sagar.Biradar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.