All of lore.kernel.org
 help / color / mirror / Atom feed
* 'Device not ready' issue on mpt2sas since 3.1.10
@ 2012-06-22 11:19 Matthias Prager
  2012-07-09 14:40 ` Matthias Prager
  0 siblings, 1 reply; 35+ messages in thread
From: Matthias Prager @ 2012-06-22 11:19 UTC (permalink / raw)
  To: linux-scsi

Hello linux-scsi,

I'm reporting a problem which I'm experiencing since kernel version
3.1.10 upwards.

Background:
----
OS:     Gentoo (as Guest-OS running on ESXi 5)
Kernel: 3.0.33-gentoo x86_64 (latest kernel version without the issue)
MB:     Intel S3210SH (latest FW/BIOS)
HBA:    LSI 9211-8i (in IR mode)
mpt2sas 08.100.00.02 (kernel driver of 3.0.33-gentoo)
FW Ver  13.00.57.00 (lsi-hba)
BIOS    07.25.00.00 (lsi-hba)
DISK    Seagate Barracuda ES.2 ST3750330NS Firmware: SN06 (and others)
        Layout: ext4 on-top of raid1 software-md
ESXi uses an LSI 9240-8i HBA as datastore. Two LSI 9211-8i HBAs, the
onboard Intel ICH9R and an Intel networkcard are passed-through to the
guest OS. HW-Raid is only used for the datastore (on 9240-8i).
----

Since kernel 3.1.10 I'm experiencing issues with disks not waking up
from spindown. All I need to do to trigger it is to wait until the disks
timeout/spindown and then try to access the content. The issue is most
prominent with one disk, but not limited to it (what makes this disk so
special? - I don't have a clue).

I've tried every kernel version from 3.1.10 to 3.4.2 (vanilla as well as
gentoo-patched-sources).
I've upgraded the controller firmware to the latest version available
from LSI.
I've patched the ESXi 5 host with the latest upgrades.
I've tried booting with 'pci=noioapicquirk' (thinking there may be a
link to bug 43074 on the kernel bug tracker).
I'm booting with 'scsi_mod.scan=sync' to avoid any async scanning issues.
But nothing fixed the issue except going back to kernel 3.0.33 .

I would greatly appreciate any suggestions or help in the matter. Please
do tell me what else you need from me to close down on the issue.
Or should I rather file a bug in the kernel bug tracker?

Thank you

Matthias Prager

Kernel messages when the issue occurs:
---------------------------------------------------
...
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj] Device not ready
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj]  Sense Key : Not Ready [current]
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj]  Add. Sense: Logical unit not
ready, initializing command required
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj] CDB: Read(10): 28 00 2e 41 c0
3f 00 00 08 00
Apr 04 22:55:10 [kernel] end_request: I/O error, dev sdj, sector 776060991
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj] Device not ready
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj]  Sense Key : Not Ready [current]
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj]  Add. Sense: Logical unit not
ready, initializing command required
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj] CDB: Write(10): 2a 00 57 54
52 3f 00 00 08 00
Apr 04 22:55:10 [kernel] end_request: I/O error, dev sdj, sector 1465143871
                - Last output repeated twice -
Apr 04 22:55:10 [kernel] md: super_written gets error=-5, uptodate=0
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj] Device not ready
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj]  Sense Key : Not Ready [current]
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj]  Add. Sense: Logical unit not
ready, initializing command required
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj] CDB: Write(10): 2a 00 00 00
00 3f 00 00 08 00
Apr 04 22:55:10 [kernel] end_request: I/O error, dev sdj, sector 63
Apr 04 22:55:10 [kernel] Buffer I/O error on device md4, logical block 0
Apr 04 22:55:10 [kernel] lost page write due to I/O error on md4
Apr 04 22:55:10 [kernel] EXT4-fs error (device md4):
ext4_find_entry:935: inode #24248321: comm smbd: reading directory lblock 0
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj] Device not ready
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj]  Sense Key : Not Ready [current]
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj]  Add. Sense: Logical unit not
ready, initializing command required
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj] CDB: Write(10): 2a 00 57 54
52 3f 00 00 08 00
Apr 04 22:55:10 [kernel] end_request: I/O error, dev sdj, sector 1465143871
                - Last output repeated twice -
Apr 04 22:55:10 [kernel] md: super_written gets error=-5, uptodate=0
Apr 04 22:58:50 [kernel] sd 1:0:1:0: [sdj] Device not ready
Apr 04 22:58:50 [kernel] sd 1:0:1:0: [sdj]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
Apr 04 22:58:50 [kernel] sd 1:0:1:0: [sdj]  Sense Key : Not Ready [current]
Apr 04 22:58:50 [kernel] sd 1:0:1:0: [sdj]  Add. Sense: Logical unit not
ready, initializing command required
Apr 04 22:58:50 [kernel] sd 1:0:1:0: [sdj] CDB: Read(10): 28 00 2e 41 c0
3f 00 00 08 00
Apr 04 22:58:50 [kernel] end_request: I/O error, dev sdj, sector 776060991
Apr 04 22:58:50 [kernel] EXT4-fs (md4): previous I/O error to superblock
detected
Apr 04 22:58:50 [kernel] sd 1:0:1:0: [sdj] Device not ready
Apr 04 22:58:50 [kernel] sd 1:0:1:0: [sdj]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
Apr 04 22:58:50 [kernel] sd 1:0:1:0: [sdj]  Sense Key : Not Ready [current]
Apr 04 22:58:50 [kernel] sd 1:0:1:0: [sdj]  Add. Sense: Logical unit not
ready, initializing command required
Apr 04 22:58:50 [kernel] sd 1:0:1:0: [sdj] CDB: Write(10): 2a 00 57 54
52 3f 00 00 08 00
Apr 04 22:58:50 [kernel] end_request: I/O error, dev sdj, sector 1465143871
                - Last output repeated twice -
Apr 04 22:58:50 [kernel] md: super_written gets error=-5, uptodate=0
Apr 04 22:58:50 [kernel] sd 1:0:1:0: [sdj] Device not ready
Apr 04 22:58:50 [kernel] sd 1:0:1:0: [sdj]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
Apr 04 22:58:50 [kernel] sd 1:0:1:0: [sdj]  Sense Key : Not Ready [current]
Apr 04 22:58:50 [kernel] sd 1:0:1:0: [sdj]  Add. Sense: Logical unit not
ready, initializing command required
Apr 04 22:58:50 [kernel] sd 1:0:1:0: [sdj] CDB: Write(10): 2a 00 00 00
00 3f 00 00 08 00
Apr 04 22:58:50 [kernel] end_request: I/O error, dev sdj, sector 63
Apr 04 22:58:50 [kernel] Buffer I/O error on device md4, logical block 0
Apr 04 22:58:50 [kernel] lost page write due to I/O error on md4
Apr 04 22:58:50 [kernel] EXT4-fs error (device md4):
ext4_find_entry:935: inode #24248321: comm smbd: reading directory lblock 0
Apr 04 22:58:51 [kernel] sd 1:0:1:0: [sdj] Device not ready
Apr 04 22:58:51 [kernel] sd 1:0:1:0: [sdj]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
Apr 04 22:58:51 [kernel] sd 1:0:1:0: [sdj]  Sense Key : Not Ready [current]
Apr 04 22:58:51 [kernel] sd 1:0:1:0: [sdj]  Add. Sense: Logical unit not
ready, initializing command required
Apr 04 22:58:51 [kernel] sd 1:0:1:0: [sdj] CDB: Write(10): 2a 00 57 54
52 3f 00 00 08 00
Apr 04 22:58:51 [kernel] end_request: I/O error, dev sdj, sector 1465143871
                - Last output repeated twice -
Apr 04 22:58:51 [kernel] md: super_written gets error=-5, uptodate=0
...
---------------------------------------------------



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-06-22 11:19 'Device not ready' issue on mpt2sas since 3.1.10 Matthias Prager
@ 2012-07-09 14:40 ` Matthias Prager
  2012-07-09 19:37   ` Robert Trace
  2012-07-09 22:08   ` NeilBrown
  0 siblings, 2 replies; 35+ messages in thread
From: Matthias Prager @ 2012-07-09 14:40 UTC (permalink / raw)
  To: linux-scsi, linux-raid; +Cc: Matthias Prager

Hello linux-scsi and linux-raid,

I did some further research regarding my problem.
It appears to me the fault does not lie with the mpt2sas driver (not
that I can definitely exclude it), but with the md implementation.

I reproduced what I think is the same issue on a different machine (also
running Vmware ESXi 5 and an LSI 9211-8i in IR mode) with a different
set of hard-drives of the same model. Using systemrescuecd
(2.8.1-beta003) and booting the 64bit 3.4.4 kernel, I issued the
following commands:

1) 'hdparm -y /dev/sda' (to put the hard-drive to sleep)
2) 'mdadm --create /dev/md1 --metadata 1.2 --level=mirror
--raid-devices=2 --name=test1 /dev/sda missing'
3) 'fdisk -l /dev/md127' (for some reason /proc/mdstat indicates the md
is being created as md127)

2) gave me this feedback:
------
mdadm: super1.x cannot open /dev/sda: Device or resource busy
mdadm: /dev/sda is not suitable for this array.
mdadm: create aborted
-------
Even though it says creating aborted it still created md127.

And 3) lead to these lines in dmesg:
-------
[  604.838640] sd 2:0:0:0: [sda] Device not ready
[  604.838645] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[  604.838655] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
[  604.838663] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
initializing command required
[  604.838668] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
20 00
[  604.838680] end_request: I/O error, dev sda, sector 2048
[  604.838688] Buffer I/O error on device md127, logical block 0
[  604.838695] Buffer I/O error on device md127, logical block 1
[  604.838699] Buffer I/O error on device md127, logical block 2
[  604.838702] Buffer I/O error on device md127, logical block 3
[  604.838783] sd 2:0:0:0: [sda] Device not ready
[  604.838785] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[  604.838789] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
[  604.838793] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
initializing command required
[  604.838797] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
08 00
[  604.838805] end_request: I/O error, dev sda, sector 2048
[  604.838808] Buffer I/O error on device md127, logical block 0
[  604.838983] sd 2:0:0:0: [sda] Device not ready
[  604.838986] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[  604.838989] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
[  604.838993] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
initializing command required
[  604.838998] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 57 54 65 d8 00 00
08 00
[  604.839006] end_request: I/O error, dev sda, sector 1465148888
[  604.839009] Buffer I/O error on device md127, logical block 183143355
[  604.839087] sd 2:0:0:0: [sda] Device not ready
[  604.839090] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[  604.839093] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
[  604.839097] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
initializing command required
[  604.839102] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 57 54 65 d8 00 00
08 00
[  604.839110] end_request: I/O error, dev sda, sector 1465148888
[  604.839113] Buffer I/O error on device md127, logical block 183143355
[  604.839271] sd 2:0:0:0: [sda] Device not ready
[  604.839274] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[  604.839278] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
[  604.839282] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
initializing command required
[  604.839286] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
20 00
[  604.839321] end_request: I/O error, dev sda, sector 2048
[  604.839324] Buffer I/O error on device md127, logical block 0
[  604.839330] Buffer I/O error on device md127, logical block 1
[  604.840494] sd 2:0:0:0: [sda] Device not ready
[  604.840497] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[  604.840504] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
[  604.840512] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
initializing command required
[  604.840516] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
08 00
[  604.840526] end_request: I/O error, dev sda, sector 2048
------

This excludes hardware-errors (different physical machine and devices)
as cause and also ext4 which the other system was using as filesystem.
Maybe Neil Brown (who scripts/get_maintainer.pl identified as the
maintainer of the md-code) can make bits and pieces of this. It may well
be this is the same problem but a different error-path - I don't know.

I will try to make the scenario more generic, but I don't have a
non-virtual machine to spare atm. Also please do let me know if I'm
posting this to the wrong lists (linux-scsi and linux-raid) or if there
is anything which might not be helpful with the way I'm reporting this.

Regards,
Matthias Prager

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-09 14:40 ` Matthias Prager
@ 2012-07-09 19:37   ` Robert Trace
  2012-07-09 20:45     ` Darrick J. Wong
  2012-07-10  0:12     ` Matthias Prager
  2012-07-09 22:08   ` NeilBrown
  1 sibling, 2 replies; 35+ messages in thread
From: Robert Trace @ 2012-07-09 19:37 UTC (permalink / raw)
  To: Matthias Prager; +Cc: linux-scsi, linux-raid

> I did some further research regarding my problem.
> It appears to me the fault does not lie with the mpt2sas driver (not
> that I can definitely exclude it), but with the md implementation.

I'm actually discovering some of the same issues (LSI 9211-8i w/ SATA
disks), but I've come to a slightly different conclusion.

I noticed that when my SATA disks are on a SATA controller and they spin
down (or are spun down via hdparm -y), then they response to TUR (TEST
UNIT READY) commands with an OK.  Any I/O sent to these disks simply
wait while the disks spin up and then complete as usual.

However, my SATA disks on the SAS controller respond to TUR with the
sense error "Not Ready/Initializing command required".  Any I/O sent to
these disks immediately fails.  You saw this in your logging:

> [  604.838640] sd 2:0:0:0: [sda] Device not ready
> [  604.838645] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [  604.838655] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
> [  604.838663] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
> initializing command required
> [  604.838668] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
> 20 00
> [  604.838680] end_request: I/O error, dev sda, sector 2048
> [  604.838688] Buffer I/O error on device md127, logical block 0
> [  604.838695] Buffer I/O error on device md127, logical block 1
> [  604.838699] Buffer I/O error on device md127, logical block 2
> [  604.838702] Buffer I/O error on device md127, logical block 3

Sending an explicit START UNIT command to these sleeping disks will wake
them up and then they behave normally.  (BTW, you can issue TURs and
START UNITs via the sg_turs and sg_start commands).

I've reproduced this behavior on the raw disks themselves, no MD layer
involved (although the freak-out by my MD layer is what alerted me to
this issue too... Having your entire array punted the first time you
access it is a little scary :-).  I'm also on raw hardware and I've seen
this behavior on kernels 3.0.33 through 3.4.4.

So, SATA disks respond differently depending on the controller they're
on.  I don't know if this is a SCSI thing, a SAS thing or a
firmware/driver thing for the 9211.

Now, whether or not the MD layer should be assembling arrays from
"failed" disks is, I think, a separate issue.

-- Rob

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-09 19:37   ` Robert Trace
@ 2012-07-09 20:45     ` Darrick J. Wong
  2012-07-09 22:24       ` Robert Trace
  2012-07-10  0:12     ` Matthias Prager
  1 sibling, 1 reply; 35+ messages in thread
From: Darrick J. Wong @ 2012-07-09 20:45 UTC (permalink / raw)
  To: Robert Trace; +Cc: Matthias Prager, linux-scsi, linux-raid

On Mon, Jul 09, 2012 at 03:37:09PM -0400, Robert Trace wrote:
> > I did some further research regarding my problem.
> > It appears to me the fault does not lie with the mpt2sas driver (not
> > that I can definitely exclude it), but with the md implementation.
> 
> I'm actually discovering some of the same issues (LSI 9211-8i w/ SATA
> disks), but I've come to a slightly different conclusion.
> 
> I noticed that when my SATA disks are on a SATA controller and they spin
> down (or are spun down via hdparm -y), then they response to TUR (TEST
> UNIT READY) commands with an OK.  Any I/O sent to these disks simply
> wait while the disks spin up and then complete as usual.
> 
> However, my SATA disks on the SAS controller respond to TUR with the
> sense error "Not Ready/Initializing command required".  Any I/O sent to
> these disks immediately fails.  You saw this in your logging:
> 
> > [  604.838640] sd 2:0:0:0: [sda] Device not ready
> > [  604.838645] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
> > driverbyte=DRIVER_SENSE
> > [  604.838655] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
> > [  604.838663] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
> > initializing command required
> > [  604.838668] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
> > 20 00
> > [  604.838680] end_request: I/O error, dev sda, sector 2048
> > [  604.838688] Buffer I/O error on device md127, logical block 0
> > [  604.838695] Buffer I/O error on device md127, logical block 1
> > [  604.838699] Buffer I/O error on device md127, logical block 2
> > [  604.838702] Buffer I/O error on device md127, logical block 3
> 
> Sending an explicit START UNIT command to these sleeping disks will wake
> them up and then they behave normally.  (BTW, you can issue TURs and
> START UNITs via the sg_turs and sg_start commands).
> 
> I've reproduced this behavior on the raw disks themselves, no MD layer
> involved (although the freak-out by my MD layer is what alerted me to
> this issue too... Having your entire array punted the first time you
> access it is a little scary :-).  I'm also on raw hardware and I've seen
> this behavior on kernels 3.0.33 through 3.4.4.
> 
> So, SATA disks respond differently depending on the controller they're
> on.  I don't know if this is a SCSI thing, a SAS thing or a
> firmware/driver thing for the 9211.

I suspect that /sys/devices/<bunch of sas topology here>/manage_start_stop = 0
for the SATA devices hanging off the SAS controller.  Setting that sysfs
attribute to 1 is supposed to enable the SCSI layer to send TUR when it sees
"LU not ready", as well as spin down the drives at suspend/poweroff time.

--D
> 
> Now, whether or not the MD layer should be assembling arrays from
> "failed" disks is, I think, a separate issue.
> 
> -- Rob
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-09 14:40 ` Matthias Prager
  2012-07-09 19:37   ` Robert Trace
@ 2012-07-09 22:08   ` NeilBrown
  2012-07-10  0:03     ` Matthias Prager
  1 sibling, 1 reply; 35+ messages in thread
From: NeilBrown @ 2012-07-09 22:08 UTC (permalink / raw)
  To: Matthias Prager; +Cc: linux-scsi, linux-raid

[-- Attachment #1: Type: text/plain, Size: 7177 bytes --]

On Mon, 09 Jul 2012 16:40:15 +0200 Matthias Prager <linux@matthiasprager.de>
wrote:

> Hello linux-scsi and linux-raid,
> 
> I did some further research regarding my problem.
> It appears to me the fault does not lie with the mpt2sas driver (not
> that I can definitely exclude it), but with the md implementation.
> 
> I reproduced what I think is the same issue on a different machine (also
> running Vmware ESXi 5 and an LSI 9211-8i in IR mode) with a different
> set of hard-drives of the same model. Using systemrescuecd
> (2.8.1-beta003) and booting the 64bit 3.4.4 kernel, I issued the
> following commands:
> 
> 1) 'hdparm -y /dev/sda' (to put the hard-drive to sleep)
> 2) 'mdadm --create /dev/md1 --metadata 1.2 --level=mirror
> --raid-devices=2 --name=test1 /dev/sda missing'
> 3) 'fdisk -l /dev/md127' (for some reason /proc/mdstat indicates the md
> is being created as md127)
> 
> 2) gave me this feedback:
> ------
> mdadm: super1.x cannot open /dev/sda: Device or resource busy
> mdadm: /dev/sda is not suitable for this array.
> mdadm: create aborted
> -------
> Even though it says creating aborted it still created md127.

One of my pet peeves in when people interpret the observations wrongly and
then report their interpretation instead of their observation.  However
sometimes it is very hard to separate the two.  You comment above looks
perfectly reasonable and looks like a clean observation and not and
interpretation.  Yet it is an interpretation :-)

The observation would be
   "Even though it says creating abort, md127 was still created".

You see, it wasn't this mdadm that created md127 - it certainly shouldn't
have as you asked it to create md1.

I don't know the exact sequence of events, but something - possibly relating
to the error messages reported below - caused udev to notice /dev/sda.
udev then ran "mdadm -I /dev/sda" and as it had some metadata on it, it
created an array with it.  As the name information in that metadata was
probably "test1" or similar, rather than "1", mdadm didn't know what number
was wanted for the array, so it chose a free high number - 127.

This metadata must have been left over from an earlier experiment.

So it might have been something like.

- you run mdadm (call this mdadm-1).
- mdadm tries to open sda
- driver notices that device is asleep, and wakes it up
- the waking up of the device causes a CHANGE uevent to udev
- this cause udev to run a new mdadm - mdadm-2
- mdadm-2 reads the metadata, sees old metadata, assembled sda in a new md127
- mdadm-1 gets scheduled again, tries to get O_EXCL access to sda and fails, 
  because sda is now part of md127

Clearly undesirable behaviour.  I'm not sure which bit is "wrong".

NeilBrown


> 
> And 3) lead to these lines in dmesg:
> -------
> [  604.838640] sd 2:0:0:0: [sda] Device not ready
> [  604.838645] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [  604.838655] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
> [  604.838663] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
> initializing command required
> [  604.838668] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
> 20 00
> [  604.838680] end_request: I/O error, dev sda, sector 2048
> [  604.838688] Buffer I/O error on device md127, logical block 0
> [  604.838695] Buffer I/O error on device md127, logical block 1
> [  604.838699] Buffer I/O error on device md127, logical block 2
> [  604.838702] Buffer I/O error on device md127, logical block 3
> [  604.838783] sd 2:0:0:0: [sda] Device not ready
> [  604.838785] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [  604.838789] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
> [  604.838793] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
> initializing command required
> [  604.838797] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
> 08 00
> [  604.838805] end_request: I/O error, dev sda, sector 2048
> [  604.838808] Buffer I/O error on device md127, logical block 0
> [  604.838983] sd 2:0:0:0: [sda] Device not ready
> [  604.838986] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [  604.838989] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
> [  604.838993] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
> initializing command required
> [  604.838998] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 57 54 65 d8 00 00
> 08 00
> [  604.839006] end_request: I/O error, dev sda, sector 1465148888
> [  604.839009] Buffer I/O error on device md127, logical block 183143355
> [  604.839087] sd 2:0:0:0: [sda] Device not ready
> [  604.839090] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [  604.839093] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
> [  604.839097] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
> initializing command required
> [  604.839102] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 57 54 65 d8 00 00
> 08 00
> [  604.839110] end_request: I/O error, dev sda, sector 1465148888
> [  604.839113] Buffer I/O error on device md127, logical block 183143355
> [  604.839271] sd 2:0:0:0: [sda] Device not ready
> [  604.839274] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [  604.839278] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
> [  604.839282] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
> initializing command required
> [  604.839286] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
> 20 00
> [  604.839321] end_request: I/O error, dev sda, sector 2048
> [  604.839324] Buffer I/O error on device md127, logical block 0
> [  604.839330] Buffer I/O error on device md127, logical block 1
> [  604.840494] sd 2:0:0:0: [sda] Device not ready
> [  604.840497] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> [  604.840504] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
> [  604.840512] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
> initializing command required
> [  604.840516] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
> 08 00
> [  604.840526] end_request: I/O error, dev sda, sector 2048
> ------
> 
> This excludes hardware-errors (different physical machine and devices)
> as cause and also ext4 which the other system was using as filesystem.
> Maybe Neil Brown (who scripts/get_maintainer.pl identified as the
> maintainer of the md-code) can make bits and pieces of this. It may well
> be this is the same problem but a different error-path - I don't know.
> 
> I will try to make the scenario more generic, but I don't have a
> non-virtual machine to spare atm. Also please do let me know if I'm
> posting this to the wrong lists (linux-scsi and linux-raid) or if there
> is anything which might not be helpful with the way I'm reporting this.
> 
> Regards,
> Matthias Prager
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-09 20:45     ` Darrick J. Wong
@ 2012-07-09 22:24       ` Robert Trace
  2012-07-10  0:21         ` Matthias Prager
  2012-07-10 16:54         ` Darrick J. Wong
  0 siblings, 2 replies; 35+ messages in thread
From: Robert Trace @ 2012-07-09 22:24 UTC (permalink / raw)
  To: djwong; +Cc: Matthias Prager, linux-scsi, linux-raid

On 07/09/2012 04:45 PM, Darrick J. Wong wrote:
>
> I suspect that /sys/devices/<bunch of sas topology here>/manage_start_stop = 0
> for the SATA devices hanging off the SAS controller.

Yep, looks like you're right.  For my system:

# cat /sys/block/sd?/device/scsi_disk/*/manage_start_stop
1
1
1
1
1
0
0
0
0
0
0
0
0

Those first 5 disks are SATA disks on SATA controllers.  The last 8
disks are SATA disks on the SAS controller.

> Setting that sysfs
> attribute to 1 is supposed to enable the SCSI layer to send TUR when it sees
> "LU not ready", as well as spin down the drives at suspend/poweroff time.

Setting it to 1 doesn't seem to have made any difference, however.

# cat /sys/block/sdm/device/scsi_disk/14\:0\:7\:0/manage_start_stop
0
# echo 1 > /sys/block/sdm/device/scsi_disk/14\:0\:7\:/manage_start_stop
# cat /sys/block/sdm/device/scsi_disk/14\:0\:7\:0/manage_start_stop
1
# hdparm -y /dev/sdm

/dev/sdm:
 issuing standby command
# hdparm -C /dev/sdm

/dev/sdm:
 drive state is:  standby
# dd if=/dev/sdm of=/dev/null bs=512 count=1
dd: reading `/dev/sdm': Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 0.00117802 s, 0.0 kB/s

... and on the scsi logging side, I see the read(10) to the disk which
immediately returns "Not Ready" and the I/O failure bubbles up the
chain.  And afterwards, the disk is still asleep.

# hdparm -C /dev/sdm

/dev/sdm:
 drive state is:  standby

Also, TURs don't appear to actually wake the disk up (should they?).
The only thing I've found that'll wake the disk up is an explicit START
UNIT command.

-- Rob

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-09 22:08   ` NeilBrown
@ 2012-07-10  0:03     ` Matthias Prager
  0 siblings, 0 replies; 35+ messages in thread
From: Matthias Prager @ 2012-07-10  0:03 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-scsi, linux-raid, Matthias Prager

Am 10.07.2012 00:08, schrieb NeilBrown:
> On Mon, 09 Jul 2012 16:40:15 +0200 Matthias Prager <linux@matthiasprager.de>
> wrote:
> 
>> Even though it says creating aborted it still created md127.
> 
> One of my pet peeves in when people interpret the observations wrongly and
> then report their interpretation instead of their observation.  However
> sometimes it is very hard to separate the two.  You comment above looks
> perfectly reasonable and looks like a clean observation and not and
> interpretation.  Yet it is an interpretation :-)
> 
> The observation would be
>    "Even though it says creating abort, md127 was still created".
> 
> You see, it wasn't this mdadm that created md127 - it certainly shouldn't
> have as you asked it to create md1.
Sry - I jumped to conclusions without knowing what was actually going on.

> 
> I don't know the exact sequence of events, but something - possibly relating
> to the error messages reported below - caused udev to notice /dev/sda.
> udev then ran "mdadm -I /dev/sda" and as it had some metadata on it, it
> created an array with it.  As the name information in that metadata was
> probably "test1" or similar, rather than "1", mdadm didn't know what number
> was wanted for the array, so it chose a free high number - 127.
> 
> This metadata must have been left over from an earlier experiment.
That is correct (as am just realizing now). There is metadata of an
raid1 array left on the disk even though it was used (for a short time)
with zfs on freebsd before doing these experiments.

> 
> So it might have been something like.
> 
> - you run mdadm (call this mdadm-1).
> - mdadm tries to open sda
> - driver notices that device is asleep, and wakes it up
> - the waking up of the device causes a CHANGE uevent to udev
> - this cause udev to run a new mdadm - mdadm-2
> - mdadm-2 reads the metadata, sees old metadata, assembled sda in a new md127
> - mdadm-1 gets scheduled again, tries to get O_EXCL access to sda and fails, 
>   because sda is now part of md127
> 
> Clearly undesirable behaviour.  I'm not sure which bit is "wrong".
As it turns out mdadm is doing everything right. md127 is actually
already present (though inactive) at boot-time. So mdadm is absolutly
correct in saying sda is busy and refusing to do anything further.

> 
> NeilBrown
> 

The real problem seems to be located in some layer below md, which is
not waking up the disk for any i/o (at all - not even for fdisk -l).

Matthias

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-09 19:37   ` Robert Trace
  2012-07-09 20:45     ` Darrick J. Wong
@ 2012-07-10  0:12     ` Matthias Prager
  2012-07-10  1:51       ` Robert Trace
  1 sibling, 1 reply; 35+ messages in thread
From: Matthias Prager @ 2012-07-10  0:12 UTC (permalink / raw)
  To: Robert Trace; +Cc: linux-scsi, linux-raid, Matthias Prager

Am 09.07.2012 21:37, schrieb Robert Trace:
>> I did some further research regarding my problem.
>> It appears to me the fault does not lie with the mpt2sas driver (not
>> that I can definitely exclude it), but with the md implementation.
> 
> I'm actually discovering some of the same issues (LSI 9211-8i w/ SATA
> disks), but I've come to a slightly different conclusion.
> 
> I noticed that when my SATA disks are on a SATA controller and they spin
> down (or are spun down via hdparm -y), then they response to TUR (TEST
> UNIT READY) commands with an OK.  Any I/O sent to these disks simply
> wait while the disks spin up and then complete as usual.
> 
> However, my SATA disks on the SAS controller respond to TUR with the
> sense error "Not Ready/Initializing command required".  Any I/O sent to
> these disks immediately fails.  You saw this in your logging:
> 
>> [  604.838640] sd 2:0:0:0: [sda] Device not ready
>> [  604.838645] sd 2:0:0:0: [sda]  Result: hostbyte=DID_OK
>> driverbyte=DRIVER_SENSE
>> [  604.838655] sd 2:0:0:0: [sda]  Sense Key : Not Ready [current]
>> [  604.838663] sd 2:0:0:0: [sda]  Add. Sense: Logical unit not ready,
>> initializing command required
>> [  604.838668] sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00 00 08 00 00 00
>> 20 00
>> [  604.838680] end_request: I/O error, dev sda, sector 2048
>> [  604.838688] Buffer I/O error on device md127, logical block 0
>> [  604.838695] Buffer I/O error on device md127, logical block 1
>> [  604.838699] Buffer I/O error on device md127, logical block 2
>> [  604.838702] Buffer I/O error on device md127, logical block 3
> 
> Sending an explicit START UNIT command to these sleeping disks will wake
> them up and then they behave normally.  (BTW, you can issue TURs and
> START UNITs via the sg_turs and sg_start commands).
Thanks for these pointers.

> 
> I've reproduced this behavior on the raw disks themselves, no MD layer
> involved (although the freak-out by my MD layer is what alerted me to
> this issue too... Having your entire array punted the first time you
> access it is a little scary :-).  I'm also on raw hardware and I've seen
> this behavior on kernels 3.0.33 through 3.4.4.
This is interesting - are you sure about 3.0.33? I'm running this kernel
atm for it gives me no trouble (as opposed to >=3.1.10). The SATA disks
are spun up when I access data on them.

> 
> So, SATA disks respond differently depending on the controller they're
> on.  I don't know if this is a SCSI thing, a SAS thing or a
> firmware/driver thing for the 9211.
> 
> Now, whether or not the MD layer should be assembling arrays from
> "failed" disks is, I think, a separate issue.
I realize now in my cases the MD layer behaved correctly.

> 
> -- Rob
> 



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-09 22:24       ` Robert Trace
@ 2012-07-10  0:21         ` Matthias Prager
  2012-07-10  1:56           ` Robert Trace
  2012-07-10 16:54         ` Darrick J. Wong
  1 sibling, 1 reply; 35+ messages in thread
From: Matthias Prager @ 2012-07-10  0:21 UTC (permalink / raw)
  To: Robert Trace; +Cc: djwong, linux-scsi, linux-raid, Matthias Prager

Am 10.07.2012 00:24, schrieb Robert Trace:
> 
> Also, TURs don't appear to actually wake the disk up (should they?).
> The only thing I've found that'll wake the disk up is an explicit START
> UNIT command.

I haven't checked the scsi logging side, but about the only commands
that wake up the disks are 'smartctl -a /dev/sda' and 'sg_start'
(smartcl maybe issuing a START UNIT command on it's own).

Matthias

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-10  0:12     ` Matthias Prager
@ 2012-07-10  1:51       ` Robert Trace
  2012-07-10 23:27         ` Robert Trace
  0 siblings, 1 reply; 35+ messages in thread
From: Robert Trace @ 2012-07-10  1:51 UTC (permalink / raw)
  To: Matthias Prager; +Cc: linux-scsi

[removed linux-raid since the md layer seems unrelated]

On 07/09/2012 08:12 PM, Matthias Prager wrote:
>>
>> I've reproduced this behavior on the raw disks themselves, no MD layer
>> involved (although the freak-out by my MD layer is what alerted me to
>> this issue too... Having your entire array punted the first time you
>> access it is a little scary :-).  I'm also on raw hardware and I've seen
>> this behavior on kernels 3.0.33 through 3.4.4.
> This is interesting - are you sure about 3.0.33? I'm running this kernel
> atm for it gives me no trouble (as opposed to >=3.1.10). The SATA disks
> are spun up when I access data on them.

Huh..  I just retested this and I'm seeing really random behavior.

I tried 3.0.33 a few days ago after I saw your initial e-mail to this
list.  At that time, the one disk I tried didn't wake up when I sent I/O
to it.

My first retest (just now), on 3.0.33 with four disks, showed the
behavior you initially reported.  Two of the disks woke up from the I/O,
but not all of them.

Repeating the test without rebooting made two disks wake up, but only
one of the same disks from the first test.  The second disk that woke up
was different.

After rebooting and running the test again, none of the disks woke up.

Rebooting again and all of the disks are waking up.

(FYI, here's the test I ran:

1.  hdparm -y /dev/sd[lmjk]
2.  hdparm -C /dev/sd[lmjk] (to verify disks in standby)
3.  for i in l m j k; do sg_turs -v /dev/sd${i}; done
(All disks reported "Not Ready")
4.  echo 3 > /proc/sys/vm/drop_caches
5.  for i in l m j k; do dd if=/dev/sd${i} of=/dev/null bs=512 count=1
skip=<number>; done

I've been manually changing the skip=<number> because I've seen the dd
command complete successfully without the disk waking up.  I think this
is because the disk is satisfying the read from its own cache.  Changing
where on the disk I'm reading should thwart this.
)

I'm confused.  I'll try more recent kernels again and see if the
behavior becomes predictable.

-- Rob

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-10  0:21         ` Matthias Prager
@ 2012-07-10  1:56           ` Robert Trace
  0 siblings, 0 replies; 35+ messages in thread
From: Robert Trace @ 2012-07-10  1:56 UTC (permalink / raw)
  To: Matthias Prager; +Cc: djwong, linux-scsi

On 07/09/2012 08:21 PM, Matthias Prager wrote:
>
> I haven't checked the scsi logging side, but about the only commands
> that wake up the disks are 'smartctl -a /dev/sda' and 'sg_start'
> (smartcl maybe issuing a START UNIT command on it's own).

smartctl -a does appear to wake the disks.  The scsi log shows an
IDENTIFY and then several ATA passthrough commands (one of which takes
~10 seconds to complete).  So, I don't see an explicit START UNIT, but
one of those ATA commands which I didn't decode could certainly trigger
the wakeup.

-- Rob

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-09 22:24       ` Robert Trace
  2012-07-10  0:21         ` Matthias Prager
@ 2012-07-10 16:54         ` Darrick J. Wong
  1 sibling, 0 replies; 35+ messages in thread
From: Darrick J. Wong @ 2012-07-10 16:54 UTC (permalink / raw)
  To: Robert Trace; +Cc: Matthias Prager, linux-scsi, linux-raid

On Mon, Jul 09, 2012 at 06:24:15PM -0400, Robert Trace wrote:
> On 07/09/2012 04:45 PM, Darrick J. Wong wrote:
> >
> > I suspect that /sys/devices/<bunch of sas topology here>/manage_start_stop = 0
> > for the SATA devices hanging off the SAS controller.
> 
> Yep, looks like you're right.  For my system:
> 
> # cat /sys/block/sd?/device/scsi_disk/*/manage_start_stop
> 1
> 1
> 1
> 1
> 1
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 
> Those first 5 disks are SATA disks on SATA controllers.  The last 8
> disks are SATA disks on the SAS controller.
> 
> > Setting that sysfs
> > attribute to 1 is supposed to enable the SCSI layer to send TUR when it sees
> > "LU not ready", as well as spin down the drives at suspend/poweroff time.
> 
> Setting it to 1 doesn't seem to have made any difference, however.
> 
> # cat /sys/block/sdm/device/scsi_disk/14\:0\:7\:0/manage_start_stop
> 0
> # echo 1 > /sys/block/sdm/device/scsi_disk/14\:0\:7\:/manage_start_stop
> # cat /sys/block/sdm/device/scsi_disk/14\:0\:7\:0/manage_start_stop
> 1
> # hdparm -y /dev/sdm
> 
> /dev/sdm:
>  issuing standby command
> # hdparm -C /dev/sdm
> 
> /dev/sdm:
>  drive state is:  standby
> # dd if=/dev/sdm of=/dev/null bs=512 count=1
> dd: reading `/dev/sdm': Input/output error
> 0+0 records in
> 0+0 records out
> 0 bytes (0 B) copied, 0.00117802 s, 0.0 kB/s
> 
> ... and on the scsi logging side, I see the read(10) to the disk which
> immediately returns "Not Ready" and the I/O failure bubbles up the
> chain.  And afterwards, the disk is still asleep.
> 
> # hdparm -C /dev/sdm
> 
> /dev/sdm:
>  drive state is:  standby
> 
> Also, TURs don't appear to actually wake the disk up (should they?).
> The only thing I've found that'll wake the disk up is an explicit START
> UNIT command.

Sorry, I misspoke, manage_start_stop=1 sends START UNIT, not TUR.  Also, it
only manages spindown/up at suspend/resume time, hence the behavior you see.
The relevant source code is sd_start_stop_device() in drivers/scsi/sd.c.

--D
> 
> -- Rob
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-10  1:51       ` Robert Trace
@ 2012-07-10 23:27         ` Robert Trace
  2012-07-11 12:19           ` Matthias Prager
  0 siblings, 1 reply; 35+ messages in thread
From: Robert Trace @ 2012-07-10 23:27 UTC (permalink / raw)
  To: Matthias Prager; +Cc: linux-scsi

On 07/09/2012 09:51 PM, Robert Trace wrote:
> 
> Huh..  I just retested this and I'm seeing really random behavior.

Ok, with a refined test I've been able to reliably reproduce this and I
bisected it back to commit 85ef06d1d252f6a2e73b678591ab71caad4667bb in
Linus' tree (introduced between 3.0 and 3.1):

commit 85ef06d1d252f6a2e73b678591ab71caad4667bb
Author: Tejun Heo <tj@kernel.org>
Date:   Fri Jul 1 16:17:47 2011 +0200

    block: flush MEDIA_CHANGE from drivers on close(2)

Prior to the above commit, sleeping disks will spin up as a result of
I/O sent to them.  With the above commit, they don't spin up and
immediately return an I/O failure.

That's all the further I've gotten so far.  I'll be happy to test any
patches or suggestions.

-- Rob

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-10 23:27         ` Robert Trace
@ 2012-07-11 12:19           ` Matthias Prager
  2012-07-11 13:48             ` Matthias Prager
  0 siblings, 1 reply; 35+ messages in thread
From: Matthias Prager @ 2012-07-11 12:19 UTC (permalink / raw)
  To: Robert Trace; +Cc: linux-scsi, Matthias Prager

Am 11.07.2012 01:27, schrieb Robert Trace:
> On 07/09/2012 09:51 PM, Robert Trace wrote:
>>
>> Huh..  I just retested this and I'm seeing really random behavior.
> 
> Ok, with a refined test I've been able to reliably reproduce this and I
> bisected it back to commit 85ef06d1d252f6a2e73b678591ab71caad4667bb in
> Linus' tree (introduced between 3.0 and 3.1):
> 
> commit 85ef06d1d252f6a2e73b678591ab71caad4667bb
> Author: Tejun Heo <tj@kernel.org>
> Date:   Fri Jul 1 16:17:47 2011 +0200
> 
>     block: flush MEDIA_CHANGE from drivers on close(2)
> 
> Prior to the above commit, sleeping disks will spin up as a result of
> I/O sent to them.  With the above commit, they don't spin up and
> immediately return an I/O failure.
This is good news thank you. I can confirm your findings - omitting
commit 85ef06d1d252f6a2e73b678591ab71caad4667bb solves my initial issue
here (with 3.1.10).

> 
> That's all the further I've gotten so far.  I'll be happy to test any
> patches or suggestions.
> 
> -- Rob
> 



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-11 12:19           ` Matthias Prager
@ 2012-07-11 13:48             ` Matthias Prager
  2012-07-17 18:09               ` Tejun Heo
  0 siblings, 1 reply; 35+ messages in thread
From: Matthias Prager @ 2012-07-11 13:48 UTC (permalink / raw)
  To: Robert Trace; +Cc: Matthias Prager, linux-scsi, Tejun Heo, Jens Axboe

I just tested kernel version 3.4.4 without commit
85ef06d1d252f6a2e73b678591ab71caad4667bb and it also works fine (beware
of commit 62d3c5439c534b0e6c653fc63e6d8c67be3a57b1 as it conflicts with
reverting 85ef06d1d252f6a2e73b678591ab71caad4667bb).

I'm trying to understand why this commit leads to the issue of i/o
failing on spun down drives, in hopes of being able to fix it. Meanwhile
maybe Tejun Heo (author of the commit) or Jens Axboe (the committer) are
able to shed some light on this (I've included them in the CC list).

Matthias

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-11 13:48             ` Matthias Prager
@ 2012-07-17 18:09               ` Tejun Heo
  2012-07-17 19:39                 ` Matthias Prager
  0 siblings, 1 reply; 35+ messages in thread
From: Tejun Heo @ 2012-07-17 18:09 UTC (permalink / raw)
  To: Matthias Prager; +Cc: Robert Trace, linux-scsi, Jens Axboe

Hello,

On Wed, Jul 11, 2012 at 03:48:00PM +0200, Matthias Prager wrote:
> I just tested kernel version 3.4.4 without commit
> 85ef06d1d252f6a2e73b678591ab71caad4667bb and it also works fine (beware
> of commit 62d3c5439c534b0e6c653fc63e6d8c67be3a57b1 as it conflicts with
> reverting 85ef06d1d252f6a2e73b678591ab71caad4667bb).
> 
> I'm trying to understand why this commit leads to the issue of i/o
> failing on spun down drives, in hopes of being able to fix it. Meanwhile
> maybe Tejun Heo (author of the commit) or Jens Axboe (the committer) are
> able to shed some light on this (I've included them in the CC list).

Nothing rings a bell for me.  How does it fail?  The only thing it
change is when and which media check commands are issued.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-17 18:09               ` Tejun Heo
@ 2012-07-17 19:39                 ` Matthias Prager
  2012-07-17 20:01                   ` Tejun Heo
  0 siblings, 1 reply; 35+ messages in thread
From: Matthias Prager @ 2012-07-17 19:39 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Robert Trace, linux-scsi, Jens Axboe

Hello Tejun,

Am 17.07.2012 20:09, schrieb Tejun Heo:
> Hello,
> 
> On Wed, Jul 11, 2012 at 03:48:00PM +0200, Matthias Prager wrote:
>> I'm trying to understand why this commit leads to the issue of i/o
>> failing on spun down drives, in hopes of being able to fix it. Meanwhile
>> maybe Tejun Heo (author of the commit) or Jens Axboe (the committer) are
>> able to shed some light on this (I've included them in the CC list).
> 
> Nothing rings a bell for me.  How does it fail?  The only thing it
> change is when and which media check commands are issued.

I will try to describe the issue as best as I can (please feel free to
point me to more helpful debugging steps or guides):
Whenever I put a drive to sleep (either via 'hdparm -y ...' or by
letting it run into standby timeout) and issue i/o's afterwards (like
with the help of 'fdisk -l') I get back i/o errors (along the lines of
'end_request: I/O error, ...' - see previous posts in this thread) and
the drive remains in standby (instead of waking up).

Robert (who also saw these errors) bisected the issue down to your
patch. And without it kernels 3.1.10 + 3.4.4 run smoothly for him and me.

I could not however reproduce the issue on any other device than a LSI
SAS controller (using SATA disks) - on a regular ICH10 using AHCI and a
SATA drive I don't see these i/o errors. But since I'm experiencing
these issues on two different systems (both with lsi controllers while
running vmware-guests on them) and Robert sees them on his
(non-virtualized) system with the same lsi controller (9211-8i), I'm
inclined to make the following assumptions:
Either it is an issue which is limited to this controller and possibly
sata disks hanging off it or it is a more general issue with sas
controllers and sata disks (again it could well affect sas disks too).
Lacking other controllers or sas disks I can't be sure.

Thank you for taking the time to look into this - it's much appreciated
Matthias

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-17 19:39                 ` Matthias Prager
@ 2012-07-17 20:01                   ` Tejun Heo
  2012-07-21 12:15                     ` Matthias Prager
  0 siblings, 1 reply; 35+ messages in thread
From: Tejun Heo @ 2012-07-17 20:01 UTC (permalink / raw)
  To: Matthias Prager
  Cc: Robert Trace, linux-scsi, Jens Axboe, Eric Moore, James E.J. Bottomley

Hello,

On Tue, Jul 17, 2012 at 09:39:41PM +0200, Matthias Prager wrote:
> I could not however reproduce the issue on any other device than a LSI
> SAS controller (using SATA disks) - on a regular ICH10 using AHCI and a
> SATA drive I don't see these i/o errors. But since I'm experiencing
> these issues on two different systems (both with lsi controllers while
> running vmware-guests on them) and Robert sees them on his
> (non-virtualized) system with the same lsi controller (9211-8i), I'm
> inclined to make the following assumptions:
> Either it is an issue which is limited to this controller and possibly
> sata disks hanging off it or it is a more general issue with sas
> controllers and sata disks (again it could well affect sas disks too).
> Lacking other controllers or sas disks I can't be sure.

So, nothing in the libata stack generates NOT_READY - "initializing
command required".  I suppose it's LSI firmware / driver translating
TUR to CHECK_POWER_MODE and generating NOT_READY.  I don't know what
SAT says about this but this can't be correct.  An ATA device in
standby mode is ready to process any commands.  It should be able to
come back to full operation on demand as necessary and that's why it
can be transparently enabled from device side.  Eric?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-17 20:01                   ` Tejun Heo
@ 2012-07-21 12:15                     ` Matthias Prager
  2012-07-22 17:31                       ` Tejun Heo
  0 siblings, 1 reply; 35+ messages in thread
From: Matthias Prager @ 2012-07-21 12:15 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Robert Trace, linux-scsi, Jens Axboe, Eric Moore,
	James E.J. Bottomley, Alan, Darrick J. Wong

Am 17.07.2012 22:01, schrieb Tejun Heo:
> On Tue, Jul 17, 2012 at 09:39:41PM +0200, Matthias Prager wrote:
>> I could not however reproduce the issue on any other device than a LSI
>> SAS controller (using SATA disks) - on a regular ICH10 using AHCI and a
>> SATA drive I don't see these i/o errors. But since I'm experiencing
>> these issues on two different systems (both with lsi controllers while
>> running vmware-guests on them) and Robert sees them on his
>> (non-virtualized) system with the same lsi controller (9211-8i), I'm
>> inclined to make the following assumptions:
>> Either it is an issue which is limited to this controller and possibly
>> sata disks hanging off it or it is a more general issue with sas
>> controllers and sata disks (again it could well affect sas disks too).
>> Lacking other controllers or sas disks I can't be sure.
> 
> So, nothing in the libata stack generates NOT_READY - "initializing
> command required".  I suppose it's LSI firmware / driver translating
> TUR to CHECK_POWER_MODE and generating NOT_READY.  I don't know what
> SAT says about this but this can't be correct.  An ATA device in
> standby mode is ready to process any commands.  It should be able to
> come back to full operation on demand as necessary and that's why it
> can be transparently enabled from device side.  Eric?
> 

While reading the linux-scsi mailing list I stumbled upon

'[Bug 16070] Fail to issue Start/Stop Unit'
<http://marc.info/?l=linux-scsi&m=134278835822649&w=2>
(bugtracker: <https://bugzilla.kernel.org/show_bug.cgi?id=16070>)

which lead me to trying to enable the 'allow_restart' flag for my disks.
With this workaround a vanilla kernel 3.4.5 does not exhibit the i/o
errors on sleeping sata disks hanging off sas controllers.


I'm currently running one of my systems with a

'echo 1 | tee /sys/block/sd?/device/scsi_disk/*/allow_restart >/dev/null'

line added to the init scripts. This way I can use the untouched kernel
sources and still get around the i/o errors. But I reckon this is no
solution.


I'm no expert on scsi/sas/ata internals, so please take the following
thoughts with a grain of salt:

As far as I can see (and Tejun confirmed that - I think) Tejun commit
85ef06d1d252f6a2e73b678591ab71caad4667bb somehow exposes a bug, which
lies deeper in the sas/ata code. The 'sas_slave_configure()' function in
'drivers/scsi/libsas/sas_scsi_host.c' sets the 'allow_restart' flag for
sas disks hanging off sas controllers. But if it encounters a sata disk
it calls 'ata_sas_slave_configure()' in 'drivers/ata/libata_scsi.c'
instead and returns without enabling the 'allow_restart' flag. A simple
fix would be to set allow_restart=1 after having called
'ata_sas_slave_configure()' but before returning (in
'sas_slave_configure()').

Now I'm not sure this isn't taping over another bug. Which leads me to
my question: What is the correct behavior?

#1 Issuing a separate spin-up command (START UNIT?) prior to sending i/o
by setting allow_restart=1 for sata disks on sas controllers

or

#2 Teaching the sas drivers they do not need spin-up commands and can
simply start issuing i/o to sata disks

--
Matthias

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-21 12:15                     ` Matthias Prager
@ 2012-07-22 17:31                       ` Tejun Heo
  2012-07-22 23:14                         ` Matthias Prager
                                           ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Tejun Heo @ 2012-07-22 17:31 UTC (permalink / raw)
  To: Matthias Prager
  Cc: Robert Trace, linux-scsi, Jens Axboe, Eric Moore,
	James E.J. Bottomley, Alan, Darrick J. Wong

Hello,

On Sat, Jul 21, 2012 at 02:15:56PM +0200, Matthias Prager wrote:
> Now I'm not sure this isn't taping over another bug. Which leads me to
> my question: What is the correct behavior?
> 
> #1 Issuing a separate spin-up command (START UNIT?) prior to sending i/o
> by setting allow_restart=1 for sata disks on sas controllers
> 
> or
> 
> #2 Teaching the sas drivers they do not need spin-up commands and can
> simply start issuing i/o to sata disks

I haven't consulted SAT but it seems like a bug in SAS driver or
firmware.  If it's a driver bug, we better fix it there.  If a
firmware bug, working around those is one of major roles of drivers,
so I think setting allow_restart is fine.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-22 17:31                       ` Tejun Heo
@ 2012-07-22 23:14                         ` Matthias Prager
  2012-07-23 15:26                           ` Tejun Heo
  2012-07-25 14:19                         ` James Bottomley
  2012-07-25 22:35                         ` tomm
  2 siblings, 1 reply; 35+ messages in thread
From: Matthias Prager @ 2012-07-22 23:14 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Robert Trace, linux-scsi, Jens Axboe, Eric Moore,
	James E.J. Bottomley, Alan, Darrick J. Wong, Matthias Prager

Hello Tejun,

Am 22.07.2012 19:31, schrieb Tejun Heo:>
> I haven't consulted SAT but it seems like a bug in SAS driver or
> firmware.  If it's a driver bug, we better fix it there.  If a
> firmware bug, working around those is one of major roles of drivers,
> so I think setting allow_restart is fine.

as it turns out my workaround (setting allow_restart=1) isn't all that
useful after all. There are no more i/o errors because the drive just
never goes to standby mode anymore (at least 'hdparm -y /dev/sda' does
not seem to have any effect anymore). I don't really understand why - do
sas drives ever get to standby mode? (they have allow_restart=1 set by
default) And is this desired or expected behavior for sata disk on sas
controllers?

For the moment the only way for me to have my sata drives sleeping
without i/o errors is to revert your original commit
(85ef06d1d252f6a2e73b678591ab71caad4667bb - tested with kernels 3.1.10,
3.4.4, 3.4.5, 3.4.6 and 3.5.0)

--
Matthias

P.S. I hope I'm not getting on everybody's nerves here (especially yours
Tejun)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-22 23:14                         ` Matthias Prager
@ 2012-07-23 15:26                           ` Tejun Heo
  2012-07-24 22:04                             ` Matthias Prager
  0 siblings, 1 reply; 35+ messages in thread
From: Tejun Heo @ 2012-07-23 15:26 UTC (permalink / raw)
  To: Matthias Prager
  Cc: Robert Trace, linux-scsi, Jens Axboe, Eric Moore,
	James E.J. Bottomley, Alan, Darrick J. Wong, Matthias Prager

Hello,

On Mon, Jul 23, 2012 at 01:14:00AM +0200, Matthias Prager wrote:
> as it turns out my workaround (setting allow_restart=1) isn't all that
> useful after all. There are no more i/o errors because the drive just
> never goes to standby mode anymore (at least 'hdparm -y /dev/sda' does
> not seem to have any effect anymore). I don't really understand why - do
> sas drives ever get to standby mode? (they have allow_restart=1 set by
> default) And is this desired or expected behavior for sata disk on sas
> controllers?
> 
> For the moment the only way for me to have my sata drives sleeping
> without i/o errors is to revert your original commit
> (85ef06d1d252f6a2e73b678591ab71caad4667bb - tested with kernels 3.1.10,
> 3.4.4, 3.4.5, 3.4.6 and 3.5.0)

Hmmm... I think we definitely need help from mpt people.  Ping, guys.

> P.S. I hope I'm not getting on everybody's nerves here (especially yours
> Tejun)

Not at all. :)

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-23 15:26                           ` Tejun Heo
@ 2012-07-24 22:04                             ` Matthias Prager
  2012-07-25 10:26                               ` Reddy, Sreekanth
  0 siblings, 1 reply; 35+ messages in thread
From: Matthias Prager @ 2012-07-24 22:04 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Robert Trace, linux-scsi, Jens Axboe, Eric Moore,
	James E.J. Bottomley, Alan, Darrick J. Wong, Matthias Prager

Hello everyone,

I retested with a new firmware (P14 - released today), since it contains
a bunch of sata and SATL fixes (according to the changelog).
Unfortunately the observed behavior is unchanged (tested on a 3.4.5 kernel).

Just wanted to let everyone know.

Cheers
Matthias

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-24 22:04                             ` Matthias Prager
@ 2012-07-25 10:26                               ` Reddy, Sreekanth
  0 siblings, 0 replies; 35+ messages in thread
From: Reddy, Sreekanth @ 2012-07-25 10:26 UTC (permalink / raw)
  To: Matthias Prager, Tejun Heo
  Cc: Robert Trace, linux-scsi, Jens Axboe, Moore, Eric,
	James E.J. Bottomley, Alan, Darrick J. Wong

Hi,

We have done some analysis on this issue. From our analysis we observed that, this issue is reproducible on kernel 3.1.10 onwards but in 3.0.36 this issue is not reproducible. So, we have taken the mpt2sas code from 3.1.10 kernel and compiled and run it on 3.0.36 kernel. Here this issue is not reproducible (i.e. it is working fine). 

From 3.0.36 kernel onwards we have not added any patches that will cause this issue. So, what I mean to say is "this issue is not because of mpt2sas driver".

Regards,
Sreekanth.  

> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
> owner@vger.kernel.org] On Behalf Of Matthias Prager
> Sent: Wednesday, July 25, 2012 3:34 AM
> To: Tejun Heo
> Cc: Robert Trace; linux-scsi@vger.kernel.org; Jens Axboe; Moore, Eric;
> James E.J. Bottomley; Alan; Darrick J. Wong; Matthias Prager
> Subject: Re: 'Device not ready' issue on mpt2sas since 3.1.10
> 
> Hello everyone,
> 
> I retested with a new firmware (P14 - released today), since it
> contains
> a bunch of sata and SATL fixes (according to the changelog).
> Unfortunately the observed behavior is unchanged (tested on a 3.4.5
> kernel).
> 
> Just wanted to let everyone know.
> 
> Cheers
> Matthias
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-22 17:31                       ` Tejun Heo
  2012-07-22 23:14                         ` Matthias Prager
@ 2012-07-25 14:19                         ` James Bottomley
  2012-07-25 17:17                           ` Tejun Heo
  2012-07-25 22:35                         ` tomm
  2 siblings, 1 reply; 35+ messages in thread
From: James Bottomley @ 2012-07-25 14:19 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Matthias Prager, Robert Trace, linux-scsi, Jens Axboe,
	Eric Moore, Alan, Darrick J. Wong, linux-ide

On Sun, 2012-07-22 at 10:31 -0700, Tejun Heo wrote:
> Hello,
> 
> On Sat, Jul 21, 2012 at 02:15:56PM +0200, Matthias Prager wrote:
> > Now I'm not sure this isn't taping over another bug. Which leads me to
> > my question: What is the correct behavior?
> > 
> > #1 Issuing a separate spin-up command (START UNIT?) prior to sending i/o
> > by setting allow_restart=1 for sata disks on sas controllers
> > 
> > or
> > 
> > #2 Teaching the sas drivers they do not need spin-up commands and can
> > simply start issuing i/o to sata disks
> 
> I haven't consulted SAT but it seems like a bug in SAS driver or
> firmware.  If it's a driver bug, we better fix it there.  If a
> firmware bug, working around those is one of major roles of drivers,
> so I think setting allow_restart is fine.

Actually, I don't think so.  SAT-2 section 8.12.2 does say 

        if the device is in the stopped state as the result of
        processing a START STOP UNIT command (see 9.11), then the SATL
        shall terminate the TEST UNIT READY command with CHECK CONDITION
        status with the sense key set to NOT READY and the additional
        sense code of LOGICAL UNIT NOT READY, INITIALIZING COMMAND
        REQUIRED;

START STOP UNIT (with START=0) translates to STANDBY IMMEDIATE, and
that's what hdparm -y issues.  We don't see this in /drivers/ata because
TEST UNIT READY always returns success.

So it looks like the mpt2sas SAT is doing the correct thing and we only
don't see this problem in normal SATA devices because of a bug in the
libata-scsi SAT.

However, the kernel log

Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj] Device not ready
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj]  Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj]  Sense Key : Not Ready [current]
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj]  Add. Sense: Logical unit not ready, initializing command required
Apr 04 22:55:10 [kernel] sd 1:0:1:0: [sdj] CDB: Write(10): 2a 00 57 54 52 3f 00 00 08 00

Indicates we got the NOT READY to a non-TUR command, so I suspect what's
happening is that sending the TUR causes the SAT to remember the standby
state and respond NOT READY to all subsequent commands.  However, if we
just send an ordinary command, not a TUR, it quietly wakes the drive and
we don't see any problems.

There is support in SAT for this behaviour because there's a note on the
START STOP UNIT command saying

        After returning GOOD status for a START STOP UNIT command with
        the START bit set to zero, the SATL shall consider the ATA
        device to be in the Stopped power state (see SBC-2)

Which in SCSI terms would mean return NOT READY to any subsequent
commands.

Can someone verify this is indeed what the mpt2sas HBA is doing?

James



^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-25 14:19                         ` James Bottomley
@ 2012-07-25 17:17                           ` Tejun Heo
  2012-07-25 19:55                             ` James Bottomley
  0 siblings, 1 reply; 35+ messages in thread
From: Tejun Heo @ 2012-07-25 17:17 UTC (permalink / raw)
  To: James Bottomley
  Cc: Matthias Prager, Robert Trace, linux-scsi, Jens Axboe,
	Eric Moore, Alan, Darrick J. Wong, linux-ide

Hello, James.

On Wed, Jul 25, 2012 at 06:19:13PM +0400, James Bottomley wrote:
> > I haven't consulted SAT but it seems like a bug in SAS driver or
> > firmware.  If it's a driver bug, we better fix it there.  If a
> > firmware bug, working around those is one of major roles of drivers,
> > so I think setting allow_restart is fine.
> 
> Actually, I don't think so.  SAT-2 section 8.12.2 does say 
> 
>         if the device is in the stopped state as the result of
>         processing a START STOP UNIT command (see 9.11), then the SATL
>         shall terminate the TEST UNIT READY command with CHECK CONDITION
>         status with the sense key set to NOT READY and the additional
>         sense code of LOGICAL UNIT NOT READY, INITIALIZING COMMAND
>         REQUIRED;
> 
> START STOP UNIT (with START=0) translates to STANDBY IMMEDIATE, and
> that's what hdparm -y issues.  We don't see this in /drivers/ata because
> TEST UNIT READY always returns success.

Urgh... ATA device in standby mode is ready for any command and
definitely doesn't need an "initializing command".  Oh, well...

> So it looks like the mpt2sas SAT is doing the correct thing and we only
> don't see this problem in normal SATA devices because of a bug in the
> libata-scsi SAT.

libata is inconsistent with the standard but I think the standard is
wrong here. :(

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-25 17:17                           ` Tejun Heo
@ 2012-07-25 19:55                             ` James Bottomley
  2012-07-25 23:56                               ` Matthias Prager
  2012-08-16 18:26                               ` Robert Trace
  0 siblings, 2 replies; 35+ messages in thread
From: James Bottomley @ 2012-07-25 19:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Matthias Prager, Robert Trace, linux-scsi, Jens Axboe,
	Eric Moore, Alan, Darrick J. Wong, linux-ide

On Wed, 2012-07-25 at 10:17 -0700, Tejun Heo wrote:
> Hello, James.
> 
> On Wed, Jul 25, 2012 at 06:19:13PM +0400, James Bottomley wrote:
> > > I haven't consulted SAT but it seems like a bug in SAS driver or
> > > firmware.  If it's a driver bug, we better fix it there.  If a
> > > firmware bug, working around those is one of major roles of drivers,
> > > so I think setting allow_restart is fine.
> > 
> > Actually, I don't think so.  SAT-2 section 8.12.2 does say 
> > 
> >         if the device is in the stopped state as the result of
> >         processing a START STOP UNIT command (see 9.11), then the SATL
> >         shall terminate the TEST UNIT READY command with CHECK CONDITION
> >         status with the sense key set to NOT READY and the additional
> >         sense code of LOGICAL UNIT NOT READY, INITIALIZING COMMAND
> >         REQUIRED;
> > 
> > START STOP UNIT (with START=0) translates to STANDBY IMMEDIATE, and
> > that's what hdparm -y issues.  We don't see this in /drivers/ata because
> > TEST UNIT READY always returns success.
> 
> Urgh... ATA device in standby mode is ready for any command and
> definitely doesn't need an "initializing command".  Oh, well...

Well, it does in sleep mode ... which seems to most closely map to what
SCSI thinks of as a stopped unit. I checked the specs just in case there
was an error ... they all say STANDBY not SLEEP.

> > So it looks like the mpt2sas SAT is doing the correct thing and we only
> > don't see this problem in normal SATA devices because of a bug in the
> > libata-scsi SAT.
> 
> libata is inconsistent with the standard but I think the standard is
> wrong here. :(

Well, reading it, so do I.  Unfortunately, we get to deal with the world
as it is rather than as we would wish it to be.  We likely have this
problem with a lot of USB SATLs as well ...

It looks like a hack like this might be needed.

James

---

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 4a6381c..7e59a7f 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -42,6 +42,8 @@
 
 #include <trace/events/scsi.h>
 
+static void scsi_eh_done(struct scsi_cmnd *scmd);
+
 #define SENSE_TIMEOUT		(10*HZ)
 
 /*
@@ -241,6 +243,14 @@ static int scsi_check_sense(struct scsi_cmnd *scmd)
 	if (! scsi_command_normalize_sense(scmd, &sshdr))
 		return FAILED;	/* no valid sense data */
 
+	if (scmd->cmnd[0] == TEST_UNIT_READY && scmd->scsi_done != scsi_eh_done)
+		/* 
+		 * nasty: for mid-layer issued TURs, we need to return the
+		 * actual sense data without any recovery attempt.  For eh
+		 * issued ones, we need to try to recover and interpret
+		 */
+		return SUCCESS;
+
 	if (scsi_sense_is_deferred(&sshdr))
 		return NEEDS_RETRY;
 
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 56a9379..91d3366 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -764,6 +764,16 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result,
 	sdev->model = (char *) (sdev->inquiry + 16);
 	sdev->rev = (char *) (sdev->inquiry + 32);
 
+	if (strncmp(sdev->vendor, "ATA     ", 8) == 0) {
+		/* 
+		 * sata emulation layer device.  This is a hack to work around
+		 * the SATL power management specifications which state that
+		 * when the SATL detects the device has gone into standby
+		 * mode, it shall respond with NOT READY.
+		 */
+		sdev->allow_restart = 1;
+	}
+
 	if (*bflags & BLIST_ISROM) {
 		sdev->type = TYPE_ROM;
 		sdev->removable = 1;



^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-22 17:31                       ` Tejun Heo
  2012-07-22 23:14                         ` Matthias Prager
  2012-07-25 14:19                         ` James Bottomley
@ 2012-07-25 22:35                         ` tomm
  2012-07-26 19:20                           ` Robert Trace
  2 siblings, 1 reply; 35+ messages in thread
From: tomm @ 2012-07-25 22:35 UTC (permalink / raw)
  To: linux-scsi

Tejun Heo <tj <at> kernel.org> writes:

> 
> Hello,
> 
> On Sat, Jul 21, 2012 at 02:15:56PM +0200, Matthias Prager wrote:
> > Now I'm not sure this isn't taping over another bug. Which leads me to
> > my question: What is the correct behavior?
> > 
> > #1 Issuing a separate spin-up command (START UNIT?) prior to sending i/o
> > by setting allow_restart=1 for sata disks on sas controllers
> > 
> > or
> > 
> > #2 Teaching the sas drivers they do not need spin-up commands and can
> > simply start issuing i/o to sata disks
> 
> I haven't consulted SAT but it seems like a bug in SAS driver or
> firmware.  If it's a driver bug, we better fix it there.  If a
> firmware bug, working around those is one of major roles of drivers,
> so I think setting allow_restart is fine.
> 
> Thanks.
> 
If this is a driver or firmware bug, then why would commit
85ef06d1d252f6a2e73b678591ab71caad4667bb 
cause this to happen?  What is the interaction between this issue
and this commit which just flushes events?

Also this issue does not happen with mvsas, only with mpt2sas.





^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-25 19:55                             ` James Bottomley
@ 2012-07-25 23:56                               ` Matthias Prager
  2012-07-26 19:16                                 ` Robert Trace
  2012-08-16 18:26                               ` Robert Trace
  1 sibling, 1 reply; 35+ messages in thread
From: Matthias Prager @ 2012-07-25 23:56 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Robert Trace, linux-scsi, Jens Axboe, Eric Moore,
	Alan, Darrick J. Wong, linux-ide, Matthias Prager

Hello James,

Am 25.07.2012 21:55, schrieb James Bottomley:>
> It looks like a hack like this might be needed.
>
> James
>

<SNIP>

I don't yet understand all the code but I'm following your discussion
with Tejun: I've set up a minimal vm running gentoo with a mpt2sas
driven controller in passthrough mode. I've applied your proposed patch
against the vanilla 3.5.0 kernel (which includes Tejun's commit), and
I'm happy to report the problem does seem to get fixed by it.
Well at least sending the sata drive in standby using 'hdparm -y' now
works (according to 'hdparm -C') without these nasty i/o errors on later
i/o. That is to say the drive wakes up again (e.g. from a 'fdisk -l
/dev/sda' command) and returns data.

--
Matthias

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-25 23:56                               ` Matthias Prager
@ 2012-07-26 19:16                                 ` Robert Trace
  0 siblings, 0 replies; 35+ messages in thread
From: Robert Trace @ 2012-07-26 19:16 UTC (permalink / raw)
  To: Matthias Prager
  Cc: James Bottomley, Tejun Heo, linux-scsi, Jens Axboe, Eric Moore,
	Alan, Darrick J. Wong, linux-ide

On 07/25/2012 07:56 PM, Matthias Prager wrote:
> 
> I don't yet understand all the code but I'm following your discussion
> with Tejun: I've set up a minimal vm running gentoo with a mpt2sas
> driven controller in passthrough mode. I've applied your proposed patch
> against the vanilla 3.5.0 kernel (which includes Tejun's commit), and
> I'm happy to report the problem does seem to get fixed by it.

I can confirm this on my hardware as well with both 3.4.4 and 3.5.0.
Without James' patch the kernels will immediately drop the I/O and with
the patch both kernels will wake the SATA disks and then complete the
I/O successfully.

-- Robert

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-25 22:35                         ` tomm
@ 2012-07-26 19:20                           ` Robert Trace
  0 siblings, 0 replies; 35+ messages in thread
From: Robert Trace @ 2012-07-26 19:20 UTC (permalink / raw)
  To: tomm; +Cc: linux-scsi

On 07/25/2012 06:35 PM, tomm wrote:
>
> If this is a driver or firmware bug, then why would commit
> 85ef06d1d252f6a2e73b678591ab71caad4667bb 
> cause this to happen?  What is the interaction between this issue
> and this commit which just flushes events?

That's confusing to me as well.  Tejun's patch seems very unrelated to
anything related to power-state on non-removable disks.

> Also this issue does not happen with mvsas, only with mpt2sas.

Now _that_ is a useful data point.  Is that with SATA disks attached?
Why is it limited (so far) to just the mpt2sas controller?

-- Robert


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-07-25 19:55                             ` James Bottomley
  2012-07-25 23:56                               ` Matthias Prager
@ 2012-08-16 18:26                               ` Robert Trace
  2012-08-16 20:24                                 ` Matthias Prager
  1 sibling, 1 reply; 35+ messages in thread
From: Robert Trace @ 2012-08-16 18:26 UTC (permalink / raw)
  To: James Bottomley
  Cc: Tejun Heo, Matthias Prager, linux-scsi, Jens Axboe, Eric Moore,
	Alan, Darrick J. Wong, linux-ide

On 07/25/2012 03:55 PM, James Bottomley wrote:
> 
> Well, reading it, so do I.  Unfortunately, we get to deal with the world
> as it is rather than as we would wish it to be.  We likely have this
> problem with a lot of USB SATLs as well ...

Has this patch made it into the main git trees yet?

I haven't seen anything about it in nearly a month, but I've been using
the James' patch since he posted it and the sleep/wakeup behavior seems
improved/correct.

-- Robert

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-08-16 18:26                               ` Robert Trace
@ 2012-08-16 20:24                                 ` Matthias Prager
  2012-08-16 20:33                                   ` Robert Trace
  0 siblings, 1 reply; 35+ messages in thread
From: Matthias Prager @ 2012-08-16 20:24 UTC (permalink / raw)
  To: Robert Trace
  Cc: James Bottomley, Tejun Heo, linux-scsi, Jens Axboe, Eric Moore,
	Alan, Darrick J. Wong, linux-ide, Matthias Prager

Am 16.08.2012 20:26, schrieb Robert Trace:
> On 07/25/2012 03:55 PM, James Bottomley wrote:
>>
>> Well, reading it, so do I.  Unfortunately, we get to deal with the world
>> as it is rather than as we would wish it to be.  We likely have this
>> problem with a lot of USB SATLs as well ...
> 
> Has this patch made it into the main git trees yet?

Not yet, but it is in James scsi misc tree and last I heard was
scheduled for inclusion in the 3.6 kernel.

Anyways here is his commit:
<http://git.kernel.org/?p=linux/kernel/git/jejb/scsi.git;a=commit;h=98dc81b0d6c483a3eb256764ae10f156ccefdbbb>

> 
> I haven't seen anything about it in nearly a month, but I've been using
> the James' patch since he posted it and the sleep/wakeup behavior seems
> improved/correct.

I have been running smoothly with the patch too - problem solved I'd say :-)

> 
> -- Robert
> 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
  2012-08-16 20:24                                 ` Matthias Prager
@ 2012-08-16 20:33                                   ` Robert Trace
  0 siblings, 0 replies; 35+ messages in thread
From: Robert Trace @ 2012-08-16 20:33 UTC (permalink / raw)
  To: Matthias Prager
  Cc: James Bottomley, Tejun Heo, linux-scsi, Jens Axboe, Eric Moore,
	Alan, Darrick J. Wong, linux-ide

On 08/16/2012 04:24 PM, Matthias Prager wrote:
> 
> Not yet, but it is in James scsi misc tree and last I heard was
> scheduled for inclusion in the 3.6 kernel.

Close enough. :-)  I didn't track the changes on the SCSI tree and I
just wanted to make sure that it didn't slip through the cracks.

Thanks to all involved for all of the help and a speedy fix!

-- Robert

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: 'Device not ready' issue on mpt2sas since 3.1.10
@ 2015-11-27 10:28 Felix Matouschek
  0 siblings, 0 replies; 35+ messages in thread
From: Felix Matouschek @ 2015-11-27 10:28 UTC (permalink / raw)
  To: linux-scsi

Hello,

I've encountered a similiar error like Matthias Prager did in his first 
mail in this thread in 2012.

I use Debian 8 Kernel 3.16 and also own a LSI 2008 card flashed to IT 
mode (firmware P20) and have problems with disks that were spun down.
Writing to them when they are spun down usually ends in the following 
errors:

[59526.359997] sd 0:0:1:0: [sdc] FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_OK
[59526.360003] sd 0:0:1:0: [sdc] CDB:
[59526.360006] Read(16): 88 00 00 00 00 00 31 28 fd 58 00 00 00 08 00 00
[59526.360022] blk_update_request: I/O error, dev sdc, sector 824769880
[59544.111090] sd 0:0:0:0: [sdb] FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_OK
[59544.111097] sd 0:0:0:0: [sdb] CDB:
[59544.111100] Read(16): 88 00 00 00 00 00 31 28 fd 50 00 00 00 08 00 00
[59544.111115] blk_update_request: I/O error, dev sdb, sector 824769872
[59544.114465] sd 0:0:4:0: [sdf] FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_OK
[59544.114468] sd 0:0:4:0: [sdf] CDB:
[59544.114469] Read(16): 88 00 00 00 00 00 31 28 fd 58 00 00 00 08 00 00
[59544.114483] blk_update_request: I/O error, dev sdf, sector 824769880
[59552.117436] sd 0:0:3:0: [sde] FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_OK
[59552.117443] sd 0:0:3:0: [sde] CDB:
[59552.117446] Read(16): 88 00 00 00 00 00 31 28 fd b0 00 00 00 08 00 00
[59552.117462] blk_update_request: I/O error, dev sde, sector 824769968
[59572.951158] sd 0:0:2:0: [sdd] FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_OK
[59572.951167] sd 0:0:2:0: [sdd] CDB:
[59572.951170] Read(16): 88 00 00 00 00 00 31 28 fd b0 00 00 00 08 00 00
[59572.951192] blk_update_request: I/O error, dev sdd, sector 824769968
[59572.955679] sd 0:0:5:0: [sdg] FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_OK
[59572.955695] sd 0:0:5:0: [sdg] CDB:
[59572.955701] Read(16): 88 00 00 00 00 00 31 28 fd b0 00 00 00 08 00 00
[59572.955720] blk_update_request: I/O error, dev sdg, sector 824769968
[70357.782677] sd 0:0:4:0: [sdf] FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_OK
[70357.782686] sd 0:0:4:0: [sdf] CDB:
[70357.782690] Read(16): 88 00 00 00 00 00 85 c1 c9 08 00 00 00 08 00 00
[70357.782712] blk_update_request: I/O error, dev sdf, sector 2244069640
[70368.087947] sd 0:0:0:0: [sdb] FAILED Result: hostbyte=DID_OK 
driverbyte=DRIVER_OK
[70368.087953] sd 0:0:0:0: [sdb] CDB:
[70368.087955] Read(16): 88 00 00 00 00 00 85 c1 c9 00 00 00 00 08 00 00
[70368.087969] blk_update_request: I/O error, dev sdb, sector 2244069632

Notice the lack of the "Device not ready" message, otherwise these 
errors look very similiars to Matthias' errors.

I have no clue what to do to fix this problem. Any suggestions?

Greetings,
Felix



^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2015-11-27 10:58 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-22 11:19 'Device not ready' issue on mpt2sas since 3.1.10 Matthias Prager
2012-07-09 14:40 ` Matthias Prager
2012-07-09 19:37   ` Robert Trace
2012-07-09 20:45     ` Darrick J. Wong
2012-07-09 22:24       ` Robert Trace
2012-07-10  0:21         ` Matthias Prager
2012-07-10  1:56           ` Robert Trace
2012-07-10 16:54         ` Darrick J. Wong
2012-07-10  0:12     ` Matthias Prager
2012-07-10  1:51       ` Robert Trace
2012-07-10 23:27         ` Robert Trace
2012-07-11 12:19           ` Matthias Prager
2012-07-11 13:48             ` Matthias Prager
2012-07-17 18:09               ` Tejun Heo
2012-07-17 19:39                 ` Matthias Prager
2012-07-17 20:01                   ` Tejun Heo
2012-07-21 12:15                     ` Matthias Prager
2012-07-22 17:31                       ` Tejun Heo
2012-07-22 23:14                         ` Matthias Prager
2012-07-23 15:26                           ` Tejun Heo
2012-07-24 22:04                             ` Matthias Prager
2012-07-25 10:26                               ` Reddy, Sreekanth
2012-07-25 14:19                         ` James Bottomley
2012-07-25 17:17                           ` Tejun Heo
2012-07-25 19:55                             ` James Bottomley
2012-07-25 23:56                               ` Matthias Prager
2012-07-26 19:16                                 ` Robert Trace
2012-08-16 18:26                               ` Robert Trace
2012-08-16 20:24                                 ` Matthias Prager
2012-08-16 20:33                                   ` Robert Trace
2012-07-25 22:35                         ` tomm
2012-07-26 19:20                           ` Robert Trace
2012-07-09 22:08   ` NeilBrown
2012-07-10  0:03     ` Matthias Prager
2015-11-27 10:28 Felix Matouschek

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.