All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND] sd: disk offlined prematurely from media access timeout
@ 2013-09-24 19:42 David Jeffery
  2013-10-23  8:36 ` Martin K. Petersen
  0 siblings, 1 reply; 2+ messages in thread
From: David Jeffery @ 2013-09-24 19:42 UTC (permalink / raw)
  To: linux-scsi

There is an error with the medium access timeout feature of the sd driver. The
sdkp->medium_access_timed_out value is set to zero in sd_done() in the wrong
place.  It is set to zero only if a command returns sense data.  If an I/O
command times out, error handling succeeds, and the I/O commands complete, the
value won't be reset if nothing responds with a sense buffer.  Then, another
timeout (no matter how far in the future) can increment it again, causing the
device to be errantly set offline.

The resetting of sdkp->medium_access_timed_out should occur before the check for
sense data.

Signed-off-by: David Jeffery <djeffery@redhat.com>

---

It can be reproduced using scsi_debug and using SCSI_DEBUG_OPT_MAC_TIMEOUT to
force some I/O to timeout once. This small script assumes /dev/sdb as
scsi_debug's disk, causes a timeout, completes 2MB of I/O successfully including
the timed out I/O command, then repeats.  Without the patch, the device is
offlined on the second loop.  All loops will successfully complete I/O
with the patch.

echo "-1" >/sys/bus/pseudo/drivers/scsi_debug/every_nth
for i in `seq 1 4`; do
        echo starting loop $i
        echo "128" >/sys/bus/pseudo/drivers/scsi_debug/opts
        dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=1 &
        sleep 5
        echo "0" >/sys/bus/pseudo/drivers/scsi_debug/opts
        wait
        dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=1
        echo ending loop $i
done


diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 86fcf2c..2779e6b 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1669,12 +1669,12 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 						   sshdr.ascq));
 	}
 #endif
+	sdkp->medium_access_timed_out = 0;
+
 	if (driver_byte(result) != DRIVER_SENSE &&
 	    (!sense_valid || sense_deferred))
 		goto out;
 
-	sdkp->medium_access_timed_out = 0;
-
 	switch (sshdr.sense_key) {
 	case HARDWARE_ERROR:
 	case MEDIUM_ERROR:

^ permalink raw reply related	[flat|nested] 2+ messages in thread

* Re: [PATCH RESEND] sd: disk offlined prematurely from media access timeout
  2013-09-24 19:42 [PATCH RESEND] sd: disk offlined prematurely from media access timeout David Jeffery
@ 2013-10-23  8:36 ` Martin K. Petersen
  0 siblings, 0 replies; 2+ messages in thread
From: Martin K. Petersen @ 2013-10-23  8:36 UTC (permalink / raw)
  To: David Jeffery; +Cc: linux-scsi

>>>>> "David" == David Jeffery <djeffery@redhat.com> writes:

David> There is an error with the medium access timeout feature of the
David> sd driver. The sdkp->medium_access_timed_out value is set to zero
David> in sd_done() in the wrong place.  It is set to zero only if a
David> command returns sense data.  If an I/O command times out, error
David> handling succeeds, and the I/O commands complete, the value won't
David> be reset if nothing responds with a sense buffer.  Then, another
David> timeout (no matter how far in the future) can increment it again,
David> causing the device to be errantly set offline.

David> The resetting of sdkp->medium_access_timed_out should occur
David> before the check for sense data.

David> Signed-off-by: David Jeffery <djeffery@redhat.com>

Acked-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-10-23  8:36 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-24 19:42 [PATCH RESEND] sd: disk offlined prematurely from media access timeout David Jeffery
2013-10-23  8:36 ` Martin K. Petersen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.