* [PATCH RESEND] sd: disk offlined prematurely from media access timeout
@ 2013-09-24 19:42 David Jeffery
2013-10-23 8:36 ` Martin K. Petersen
0 siblings, 1 reply; 2+ messages in thread
From: David Jeffery @ 2013-09-24 19:42 UTC (permalink / raw)
To: linux-scsi
There is an error with the medium access timeout feature of the sd driver. The
sdkp->medium_access_timed_out value is set to zero in sd_done() in the wrong
place. It is set to zero only if a command returns sense data. If an I/O
command times out, error handling succeeds, and the I/O commands complete, the
value won't be reset if nothing responds with a sense buffer. Then, another
timeout (no matter how far in the future) can increment it again, causing the
device to be errantly set offline.
The resetting of sdkp->medium_access_timed_out should occur before the check for
sense data.
Signed-off-by: David Jeffery <djeffery@redhat.com>
---
It can be reproduced using scsi_debug and using SCSI_DEBUG_OPT_MAC_TIMEOUT to
force some I/O to timeout once. This small script assumes /dev/sdb as
scsi_debug's disk, causes a timeout, completes 2MB of I/O successfully including
the timed out I/O command, then repeats. Without the patch, the device is
offlined on the second loop. All loops will successfully complete I/O
with the patch.
echo "-1" >/sys/bus/pseudo/drivers/scsi_debug/every_nth
for i in `seq 1 4`; do
echo starting loop $i
echo "128" >/sys/bus/pseudo/drivers/scsi_debug/opts
dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=1 &
sleep 5
echo "0" >/sys/bus/pseudo/drivers/scsi_debug/opts
wait
dd if=/dev/sdb of=/dev/null bs=1M iflag=direct count=1
echo ending loop $i
done
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 86fcf2c..2779e6b 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1669,12 +1669,12 @@ static int sd_done(struct scsi_cmnd *SCpnt)
sshdr.ascq));
}
#endif
+ sdkp->medium_access_timed_out = 0;
+
if (driver_byte(result) != DRIVER_SENSE &&
(!sense_valid || sense_deferred))
goto out;
- sdkp->medium_access_timed_out = 0;
-
switch (sshdr.sense_key) {
case HARDWARE_ERROR:
case MEDIUM_ERROR:
^ permalink raw reply related [flat|nested] 2+ messages in thread
* Re: [PATCH RESEND] sd: disk offlined prematurely from media access timeout
2013-09-24 19:42 [PATCH RESEND] sd: disk offlined prematurely from media access timeout David Jeffery
@ 2013-10-23 8:36 ` Martin K. Petersen
0 siblings, 0 replies; 2+ messages in thread
From: Martin K. Petersen @ 2013-10-23 8:36 UTC (permalink / raw)
To: David Jeffery; +Cc: linux-scsi
>>>>> "David" == David Jeffery <djeffery@redhat.com> writes:
David> There is an error with the medium access timeout feature of the
David> sd driver. The sdkp->medium_access_timed_out value is set to zero
David> in sd_done() in the wrong place. It is set to zero only if a
David> command returns sense data. If an I/O command times out, error
David> handling succeeds, and the I/O commands complete, the value won't
David> be reset if nothing responds with a sense buffer. Then, another
David> timeout (no matter how far in the future) can increment it again,
David> causing the device to be errantly set offline.
David> The resetting of sdkp->medium_access_timed_out should occur
David> before the check for sense data.
David> Signed-off-by: David Jeffery <djeffery@redhat.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
--
Martin K. Petersen Oracle Linux Engineering
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2013-10-23 8:36 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-24 19:42 [PATCH RESEND] sd: disk offlined prematurely from media access timeout David Jeffery
2013-10-23 8:36 ` Martin K. Petersen
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.