* [PATCH] sd: retry read_capacity on UNIT_ATTENTION
@ 2010-04-01 13:44 Hannes Reinecke
2010-04-01 14:30 ` James Bottomley
0 siblings, 1 reply; 4+ messages in thread
From: Hannes Reinecke @ 2010-04-01 13:44 UTC (permalink / raw)
To: James Bottomley; +Cc: linux-scsi
Hazard testing uncovered yet another bug in sd. Under heavy
reset activity the retry counter might be exhausted and
the command will be returned with sense UNIT_ATTENTION/0x29/00
(POWER ON, RESET, OR BUS DEVICE RESET OCCURRED). In those
cases we should just increase the retry counter again,
retrying one more to clear up this Unit Attention state.
Signed-off-by: Hannes Reinecke <hare@suse.de>
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 1962bea..7d75a21 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1454,8 +1454,15 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
if (media_not_present(sdkp, &sshdr))
return -ENODEV;
- if (the_result)
+ if (the_result) {
sense_valid = scsi_sense_valid(&sshdr);
+ if (sense_valid &&
+ sshdr.sense_key == UNIT_ATTENTION &&
+ sshdr.asc = 0x29 && sshdr.asq == 0x00)
+ /* Device reset might occur several times,
+ * give it one more chance */
+ retries++;
+ }
retries--;
} while (the_result && retries);
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] sd: retry read_capacity on UNIT_ATTENTION
2010-04-01 13:44 [PATCH] sd: retry read_capacity on UNIT_ATTENTION Hannes Reinecke
@ 2010-04-01 14:30 ` James Bottomley
2010-04-08 7:36 ` Hannes Reinecke
0 siblings, 1 reply; 4+ messages in thread
From: James Bottomley @ 2010-04-01 14:30 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: linux-scsi
On Thu, 2010-04-01 at 15:44 +0200, Hannes Reinecke wrote:
> Hazard testing uncovered yet another bug in sd. Under heavy
> reset activity the retry counter might be exhausted and
> the command will be returned with sense UNIT_ATTENTION/0x29/00
> (POWER ON, RESET, OR BUS DEVICE RESET OCCURRED). In those
> cases we should just increase the retry counter again,
> retrying one more to clear up this Unit Attention state.
>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
>
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index 1962bea..7d75a21 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -1454,8 +1454,15 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
> if (media_not_present(sdkp, &sshdr))
> return -ENODEV;
>
> - if (the_result)
> + if (the_result) {
> sense_valid = scsi_sense_valid(&sshdr);
> + if (sense_valid &&
> + sshdr.sense_key == UNIT_ATTENTION &&
> + sshdr.asc = 0x29 && sshdr.asq == 0x00)
^^^^
should be ==
> + /* Device reset might occur several times,
> + * give it one more chance */
> + retries++;
> + }
Firstly, not even compile checked:
drivers/scsi/sd.c: In function ‘read_capacity_10’:
drivers/scsi/sd.c:1558: error: ‘struct scsi_sense_hdr’ has no member named ‘asq’
Secondly, we can't quite do this. Some devices (only broken ones in my
experience) will reply UNIT_ATTENTION I was RESET forever, leading to a
loop here. Additionally, a massive reset storm on a shared bus would
DoS the code here, so there must be a give up point after a reasonable
number of retries.
The third problem is that if this is happening to a large device, we
only catch it in RC10 ... so we'll report undersize if the device is >
SPC2
How about this instead?
James
---
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 7b75c8a..cdb8ed6 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1432,6 +1432,8 @@ static void read_capacity_error(struct scsi_disk *sdkp, struct scsi_device *sdp,
#error RC16_LEN must not be more than SD_BUF_SIZE
#endif
+#define READ_CAPACITY_RETRIES_ON_RESET 10
+
static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
unsigned char *buffer)
{
@@ -1439,7 +1441,7 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
struct scsi_sense_hdr sshdr;
int sense_valid = 0;
int the_result;
- int retries = 3;
+ int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
unsigned int alignment;
unsigned long long lba;
unsigned sector_size;
@@ -1468,6 +1470,13 @@ static int read_capacity_16(struct scsi_disk *sdkp, struct scsi_device *sdp,
* Invalid Field in CDB, just retry
* silently with RC10 */
return -EINVAL;
+ if (sense_valid &&
+ sshdr.sense_key == UNIT_ATTENTION &&
+ sshdr.asc == 0x29 && sshdr.ascq == 0x00)
+ /* Device reset might occur several times,
+ * give it one more chance */
+ if (--reset_retries > 0)
+ continue;
}
retries--;
@@ -1526,7 +1535,7 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
struct scsi_sense_hdr sshdr;
int sense_valid = 0;
int the_result;
- int retries = 3;
+ int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
sector_t lba;
unsigned sector_size;
@@ -1542,8 +1551,16 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
if (media_not_present(sdkp, &sshdr))
return -ENODEV;
- if (the_result)
+ if (the_result) {
sense_valid = scsi_sense_valid(&sshdr);
+ if (sense_valid &&
+ sshdr.sense_key == UNIT_ATTENTION &&
+ sshdr.asc == 0x29 && sshdr.ascq == 0x00)
+ /* Device reset might occur several times,
+ * give it one more chance */
+ if (--reset_retries > 0)
+ continue;
+ }
retries--;
} while (the_result && retries);
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] sd: retry read_capacity on UNIT_ATTENTION
2010-04-01 14:30 ` James Bottomley
@ 2010-04-08 7:36 ` Hannes Reinecke
2010-04-08 13:48 ` James Bottomley
0 siblings, 1 reply; 4+ messages in thread
From: Hannes Reinecke @ 2010-04-08 7:36 UTC (permalink / raw)
To: James Bottomley; +Cc: linux-scsi
James Bottomley wrote:
> On Thu, 2010-04-01 at 15:44 +0200, Hannes Reinecke wrote:
>> Hazard testing uncovered yet another bug in sd. Under heavy
>> reset activity the retry counter might be exhausted and
>> the command will be returned with sense UNIT_ATTENTION/0x29/00
>> (POWER ON, RESET, OR BUS DEVICE RESET OCCURRED). In those
>> cases we should just increase the retry counter again,
>> retrying one more to clear up this Unit Attention state.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index 1962bea..7d75a21 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -1454,8 +1454,15 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
>> if (media_not_present(sdkp, &sshdr))
>> return -ENODEV;
>>
>> - if (the_result)
>> + if (the_result) {
>> sense_valid = scsi_sense_valid(&sshdr);
>> + if (sense_valid &&
>> + sshdr.sense_key == UNIT_ATTENTION &&
>> + sshdr.asc = 0x29 && sshdr.asq == 0x00)
> ^^^^
> should be ==
>
>> + /* Device reset might occur several times,
>> + * give it one more chance */
>> + retries++;
>> + }
>
> Firstly, not even compile checked:
>
> drivers/scsi/sd.c: In function ‘read_capacity_10’:
> drivers/scsi/sd.c:1558: error: ‘struct scsi_sense_hdr’ has no member named ‘asq’
>
D'oh.
> Secondly, we can't quite do this. Some devices (only broken ones in my
> experience) will reply UNIT_ATTENTION I was RESET forever, leading to a
> loop here. Additionally, a massive reset storm on a shared bus would
> DoS the code here, so there must be a give up point after a reasonable
> number of retries.
>
Hmm. yes.
> The third problem is that if this is happening to a large device, we
> only catch it in RC10 ... so we'll report undersize if the device is >
> SPC2
>
Okay. In the best of all worlds we would have a module parameter which
would us to adjust this parameter, as I fear the actual number of retries
will depend on the number of devices connected.
But if you fell that's overkill it's fine by me, too.
> How about this instead?
>
Yes, that's better. Thanks.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] sd: retry read_capacity on UNIT_ATTENTION
2010-04-08 7:36 ` Hannes Reinecke
@ 2010-04-08 13:48 ` James Bottomley
0 siblings, 0 replies; 4+ messages in thread
From: James Bottomley @ 2010-04-08 13:48 UTC (permalink / raw)
To: Hannes Reinecke; +Cc: linux-scsi
On Thu, 2010-04-08 at 09:36 +0200, Hannes Reinecke wrote:
> James Bottomley wrote:
> > On Thu, 2010-04-01 at 15:44 +0200, Hannes Reinecke wrote:
> >> Hazard testing uncovered yet another bug in sd. Under heavy
> >> reset activity the retry counter might be exhausted and
> >> the command will be returned with sense UNIT_ATTENTION/0x29/00
> >> (POWER ON, RESET, OR BUS DEVICE RESET OCCURRED). In those
> >> cases we should just increase the retry counter again,
> >> retrying one more to clear up this Unit Attention state.
> >>
> >> Signed-off-by: Hannes Reinecke <hare@suse.de>
> >>
> >> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> >> index 1962bea..7d75a21 100644
> >> --- a/drivers/scsi/sd.c
> >> +++ b/drivers/scsi/sd.c
> >> @@ -1454,8 +1454,15 @@ static int read_capacity_10(struct scsi_disk *sdkp, struct scsi_device *sdp,
> >> if (media_not_present(sdkp, &sshdr))
> >> return -ENODEV;
> >>
> >> - if (the_result)
> >> + if (the_result) {
> >> sense_valid = scsi_sense_valid(&sshdr);
> >> + if (sense_valid &&
> >> + sshdr.sense_key == UNIT_ATTENTION &&
> >> + sshdr.asc = 0x29 && sshdr.asq == 0x00)
> > ^^^^
> > should be ==
> >
> >> + /* Device reset might occur several times,
> >> + * give it one more chance */
> >> + retries++;
> >> + }
> >
> > Firstly, not even compile checked:
> >
> > drivers/scsi/sd.c: In function ‘read_capacity_10’:
> > drivers/scsi/sd.c:1558: error: ‘struct scsi_sense_hdr’ has no member named ‘asq’
> >
> D'oh.
>
> > Secondly, we can't quite do this. Some devices (only broken ones in my
> > experience) will reply UNIT_ATTENTION I was RESET forever, leading to a
> > loop here. Additionally, a massive reset storm on a shared bus would
> > DoS the code here, so there must be a give up point after a reasonable
> > number of retries.
> >
> Hmm. yes.
>
> > The third problem is that if this is happening to a large device, we
> > only catch it in RC10 ... so we'll report undersize if the device is >
> > SPC2
> >
> Okay. In the best of all worlds we would have a module parameter which
> would us to adjust this parameter, as I fear the actual number of retries
> will depend on the number of devices connected.
>
> But if you fell that's overkill it's fine by me, too.
A module parameter probably is overkill. Once we get beyond 5, that's
the usual retry limit for ordinary commands, so even if we survive
beyond READ_CAPACITY, we'll begin failing in the actual operational
commands like reads and writes (beginning with partition size).
> > How about this instead?
> >
> Yes, that's better. Thanks.
OK, will put it in with your ack then.
James
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-04-08 13:48 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-01 13:44 [PATCH] sd: retry read_capacity on UNIT_ATTENTION Hannes Reinecke
2010-04-01 14:30 ` James Bottomley
2010-04-08 7:36 ` Hannes Reinecke
2010-04-08 13:48 ` James Bottomley
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.