* [PATCH] scsi_dh_alua: Requeue another not ready check condition at ML
@ 2014-02-28 0:14 Stewart, Sean
2014-02-28 1:58 ` Mike Christie
0 siblings, 1 reply; 4+ messages in thread
From: Stewart, Sean @ 2014-02-28 0:14 UTC (permalink / raw)
To: linux-scsi, dm-devel
This allows the sd driver to retry commands like read capacity until a
LUN is ready, rather than giving up after three retries.
In NetApp E-Series, a controller can return not ready like this when it
quiesces I/O on the controller that just came on the network, during a
firmware upgrade procedure, and retrying the command at the midlayer
will allow the discovery to complete, successfully.
Signed-off-by: Sean Stewart <sean.stewart@netapp.com>
---
drivers/scsi/device_handler/scsi_dh_alua.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index 5248c88..95d87fe 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -454,6 +454,11 @@ static int alua_check_sense(struct scsi_device *sdev,
{
switch (sense_hdr->sense_key) {
case NOT_READY:
+ if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x01)
+ /*
+ * LUN Not Ready -- In process of becoming ready
+ */
+ return ADD_TO_MLQUEUE;
if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0a)
/*
* LUN Not Accessible - ALUA state transition
--
1.8.3.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH] scsi_dh_alua: Requeue another not ready check condition at ML
2014-02-28 0:14 [PATCH] scsi_dh_alua: Requeue another not ready check condition at ML Stewart, Sean
@ 2014-02-28 1:58 ` Mike Christie
2014-02-28 15:14 ` Hannes Reinecke
0 siblings, 1 reply; 4+ messages in thread
From: Mike Christie @ 2014-02-28 1:58 UTC (permalink / raw)
To: Stewart, Sean; +Cc: linux-scsi, dm-devel, James.Bottomley, hare
On 02/27/2014 06:14 PM, Stewart, Sean wrote:
> This allows the sd driver to retry commands like read capacity until a
> LUN is ready, rather than giving up after three retries.
>
> In NetApp E-Series, a controller can return not ready like this when it
> quiesces I/O on the controller that just came on the network, during a
> firmware upgrade procedure, and retrying the command at the midlayer
> will allow the discovery to complete, successfully.
>
> Signed-off-by: Sean Stewart <sean.stewart@netapp.com>
> ---
> drivers/scsi/device_handler/scsi_dh_alua.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
> index 5248c88..95d87fe 100644
> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
> @@ -454,6 +454,11 @@ static int alua_check_sense(struct scsi_device *sdev,
> {
> switch (sense_hdr->sense_key) {
> case NOT_READY:
> + if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x01)
> + /*
> + * LUN Not Ready -- In process of becoming ready
> + */
> + return ADD_TO_MLQUEUE;
It seems like the check_sense callout is being used to work around
scsi-ml in a lot of the additions that are not alua specific. If this is
meant for a specific target then it should not be here. If this is
non-alua specific behavior then it should also not be here either.
If the IO was not a REQ_TYPE_BLOCK_PC request, then it would retried by
scsi_io_completion. Same with the other ones like inquiry data changed,
report luns data changed, etc.
Are we sure we don't want to fix the REQ_TYPE_BLOCK_PC/scsi_execute*
users to retry, or to add some new flag that those users can use that
tells scsi-ml to retry like it normally would so callers do not have to
check for all these errors, or just add these to scsi_decide_disposition?
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] scsi_dh_alua: Requeue another not ready check condition at ML
2014-02-28 1:58 ` Mike Christie
@ 2014-02-28 15:14 ` Hannes Reinecke
2014-02-28 20:20 ` Stewart, Sean
0 siblings, 1 reply; 4+ messages in thread
From: Hannes Reinecke @ 2014-02-28 15:14 UTC (permalink / raw)
To: Mike Christie, Stewart, Sean; +Cc: linux-scsi, dm-devel, James.Bottomley
On 02/28/2014 02:58 AM, Mike Christie wrote:
> On 02/27/2014 06:14 PM, Stewart, Sean wrote:
>> This allows the sd driver to retry commands like read capacity until a
>> LUN is ready, rather than giving up after three retries.
>>
>> In NetApp E-Series, a controller can return not ready like this when it
>> quiesces I/O on the controller that just came on the network, during a
>> firmware upgrade procedure, and retrying the command at the midlayer
>> will allow the discovery to complete, successfully.
>>
>> Signed-off-by: Sean Stewart <sean.stewart@netapp.com>
>> ---
>> drivers/scsi/device_handler/scsi_dh_alua.c | 5 +++++
>> 1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
>> index 5248c88..95d87fe 100644
>> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
>> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
>> @@ -454,6 +454,11 @@ static int alua_check_sense(struct scsi_device *sdev,
>> {
>> switch (sense_hdr->sense_key) {
>> case NOT_READY:
>> + if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x01)
>> + /*
>> + * LUN Not Ready -- In process of becoming ready
>> + */
>> + return ADD_TO_MLQUEUE;
>
> It seems like the check_sense callout is being used to work around
> scsi-ml in a lot of the additions that are not alua specific. If this is
> meant for a specific target then it should not be here. If this is
> non-alua specific behavior then it should also not be here either.
>
> If the IO was not a REQ_TYPE_BLOCK_PC request, then it would retried by
> scsi_io_completion. Same with the other ones like inquiry data changed,
> report luns data changed, etc.
>
> Are we sure we don't want to fix the REQ_TYPE_BLOCK_PC/scsi_execute*
> users to retry, or to add some new flag that those users can use that
> tells scsi-ml to retry like it normally would so callers do not have to
> check for all these errors, or just add these to scsi_decide_disposition?
>
Yes, that's definitely a better idea. I've stumbled across this
issue several times now.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] scsi_dh_alua: Requeue another not ready check condition at ML
2014-02-28 15:14 ` Hannes Reinecke
@ 2014-02-28 20:20 ` Stewart, Sean
0 siblings, 0 replies; 4+ messages in thread
From: Stewart, Sean @ 2014-02-28 20:20 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Mike Christie, Stewart, Sean, linux-scsi, dm-devel, James.Bottomley
On Fri, 2014-02-28 at 16:14 +0100, Hannes Reinecke wrote:
> On 02/28/2014 02:58 AM, Mike Christie wrote:
> > On 02/27/2014 06:14 PM, Stewart, Sean wrote:
> >> This allows the sd driver to retry commands like read capacity until a
> >> LUN is ready, rather than giving up after three retries.
> >>
> >> In NetApp E-Series, a controller can return not ready like this when it
> >> quiesces I/O on the controller that just came on the network, during a
> >> firmware upgrade procedure, and retrying the command at the midlayer
> >> will allow the discovery to complete, successfully.
> >>
> >> Signed-off-by: Sean Stewart <sean.stewart@netapp.com>
> >> ---
> >> drivers/scsi/device_handler/scsi_dh_alua.c | 5 +++++
> >> 1 file changed, 5 insertions(+)
> >>
> >> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
> >> index 5248c88..95d87fe 100644
> >> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
> >> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
> >> @@ -454,6 +454,11 @@ static int alua_check_sense(struct scsi_device *sdev,
> >> {
> >> switch (sense_hdr->sense_key) {
> >> case NOT_READY:
> >> + if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x01)
> >> + /*
> >> + * LUN Not Ready -- In process of becoming ready
> >> + */
> >> + return ADD_TO_MLQUEUE;
> >
> > It seems like the check_sense callout is being used to work around
> > scsi-ml in a lot of the additions that are not alua specific. If this is
> > meant for a specific target then it should not be here. If this is
> > non-alua specific behavior then it should also not be here either.
This sounds reasonable to me. Originally, our target would return a
vendor-specific check condition, and I knew we wouldn't be able to get
the alua handler to retry that. I also saw if we could get this
condition to return 02/04/0A so we'd be covered, but it wouldn't
accurately describe what's going on, so we set the target to return
02/04/01.
In any case, without having the device handler do ADD_TO_MLQUEUE, I see
the command come back with the check condition, return SUCCESS, then the
read_capacity_10 function burns through it's three retries:
int retries = 3, reset_retries = READ_CAPACITY_RETRIES_ON_RESET;
I captured this with scsi midlayer debugging to show what's going on.
Feb 28 13:51:44 wica-fo-stone kernel: sd 2:0:2:0: Send:
0xffff880420259cc0
Feb 28 13:51:44 wica-fo-stone kernel: sd 2:0:2:0: CDB: Read
Capacity(10): 25 00 00 00 00 00 00 00 00 00
Feb 28 13:51:45 wica-fo-stone kernel: sd 2:0:2:0: Done:
0xffff880420259cc0 SUCCESS
The same scsi_cmnd comes back with SUCCESS twice more, then:
Feb 28 13:51:46 wica-fo-stone kernel: sd 2:0:2:0: [sdd] READ CAPACITY
failed
> >
> > If the IO was not a REQ_TYPE_BLOCK_PC request, then it would retried by
> > scsi_io_completion. Same with the other ones like inquiry data changed,
> > report luns data changed, etc.
> >
> > Are we sure we don't want to fix the REQ_TYPE_BLOCK_PC/scsi_execute*
> > users to retry, or to add some new flag that those users can use that
> > tells scsi-ml to retry like it normally would so callers do not have to
> > check for all these errors, or just add these to scsi_decide_disposition?
> >
> Yes, that's definitely a better idea. I've stumbled across this
> issue several times now.
Same.. This actually seems to have come up a lot. We had basically the
same problem when we have a new VID/PID, but a customer uses an OS
without the VID/PID in the RDAC handler. It can cause a lot of
headaches.
I think it should be possible for us to approach this in such a way that
a transient state on the target won't render the SCSI disk unusable (as
is done here). So, by a flag, do you mean we could add something to the
request flags field? We could use this to signify a command that should
keep retrying in the way that I'm looking for here (commands related to
initial discovery, like read capacities, are what I'm thinking of).
Thanks,
Sean
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2014-02-28 20:20 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-28 0:14 [PATCH] scsi_dh_alua: Requeue another not ready check condition at ML Stewart, Sean
2014-02-28 1:58 ` Mike Christie
2014-02-28 15:14 ` Hannes Reinecke
2014-02-28 20:20 ` Stewart, Sean
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.