All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] I/O errors for ALUA state transitions
@ 2021-09-07  7:16 Hannes Reinecke
  2021-09-07 15:59 ` michael.christie
  0 siblings, 1 reply; 4+ messages in thread
From: Hannes Reinecke @ 2021-09-07  7:16 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, linux-scsi, Rajashekhar M A,
	Hannes Reinecke

From: Rajashekhar M A <rajs@netapp.com>

When a host is configured with a few LUNs and IO is running,
injecting FC faults repeatedly leads to path recovery problems.
The LUNs have 4 paths each and 3 of them come back active after
say an FC fault which makes two of the paths go down, instead of
all 4. This happens after several iterations of continuous FC faults.

Reason here is that we're returning an I/O error whenever we're
encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE,
ASYMMETRIC ACCESS STATE TRANSITION) instead of retrying.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/scsi_error.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 03a2ff547b22..1185083105ae 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -594,10 +594,11 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
 		    sshdr.asc == 0x3f && sshdr.ascq == 0x0e)
 			return NEEDS_RETRY;
 		/*
-		 * if the device is in the process of becoming ready, we
-		 * should retry.
+		 * if the device is in the process of becoming ready, or
+		 * transitions between ALUA states, we should retry.
 		 */
-		if ((sshdr.asc == 0x04) && (sshdr.ascq == 0x01))
+		if ((sshdr.asc == 0x04) &&
+		    (sshdr.ascq == 0x01 || sshdr.ascq == 0x0a))
 			return NEEDS_RETRY;
 		/*
 		 * if the device is not started, we need to wake
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] I/O errors for ALUA state transitions
  2021-09-07  7:16 [PATCH] I/O errors for ALUA state transitions Hannes Reinecke
@ 2021-09-07 15:59 ` michael.christie
  2021-09-08  6:12   ` Hannes Reinecke
  0 siblings, 1 reply; 4+ messages in thread
From: michael.christie @ 2021-09-07 15:59 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, linux-scsi, Rajashekhar M A

On 9/7/21 2:16 AM, Hannes Reinecke wrote:
> From: Rajashekhar M A <rajs@netapp.com>
> 
> When a host is configured with a few LUNs and IO is running,
> injecting FC faults repeatedly leads to path recovery problems.
> The LUNs have 4 paths each and 3 of them come back active after
> say an FC fault which makes two of the paths go down, instead of
> all 4. This happens after several iterations of continuous FC faults.
> 
> Reason here is that we're returning an I/O error whenever we're
> encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE,
> ASYMMETRIC ACCESS STATE TRANSITION) instead of retrying.
> > Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/scsi/scsi_error.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 03a2ff547b22..1185083105ae 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -594,10 +594,11 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
>  		    sshdr.asc == 0x3f && sshdr.ascq == 0x0e)
>  			return NEEDS_RETRY;
>  		/*
> -		 * if the device is in the process of becoming ready, we
> -		 * should retry.
> +		 * if the device is in the process of becoming ready, or
> +		 * transitions between ALUA states, we should retry.
>  		 */
> -		if ((sshdr.asc == 0x04) && (sshdr.ascq == 0x01))
> +		if ((sshdr.asc == 0x04) &&
> +		    (sshdr.ascq == 0x01 || sshdr.ascq == 0x0a))
>  			return NEEDS_RETRY;

Why put this here instead of in alua_check_sense with the other ALUA
state transition check?


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] I/O errors for ALUA state transitions
  2021-09-07 15:59 ` michael.christie
@ 2021-09-08  6:12   ` Hannes Reinecke
  2021-09-08 21:30     ` Ewan Milne
  0 siblings, 1 reply; 4+ messages in thread
From: Hannes Reinecke @ 2021-09-08  6:12 UTC (permalink / raw)
  To: michael.christie, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, linux-scsi, Rajashekhar M A

On 9/7/21 5:59 PM, michael.christie@oracle.com wrote:
> On 9/7/21 2:16 AM, Hannes Reinecke wrote:
>> From: Rajashekhar M A <rajs@netapp.com>
>>
>> When a host is configured with a few LUNs and IO is running,
>> injecting FC faults repeatedly leads to path recovery problems.
>> The LUNs have 4 paths each and 3 of them come back active after
>> say an FC fault which makes two of the paths go down, instead of
>> all 4. This happens after several iterations of continuous FC faults.
>>
>> Reason here is that we're returning an I/O error whenever we're
>> encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE,
>> ASYMMETRIC ACCESS STATE TRANSITION) instead of retrying.
>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   drivers/scsi/scsi_error.c | 7 ++++---
>>   1 file changed, 4 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>> index 03a2ff547b22..1185083105ae 100644
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -594,10 +594,11 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
>>   		    sshdr.asc == 0x3f && sshdr.ascq == 0x0e)
>>   			return NEEDS_RETRY;
>>   		/*
>> -		 * if the device is in the process of becoming ready, we
>> -		 * should retry.
>> +		 * if the device is in the process of becoming ready, or
>> +		 * transitions between ALUA states, we should retry.
>>   		 */
>> -		if ((sshdr.asc == 0x04) && (sshdr.ascq == 0x01))
>> +		if ((sshdr.asc == 0x04) &&
>> +		    (sshdr.ascq == 0x01 || sshdr.ascq == 0x0a))
>>   			return NEEDS_RETRY;
> 
> Why put this here instead of in alua_check_sense with the other ALUA
> state transition check?
> 
Good point. Will be updating the patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] I/O errors for ALUA state transitions
  2021-09-08  6:12   ` Hannes Reinecke
@ 2021-09-08 21:30     ` Ewan Milne
  0 siblings, 0 replies; 4+ messages in thread
From: Ewan Milne @ 2021-09-08 21:30 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: michael.christie, Martin K. Petersen, Christoph Hellwig,
	James Bottomley, linux-scsi, Rajashekhar M A

alua_check_sense() checks for NOT READY / 04h 0Ah but not
UNIT ATTENTION with the same ASC/ASCQ.

Careful about updating pg->state in this case though because that
may cause subsequent I/O in the probe to fail, we didn't do this before
and we *may* only get the UA once.

-Ewan

On Wed, Sep 8, 2021 at 2:12 AM Hannes Reinecke <hare@suse.de> wrote:
>
> On 9/7/21 5:59 PM, michael.christie@oracle.com wrote:
> > On 9/7/21 2:16 AM, Hannes Reinecke wrote:
> >> From: Rajashekhar M A <rajs@netapp.com>
> >>
> >> When a host is configured with a few LUNs and IO is running,
> >> injecting FC faults repeatedly leads to path recovery problems.
> >> The LUNs have 4 paths each and 3 of them come back active after
> >> say an FC fault which makes two of the paths go down, instead of
> >> all 4. This happens after several iterations of continuous FC faults.
> >>
> >> Reason here is that we're returning an I/O error whenever we're
> >> encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE,
> >> ASYMMETRIC ACCESS STATE TRANSITION) instead of retrying.
> >>> Signed-off-by: Hannes Reinecke <hare@suse.de>
> >> ---
> >>   drivers/scsi/scsi_error.c | 7 ++++---
> >>   1 file changed, 4 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> >> index 03a2ff547b22..1185083105ae 100644
> >> --- a/drivers/scsi/scsi_error.c
> >> +++ b/drivers/scsi/scsi_error.c
> >> @@ -594,10 +594,11 @@ enum scsi_disposition scsi_check_sense(struct scsi_cmnd *scmd)
> >>                  sshdr.asc == 0x3f && sshdr.ascq == 0x0e)
> >>                      return NEEDS_RETRY;
> >>              /*
> >> -             * if the device is in the process of becoming ready, we
> >> -             * should retry.
> >> +             * if the device is in the process of becoming ready, or
> >> +             * transitions between ALUA states, we should retry.
> >>               */
> >> -            if ((sshdr.asc == 0x04) && (sshdr.ascq == 0x01))
> >> +            if ((sshdr.asc == 0x04) &&
> >> +                (sshdr.ascq == 0x01 || sshdr.ascq == 0x0a))
> >>                      return NEEDS_RETRY;
> >
> > Why put this here instead of in alua_check_sense with the other ALUA
> > state transition check?
> >
> Good point. Will be updating the patch.
>
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke                Kernel Storage Architect
> hare@suse.de                              +49 911 74053 688
> SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
> HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-09-08 21:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-07  7:16 [PATCH] I/O errors for ALUA state transitions Hannes Reinecke
2021-09-07 15:59 ` michael.christie
2021-09-08  6:12   ` Hannes Reinecke
2021-09-08 21:30     ` Ewan Milne

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.