linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] scsi_dh_alua: handle RTPG sense code correctly during state transitions
@ 2019-10-07 13:57 Hannes Reinecke
  2019-10-07 14:15 ` Laurence Oberman
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Hannes Reinecke @ 2019-10-07 13:57 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Martin Wilck, linux-scsi,
	Hannes Reinecke

From: Hannes Reinecke <hare@suse.com>

Some arrays are not capable of returning RTPG data during state
transitioning, but rather return an 'LUN not accessible, asymmetric
access state transition' sense code. In these cases we
can set the state to 'transitioning' directly and don't need to
evaluate the RTPG data (which we won't have anyway).

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index 4971104b1817..f32da0ca529e 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -512,6 +512,7 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg)
 	unsigned int tpg_desc_tbl_off;
 	unsigned char orig_transition_tmo;
 	unsigned long flags;
+	bool transitioning_sense = false;
 
 	if (!pg->expiry) {
 		unsigned long transition_tmo = ALUA_FAILOVER_TIMEOUT * HZ;
@@ -572,13 +573,19 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg)
 			goto retry;
 		}
 		/*
-		 * Retry on ALUA state transition or if any
-		 * UNIT ATTENTION occurred.
+		 * If the array returns with 'ALUA state transition'
+		 * sense code here it cannot return RTPG data during
+		 * transition. So set the state to 'transitioning' directly.
 		 */
 		if (sense_hdr.sense_key == NOT_READY &&
-		    sense_hdr.asc == 0x04 && sense_hdr.ascq == 0x0a)
-			err = SCSI_DH_RETRY;
-		else if (sense_hdr.sense_key == UNIT_ATTENTION)
+		    sense_hdr.asc == 0x04 && sense_hdr.ascq == 0x0a) {
+			transitioning_sense = true;
+			goto skip_rtpg;
+		}
+		/*
+		 * Retry on any other UNIT ATTENTION occurred.
+		 */
+		if (sense_hdr.sense_key == UNIT_ATTENTION)
 			err = SCSI_DH_RETRY;
 		if (err == SCSI_DH_RETRY &&
 		    pg->expiry != 0 && time_before(jiffies, pg->expiry)) {
@@ -666,7 +673,11 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg)
 		off = 8 + (desc[7] * 4);
 	}
 
+ skip_rtpg:
 	spin_lock_irqsave(&pg->lock, flags);
+	if (transitioning_sense)
+		pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
+
 	sdev_printk(KERN_INFO, sdev,
 		    "%s: port group %02x state %c %s supports %c%c%c%c%c%c%c\n",
 		    ALUA_DH_NAME, pg->group_id, print_alua_state(pg->state),
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] scsi_dh_alua: handle RTPG sense code correctly during state transitions
  2019-10-07 13:57 [PATCH] scsi_dh_alua: handle RTPG sense code correctly during state transitions Hannes Reinecke
@ 2019-10-07 14:15 ` Laurence Oberman
  2019-10-07 20:45 ` Ewan D. Milne
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Laurence Oberman @ 2019-10-07 14:15 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Martin Wilck, linux-scsi,
	Hannes Reinecke

On Mon, 2019-10-07 at 15:57 +0200, Hannes Reinecke wrote:
> From: Hannes Reinecke <hare@suse.com>
> 
> Some arrays are not capable of returning RTPG data during state
> transitioning, but rather return an 'LUN not accessible, asymmetric
> access state transition' sense code. In these cases we
> can set the state to 'transitioning' directly and don't need to
> evaluate the RTPG data (which we won't have anyway).
> 
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> ---
>  drivers/scsi/device_handler/scsi_dh_alua.c | 21 ++++++++++++++++--
> ---
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c
> b/drivers/scsi/device_handler/scsi_dh_alua.c
> index 4971104b1817..f32da0ca529e 100644
> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
> @@ -512,6 +512,7 @@ static int alua_rtpg(struct scsi_device *sdev,
> struct alua_port_group *pg)
>  	unsigned int tpg_desc_tbl_off;
>  	unsigned char orig_transition_tmo;
>  	unsigned long flags;
> +	bool transitioning_sense = false;
>  
>  	if (!pg->expiry) {
>  		unsigned long transition_tmo = ALUA_FAILOVER_TIMEOUT *
> HZ;
> @@ -572,13 +573,19 @@ static int alua_rtpg(struct scsi_device *sdev,
> struct alua_port_group *pg)
>  			goto retry;
>  		}
>  		/*
> -		 * Retry on ALUA state transition or if any
> -		 * UNIT ATTENTION occurred.
> +		 * If the array returns with 'ALUA state transition'
> +		 * sense code here it cannot return RTPG data during
> +		 * transition. So set the state to 'transitioning'
> directly.
>  		 */
>  		if (sense_hdr.sense_key == NOT_READY &&
> -		    sense_hdr.asc == 0x04 && sense_hdr.ascq == 0x0a)
> -			err = SCSI_DH_RETRY;
> -		else if (sense_hdr.sense_key == UNIT_ATTENTION)
> +		    sense_hdr.asc == 0x04 && sense_hdr.ascq == 0x0a) {
> +			transitioning_sense = true;
> +			goto skip_rtpg;
> +		}
> +		/*
> +		 * Retry on any other UNIT ATTENTION occurred.
> +		 */
> +		if (sense_hdr.sense_key == UNIT_ATTENTION)
>  			err = SCSI_DH_RETRY;
>  		if (err == SCSI_DH_RETRY &&
>  		    pg->expiry != 0 && time_before(jiffies, pg-
> >expiry)) {
> @@ -666,7 +673,11 @@ static int alua_rtpg(struct scsi_device *sdev,
> struct alua_port_group *pg)
>  		off = 8 + (desc[7] * 4);
>  	}
>  
> + skip_rtpg:
>  	spin_lock_irqsave(&pg->lock, flags);
> +	if (transitioning_sense)
> +		pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
> +
>  	sdev_printk(KERN_INFO, sdev,
>  		    "%s: port group %02x state %c %s supports
> %c%c%c%c%c%c%c\n",
>  		    ALUA_DH_NAME, pg->group_id, print_alua_state(pg-
> >state),

This makes sense to me and has affected recovery timeouts in the past.
Code looks correct to me. 

Reviewed-by: Laurence Oberman <loberman@redhat.com>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] scsi_dh_alua: handle RTPG sense code correctly during state transitions
  2019-10-07 13:57 [PATCH] scsi_dh_alua: handle RTPG sense code correctly during state transitions Hannes Reinecke
  2019-10-07 14:15 ` Laurence Oberman
@ 2019-10-07 20:45 ` Ewan D. Milne
  2019-10-08  6:21   ` Hannes Reinecke
  2019-10-09 16:31 ` Bart Van Assche
  2019-10-10  2:43 ` Martin K. Petersen
  3 siblings, 1 reply; 7+ messages in thread
From: Ewan D. Milne @ 2019-10-07 20:45 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Martin Wilck, linux-scsi,
	Hannes Reinecke

See below.

On Mon, 2019-10-07 at 15:57 +0200, Hannes Reinecke wrote:
> From: Hannes Reinecke <hare@suse.com>
> 
> Some arrays are not capable of returning RTPG data during state
> transitioning, but rather return an 'LUN not accessible, asymmetric
> access state transition' sense code. In these cases we
> can set the state to 'transitioning' directly and don't need to
> evaluate the RTPG data (which we won't have anyway).
> 
> Signed-off-by: Hannes Reinecke <hare@suse.com>
> ---
>  drivers/scsi/device_handler/scsi_dh_alua.c | 21 ++++++++++++++++-----
>  1 file changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
> index 4971104b1817..f32da0ca529e 100644
> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
> @@ -512,6 +512,7 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg)
>  	unsigned int tpg_desc_tbl_off;
>  	unsigned char orig_transition_tmo;
>  	unsigned long flags;
> +	bool transitioning_sense = false;
>  
>  	if (!pg->expiry) {
>  		unsigned long transition_tmo = ALUA_FAILOVER_TIMEOUT * HZ;
> @@ -572,13 +573,19 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg)
>  			goto retry;
>  		}
>  		/*
> -		 * Retry on ALUA state transition or if any
> -		 * UNIT ATTENTION occurred.
> +		 * If the array returns with 'ALUA state transition'
> +		 * sense code here it cannot return RTPG data during
> +		 * transition. So set the state to 'transitioning' directly.
>  		 */
>  		if (sense_hdr.sense_key == NOT_READY &&
> -		    sense_hdr.asc == 0x04 && sense_hdr.ascq == 0x0a)
> -			err = SCSI_DH_RETRY;
> -		else if (sense_hdr.sense_key == UNIT_ATTENTION)
> +		    sense_hdr.asc == 0x04 && sense_hdr.ascq == 0x0a) {
> +			transitioning_sense = true;
> +			goto skip_rtpg;
> +		}
> +		/*
> +		 * Retry on any other UNIT ATTENTION occurred.
> +		 */
> +		if (sense_hdr.sense_key == UNIT_ATTENTION)
>  			err = SCSI_DH_RETRY;
>  		if (err == SCSI_DH_RETRY &&
>  		    pg->expiry != 0 && time_before(jiffies, pg->expiry)) {
> @@ -666,7 +673,11 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg)
>  		off = 8 + (desc[7] * 4);
>  	}
>  
> + skip_rtpg:
>  	spin_lock_irqsave(&pg->lock, flags);
> +	if (transitioning_sense)
> +		pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
> +
>  	sdev_printk(KERN_INFO, sdev,
>  		    "%s: port group %02x state %c %s supports %c%c%c%c%c%c%c\n",
>  		    ALUA_DH_NAME, pg->group_id, print_alua_state(pg->state),

The patch itself looks OK, but I was wondering about a couple of things:

  - There are other places in scsi_dh_alua where the ASC/ASCQ 04 0A is checked
    and we retry, I understand that this is a particular case you are solving
    but is the changing of the state to -> transitioning (because that's what
    the device said the state was) applicable in those other cases?
  - The code originally seems to have been under the assumption that the
    transitioning state was a transient event, so the retry would pick up
    the eventual state.  Now, some storage arrays spend a long time in the
    transitioning state, but if we don't send another command are we going to
    get the sense (or the UA) that triggers entry to the eventual ALUA state?

-Ewan


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] scsi_dh_alua: handle RTPG sense code correctly during state transitions
  2019-10-07 20:45 ` Ewan D. Milne
@ 2019-10-08  6:21   ` Hannes Reinecke
  2019-10-08 15:58     ` Ewan D. Milne
  0 siblings, 1 reply; 7+ messages in thread
From: Hannes Reinecke @ 2019-10-08  6:21 UTC (permalink / raw)
  To: Ewan D. Milne, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Martin Wilck, linux-scsi,
	Hannes Reinecke

On 10/7/19 10:45 PM, Ewan D. Milne wrote:
> See below.
> 
> On Mon, 2019-10-07 at 15:57 +0200, Hannes Reinecke wrote:
>> From: Hannes Reinecke <hare@suse.com>
>>
>> Some arrays are not capable of returning RTPG data during state
>> transitioning, but rather return an 'LUN not accessible, asymmetric
>> access state transition' sense code. In these cases we
>> can set the state to 'transitioning' directly and don't need to
>> evaluate the RTPG data (which we won't have anyway).
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.com>
>> ---
>>  drivers/scsi/device_handler/scsi_dh_alua.c | 21 ++++++++++++++++-----
>>  1 file changed, 16 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
>> index 4971104b1817..f32da0ca529e 100644
>> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
>> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
>> @@ -512,6 +512,7 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg)
>>  	unsigned int tpg_desc_tbl_off;
>>  	unsigned char orig_transition_tmo;
>>  	unsigned long flags;
>> +	bool transitioning_sense = false;
>>  
>>  	if (!pg->expiry) {
>>  		unsigned long transition_tmo = ALUA_FAILOVER_TIMEOUT * HZ;
>> @@ -572,13 +573,19 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg)
>>  			goto retry;
>>  		}
>>  		/*
>> -		 * Retry on ALUA state transition or if any
>> -		 * UNIT ATTENTION occurred.
>> +		 * If the array returns with 'ALUA state transition'
>> +		 * sense code here it cannot return RTPG data during
>> +		 * transition. So set the state to 'transitioning' directly.
>>  		 */
>>  		if (sense_hdr.sense_key == NOT_READY &&
>> -		    sense_hdr.asc == 0x04 && sense_hdr.ascq == 0x0a)
>> -			err = SCSI_DH_RETRY;
>> -		else if (sense_hdr.sense_key == UNIT_ATTENTION)
>> +		    sense_hdr.asc == 0x04 && sense_hdr.ascq == 0x0a) {
>> +			transitioning_sense = true;
>> +			goto skip_rtpg;
>> +		}
>> +		/*
>> +		 * Retry on any other UNIT ATTENTION occurred.
>> +		 */
>> +		if (sense_hdr.sense_key == UNIT_ATTENTION)
>>  			err = SCSI_DH_RETRY;
>>  		if (err == SCSI_DH_RETRY &&
>>  		    pg->expiry != 0 && time_before(jiffies, pg->expiry)) {
>> @@ -666,7 +673,11 @@ static int alua_rtpg(struct scsi_device *sdev, struct alua_port_group *pg)
>>  		off = 8 + (desc[7] * 4);
>>  	}
>>  
>> + skip_rtpg:
>>  	spin_lock_irqsave(&pg->lock, flags);
>> +	if (transitioning_sense)
>> +		pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
>> +
>>  	sdev_printk(KERN_INFO, sdev,
>>  		    "%s: port group %02x state %c %s supports %c%c%c%c%c%c%c\n",
>>  		    ALUA_DH_NAME, pg->group_id, print_alua_state(pg->state),
> 
> The patch itself looks OK, but I was wondering about a couple of things:
> 
>   - There are other places in scsi_dh_alua where the ASC/ASCQ 04 0A is checked
>     and we retry, I understand that this is a particular case you are solving
>     but is the changing of the state to -> transitioning (because that's what
>     the device said the state was) applicable in those other cases?
No. The original code was built around the assumption that RTPG would
return the status of the device; consequently we would have to retry
RTPG until we get a final status. But as mentioned, there are arrays
which cannot return RTPG data during transitioning, so the code would
never be able to detect a transitioning state.
With this patch we set the state directly once the said sense code is
received.
But this applies _only_ to the RTPG command, as this is required to move
the state machine along.
None of the other commands are affected.

>   - The code originally seems to have been under the assumption that the
>     transitioning state was a transient event, so the retry would pick up
>     the eventual state.  Now, some storage arrays spend a long time in the
>     transitioning state, but if we don't send another command are we going to
>     get the sense (or the UA) that triggers entry to the eventual ALUA state?
> 
Note, there are two types of retries.
The one is the 'normal' command retry, where we resend a command a given
number of times to retrieve the final status.
This is precisely the error which caused this patch.

And then there is a scheduled retry; here we essentially poll the array
with sending RTPG in regular intervals until the 'transitioning' state
is gone. (Check for 'alua_rtpg()' and the handling of the SCSI_DH_RETRY
return value). With the patch we continue to trigger that second type of
retries, which will eventually clear the transitioning state.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      Teamlead Storage & Networking
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 247165 (AG München), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] scsi_dh_alua: handle RTPG sense code correctly during state transitions
  2019-10-08  6:21   ` Hannes Reinecke
@ 2019-10-08 15:58     ` Ewan D. Milne
  0 siblings, 0 replies; 7+ messages in thread
From: Ewan D. Milne @ 2019-10-08 15:58 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Martin Wilck, linux-scsi,
	Hannes Reinecke

On Tue, 2019-10-08 at 08:21 +0200, Hannes Reinecke wrote:
> On 10/7/19 10:45 PM, Ewan D. Milne wrote:
> > 
> > The patch itself looks OK, but I was wondering about a couple of things:
> > 
> >   - There are other places in scsi_dh_alua where the ASC/ASCQ 04 0A is checked
> >     and we retry, I understand that this is a particular case you are solving
> >     but is the changing of the state to -> transitioning (because that's what
> >     the device said the state was) applicable in those other cases?
> 
> No. The original code was built around the assumption that RTPG would
> return the status of the device; consequently we would have to retry
> RTPG until we get a final status. But as mentioned, there are arrays
> which cannot return RTPG data during transitioning, so the code would
> never be able to detect a transitioning state.
> With this patch we set the state directly once the said sense code is
> received.
> But this applies _only_ to the RTPG command, as this is required to move
> the state machine along.
> None of the other commands are affected.
> 
> >   - The code originally seems to have been under the assumption that the
> >     transitioning state was a transient event, so the retry would pick up
> >     the eventual state.  Now, some storage arrays spend a long time in the
> >     transitioning state, but if we don't send another command are we going to
> >     get the sense (or the UA) that triggers entry to the eventual ALUA state?
> > 
> 
> Note, there are two types of retries.
> The one is the 'normal' command retry, where we resend a command a given
> number of times to retrieve the final status.
> This is precisely the error which caused this patch.
> 
> And then there is a scheduled retry; here we essentially poll the array
> with sending RTPG in regular intervals until the 'transitioning' state
> is gone. (Check for 'alua_rtpg()' and the handling of the SCSI_DH_RETRY
> return value). With the patch we continue to trigger that second type of
> retries, which will eventually clear the transitioning state.
> 
> Cheers,
> 
> Hannes

Thanks for the explanation.  The patch looks good.

Reviewed-by: Ewan D. Milne <emilne@redhat.com>


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] scsi_dh_alua: handle RTPG sense code correctly during state transitions
  2019-10-07 13:57 [PATCH] scsi_dh_alua: handle RTPG sense code correctly during state transitions Hannes Reinecke
  2019-10-07 14:15 ` Laurence Oberman
  2019-10-07 20:45 ` Ewan D. Milne
@ 2019-10-09 16:31 ` Bart Van Assche
  2019-10-10  2:43 ` Martin K. Petersen
  3 siblings, 0 replies; 7+ messages in thread
From: Bart Van Assche @ 2019-10-09 16:31 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Martin Wilck, linux-scsi,
	Hannes Reinecke

On 10/7/19 6:57 AM, Hannes Reinecke wrote:
> Some arrays are not capable of returning RTPG data during state
> transitioning, but rather return an 'LUN not accessible, asymmetric
> access state transition' sense code. In these cases we
> can set the state to 'transitioning' directly and don't need to
> evaluate the RTPG data (which we won't have anyway).

Reviewed-by: Bart Van Assche <bvanassche@acm.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] scsi_dh_alua: handle RTPG sense code correctly during state transitions
  2019-10-07 13:57 [PATCH] scsi_dh_alua: handle RTPG sense code correctly during state transitions Hannes Reinecke
                   ` (2 preceding siblings ...)
  2019-10-09 16:31 ` Bart Van Assche
@ 2019-10-10  2:43 ` Martin K. Petersen
  3 siblings, 0 replies; 7+ messages in thread
From: Martin K. Petersen @ 2019-10-10  2:43 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	Martin Wilck, linux-scsi, Hannes Reinecke


Hannes,

> Some arrays are not capable of returning RTPG data during state
> transitioning, but rather return an 'LUN not accessible, asymmetric
> access state transition' sense code. In these cases we can set the
> state to 'transitioning' directly and don't need to evaluate the RTPG
> data (which we won't have anyway).

Applied to 5.4/scsi-fixes, thanks you!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-10-10  2:43 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-07 13:57 [PATCH] scsi_dh_alua: handle RTPG sense code correctly during state transitions Hannes Reinecke
2019-10-07 14:15 ` Laurence Oberman
2019-10-07 20:45 ` Ewan D. Milne
2019-10-08  6:21   ` Hannes Reinecke
2019-10-08 15:58     ` Ewan D. Milne
2019-10-09 16:31 ` Bart Van Assche
2019-10-10  2:43 ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).