linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] I/O errors for ALUA state transitions
@ 2024-05-03 19:56 Martin Wilck
  2024-05-06  5:54 ` Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Martin Wilck @ 2024-05-03 19:56 UTC (permalink / raw)
  To: Martin K. Petersen, Christoph Hellwig, Hannes Reinecke,
	James Bottomley, Ewan Milne
  Cc: Bart Van Assche, linux-scsi, Martin Wilck, Rajashekhar M A

When a host is configured with a few LUNs and IO is running,
injecting FC faults repeatedly leads to path recovery problems.
The LUNs have 4 paths each and 3 of them come back active after
say an FC fault which makes two of the paths go down, instead of
all 4. This happens after several iterations of continuous FC faults.

Reason here is that we're returning an I/O error whenever we're
encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE,
ASYMMETRIC ACCESS STATE TRANSITION) instead of retrying.

mwilck: Resending a modified version of this patch, which was originally
authored by Rajashekhar M A from Netapp, and submitted in 2021.
Moved the changes to alua_check_sense() as suggested by Mike Christie [1].
Evan Milne had raised the question whether pg->state should be set to
transitioning in the UA case [2]. I believe that doing this is
correct. SCSI_ACCESS_STATE_TRANSITIONING by itself doesn't cause I/O
errors. Our handler schedules an RTPG, which will only result in an I/O
error condition if the transitioning timeout expires.

[1] https://lore.kernel.org/all/0bc96e82-fdda-4187-148d-5b34f81d4942@oracle.com/
[2] https://lore.kernel.org/all/CAGtn9r=kicnTDE2o7Gt5Y=yoidHYD7tG8XdMHEBJTBraVEoOCw@mail.gmail.com/

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Co-authored-by: Rajashekhar M A <rajs@netapp.com>
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 34 +++++++++++++---------
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index a226dc1b65d7..682d5bb53d14 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -414,28 +414,34 @@ static char print_alua_state(unsigned char state)
 	}
 }
 
-static enum scsi_disposition alua_check_sense(struct scsi_device *sdev,
-					      struct scsi_sense_hdr *sense_hdr)
+static enum scsi_disposition alua_handle_state_transition(struct scsi_device *sdev)
 {
 	struct alua_dh_data *h = sdev->handler_data;
 	struct alua_port_group *pg;
 
+	/*
+	 * LUN Not Accessible - ALUA state transition
+	 */
+	rcu_read_lock();
+	pg = rcu_dereference(h->pg);
+	if (pg)
+		pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
+	rcu_read_unlock();
+	alua_check(sdev, false);
+	return NEEDS_RETRY;
+}
+
+static enum scsi_disposition alua_check_sense(struct scsi_device *sdev,
+					      struct scsi_sense_hdr *sense_hdr)
+{
 	switch (sense_hdr->sense_key) {
 	case NOT_READY:
-		if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0a) {
-			/*
-			 * LUN Not Accessible - ALUA state transition
-			 */
-			rcu_read_lock();
-			pg = rcu_dereference(h->pg);
-			if (pg)
-				pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
-			rcu_read_unlock();
-			alua_check(sdev, false);
-			return NEEDS_RETRY;
-		}
+		if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0a)
+			return alua_handle_state_transition(sdev);
 		break;
 	case UNIT_ATTENTION:
+		if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0a)
+			return alua_handle_state_transition(sdev);
 		if (sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00) {
 			/*
 			 * Power On, Reset, or Bus Device Reset.
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] I/O errors for ALUA state transitions
  2024-05-03 19:56 [PATCH v2] I/O errors for ALUA state transitions Martin Wilck
@ 2024-05-06  5:54 ` Christoph Hellwig
  2024-05-07  9:10   ` Martin Wilck
  2024-05-06 21:48 ` Mike Christie
  2024-05-07  9:12 ` Damien Le Moal
  2 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2024-05-06  5:54 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Martin K. Petersen, Christoph Hellwig, Hannes Reinecke,
	James Bottomley, Ewan Milne, Bart Van Assche, linux-scsi,
	Martin Wilck, Rajashekhar M A

> -static enum scsi_disposition alua_check_sense(struct scsi_device *sdev,
> -					      struct scsi_sense_hdr *sense_hdr)
> +static enum scsi_disposition alua_handle_state_transition(struct scsi_device *sdev)
>  {
>  	struct alua_dh_data *h = sdev->handler_data;
>  	struct alua_port_group *pg;
>  
> +	/*
> +	 * LUN Not Accessible - ALUA state transition
> +	 */
> +	rcu_read_lock();
> +	pg = rcu_dereference(h->pg);
> +	if (pg)
> +		pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
> +	rcu_read_unlock();
> +	alua_check(sdev, false);
> +	return NEEDS_RETRY;

This always returns NEEDS_RETRY, so you can drop the return value
entirely and handle this in the callers.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] I/O errors for ALUA state transitions
  2024-05-03 19:56 [PATCH v2] I/O errors for ALUA state transitions Martin Wilck
  2024-05-06  5:54 ` Christoph Hellwig
@ 2024-05-06 21:48 ` Mike Christie
  2024-05-07  9:09   ` Martin Wilck
  2024-05-07  9:12 ` Damien Le Moal
  2 siblings, 1 reply; 6+ messages in thread
From: Mike Christie @ 2024-05-06 21:48 UTC (permalink / raw)
  To: Martin Wilck, Martin K. Petersen, Christoph Hellwig,
	Hannes Reinecke, James Bottomley, Ewan Milne
  Cc: Bart Van Assche, linux-scsi, Martin Wilck, Rajashekhar M A

On 5/3/24 2:56 PM, Martin Wilck wrote:
>  	case UNIT_ATTENTION:
> +		if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0a)

Do you need to add this check in alua_tur as well? We are checking for
the NOT_READY case.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] I/O errors for ALUA state transitions
  2024-05-06 21:48 ` Mike Christie
@ 2024-05-07  9:09   ` Martin Wilck
  0 siblings, 0 replies; 6+ messages in thread
From: Martin Wilck @ 2024-05-07  9:09 UTC (permalink / raw)
  To: Mike Christie, Martin K. Petersen, Christoph Hellwig,
	Hannes Reinecke, James Bottomley, Ewan Milne
  Cc: Bart Van Assche, linux-scsi, Rajashekhar M A

On Mon, 2024-05-06 at 16:48 -0500, Mike Christie wrote:
> On 5/3/24 2:56 PM, Martin Wilck wrote:
> >  	case UNIT_ATTENTION:
> > +		if (sense_hdr->asc == 0x04 && sense_hdr->ascq ==
> > 0x0a)
> 
> Do you need to add this check in alua_tur as well? We are checking
> for
> the NOT_READY case.

Good point. I'll add the check, I suppose it can't hurt. But I notice
that scsi_test_unit_ready() tries to "eat" UA conditions and alua_tur()
calls it with ALUA_FAILOVER_RETRIES (5) retries, so checking the sense
key in alua_tur() probably won't make much of a difference, either.

[Side note: I am wondering if it makes sense to have
scsi_test_unit_ready() retry on UA when called from alua_tur(). After
all, alua_tur() is only called to check whether another RTPG must be
scheduled. @Hannes?]

Regards,
Martin




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] I/O errors for ALUA state transitions
  2024-05-06  5:54 ` Christoph Hellwig
@ 2024-05-07  9:10   ` Martin Wilck
  0 siblings, 0 replies; 6+ messages in thread
From: Martin Wilck @ 2024-05-07  9:10 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Martin K. Petersen, Hannes Reinecke, James Bottomley, Ewan Milne,
	Bart Van Assche, linux-scsi, Rajashekhar M A

On Mon, 2024-05-06 at 07:54 +0200, Christoph Hellwig wrote:
> > -static enum scsi_disposition alua_check_sense(struct scsi_device
> > *sdev,
> > -					      struct
> > scsi_sense_hdr *sense_hdr)
> > +static enum scsi_disposition alua_handle_state_transition(struct
> > scsi_device *sdev)
> >  {
> >  	struct alua_dh_data *h = sdev->handler_data;
> >  	struct alua_port_group *pg;
> >  
> > +	/*
> > +	 * LUN Not Accessible - ALUA state transition
> > +	 */
> > +	rcu_read_lock();
> > +	pg = rcu_dereference(h->pg);
> > +	if (pg)
> > +		pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
> > +	rcu_read_unlock();
> > +	alua_check(sdev, false);
> > +	return NEEDS_RETRY;
> 
> This always returns NEEDS_RETRY, so you can drop the return value
> entirely and handle this in the callers.
> 

I liked being able to write "return alua_handle_state_transition(...)"
in the caller. But np, I'll change it.

Martin


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] I/O errors for ALUA state transitions
  2024-05-03 19:56 [PATCH v2] I/O errors for ALUA state transitions Martin Wilck
  2024-05-06  5:54 ` Christoph Hellwig
  2024-05-06 21:48 ` Mike Christie
@ 2024-05-07  9:12 ` Damien Le Moal
  2 siblings, 0 replies; 6+ messages in thread
From: Damien Le Moal @ 2024-05-07  9:12 UTC (permalink / raw)
  To: Martin Wilck, Martin K. Petersen, Christoph Hellwig,
	Hannes Reinecke, James Bottomley, Ewan Milne
  Cc: Bart Van Assche, linux-scsi, Martin Wilck, Rajashekhar M A

On 5/4/24 04:56, Martin Wilck wrote:
> When a host is configured with a few LUNs and IO is running,
> injecting FC faults repeatedly leads to path recovery problems.
> The LUNs have 4 paths each and 3 of them come back active after
> say an FC fault which makes two of the paths go down, instead of
> all 4. This happens after several iterations of continuous FC faults.
> 
> Reason here is that we're returning an I/O error whenever we're
> encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE,
> ASYMMETRIC ACCESS STATE TRANSITION) instead of retrying.
> 
> mwilck: Resending a modified version of this patch, which was originally
> authored by Rajashekhar M A from Netapp, and submitted in 2021.
> Moved the changes to alua_check_sense() as suggested by Mike Christie [1].
> Evan Milne had raised the question whether pg->state should be set to
> transitioning in the UA case [2]. I believe that doing this is
> correct. SCSI_ACCESS_STATE_TRANSITIONING by itself doesn't cause I/O
> errors. Our handler schedules an RTPG, which will only result in an I/O
> error condition if the transitioning timeout expires.
> 
> [1] https://lore.kernel.org/all/0bc96e82-fdda-4187-148d-5b34f81d4942@oracle.com/
> [2] https://lore.kernel.org/all/CAGtn9r=kicnTDE2o7Gt5Y=yoidHYD7tG8XdMHEBJTBraVEoOCw@mail.gmail.com/
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> Co-authored-by: Rajashekhar M A <rajs@netapp.com>
> ---
>  drivers/scsi/device_handler/scsi_dh_alua.c | 34 +++++++++++++---------
>  1 file changed, 20 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
> index a226dc1b65d7..682d5bb53d14 100644
> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
> @@ -414,28 +414,34 @@ static char print_alua_state(unsigned char state)
>  	}
>  }
>  
> -static enum scsi_disposition alua_check_sense(struct scsi_device *sdev,
> -					      struct scsi_sense_hdr *sense_hdr)
> +static enum scsi_disposition alua_handle_state_transition(struct scsi_device *sdev)
>  {
>  	struct alua_dh_data *h = sdev->handler_data;
>  	struct alua_port_group *pg;
>  
> +	/*
> +	 * LUN Not Accessible - ALUA state transition
> +	 */
> +	rcu_read_lock();
> +	pg = rcu_dereference(h->pg);
> +	if (pg)
> +		pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
> +	rcu_read_unlock();
> +	alua_check(sdev, false);
> +	return NEEDS_RETRY;
> +}
> +
> +static enum scsi_disposition alua_check_sense(struct scsi_device *sdev,
> +					      struct scsi_sense_hdr *sense_hdr)
> +{
>  	switch (sense_hdr->sense_key) {
>  	case NOT_READY:
> -		if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0a) {
> -			/*
> -			 * LUN Not Accessible - ALUA state transition
> -			 */
> -			rcu_read_lock();
> -			pg = rcu_dereference(h->pg);
> -			if (pg)
> -				pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
> -			rcu_read_unlock();
> -			alua_check(sdev, false);
> -			return NEEDS_RETRY;
> -		}
> +		if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0a)

Please keep the comment that spells out what this asc/ascq is.

> +			return alua_handle_state_transition(sdev);
>  		break;
>  	case UNIT_ATTENTION:
> +		if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0a)
> +			return alua_handle_state_transition(sdev);
>  		if (sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00) {
>  			/*
>  			 * Power On, Reset, or Bus Device Reset.

-- 
Damien Le Moal
Western Digital Research


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-05-07  9:12 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-03 19:56 [PATCH v2] I/O errors for ALUA state transitions Martin Wilck
2024-05-06  5:54 ` Christoph Hellwig
2024-05-07  9:10   ` Martin Wilck
2024-05-06 21:48 ` Mike Christie
2024-05-07  9:09   ` Martin Wilck
2024-05-07  9:12 ` Damien Le Moal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).