linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4] I/O errors for ALUA state transitions
@ 2024-05-08 10:24 Martin Wilck
  2024-05-08 12:39 ` Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Martin Wilck @ 2024-05-08 10:24 UTC (permalink / raw)
  To: Martin K. Petersen, Christoph Hellwig, Hannes Reinecke,
	James Bottomley, Ewan Milne, Mike Christie, linux-scsi
  Cc: Bart Van Assche, Damien Le Moal, Martin Wilck

From: Rajashekhar M A <rajs@netapp.com>

When a host is configured with a few LUNs and IO is running,
injecting FC faults repeatedly leads to path recovery problems.
The LUNs have 4 paths each and 3 of them come back active after
say an FC fault which makes two of the paths go down, instead of
all 4. This happens after several iterations of continuous FC faults.

Reason here is that we're returning an I/O error whenever we're
encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE,
ASYMMETRIC ACCESS STATE TRANSITION) instead of retrying.

mwilck: Moved this code to alua_check_sense() as suggested by
Mike Christie [1]. Evan Milne had raised the question whether pg->state
should be set to transitioning in the UA case [2]. I believe that doing
this is correct. SCSI_ACCESS_STATE_TRANSITIONING by itself doesn't cause
I/O errors. Our handler schedules an RTPG, which will only result in
an I/O error condition if the transitioning timeout expires.

[1] https://lore.kernel.org/all/0bc96e82-fdda-4187-148d-5b34f81d4942@oracle.com/
[2] https://lore.kernel.org/all/CAGtn9r=kicnTDE2o7Gt5Y=yoidHYD7tG8XdMHEBJTBraVEoOCw@mail.gmail.com/

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
---
Changes v3->v4:
- fix a whitespace error (Damien Le Moal)
Changes v2->v3:
- drop return value of alua_handle_state_transition() (Christoph Hellwig)
- handle UNIT ATTENTION in alua_tur(), too (Mike Christie)
- restore comment in alua_check_sense() (Damien Le Moal)
---
 drivers/scsi/device_handler/scsi_dh_alua.c | 31 +++++++++++++++-------
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c b/drivers/scsi/device_handler/scsi_dh_alua.c
index a226dc1b65d7..4eb0837298d4 100644
--- a/drivers/scsi/device_handler/scsi_dh_alua.c
+++ b/drivers/scsi/device_handler/scsi_dh_alua.c
@@ -414,28 +414,40 @@ static char print_alua_state(unsigned char state)
 	}
 }
 
-static enum scsi_disposition alua_check_sense(struct scsi_device *sdev,
-					      struct scsi_sense_hdr *sense_hdr)
+static void alua_handle_state_transition(struct scsi_device *sdev)
 {
 	struct alua_dh_data *h = sdev->handler_data;
 	struct alua_port_group *pg;
 
+	rcu_read_lock();
+	pg = rcu_dereference(h->pg);
+	if (pg)
+		pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
+	rcu_read_unlock();
+	alua_check(sdev, false);
+}
+
+static enum scsi_disposition alua_check_sense(struct scsi_device *sdev,
+					      struct scsi_sense_hdr *sense_hdr)
+{
 	switch (sense_hdr->sense_key) {
 	case NOT_READY:
 		if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0a) {
 			/*
 			 * LUN Not Accessible - ALUA state transition
 			 */
-			rcu_read_lock();
-			pg = rcu_dereference(h->pg);
-			if (pg)
-				pg->state = SCSI_ACCESS_STATE_TRANSITIONING;
-			rcu_read_unlock();
-			alua_check(sdev, false);
+			alua_handle_state_transition(sdev);
 			return NEEDS_RETRY;
 		}
 		break;
 	case UNIT_ATTENTION:
+		if (sense_hdr->asc == 0x04 && sense_hdr->ascq == 0x0a) {
+			/*
+			 * LUN Not Accessible - ALUA state transition
+			 */
+			alua_handle_state_transition(sdev);
+			return NEEDS_RETRY;
+		}
 		if (sense_hdr->asc == 0x29 && sense_hdr->ascq == 0x00) {
 			/*
 			 * Power On, Reset, or Bus Device Reset.
@@ -502,7 +514,8 @@ static int alua_tur(struct scsi_device *sdev)
 
 	retval = scsi_test_unit_ready(sdev, ALUA_FAILOVER_TIMEOUT * HZ,
 				      ALUA_FAILOVER_RETRIES, &sense_hdr);
-	if (sense_hdr.sense_key == NOT_READY &&
+	if ((sense_hdr.sense_key == NOT_READY ||
+	     sense_hdr.sense_key == UNIT_ATTENTION) &&
 	    sense_hdr.asc == 0x04 && sense_hdr.ascq == 0x0a)
 		return SCSI_DH_RETRY;
 	else if (retval)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] I/O errors for ALUA state transitions
  2024-05-08 10:24 [PATCH v4] I/O errors for ALUA state transitions Martin Wilck
@ 2024-05-08 12:39 ` Christoph Hellwig
  2024-05-08 15:48 ` Mike Christie
  2024-05-09  1:47 ` Martin K. Petersen
  2 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2024-05-08 12:39 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Martin K. Petersen, Christoph Hellwig, Hannes Reinecke,
	James Bottomley, Ewan Milne, Mike Christie, linux-scsi,
	Bart Van Assche, Damien Le Moal, Martin Wilck

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] I/O errors for ALUA state transitions
  2024-05-08 10:24 [PATCH v4] I/O errors for ALUA state transitions Martin Wilck
  2024-05-08 12:39 ` Christoph Hellwig
@ 2024-05-08 15:48 ` Mike Christie
  2024-05-09  1:47 ` Martin K. Petersen
  2 siblings, 0 replies; 6+ messages in thread
From: Mike Christie @ 2024-05-08 15:48 UTC (permalink / raw)
  To: Martin Wilck, Martin K. Petersen, Christoph Hellwig,
	Hannes Reinecke, James Bottomley, Ewan Milne, linux-scsi
  Cc: Bart Van Assche, Damien Le Moal, Martin Wilck

On 5/8/24 5:24 AM, Martin Wilck wrote:
> From: Rajashekhar M A <rajs@netapp.com>
> 
> When a host is configured with a few LUNs and IO is running,
> injecting FC faults repeatedly leads to path recovery problems.
> The LUNs have 4 paths each and 3 of them come back active after
> say an FC fault which makes two of the paths go down, instead of
> all 4. This happens after several iterations of continuous FC faults.
> 
> Reason here is that we're returning an I/O error whenever we're
> encountering sense code 06/04/0a (LOGICAL UNIT NOT ACCESSIBLE,
> ASYMMETRIC ACCESS STATE TRANSITION) instead of retrying.
> 
> mwilck: Moved this code to alua_check_sense() as suggested by
> Mike Christie [1]. Evan Milne had raised the question whether pg->state
> should be set to transitioning in the UA case [2]. I believe that doing
> this is correct. SCSI_ACCESS_STATE_TRANSITIONING by itself doesn't cause
> I/O errors. Our handler schedules an RTPG, which will only result in
> an I/O error condition if the transitioning timeout expires.
> 
> [1] https://lore.kernel.org/all/0bc96e82-fdda-4187-148d-5b34f81d4942@oracle.com/
> [2] https://lore.kernel.org/all/CAGtn9r=kicnTDE2o7Gt5Y=yoidHYD7tG8XdMHEBJTBraVEoOCw@mail.gmail.com/
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
> ---


Reviewed-by: Mike Christie <michael.christie@oracle.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] I/O errors for ALUA state transitions
  2024-05-08 10:24 [PATCH v4] I/O errors for ALUA state transitions Martin Wilck
  2024-05-08 12:39 ` Christoph Hellwig
  2024-05-08 15:48 ` Mike Christie
@ 2024-05-09  1:47 ` Martin K. Petersen
  2024-05-13 13:51   ` Martin Wilck
  2 siblings, 1 reply; 6+ messages in thread
From: Martin K. Petersen @ 2024-05-09  1:47 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Martin K. Petersen, Christoph Hellwig, Hannes Reinecke,
	James Bottomley, Ewan Milne, Mike Christie, linux-scsi,
	Bart Van Assche, Damien Le Moal, Martin Wilck, Rajashekhar M A


Hi Martin!

> From: Rajashekhar M A <rajs@netapp.com>

I can't really apply this without a formal SoB from Rajashekhar.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] I/O errors for ALUA state transitions
  2024-05-09  1:47 ` Martin K. Petersen
@ 2024-05-13 13:51   ` Martin Wilck
  2024-05-13 14:27     ` Martin K. Petersen
  0 siblings, 1 reply; 6+ messages in thread
From: Martin Wilck @ 2024-05-13 13:51 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, Hannes Reinecke, James Bottomley, Ewan Milne,
	Mike Christie, linux-scsi, Bart Van Assche, Damien Le Moal,
	Rajashekhar M A

On Wed, 2024-05-08 at 21:47 -0400, Martin K. Petersen wrote:
> 
> Hi Martin!
> 
> > From: Rajashekhar M A <rajs@netapp.com>
> 
> I can't really apply this without a formal SoB from Rajashekhar.
> 

This will be difficult, as this email is outdated and he's apparently
not with NetApp any more.

The patch looks very different now from what he originally submitted.
Would it be acceptable if I resubmit with myself and Hannes as patch
authors and him in a Co-authored-by: ?

Martin






^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v4] I/O errors for ALUA state transitions
  2024-05-13 13:51   ` Martin Wilck
@ 2024-05-13 14:27     ` Martin K. Petersen
  0 siblings, 0 replies; 6+ messages in thread
From: Martin K. Petersen @ 2024-05-13 14:27 UTC (permalink / raw)
  To: Martin Wilck
  Cc: Martin K. Petersen, Christoph Hellwig, Hannes Reinecke,
	James Bottomley, Ewan Milne, Mike Christie, linux-scsi,
	Bart Van Assche, Damien Le Moal, Rajashekhar M A


Martin,

> Would it be acceptable if I resubmit with myself and Hannes as patch
> authors and him in a Co-authored-by: ?

Given the nature of how this patch has developed over time, sure.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-05-13 14:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-05-08 10:24 [PATCH v4] I/O errors for ALUA state transitions Martin Wilck
2024-05-08 12:39 ` Christoph Hellwig
2024-05-08 15:48 ` Mike Christie
2024-05-09  1:47 ` Martin K. Petersen
2024-05-13 13:51   ` Martin Wilck
2024-05-13 14:27     ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).