linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Fixes for device probing on flaky connections
@ 2022-06-15 16:41 mwilck
  2022-06-15 16:41 ` [PATCH 1/2] scsi: add BLIST_RETRY_SCAN to ignore errors during scanning mwilck
  2022-06-15 16:41 ` [PATCH 2/2] scsi: scan: retry INQUIRY after timeout mwilck
  0 siblings, 2 replies; 7+ messages in thread
From: mwilck @ 2022-06-15 16:41 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, linux-scsi, Hannes Reinecke, Ming Lei,
	Bart Van Assche, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

We have seen device probing errors in FCoE uplink failover tests.
After device loss and link recovery, spurious packet loss in the fabric may
cause device rediscovery to fail. With the two following patches, the
rediscovery failure could be avoided.

Reviews and comments welcome,
Martin

Hannes Reinecke (1):
  scsi: add BLIST_RETRY_SCAN to ignore errors during scanning

Martin Wilck (1):
  scsi: scan: retry INQUIRY after timeout

 drivers/scsi/scsi_scan.c    | 17 ++++++++++++++---
 include/scsi/scsi_devinfo.h |  4 +++-
 2 files changed, 17 insertions(+), 4 deletions(-)

-- 
2.36.1


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 1/2] scsi: add BLIST_RETRY_SCAN to ignore errors during scanning
  2022-06-15 16:41 [PATCH 0/2] Fixes for device probing on flaky connections mwilck
@ 2022-06-15 16:41 ` mwilck
  2022-06-22  2:02   ` Martin K. Petersen
  2022-06-15 16:41 ` [PATCH 2/2] scsi: scan: retry INQUIRY after timeout mwilck
  1 sibling, 1 reply; 7+ messages in thread
From: mwilck @ 2022-06-15 16:41 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, linux-scsi, Hannes Reinecke, Ming Lei,
	Bart Van Assche

From: Hannes Reinecke <hare@suse.de>

On flaky connections the INQUIRY command might run into a timeout,
but as we already have performed a REPORT LUNS command we know that
a LUN should be present. So ignore errors for the INQUIRY commands,
and retry until we exhaust the number of retries.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 drivers/scsi/scsi_scan.c    | 12 +++++++++---
 include/scsi/scsi_devinfo.h |  4 +++-
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index 91ac901a6682..b9b851ce1b72 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -649,8 +649,6 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
 	int pass, count, result;
 	struct scsi_sense_hdr sshdr;
 
-	*bflags = 0;
-
 	/* Perform up to 3 passes.  The first pass uses a conservative
 	 * transfer length of 36 unless sdev->inquiry_len specifies a
 	 * different value. */
@@ -697,6 +695,11 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
 				    (sshdr.ascq == 0))
 					continue;
 			}
+			if (*bflags & BLIST_RETRY_SCAN) {
+				SCSI_LOG_SCAN_BUS(3, sdev_printk(KERN_INFO, sdev,
+						"scsi scan: retry inquiry after REPORT LUNs\n"));
+				continue;
+			}
 		} else if (result == 0) {
 			/*
 			 * if nothing was transferred, we try
@@ -709,6 +712,8 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
 		break;
 	}
 
+	*bflags = 0;
+
 	if (result == 0) {
 		scsi_sanitize_inquiry_string(&inq_result[8], 8);
 		scsi_sanitize_inquiry_string(&inq_result[16], 16);
@@ -1531,9 +1536,10 @@ static int scsi_report_lun_scan(struct scsi_target *starget, blist_flags_t bflag
 				    " allowed by the host adapter\n", lun);
 		} else {
 			int res;
+			blist_flags_t bflags = BLIST_RETRY_SCAN;
 
 			res = scsi_probe_and_add_lun(starget,
-				lun, NULL, NULL, rescan, NULL);
+				lun, &bflags, NULL, rescan, NULL);
 			if (res == SCSI_SCAN_NO_RESPONSE) {
 				/*
 				 * Got some results, but now none, abort.
diff --git a/include/scsi/scsi_devinfo.h b/include/scsi/scsi_devinfo.h
index 5d14adae21c7..e74a228d73d1 100644
--- a/include/scsi/scsi_devinfo.h
+++ b/include/scsi/scsi_devinfo.h
@@ -68,8 +68,10 @@
 #define BLIST_RETRY_ITF		((__force blist_flags_t)(1ULL << 32))
 /* Always retry ABORTED_COMMAND with ASC 0xc1 */
 #define BLIST_RETRY_ASC_C1	((__force blist_flags_t)(1ULL << 33))
+/* Retry errors during scanning */
+#define BLIST_RETRY_SCAN	((__force blist_flags_t)(1ULL << 34))
 
-#define __BLIST_LAST_USED BLIST_RETRY_ASC_C1
+#define __BLIST_LAST_USED BLIST_RETRY_SCAN
 
 #define __BLIST_HIGH_UNUSED (~(__BLIST_LAST_USED | \
 			       (__force blist_flags_t) \
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH 2/2] scsi: scan: retry INQUIRY after timeout
  2022-06-15 16:41 [PATCH 0/2] Fixes for device probing on flaky connections mwilck
  2022-06-15 16:41 ` [PATCH 1/2] scsi: add BLIST_RETRY_SCAN to ignore errors during scanning mwilck
@ 2022-06-15 16:41 ` mwilck
  1 sibling, 0 replies; 7+ messages in thread
From: mwilck @ 2022-06-15 16:41 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, linux-scsi, Hannes Reinecke, Ming Lei,
	Bart Van Assche, Martin Wilck

From: Martin Wilck <mwilck@suse.com>

The SCSI mid layer doesn't retry commands after DID_TIME_OUT (see
scsi_noretry_cmd()). Packet loss in the fabric can cause spurious timeouts
during SCSI device probing, causing device probing to fail. This has been
observed in FCoE uplink failover tests, for example.

This patch fixes the issue by retrying the INQUIRY up to 3 times (in practice,
we never observed more than a single retry),

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 drivers/scsi/scsi_scan.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index b9b851ce1b72..f0a248bd1cd3 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -700,6 +700,11 @@ static int scsi_probe_lun(struct scsi_device *sdev, unsigned char *inq_result,
 						"scsi scan: retry inquiry after REPORT LUNs\n"));
 				continue;
 			}
+			if (host_byte(result) == DID_TIME_OUT) {
+				SCSI_LOG_SCAN_BUS(3, sdev_printk(KERN_INFO, sdev,
+						"scsi scan: retry inquiry after timeout\n"));
+				continue;
+			}
 		} else if (result == 0) {
 			/*
 			 * if nothing was transferred, we try
-- 
2.36.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] scsi: add BLIST_RETRY_SCAN to ignore errors during scanning
  2022-06-15 16:41 ` [PATCH 1/2] scsi: add BLIST_RETRY_SCAN to ignore errors during scanning mwilck
@ 2022-06-22  2:02   ` Martin K. Petersen
  2022-06-22  8:16     ` Martin Wilck
  0 siblings, 1 reply; 7+ messages in thread
From: Martin K. Petersen @ 2022-06-22  2:02 UTC (permalink / raw)
  To: mwilck
  Cc: Martin K. Petersen, Christoph Hellwig, linux-scsi,
	Hannes Reinecke, Ming Lei, Bart Van Assche


Martin,

> @@ -1531,9 +1536,10 @@ static int scsi_report_lun_scan(struct scsi_target *starget, blist_flags_t bflag
>  				    " allowed by the host adapter\n", lun);
>  		} else {
>  			int res;
> +			blist_flags_t bflags = BLIST_RETRY_SCAN;

I'm not a big fan of using the bflag as carrier of "I was reported and
therefore must exist".

Also: Why isn't patch #2 sufficient?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] scsi: add BLIST_RETRY_SCAN to ignore errors during scanning
  2022-06-22  2:02   ` Martin K. Petersen
@ 2022-06-22  8:16     ` Martin Wilck
  2022-06-22 10:49       ` Hannes Reinecke
  0 siblings, 1 reply; 7+ messages in thread
From: Martin Wilck @ 2022-06-22  8:16 UTC (permalink / raw)
  To: Martin K. Petersen, Hannes Reinecke
  Cc: Christoph Hellwig, linux-scsi, Ming Lei, Bart Van Assche

On Tue, 2022-06-21 at 22:02 -0400, Martin K. Petersen wrote:
> 
> Martin,
> 
> > @@ -1531,9 +1536,10 @@ static int scsi_report_lun_scan(struct
> > scsi_target *starget, blist_flags_t bflag
> >                                     " allowed by the host
> > adapter\n", lun);
> >                 } else {
> >                         int res;
> > +                       blist_flags_t bflags = BLIST_RETRY_SCAN;
> 
> I'm not a big fan of using the bflag as carrier of "I was reported
> and
> therefore must exist".
> 
> Also: Why isn't patch #2 sufficient?

I think it is. I can resubmit just #2 if you prefer and Hannes agrees.

Regards,
Martin



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] scsi: add BLIST_RETRY_SCAN to ignore errors during scanning
  2022-06-22  8:16     ` Martin Wilck
@ 2022-06-22 10:49       ` Hannes Reinecke
  2022-06-22 12:05         ` Martin Wilck
  0 siblings, 1 reply; 7+ messages in thread
From: Hannes Reinecke @ 2022-06-22 10:49 UTC (permalink / raw)
  To: Martin Wilck, Martin K. Petersen
  Cc: Christoph Hellwig, linux-scsi, Ming Lei, Bart Van Assche

On 6/22/22 10:16, Martin Wilck wrote:
> On Tue, 2022-06-21 at 22:02 -0400, Martin K. Petersen wrote:
>>
>> Martin,
>>
>>> @@ -1531,9 +1536,10 @@ static int scsi_report_lun_scan(struct
>>> scsi_target *starget, blist_flags_t bflag
>>>                                      " allowed by the host
>>> adapter\n", lun);
>>>                  } else {
>>>                          int res;
>>> +                       blist_flags_t bflags = BLIST_RETRY_SCAN;
>>
>> I'm not a big fan of using the bflag as carrier of "I was reported
>> and
>> therefore must exist".
>>
>> Also: Why isn't patch #2 sufficient?
> 
> I think it is. I can resubmit just #2 if you prefer and Hannes agrees.
> 
I'm fine with just adding #2; #1 is really just there to provide the 
original behaviour. Device probing is one of the most arcane areas
in the SCSI stack due to all the various quirks etc and I didn't want
to change anything here.

But if it's okay, it's okay :-)

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 1/2] scsi: add BLIST_RETRY_SCAN to ignore errors during scanning
  2022-06-22 10:49       ` Hannes Reinecke
@ 2022-06-22 12:05         ` Martin Wilck
  0 siblings, 0 replies; 7+ messages in thread
From: Martin Wilck @ 2022-06-22 12:05 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K. Petersen
  Cc: Christoph Hellwig, linux-scsi, Ming Lei, Bart Van Assche

On Wed, 2022-06-22 at 12:49 +0200, Hannes Reinecke wrote:
> On 6/22/22 10:16, Martin Wilck wrote:
> > On Tue, 2022-06-21 at 22:02 -0400, Martin K. Petersen wrote:
> > > 
> > > Martin,
> > > 
> > > > @@ -1531,9 +1536,10 @@ static int scsi_report_lun_scan(struct
> > > > scsi_target *starget, blist_flags_t bflag
> > > >                                      " allowed by the host
> > > > adapter\n", lun);
> > > >                  } else {
> > > >                          int res;
> > > > +                       blist_flags_t bflags =
> > > > BLIST_RETRY_SCAN;
> > > 
> > > I'm not a big fan of using the bflag as carrier of "I was
> > > reported
> > > and
> > > therefore must exist".
> > > 
> > > Also: Why isn't patch #2 sufficient?
> > 
> > I think it is. I can resubmit just #2 if you prefer and Hannes
> > agrees.
> > 
> I'm fine with just adding #2; #1 is really just there to provide the 
> original behaviour. Device probing is one of the most arcane areas
> in the SCSI stack due to all the various quirks etc and I didn't want
> to change anything here.
> 
> But if it's okay, it's okay :-)

Alright. To be certain, I'll ask our partner for another test.

Martin


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-06-22 12:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-06-15 16:41 [PATCH 0/2] Fixes for device probing on flaky connections mwilck
2022-06-15 16:41 ` [PATCH 1/2] scsi: add BLIST_RETRY_SCAN to ignore errors during scanning mwilck
2022-06-22  2:02   ` Martin K. Petersen
2022-06-22  8:16     ` Martin Wilck
2022-06-22 10:49       ` Hannes Reinecke
2022-06-22 12:05         ` Martin Wilck
2022-06-15 16:41 ` [PATCH 2/2] scsi: scan: retry INQUIRY after timeout mwilck

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).