All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/4] scsi: Support to handle Intermittent errors
@ 2020-11-05  6:09 Muneendra
  2020-11-05  6:09 ` [PATCH v6 1/4] scsi: Added a new definition in scsi_cmnd.h Muneendra
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Muneendra @ 2020-11-05  6:09 UTC (permalink / raw)
  To: linux-scsi, michael.christie, hare; +Cc: jsmart2021, emilne, mkumar, Muneendra

[-- Attachment #1: Type: text/plain, Size: 3552 bytes --]

This patch adds a support to prevent retries of all the
io's after an abort succeeds on a particular device when transport
connectivity to the device is encountering intermittent errors.

Intermittent connectivity is a condition that can be detected by transport
fabric notifications. A service can monitor the ELS notifications and
take action on all the outstanding io's of a scsi device at that instant.

This feature is intended to be used when the device is part of a multipath
environment. When the service detects the poor connectivity, the multipath
path can be placed in a marginal path group and ignored further io
operations.

After placing a path in the marginal path group,the daemon sets the
port_state to Marginal which sets bit in scmd->state for all the
io's on that particular device with the new sysfs interface
provided in this patch.This prevent retries of all the
io's if an io hits a scsi timeout which inturn issues an abort.
On Abort succeeds on a marginal path the io will be immediately retried on
another active path.On abort fails then the things escalates to existing
target reset sg interface recovery process.

Below is the interface provided to set the port state to Marginal
and Online.
echo "Marginal" >> /sys/class/fc_remote_ports/rport-X\:Y-Z/port_state
echo "Online" >> /sys/class/fc_remote_ports/rport-X\:Y-Z/port_state


The patches were cut against  5.10/scsi-queue tree

---
v6:
Reordered the patches to make patch ordering and more logical.

v5:
Added the DID_TRANSPORT_MARGINAL case to scsi_decide_disposition

Made changes to clear the SCMD_NORETRIES_ABORT bit if the port_state
has changed from marginal to online due to port_delete and port_add
as we need the normal cmd retry behaviour while we are calling the
eh handlers.

Made changes in fc_scsi_scan_rport as we are checking FC_PORTSTATE_ONLINE
instead of FC_PORTSTATE_ONLINE and FC_PORTSTATE_MARGINAL


v4:
Made changes in fc_eh_timed_out callout to set the SCMD_NORETRIES_ABORT if port
state is marginal 

With this change, we  removed the code  to loop over running commands
and fc_remote_port_chkready changes to set the SCMD_NORETRIES_ABORT 

Removed the scsi_cmd argument for fc_remote_port_chkready
and reverted back the patches that addressed this change(argument)

Removed unnecessary comments
Handle the return of errors on failure.

v3:
Removed the port_state from starget attributes.
Enabled the store functionality for port_state under remote port
Added a new argument to scsi_cmd  to fc_remote_port_chkready
Used the existing scsi command iterators scsi_host_busy_iter.
Rearranged the patches
Added new patches to add new argument for fc_remote_port_chkready

v2:
Added new error code DID_TRANSPORT_MARGINAL to handle marginal errors.
Added a new rport_state FC_PORTSTATE_MARGINAL and also added a new
sysfs interface port_state to set the port_state to marginal.
Added the support in lpfc to handle the marginal state.



Muneendra (4):
  scsi: Added a new definition in scsi_cmnd.h
  scsi: No retries on abort success
  scsi_transport_fc: Added a new rport state FC_PORTSTATE_MARGINAL
  scsi_transport_fc: Added store fucntionality to set the rport
    port_state using sysfs

 drivers/scsi/scsi_error.c        |  17 +++++
 drivers/scsi/scsi_lib.c          |   2 +
 drivers/scsi/scsi_transport_fc.c | 108 +++++++++++++++++++++++++------
 include/scsi/scsi.h              |   1 +
 include/scsi/scsi_cmnd.h         |   3 +
 include/scsi/scsi_transport_fc.h |  19 ++++++
 6 files changed, 131 insertions(+), 19 deletions(-)

-- 
2.26.2


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4177 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v6 1/4] scsi: Added a new definition in scsi_cmnd.h
  2020-11-05  6:09 [PATCH v6 0/4] scsi: Support to handle Intermittent errors Muneendra
@ 2020-11-05  6:09 ` Muneendra
  2020-11-05  6:09 ` [PATCH v6 2/4] scsi: No retries on abort success Muneendra
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: Muneendra @ 2020-11-05  6:09 UTC (permalink / raw)
  To: linux-scsi, michael.christie, hare; +Cc: jsmart2021, emilne, mkumar, Muneendra

[-- Attachment #1: Type: text/plain, Size: 1451 bytes --]

Added a new definition SCMD_NORETRIES_ABORT in scsi_cmnd.h

The SCMD_NORETRIES_ABORT defines the third bit postion
in scmd->state

Clearing the SCMD_NORETRIES_ABORT bit in state flag before
blk_mq_start_request

Signed-off-by: Muneendra <muneendra.kumar@broadcom.com>

---
v6:
Rearranged the patch by merging first hunk of the patch2 in v5
to this patch

v5:
No Change

v4:
No Change

v3:
No change

v2:
Modified the commit log
---
 drivers/scsi/scsi_lib.c  | 1 +
 include/scsi/scsi_cmnd.h | 3 +++
 2 files changed, 4 insertions(+)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 1a2e9bab42ef..2b5dea07498e 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1660,6 +1660,7 @@ static blk_status_t scsi_queue_rq(struct blk_mq_hw_ctx *hctx,
 		req->rq_flags |= RQF_DONTPREP;
 	} else {
 		clear_bit(SCMD_STATE_COMPLETE, &cmd->state);
+		clear_bit(SCMD_NORETRIES_ABORT, &cmd->state);
 	}
 
 	cmd->flags &= SCMD_PRESERVED_FLAGS;
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index 69ade4fb71aa..8dec4ec6bd5f 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -64,6 +64,9 @@ struct scsi_pointer {
 /* for scmd->state */
 #define SCMD_STATE_COMPLETE	0
 #define SCMD_STATE_INFLIGHT	1
+#define SCMD_NORETRIES_ABORT	2 /* If this bit is set then there won't be any
+				   * retries of scmd on abort success
+				   */
 
 struct scsi_cmnd {
 	struct scsi_request req;
-- 
2.26.2


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4177 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 2/4] scsi: No retries on abort success
  2020-11-05  6:09 [PATCH v6 0/4] scsi: Support to handle Intermittent errors Muneendra
  2020-11-05  6:09 ` [PATCH v6 1/4] scsi: Added a new definition in scsi_cmnd.h Muneendra
@ 2020-11-05  6:09 ` Muneendra
  2020-11-05  6:09 ` [PATCH v6 3/4] scsi_transport_fc: Added a new rport state FC_PORTSTATE_MARGINAL Muneendra
  2020-11-05  6:09 ` [PATCH v6 4/4] scsi_transport_fc: Added store fucntionality to set the rport port_state using sysfs Muneendra
  3 siblings, 0 replies; 12+ messages in thread
From: Muneendra @ 2020-11-05  6:09 UTC (permalink / raw)
  To: linux-scsi, michael.christie, hare; +Cc: jsmart2021, emilne, mkumar, Muneendra

[-- Attachment #1: Type: text/plain, Size: 3414 bytes --]

Made an additional check in scsi_noretry_cmd to verify whether user has
decided not to do retries on abort success by checking the
SCMD_NORETRIES_ABORT bit

If SCMD_NORETRIES_ABORT bit is set we are making sure there won't be any
retries done on the same path and also setting the host byte as
DID_TRANSPORT_MARGINAL so that the error can be propogated as recoverable
transport error to the blk layers.

Added a new error code DID_TRANSPORT_MARGINAL to handle marginal
errors in scsi.h

Added a code in scsi_result_to_blk_status to translate
a new error DID_TRANSPORT_MARGINAL to the corresponding blk_status_t
i.e BLK_STS_TRANSPORT

Added DID_TRANSPORT_MARGINAL case to scsi_decide_disposition

Signed-off-by: Muneendra <muneendra.kumar@broadcom.com>

---
v6:
Rearranged the patch by merging second hunk of the patch2 in v5
to this patch

v5:
added the DID_TRANSPORT_MARGINAL case to
scsi_decide_disposition

v4:
Modified the comments in the code appropriately

v3:
Merged  first part of the previous patch(v2 patch3) with
this patch.

v2:
set the hostbyte as DID_TRANSPORT_MARGINAL instead of
DID_TRANSPORT_FAILFAST.
---
 drivers/scsi/scsi_error.c | 17 +++++++++++++++++
 drivers/scsi/scsi_lib.c   |  1 +
 include/scsi/scsi.h       |  1 +
 3 files changed, 19 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index ae80daa5d831..02dfd70219b2 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -1763,6 +1763,16 @@ int scsi_noretry_cmd(struct scsi_cmnd *scmd)
 		return 0;
 
 check_type:
+	/*
+	 * Check whether caller has decided not to do retries on
+	 * abort success by checking the SCMD_NORETRIES_ABORT bit
+	 */
+	if ((test_bit(SCMD_NORETRIES_ABORT, &scmd->state)) &&
+		(scmd->request->cmd_flags & REQ_FAILFAST_TRANSPORT)) {
+		set_host_byte(scmd, DID_TRANSPORT_MARGINAL);
+		return 1;
+	}
+
 	/*
 	 * assume caller has checked sense and determined
 	 * the check condition was retryable.
@@ -1861,6 +1871,13 @@ int scsi_decide_disposition(struct scsi_cmnd *scmd)
 		 * the fast io fail tmo fired), so send IO directly upwards.
 		 */
 		return SUCCESS;
+
+	case DID_TRANSPORT_MARGINAL:
+		/*
+		 * caller has decided not to do retries on
+		 * abort success, so send IO directly upwards.
+		 */
+		return SUCCESS;
 	case DID_ERROR:
 		if (msg_byte(scmd->result) == COMMAND_COMPLETE &&
 		    status_byte(scmd->result) == RESERVATION_CONFLICT)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 2b5dea07498e..9606bad1542f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -629,6 +629,7 @@ static blk_status_t scsi_result_to_blk_status(struct scsi_cmnd *cmd, int result)
 			return BLK_STS_OK;
 		return BLK_STS_IOERR;
 	case DID_TRANSPORT_FAILFAST:
+	case DID_TRANSPORT_MARGINAL:
 		return BLK_STS_TRANSPORT;
 	case DID_TARGET_FAILURE:
 		set_host_byte(cmd, DID_OK);
diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index 5339baadc082..5b287ad8b727 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -159,6 +159,7 @@ static inline int scsi_is_wlun(u64 lun)
 				 * paths might yield different results */
 #define DID_ALLOC_FAILURE 0x12  /* Space allocation on the device failed */
 #define DID_MEDIUM_ERROR  0x13  /* Medium error */
+#define DID_TRANSPORT_MARGINAL 0x14 /* Transport marginal errors */
 #define DRIVER_OK       0x00	/* Driver status                           */
 
 /*
-- 
2.26.2


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4177 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 3/4] scsi_transport_fc: Added a new rport state FC_PORTSTATE_MARGINAL
  2020-11-05  6:09 [PATCH v6 0/4] scsi: Support to handle Intermittent errors Muneendra
  2020-11-05  6:09 ` [PATCH v6 1/4] scsi: Added a new definition in scsi_cmnd.h Muneendra
  2020-11-05  6:09 ` [PATCH v6 2/4] scsi: No retries on abort success Muneendra
@ 2020-11-05  6:09 ` Muneendra
  2020-11-05 15:59   ` Mike Christie
  2020-11-05  6:09 ` [PATCH v6 4/4] scsi_transport_fc: Added store fucntionality to set the rport port_state using sysfs Muneendra
  3 siblings, 1 reply; 12+ messages in thread
From: Muneendra @ 2020-11-05  6:09 UTC (permalink / raw)
  To: linux-scsi, michael.christie, hare; +Cc: jsmart2021, emilne, mkumar, Muneendra

[-- Attachment #1: Type: text/plain, Size: 7518 bytes --]

Added a new rport state FC_PORTSTATE_MARGINAL.

Added a new inline function fc_rport_chkmarginal_set_noretries
which will set the SCMD_NORETRIES_ABORT bit in cmd->state if rport state
is marginal.

Made changes in fc_eh_timed_out to call fc_rport_chkmarginal_set_noretries
Also made changes in fc_remote_port_delete,fc_user_scan_tgt,
fc_timeout_deleted_rport functions  to handle the new rport state
FC_PORTSTATE_MARGINAL.

Made changes in fc_block_scsi_eh to clear the SCMD_NORETRIES_ABORT bit
if the port_state is not  marginal

Signed-off-by: Muneendra <muneendra.kumar@broadcom.com>

---
v6:
No change

v5:
Made changes to clear the SCMD_NORETRIES_ABORT bit if the port_state
has changed from marginal to online due to port_delete and port_add
as we need the normal cmd retry behaviour

Made changes in fc_scsi_scan_rport as we are checking FC_PORTSTATE_ONLINE
instead of FC_PORTSTATE_ONLINE and FC_PORTSTATE_MARGINAL

v4:
Made changes in fc_eh_timed_out to call fc_rport_chkmarginal_set_noretries
so that SCMD_NORETRIES_ABORT bit in cmd->state is set if rport state
is marginal.

Removed the newly added scsi_cmd argument to fc_remote_port_chkready
as the current patch handles only SCSI EH timeout/abort case.

v3:
Rearranged the patch so that all the changes with respect to new
rport state is part of this patch.
Added a new argument to scsi_cmd  to fc_remote_port_chkready

v2:
New patch
---
 drivers/scsi/scsi_transport_fc.c | 52 +++++++++++++++++++++-----------
 include/scsi/scsi_transport_fc.h | 19 ++++++++++++
 2 files changed, 54 insertions(+), 17 deletions(-)

diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 2ff7f06203da..276826db0832 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -142,20 +142,23 @@ fc_enum_name_search(host_event_code, fc_host_event_code,
 static struct {
 	enum fc_port_state	value;
 	char			*name;
+	int			matchlen;
 } fc_port_state_names[] = {
-	{ FC_PORTSTATE_UNKNOWN,		"Unknown" },
-	{ FC_PORTSTATE_NOTPRESENT,	"Not Present" },
-	{ FC_PORTSTATE_ONLINE,		"Online" },
-	{ FC_PORTSTATE_OFFLINE,		"Offline" },
-	{ FC_PORTSTATE_BLOCKED,		"Blocked" },
-	{ FC_PORTSTATE_BYPASSED,	"Bypassed" },
-	{ FC_PORTSTATE_DIAGNOSTICS,	"Diagnostics" },
-	{ FC_PORTSTATE_LINKDOWN,	"Linkdown" },
-	{ FC_PORTSTATE_ERROR,		"Error" },
-	{ FC_PORTSTATE_LOOPBACK,	"Loopback" },
-	{ FC_PORTSTATE_DELETED,		"Deleted" },
+	{ FC_PORTSTATE_UNKNOWN,		"Unknown", 7},
+	{ FC_PORTSTATE_NOTPRESENT,	"Not Present", 11 },
+	{ FC_PORTSTATE_ONLINE,		"Online", 6 },
+	{ FC_PORTSTATE_OFFLINE,		"Offline", 7 },
+	{ FC_PORTSTATE_BLOCKED,		"Blocked", 7 },
+	{ FC_PORTSTATE_BYPASSED,	"Bypassed", 8 },
+	{ FC_PORTSTATE_DIAGNOSTICS,	"Diagnostics", 11 },
+	{ FC_PORTSTATE_LINKDOWN,	"Linkdown", 8 },
+	{ FC_PORTSTATE_ERROR,		"Error", 5 },
+	{ FC_PORTSTATE_LOOPBACK,	"Loopback", 8 },
+	{ FC_PORTSTATE_DELETED,		"Deleted", 7 },
+	{ FC_PORTSTATE_MARGINAL,	"Marginal", 8 },
 };
 fc_enum_name_search(port_state, fc_port_state, fc_port_state_names)
+fc_enum_name_match(port_state, fc_port_state, fc_port_state_names)
 #define FC_PORTSTATE_MAX_NAMELEN	20
 
 
@@ -2074,6 +2077,7 @@ fc_eh_timed_out(struct scsi_cmnd *scmd)
 	if (rport->port_state == FC_PORTSTATE_BLOCKED)
 		return BLK_EH_RESET_TIMER;
 
+	fc_rport_chkmarginal_set_noretries(rport, scmd);
 	return BLK_EH_DONE;
 }
 EXPORT_SYMBOL(fc_eh_timed_out);
@@ -2095,7 +2099,8 @@ fc_user_scan_tgt(struct Scsi_Host *shost, uint channel, uint id, u64 lun)
 		if (rport->scsi_target_id == -1)
 			continue;
 
-		if (rport->port_state != FC_PORTSTATE_ONLINE)
+		if ((rport->port_state != FC_PORTSTATE_ONLINE) &&
+			(rport->port_state != FC_PORTSTATE_MARGINAL))
 			continue;
 
 		if ((channel == rport->channel) &&
@@ -2958,7 +2963,8 @@ fc_remote_port_delete(struct fc_rport  *rport)
 
 	spin_lock_irqsave(shost->host_lock, flags);
 
-	if (rport->port_state != FC_PORTSTATE_ONLINE) {
+	if ((rport->port_state != FC_PORTSTATE_ONLINE) &&
+		(rport->port_state != FC_PORTSTATE_MARGINAL)) {
 		spin_unlock_irqrestore(shost->host_lock, flags);
 		return;
 	}
@@ -3100,7 +3106,8 @@ fc_timeout_deleted_rport(struct work_struct *work)
 	 * target, validate it still is. If not, tear down the
 	 * scsi_target on it.
 	 */
-	if ((rport->port_state == FC_PORTSTATE_ONLINE) &&
+	if (((rport->port_state == FC_PORTSTATE_ONLINE) ||
+		(rport->port_state == FC_PORTSTATE_MARGINAL)) &&
 	    (rport->scsi_target_id != -1) &&
 	    !(rport->roles & FC_PORT_ROLE_FCP_TARGET)) {
 		dev_printk(KERN_ERR, &rport->dev,
@@ -3243,7 +3250,8 @@ fc_scsi_scan_rport(struct work_struct *work)
 	struct fc_internal *i = to_fc_internal(shost->transportt);
 	unsigned long flags;
 
-	if ((rport->port_state == FC_PORTSTATE_ONLINE) &&
+	if (((rport->port_state == FC_PORTSTATE_ONLINE) ||
+		(rport->port_state == FC_PORTSTATE_MARGINAL)) &&
 	    (rport->roles & FC_PORT_ROLE_FCP_TARGET) &&
 	    !(i->f->disable_target_scan)) {
 		scsi_scan_target(&rport->dev, rport->channel,
@@ -3308,11 +3316,20 @@ EXPORT_SYMBOL(fc_block_rport);
 int fc_block_scsi_eh(struct scsi_cmnd *cmnd)
 {
 	struct fc_rport *rport = starget_to_rport(scsi_target(cmnd->device));
+	int ret = 0;
 
 	if (WARN_ON_ONCE(!rport))
 		return FAST_IO_FAIL;
 
-	return fc_block_rport(rport);
+	ret = fc_block_rport(rport);
+	/*
+	 * Clear the SCMD_NORETRIES_ABORT bit if the Port state has
+	 * changed from marginal to online due to
+	 * fc_remote_port_delete and fc_remote_port_add
+	 */
+	if (rport->port_state != FC_PORTSTATE_MARGINAL)
+		clear_bit(SCMD_NORETRIES_ABORT, &cmnd->state);
+	return ret;
 }
 EXPORT_SYMBOL(fc_block_scsi_eh);
 
@@ -3747,7 +3764,8 @@ static blk_status_t fc_bsg_rport_prep(struct fc_rport *rport)
 	    !(rport->flags & FC_RPORT_FAST_FAIL_TIMEDOUT))
 		return BLK_STS_RESOURCE;
 
-	if (rport->port_state != FC_PORTSTATE_ONLINE)
+	if ((rport->port_state != FC_PORTSTATE_ONLINE) &&
+		(rport->port_state != FC_PORTSTATE_MARGINAL))
 		return BLK_STS_IOERR;
 
 	return BLK_STS_OK;
diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h
index 1c7dd35cb7a0..829bade13b89 100644
--- a/include/scsi/scsi_transport_fc.h
+++ b/include/scsi/scsi_transport_fc.h
@@ -14,6 +14,7 @@
 #include <linux/bsg-lib.h>
 #include <asm/unaligned.h>
 #include <scsi/scsi.h>
+#include <scsi/scsi_cmnd.h>
 #include <scsi/scsi_netlink.h>
 #include <scsi/scsi_host.h>
 
@@ -67,6 +68,7 @@ enum fc_port_state {
 	FC_PORTSTATE_ERROR,
 	FC_PORTSTATE_LOOPBACK,
 	FC_PORTSTATE_DELETED,
+	FC_PORTSTATE_MARGINAL,
 };
 
 
@@ -707,6 +709,22 @@ struct fc_function_template {
 	unsigned long	disable_target_scan:1;
 };
 
+/**
+ * fc_rport_chkmarginal_set_noretries - Set the SCMD_NORETRIES_ABORT bit
+ * in cmd->state if port state is marginal
+ * @rport:	remote port to be checked
+ * @scmd:	scsi_cmd to set/clear the SCMD_NORETRIES_ABORT bit on Marginal state
+ **/
+static inline void
+fc_rport_chkmarginal_set_noretries(struct fc_rport *rport, struct scsi_cmnd *cmd)
+{
+	if ((rport->port_state == FC_PORTSTATE_MARGINAL) &&
+		 (cmd->request->cmd_flags & REQ_FAILFAST_TRANSPORT))
+		set_bit(SCMD_NORETRIES_ABORT, &cmd->state);
+	else
+		clear_bit(SCMD_NORETRIES_ABORT, &cmd->state);
+
+}
 
 /**
  * fc_remote_port_chkready - called to validate the remote port state
@@ -723,6 +741,7 @@ fc_remote_port_chkready(struct fc_rport *rport)
 
 	switch (rport->port_state) {
 	case FC_PORTSTATE_ONLINE:
+	case FC_PORTSTATE_MARGINAL:
 		if (rport->roles & FC_PORT_ROLE_FCP_TARGET)
 			result = 0;
 		else if (rport->flags & FC_RPORT_DEVLOSS_PENDING)
-- 
2.26.2


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4177 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v6 4/4] scsi_transport_fc: Added store fucntionality to set the rport port_state using sysfs
  2020-11-05  6:09 [PATCH v6 0/4] scsi: Support to handle Intermittent errors Muneendra
                   ` (2 preceding siblings ...)
  2020-11-05  6:09 ` [PATCH v6 3/4] scsi_transport_fc: Added a new rport state FC_PORTSTATE_MARGINAL Muneendra
@ 2020-11-05  6:09 ` Muneendra
  3 siblings, 0 replies; 12+ messages in thread
From: Muneendra @ 2020-11-05  6:09 UTC (permalink / raw)
  To: linux-scsi, michael.christie, hare; +Cc: jsmart2021, emilne, mkumar, Muneendra

[-- Attachment #1: Type: text/plain, Size: 3864 bytes --]

Added a store functionality to set rport port_state using sysfs
under  fc_remote_ports/rport-*/port_state

With this functionality the user can move the port_state from
Marginal -> Online and Online->Marginal.

On Marginal :This interface will set SCMD_NORETRIES_ABORT bit in
scmd->state for all the pending io's on the scsi device associated
with target port.

On Online :This interface will clear SCMD_NORETRIES_ABORT bit in
scmd->state for all the pending io's on the scsi device associated
with target port.

Below is the interface provided to set the port state to Marginal
and Online.

echo "Marginal" >> /sys/class/fc_remote_ports/rport-X\:Y-Z/port_state
echo "Online" >> /sys/class/fc_remote_ports/rport-X\:Y-Z/port_state

Signed-off-by: Muneendra <muneendra.kumar@broadcom.com>

---
v6:
No change

v5:
No change

v4:
Addressed the error reported by kernel test robot
Removed the code needed to traverse all the devices under rport
to set/clear SCMD_NORETRIES_ABORT
Removed unncessary comments.
Return the error values on failure while setting the port_state

v3:
Removed the port_state from starget attributes.
Enabled the store functionality for port_state under remote port.
used the starget_for_each_device to traverse around all the devices
under rport

v2:
Changed from a noretries_abort attribute under fc_transport/target*/ to
port_state for changing the port_state to a marginal state
---
 drivers/scsi/scsi_transport_fc.c | 56 ++++++++++++++++++++++++++++++--
 1 file changed, 54 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 276826db0832..764bff609c5d 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -943,7 +943,59 @@ show_fc_rport_roles (struct device *dev, struct device_attribute *attr,
 static FC_DEVICE_ATTR(rport, roles, S_IRUGO,
 		show_fc_rport_roles, NULL);
 
-fc_private_rport_rd_enum_attr(port_state, FC_PORTSTATE_MAX_NAMELEN);
+static ssize_t fc_rport_set_marginal_state(struct device *dev,
+						struct device_attribute *attr,
+						const char *buf, size_t count)
+{
+	struct fc_rport *rport = transport_class_to_rport(dev);
+	enum fc_port_state port_state;
+	int ret = 0;
+
+	ret = get_fc_port_state_match(buf, &port_state);
+	if (ret)
+		return -EINVAL;
+	if (port_state == FC_PORTSTATE_MARGINAL) {
+		/*
+		 * Change the state to marginal only if the
+		 * current rport state is Online
+		 * Allow only Online->marginal
+		 */
+		if (rport->port_state == FC_PORTSTATE_ONLINE)
+			rport->port_state = port_state;
+		else
+			return -EINVAL;
+	} else if (port_state == FC_PORTSTATE_ONLINE) {
+		/*
+		 * Change the state to Online only if the
+		 * current rport state is Marginal
+		 * Allow only  MArginal->Online
+		 */
+		if (rport->port_state == FC_PORTSTATE_MARGINAL)
+			rport->port_state = port_state;
+		else
+			return -EINVAL;
+	} else
+		return -EINVAL;
+	return count;
+}
+
+static ssize_t
+show_fc_rport_port_state(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	const char *name;
+	struct fc_rport *rport = transport_class_to_rport(dev);
+
+	name = get_fc_port_state_name(rport->port_state);
+	if (!name)
+		return -EINVAL;
+
+	return snprintf(buf, 20, "%s\n", name);
+}
+
+static FC_DEVICE_ATTR(rport, port_state, 0444 | 0200,
+			show_fc_rport_port_state, fc_rport_set_marginal_state);
+
 fc_private_rport_rd_attr(scsi_target_id, "%d\n", 20);
 
 /*
@@ -2267,7 +2319,7 @@ fc_attach_transport(struct fc_function_template *ft)
 	SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(port_name);
 	SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(port_id);
 	SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(roles);
-	SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(port_state);
+	SETUP_PRIVATE_RPORT_ATTRIBUTE_RW(port_state);
 	SETUP_PRIVATE_RPORT_ATTRIBUTE_RD(scsi_target_id);
 	SETUP_PRIVATE_RPORT_ATTRIBUTE_RW(fast_io_fail_tmo);
 
-- 
2.26.2


[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4177 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 3/4] scsi_transport_fc: Added a new rport state FC_PORTSTATE_MARGINAL
  2020-11-05  6:09 ` [PATCH v6 3/4] scsi_transport_fc: Added a new rport state FC_PORTSTATE_MARGINAL Muneendra
@ 2020-11-05 15:59   ` Mike Christie
  2020-11-05 17:27     ` Muneendra Kumar M
  2020-11-05 20:01     ` Muneendra Kumar M
  0 siblings, 2 replies; 12+ messages in thread
From: Mike Christie @ 2020-11-05 15:59 UTC (permalink / raw)
  To: Muneendra, linux-scsi, hare; +Cc: jsmart2021, emilne, mkumar

On 11/5/20 12:09 AM, Muneendra wrote:
>   int fc_block_scsi_eh(struct scsi_cmnd *cmnd)
>   {
>   	struct fc_rport *rport = starget_to_rport(scsi_target(cmnd->device));
> +	int ret = 0;
>   
>   	if (WARN_ON_ONCE(!rport))
>   		return FAST_IO_FAIL;
>   
> -	return fc_block_rport(rport);
> +	ret = fc_block_rport(rport);
> +	/*
> +	 * Clear the SCMD_NORETRIES_ABORT bit if the Port state has
> +	 * changed from marginal to online due to
> +	 * fc_remote_port_delete and fc_remote_port_add
> +	 */
> +	if (rport->port_state != FC_PORTSTATE_MARGINAL)
> +		clear_bit(SCMD_NORETRIES_ABORT, &cmnd->state);
> +	return ret;
>   }


Hey sorry for the late reply. I was trying to test some things out but 
am not sure if all drivers work the same.

For the code above, what will happen if we have passed that check in the 
driver, then the driver does the report del and add sequence? Let's say 
it's initially calling the abort callout, and we passed that check, we 
then do the del/add seqeuence, what will happen next? Do the fc drivers 
return success or failure for the abort call. What happens for the other 
callouts too?

If failure, then the eh escalates and when we call the next callout, and 
we hit the check above and will clear it, so we are ok.

If success then we would not get a chance to clear it right? If this is 
the case, then I think you need to instead go the route where you add 
the eh cmd completion/decide_disposition callout. You would call it in 
scmd_eh_abort_handler, scsi_eh_bus_device_reset, etc when we are 
deciding if we want to retry/fail the command. In this approach you do 
not need the eh_timed_out changes, since we only seem to care about the 
port state after the eh callout has completed.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v6 3/4] scsi_transport_fc: Added a new rport state FC_PORTSTATE_MARGINAL
  2020-11-05 15:59   ` Mike Christie
@ 2020-11-05 17:27     ` Muneendra Kumar M
  2020-11-05 19:15       ` Mike Christie
  2020-11-05 20:01     ` Muneendra Kumar M
  1 sibling, 1 reply; 12+ messages in thread
From: Muneendra Kumar M @ 2020-11-05 17:27 UTC (permalink / raw)
  To: Mike Christie, linux-scsi, hare; +Cc: jsmart2021, emilne, mkumar

[-- Attachment #1: Type: text/plain, Size: 1456 bytes --]

Hi Mike,
Thanks for the input.
Below are my replies.


>Hey sorry for the late reply. I was trying to test some things out but am
>not sure if all drivers work the same.

>For the code above, what will happen if we have passed that check in the
>driver, then the driver does the report del and add sequence? Let's say
>it's initially calling the abort callout, and we passed that check, we then
>do the >del/add seqeuence, what will happen next? Do the fc drivers return
>success or failure for the abort call. What happens for the other callouts
>too?

>If failure, then the eh escalates and when we call the next callout, and we
>hit the check above and will clear it, so we are ok.

If success then we would not get a chance to clear it right?
[Muneendra]Agreed. So what about clearing the flags in fc_remote_port_del. I
think this should address all the concerns?

> If this is the case, then I think you need to instead go the route where
> you add the eh cmd completion/decide_disposition callout. You would call
> it in scmd_eh_abort_handler, scsi_eh_bus_device_reset, etc when we are
> deciding if we want to retry/fail the command.
[Muneendra]Sorry I didn't get what you are saying could you please elaborate
on the same.

In this approach you do not need the eh_timed_out changes, since we only
seem to care about the port state after the eh callout has completed.
[Muneendra]what about setting the SCMD_NORETRIES_ABORT bit?

Regards,
Muneendra.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4177 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 3/4] scsi_transport_fc: Added a new rport state FC_PORTSTATE_MARGINAL
  2020-11-05 17:27     ` Muneendra Kumar M
@ 2020-11-05 19:15       ` Mike Christie
  2020-11-05 20:02         ` Muneendra Kumar M
  2020-11-06  7:23         ` Hannes Reinecke
  0 siblings, 2 replies; 12+ messages in thread
From: Mike Christie @ 2020-11-05 19:15 UTC (permalink / raw)
  To: Muneendra Kumar M, linux-scsi, hare; +Cc: jsmart2021, emilne, mkumar

On 11/5/20 11:27 AM, Muneendra Kumar M wrote:
> Hi Mike,
> Thanks for the input.
> Below are my replies.
> 
> 
>> Hey sorry for the late reply. I was trying to test some things out but am
>> not sure if all drivers work the same.
> 
>> For the code above, what will happen if we have passed that check in the
>> driver, then the driver does the report del and add sequence? Let's say
>> it's initially calling the abort callout, and we passed that check, we then
>> do the >del/add seqeuence, what will happen next? Do the fc drivers return
>> success or failure for the abort call. What happens for the other callouts
>> too?
> 
>> If failure, then the eh escalates and when we call the next callout, and we
>> hit the check above and will clear it, so we are ok.
> 
> If success then we would not get a chance to clear it right?
> [Muneendra]Agreed. So what about clearing the flags in fc_remote_port_del. I
> think this should address all the concerns?
> 
>> If this is the case, then I think you need to instead go the route where
>> you add the eh cmd completion/decide_disposition callout. You would call
>> it in scmd_eh_abort_handler, scsi_eh_bus_device_reset, etc when we are
>> deciding if we want to retry/fail the command.
> [Muneendra]Sorry I didn't get what you are saying could you please elaborate
> on the same.
> 
> In this approach you do not need the eh_timed_out changes, since we only
> seem to care about the port state after the eh callout has completed.
> [Muneendra]what about setting the SCMD_NORETRIES_ABORT bit?
> 

I don't think you need it. It sounds like we only care about the port state
when the cmd is completing. For example we have:

1. the case where the cmd times out, we do aborts/resets, then the
port state goes into marginal, then the aborts/resets complete. We want to
fail the cmds without retries.

2. If the port state is in marginal, the cmd times out, we do the aborts/resets
and when we are done if the port state is still marginal we want to fail the
cmd without retries.

3. If the port state is marginal (or any value), before or after the cmd
initially times out, but the port state goes back to online, then when the
aborts/resets complete we want to retry the cmd.

So can we just add a callout to check the port state when the eh has completed
like the untested unfinished patch below:


diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index 983eeb0..8ad3a9a 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -6041,6 +6041,7 @@ struct scsi_host_template lpfc_template = {
 	.info			= lpfc_info,
 	.queuecommand		= lpfc_queuecommand,
 	.eh_timed_out		= fc_eh_timed_out,
+	.eh_timed_out		= fc_eh_should_retry_cmd,
 	.eh_abort_handler	= lpfc_abort_handler,
 	.eh_device_reset_handler = lpfc_device_reset_handler,
 	.eh_target_reset_handler = lpfc_target_reset_handler,
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index f11f51e..7c66d17 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -140,6 +140,7 @@ static bool scsi_cmd_retry_allowed(struct scsi_cmnd *cmd)
 	struct scsi_cmnd *scmd =
 		container_of(work, struct scsi_cmnd, abort_work.work);
 	struct scsi_device *sdev = scmd->device;
+	struct Scsi_Host *host = sdev->host;
 	int rtn;
 
 	if (scsi_host_eh_past_deadline(sdev->host)) {
@@ -159,7 +160,8 @@ static bool scsi_cmd_retry_allowed(struct scsi_cmnd *cmd)
 						    "eh timeout, not retrying "
 						    "aborted command\n"));
 			} else if (!scsi_noretry_cmd(scmd) &&
-				   scsi_cmd_retry_allowed(scmd)) {
+				   scsi_cmd_retry_allowed(scmd) &&
+				   host->hostt->eh_should_retry_cmd(scmd)) {
 				SCSI_LOG_ERROR_RECOVERY(3,
 					scmd_printk(KERN_WARNING, scmd,
 						    "retry aborted command\n"));
@@ -2105,7 +2107,8 @@ void scsi_eh_flush_done_q(struct list_head *done_q)
 	list_for_each_entry_safe(scmd, next, done_q, eh_entry) {
 		list_del_init(&scmd->eh_entry);
 		if (scsi_device_online(scmd->device) &&
-		    !scsi_noretry_cmd(scmd) && scsi_cmd_retry_allowed(scmd)) {
+		    !scsi_noretry_cmd(scmd) && scsi_cmd_retry_allowed(scmd) &&
+		    host->hostt->eh_should_retry_cmd(scmd)) {
 			SCSI_LOG_ERROR_RECOVERY(3,
 				scmd_printk(KERN_INFO, scmd,
 					     "%s: flush retry cmd\n",
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index 2ff7f06..7011963 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -2043,6 +2043,18 @@ static int fc_vport_match(struct attribute_container *cont,
 	return &i->vport_attr_cont.ac == cont;
 }
 
+bool fc_eh_should_retry_cmd(struct scsi_cmnd *scmd)
+{
+	struct fc_rport *rport = starget_to_rport(scsi_target(scmd->device));
+
+	if (rport->port_state == FC_PORTSTATE_MARGINAL)
+		return false;
+
+	/* Other port states will set the sdev state */
+	/* TODO check comment above */ 
+	return true;
+}
+EXPORT_SYMBOL_GPL(fc_eh_should_retry_cmd);
 
 /**
  * fc_eh_timed_out - FC Transport I/O timeout intercept handler
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 701f178..51d5af0 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -315,6 +315,13 @@ struct scsi_host_template {
 	 */
 	enum blk_eh_timer_return (*eh_timed_out)(struct scsi_cmnd *);
 
+	/*
+	 * Optional routine that allows the transport to decide if a cmd is
+	 * retryable. Return true if the transport is in a state the cmd
+	 * should be retried on.
+	 */
+	bool (*eh_should_retry_cmd)(struct scsi_cmnd *);
+
 	/* This is an optional routine that allows transport to initiate
 	 * LLD adapter or firmware reset using sysfs attribute.
 	 *
diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h
index 1c7dd35..f21b583 100644
--- a/include/scsi/scsi_transport_fc.h
+++ b/include/scsi/scsi_transport_fc.h
@@ -803,6 +803,7 @@ struct fc_vport *fc_vport_create(struct Scsi_Host *shost, int channel,
 int fc_block_rport(struct fc_rport *rport);
 int fc_block_scsi_eh(struct scsi_cmnd *cmnd);
 enum blk_eh_timer_return fc_eh_timed_out(struct scsi_cmnd *scmd);
+bool fc_eh_should_retry_cmd(struct scsi_cmnd *scmd);
 
 static inline struct Scsi_Host *fc_bsg_to_shost(struct bsg_job *job)
 {



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* RE: [PATCH v6 3/4] scsi_transport_fc: Added a new rport state FC_PORTSTATE_MARGINAL
  2020-11-05 15:59   ` Mike Christie
  2020-11-05 17:27     ` Muneendra Kumar M
@ 2020-11-05 20:01     ` Muneendra Kumar M
  1 sibling, 0 replies; 12+ messages in thread
From: Muneendra Kumar M @ 2020-11-05 20:01 UTC (permalink / raw)
  To: Mike Christie, linux-scsi, hare; +Cc: jsmart2021, emilne, mkumar

[-- Attachment #1: Type: text/plain, Size: 2202 bytes --]

Hi Mike,
Thanks for the detailed input.
I will apply the below changes and will resubmit the patches.

Regards,
Muneendra.

-----Original Message-----
From: Mike Christie [mailto:michael.christie@oracle.com]
Sent: Thursday, November 5, 2020 9:29 PM
To: Muneendra <muneendra.kumar@broadcom.com>; linux-scsi@vger.kernel.org;
hare@suse.de
Cc: jsmart2021@gmail.com; emilne@redhat.com; mkumar@redhat.com
Subject: Re: [PATCH v6 3/4] scsi_transport_fc: Added a new rport state
FC_PORTSTATE_MARGINAL

On 11/5/20 12:09 AM, Muneendra wrote:
>   int fc_block_scsi_eh(struct scsi_cmnd *cmnd)
>   {
>   	struct fc_rport *rport =
> starget_to_rport(scsi_target(cmnd->device));
> +	int ret = 0;
>
>   	if (WARN_ON_ONCE(!rport))
>   		return FAST_IO_FAIL;
>
> -	return fc_block_rport(rport);
> +	ret = fc_block_rport(rport);
> +	/*
> +	 * Clear the SCMD_NORETRIES_ABORT bit if the Port state has
> +	 * changed from marginal to online due to
> +	 * fc_remote_port_delete and fc_remote_port_add
> +	 */
> +	if (rport->port_state != FC_PORTSTATE_MARGINAL)
> +		clear_bit(SCMD_NORETRIES_ABORT, &cmnd->state);
> +	return ret;
>   }


Hey sorry for the late reply. I was trying to test some things out but am
not sure if all drivers work the same.

For the code above, what will happen if we have passed that check in the
driver, then the driver does the report del and add sequence? Let's say it's
initially calling the abort callout, and we passed that check, we then do
the del/add seqeuence, what will happen next? Do the fc drivers return
success or failure for the abort call. What happens for the other callouts
too?

If failure, then the eh escalates and when we call the next callout, and we
hit the check above and will clear it, so we are ok.

If success then we would not get a chance to clear it right? If this is the
case, then I think you need to instead go the route where you add the eh cmd
completion/decide_disposition callout. You would call it in
scmd_eh_abort_handler, scsi_eh_bus_device_reset, etc when we are deciding if
we want to retry/fail the command. In this approach you do not need the
eh_timed_out changes, since we only seem to care about the port state after
the eh callout has completed.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4177 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v6 3/4] scsi_transport_fc: Added a new rport state FC_PORTSTATE_MARGINAL
  2020-11-05 19:15       ` Mike Christie
@ 2020-11-05 20:02         ` Muneendra Kumar M
  2020-11-06  7:23         ` Hannes Reinecke
  1 sibling, 0 replies; 12+ messages in thread
From: Muneendra Kumar M @ 2020-11-05 20:02 UTC (permalink / raw)
  To: Mike Christie, linux-scsi, hare; +Cc: jsmart2021, emilne, mkumar

[-- Attachment #1: Type: text/plain, Size: 6699 bytes --]

Hi Mike,
Thanks for the detailed input.
I will apply the below changes and will resubmit the patches.

Regards,
Muneendra.

-----Original Message-----
From: Mike Christie [mailto:michael.christie@oracle.com]
Sent: Friday, November 6, 2020 12:46 AM
To: Muneendra Kumar M <muneendra.kumar@broadcom.com>;
linux-scsi@vger.kernel.org; hare@suse.de
Cc: jsmart2021@gmail.com; emilne@redhat.com; mkumar@redhat.com
Subject: Re: [PATCH v6 3/4] scsi_transport_fc: Added a new rport state
FC_PORTSTATE_MARGINAL

On 11/5/20 11:27 AM, Muneendra Kumar M wrote:
> Hi Mike,
> Thanks for the input.
> Below are my replies.
>
>
>> Hey sorry for the late reply. I was trying to test some things out
>> but am not sure if all drivers work the same.
>
>> For the code above, what will happen if we have passed that check in
>> the driver, then the driver does the report del and add sequence?
>> Let's say it's initially calling the abort callout, and we passed
>> that check, we then do the >del/add seqeuence, what will happen next?
>> Do the fc drivers return success or failure for the abort call. What
>> happens for the other callouts too?
>
>> If failure, then the eh escalates and when we call the next callout,
>> and we hit the check above and will clear it, so we are ok.
>
> If success then we would not get a chance to clear it right?
> [Muneendra]Agreed. So what about clearing the flags in
> fc_remote_port_del. I think this should address all the concerns?
>
>> If this is the case, then I think you need to instead go the route
>> where you add the eh cmd completion/decide_disposition callout. You
>> would call it in scmd_eh_abort_handler, scsi_eh_bus_device_reset, etc
>> when we are deciding if we want to retry/fail the command.
> [Muneendra]Sorry I didn't get what you are saying could you please
> elaborate on the same.
>
> In this approach you do not need the eh_timed_out changes, since we
> only seem to care about the port state after the eh callout has completed.
> [Muneendra]what about setting the SCMD_NORETRIES_ABORT bit?
>

I don't think you need it. It sounds like we only care about the port state
when the cmd is completing. For example we have:

1. the case where the cmd times out, we do aborts/resets, then the port
state goes into marginal, then the aborts/resets complete. We want to fail
the cmds without retries.

2. If the port state is in marginal, the cmd times out, we do the
aborts/resets and when we are done if the port state is still marginal we
want to fail the cmd without retries.

3. If the port state is marginal (or any value), before or after the cmd
initially times out, but the port state goes back to online, then when the
aborts/resets complete we want to retry the cmd.

So can we just add a callout to check the port state when the eh has
completed like the untested unfinished patch below:


diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index 983eeb0..8ad3a9a 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -6041,6 +6041,7 @@ struct scsi_host_template lpfc_template = {
 	.info			= lpfc_info,
 	.queuecommand		= lpfc_queuecommand,
 	.eh_timed_out		= fc_eh_timed_out,
+	.eh_timed_out		= fc_eh_should_retry_cmd,
 	.eh_abort_handler	= lpfc_abort_handler,
 	.eh_device_reset_handler = lpfc_device_reset_handler,
 	.eh_target_reset_handler = lpfc_target_reset_handler, diff --git
a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index
f11f51e..7c66d17 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -140,6 +140,7 @@ static bool scsi_cmd_retry_allowed(struct scsi_cmnd
*cmd)
 	struct scsi_cmnd *scmd =
 		container_of(work, struct scsi_cmnd, abort_work.work);
 	struct scsi_device *sdev = scmd->device;
+	struct Scsi_Host *host = sdev->host;
 	int rtn;

 	if (scsi_host_eh_past_deadline(sdev->host)) { @@ -159,7 +160,8 @@ static
bool scsi_cmd_retry_allowed(struct scsi_cmnd *cmd)
 						    "eh timeout, not retrying "
 						    "aborted command\n"));
 			} else if (!scsi_noretry_cmd(scmd) &&
-				   scsi_cmd_retry_allowed(scmd)) {
+				   scsi_cmd_retry_allowed(scmd) &&
+				   host->hostt->eh_should_retry_cmd(scmd)) {
 				SCSI_LOG_ERROR_RECOVERY(3,
 					scmd_printk(KERN_WARNING, scmd,
 						    "retry aborted command\n"));
@@ -2105,7 +2107,8 @@ void scsi_eh_flush_done_q(struct list_head *done_q)
 	list_for_each_entry_safe(scmd, next, done_q, eh_entry) {
 		list_del_init(&scmd->eh_entry);
 		if (scsi_device_online(scmd->device) &&
-		    !scsi_noretry_cmd(scmd) && scsi_cmd_retry_allowed(scmd)) {
+		    !scsi_noretry_cmd(scmd) && scsi_cmd_retry_allowed(scmd) &&
+		    host->hostt->eh_should_retry_cmd(scmd)) {
 			SCSI_LOG_ERROR_RECOVERY(3,
 				scmd_printk(KERN_INFO, scmd,
 					     "%s: flush retry cmd\n",
diff --git a/drivers/scsi/scsi_transport_fc.c
b/drivers/scsi/scsi_transport_fc.c
index 2ff7f06..7011963 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -2043,6 +2043,18 @@ static int fc_vport_match(struct attribute_container
*cont,
 	return &i->vport_attr_cont.ac == cont;  }

+bool fc_eh_should_retry_cmd(struct scsi_cmnd *scmd) {
+	struct fc_rport *rport = starget_to_rport(scsi_target(scmd->device));
+
+	if (rport->port_state == FC_PORTSTATE_MARGINAL)
+		return false;
+
+	/* Other port states will set the sdev state */
+	/* TODO check comment above */
+	return true;
+}
+EXPORT_SYMBOL_GPL(fc_eh_should_retry_cmd);

 /**
  * fc_eh_timed_out - FC Transport I/O timeout intercept handler diff --git
a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h index 701f178..51d5af0
100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -315,6 +315,13 @@ struct scsi_host_template {
 	 */
 	enum blk_eh_timer_return (*eh_timed_out)(struct scsi_cmnd *);

+	/*
+	 * Optional routine that allows the transport to decide if a cmd is
+	 * retryable. Return true if the transport is in a state the cmd
+	 * should be retried on.
+	 */
+	bool (*eh_should_retry_cmd)(struct scsi_cmnd *);
+
 	/* This is an optional routine that allows transport to initiate
 	 * LLD adapter or firmware reset using sysfs attribute.
 	 *
diff --git a/include/scsi/scsi_transport_fc.h
b/include/scsi/scsi_transport_fc.h
index 1c7dd35..f21b583 100644
--- a/include/scsi/scsi_transport_fc.h
+++ b/include/scsi/scsi_transport_fc.h
@@ -803,6 +803,7 @@ struct fc_vport *fc_vport_create(struct Scsi_Host
*shost, int channel,  int fc_block_rport(struct fc_rport *rport);  int
fc_block_scsi_eh(struct scsi_cmnd *cmnd);  enum blk_eh_timer_return
fc_eh_timed_out(struct scsi_cmnd *scmd);
+bool fc_eh_should_retry_cmd(struct scsi_cmnd *scmd);

 static inline struct Scsi_Host *fc_bsg_to_shost(struct bsg_job *job)  {

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4177 bytes --]

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 3/4] scsi_transport_fc: Added a new rport state FC_PORTSTATE_MARGINAL
  2020-11-05 19:15       ` Mike Christie
  2020-11-05 20:02         ` Muneendra Kumar M
@ 2020-11-06  7:23         ` Hannes Reinecke
  2020-11-07  1:18           ` Mike Christie
  1 sibling, 1 reply; 12+ messages in thread
From: Hannes Reinecke @ 2020-11-06  7:23 UTC (permalink / raw)
  To: Mike Christie, Muneendra Kumar M, linux-scsi; +Cc: jsmart2021, emilne, mkumar

On 11/5/20 8:15 PM, Mike Christie wrote:
> On 11/5/20 11:27 AM, Muneendra Kumar M wrote:
>> Hi Mike,
>> Thanks for the input.
>> Below are my replies.
>>
>>
>>> Hey sorry for the late reply. I was trying to test some things out but am
>>> not sure if all drivers work the same.
>>
>>> For the code above, what will happen if we have passed that check in the
>>> driver, then the driver does the report del and add sequence? Let's say
>>> it's initially calling the abort callout, and we passed that check, we then
>>> do the >del/add seqeuence, what will happen next? Do the fc drivers return
>>> success or failure for the abort call. What happens for the other callouts
>>> too?
>>
>>> If failure, then the eh escalates and when we call the next callout, and we
>>> hit the check above and will clear it, so we are ok.
>>
>> If success then we would not get a chance to clear it right?
>> [Muneendra]Agreed. So what about clearing the flags in fc_remote_port_del. I
>> think this should address all the concerns?
>>
>>> If this is the case, then I think you need to instead go the route where
>>> you add the eh cmd completion/decide_disposition callout. You would call
>>> it in scmd_eh_abort_handler, scsi_eh_bus_device_reset, etc when we are
>>> deciding if we want to retry/fail the command.
>> [Muneendra]Sorry I didn't get what you are saying could you please elaborate
>> on the same.
>>
>> In this approach you do not need the eh_timed_out changes, since we only
>> seem to care about the port state after the eh callout has completed.
>> [Muneendra]what about setting the SCMD_NORETRIES_ABORT bit?
>>
> 
> I don't think you need it. It sounds like we only care about the port state
> when the cmd is completing. For example we have:
> 
> 1. the case where the cmd times out, we do aborts/resets, then the
> port state goes into marginal, then the aborts/resets complete. We want to
> fail the cmds without retries.
> 
> 2. If the port state is in marginal, the cmd times out, we do the aborts/resets
> and when we are done if the port state is still marginal we want to fail the
> cmd without retries.
> 
> 3. If the port state is marginal (or any value), before or after the cmd
> initially times out, but the port state goes back to online, then when the
> aborts/resets complete we want to retry the cmd.
> 
Actually, I don't think the third case can (or should) happen.
A transition from marginal to online should always include a link bounce 
(ie a rport_del/rport_add sequence), which would cancel all outstanding 
commands anyway.
And if we have the above provision we could clear the flag in 
rport_del() and everything would be dandy.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 3/4] scsi_transport_fc: Added a new rport state FC_PORTSTATE_MARGINAL
  2020-11-06  7:23         ` Hannes Reinecke
@ 2020-11-07  1:18           ` Mike Christie
  0 siblings, 0 replies; 12+ messages in thread
From: Mike Christie @ 2020-11-07  1:18 UTC (permalink / raw)
  To: Hannes Reinecke, Muneendra Kumar M, linux-scsi; +Cc: jsmart2021, emilne, mkumar

On 11/6/20 1:23 AM, Hannes Reinecke wrote:
> On 11/5/20 8:15 PM, Mike Christie wrote:
>> On 11/5/20 11:27 AM, Muneendra Kumar M wrote:
>>> Hi Mike,
>>> Thanks for the input.
>>> Below are my replies.
>>>
>>>
>>>> Hey sorry for the late reply. I was trying to test some things out but am
>>>> not sure if all drivers work the same.
>>>
>>>> For the code above, what will happen if we have passed that check in the
>>>> driver, then the driver does the report del and add sequence? Let's say
>>>> it's initially calling the abort callout, and we passed that check, we then
>>>> do the >del/add seqeuence, what will happen next? Do the fc drivers return
>>>> success or failure for the abort call. What happens for the other callouts
>>>> too?
>>>
>>>> If failure, then the eh escalates and when we call the next callout, and we
>>>> hit the check above and will clear it, so we are ok.
>>>
>>> If success then we would not get a chance to clear it right?
>>> [Muneendra]Agreed. So what about clearing the flags in fc_remote_port_del. I
>>> think this should address all the concerns?
>>>
>>>> If this is the case, then I think you need to instead go the route where
>>>> you add the eh cmd completion/decide_disposition callout. You would call
>>>> it in scmd_eh_abort_handler, scsi_eh_bus_device_reset, etc when we are
>>>> deciding if we want to retry/fail the command.
>>> [Muneendra]Sorry I didn't get what you are saying could you please elaborate
>>> on the same.
>>>
>>> In this approach you do not need the eh_timed_out changes, since we only
>>> seem to care about the port state after the eh callout has completed.
>>> [Muneendra]what about setting the SCMD_NORETRIES_ABORT bit?
>>>
>>
>> I don't think you need it. It sounds like we only care about the port state
>> when the cmd is completing. For example we have:
>>
>> 1. the case where the cmd times out, we do aborts/resets, then the
>> port state goes into marginal, then the aborts/resets complete. We want to
>> fail the cmds without retries.
>>
>> 2. If the port state is in marginal, the cmd times out, we do the aborts/resets
>> and when we are done if the port state is still marginal we want to fail the
>> cmd without retries.
>>
>> 3. If the port state is marginal (or any value), before or after the cmd
>> initially times out, but the port state goes back to online, then when the
>> aborts/resets complete we want to retry the cmd.
>>
> Actually, I don't think the third case can (or should) happen.
> A transition from marginal to online should always include a link bounce (ie a rport_del/rport_add sequence), which would cancel all outstanding commands anyway.
> And if we have the above provision we could clear the flag in rport_del() and everything would be dandy.

That is the part I didn't like or I thought could have issues:

1. What if we go back into marginal after the add() but before we have done scsi_eh_flush_done_q? You need the original looping code and some code for del() right?

2. My issue with #1 is why do we want to add code that loops over commands when we only care about the port state and when scsi_eh_flush_done_q and the abort completion code is already looping over them.

3. This is probably a matter of preference, and I can see why people would not like an extra callout. However, I like the idea that we have an eh callout for the transport class that checks if it's even worth it to run the eh code based on the port state and then we have a eh completion callout that can also check the port state and determine if it's ok.

If we had a marginal scsi_device state or even a bit then you would not actually need #3. The fc class could just set the device state/bit and we could check that in the eh completion code paths.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-11-07  1:20 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-05  6:09 [PATCH v6 0/4] scsi: Support to handle Intermittent errors Muneendra
2020-11-05  6:09 ` [PATCH v6 1/4] scsi: Added a new definition in scsi_cmnd.h Muneendra
2020-11-05  6:09 ` [PATCH v6 2/4] scsi: No retries on abort success Muneendra
2020-11-05  6:09 ` [PATCH v6 3/4] scsi_transport_fc: Added a new rport state FC_PORTSTATE_MARGINAL Muneendra
2020-11-05 15:59   ` Mike Christie
2020-11-05 17:27     ` Muneendra Kumar M
2020-11-05 19:15       ` Mike Christie
2020-11-05 20:02         ` Muneendra Kumar M
2020-11-06  7:23         ` Hannes Reinecke
2020-11-07  1:18           ` Mike Christie
2020-11-05 20:01     ` Muneendra Kumar M
2020-11-05  6:09 ` [PATCH v6 4/4] scsi_transport_fc: Added store fucntionality to set the rport port_state using sysfs Muneendra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.