All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv3 0/6] SCSI EH cleanup
@ 2017-03-01  9:15 Hannes Reinecke
  2017-03-01  9:15 ` [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run Hannes Reinecke
                   ` (5 more replies)
  0 siblings, 6 replies; 29+ messages in thread
From: Hannes Reinecke @ 2017-03-01  9:15 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Bart van Assche, linux-scsi,
	Hannes Reinecke

Hi all,

this is a resend of a small patchset for cleaning up SCSI EH.
Primary goal is to make asynchronous aborts mandatory; there hasn't
been a single report so far where asynchronous abort won't work, so
the 'no_async_abort' flag has never been used and will be removed
with this patchset.
Additionally there's a cleanup for handle failed EH commands, and
to detect retries of failed commands.
I've also included the patch to correctly reset the media access
timeout counter to avoid it being triggered inadvertedly by several
commands timing out for the same device.

As usual, comments and reviews are welcome.

Changes to v1:
- Include reviews from Christoph

Changes to v2:
- Include reviews from Bart
- Add medium access timeout patch


Christoph Hellwig (1):
  libsas: allow async aborts

Hannes Reinecke (5):
  scsi_error: count medium access timeout only once per EH run
  scsi: make eh_eflags persistent
  scsi_error: do not escalate failed EH command
  scsi: make scsi_eh_scmd_add() always succeed
  scsi: make asynchronous aborts mandatory

 Documentation/scsi/scsi_eh.txt      |  30 +++-----
 drivers/scsi/libsas/sas_scsi_host.c |   5 --
 drivers/scsi/scsi_error.c           | 143 +++++++++---------------------------
 drivers/scsi/scsi_lib.c             |   4 +-
 drivers/scsi/scsi_priv.h            |   3 +-
 drivers/scsi/sd.c                   |  21 +++++-
 drivers/scsi/sd.h                   |   1 +
 include/scsi/scsi_driver.h          |   2 +-
 include/scsi/scsi_eh.h              |   1 +
 include/scsi/scsi_host.h            |   5 --
 10 files changed, 68 insertions(+), 147 deletions(-)

-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run
  2017-03-01  9:15 [PATCHv3 0/6] SCSI EH cleanup Hannes Reinecke
@ 2017-03-01  9:15 ` Hannes Reinecke
  2017-03-01 13:50   ` Steffen Maier
                     ` (3 more replies)
  2017-03-01  9:15 ` [PATCHv3 2/6] libsas: allow async aborts Hannes Reinecke
                   ` (4 subsequent siblings)
  5 siblings, 4 replies; 29+ messages in thread
From: Hannes Reinecke @ 2017-03-01  9:15 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Bart van Assche, linux-scsi,
	Hannes Reinecke, Ewan Milne, Lawrence Obermann, Benjamin Block,
	Steffen Maier, Hannes Reinecke

The current medium access timeout counter will be increased for
each command, so if there are enough failed commands we'll hit
the medium access timeout for even a single failure.
Fix this by making the timeout per EH run, ie the counter will
only be increased once per device and EH run.

Cc: Ewan Milne <emilne@redhat.com>
Cc: Lawrence Obermann <loberman@redhat.com>
Cc: Benjamin Block <bblock@linux.vnet.ibm.com>
Cc: Steffen Maier <maier@de.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/scsi/scsi_error.c  | 16 +++++++++++++++-
 drivers/scsi/sd.c          | 21 +++++++++++++++++----
 drivers/scsi/sd.h          |  1 +
 include/scsi/scsi_driver.h |  2 +-
 4 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index f2cafae..cec439c 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -58,6 +58,7 @@
 static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
 static int scsi_try_to_abort_cmd(struct scsi_host_template *,
 				 struct scsi_cmnd *);
+static int scsi_eh_reset(struct scsi_cmnd *scmd);
 
 /* called with shost->host_lock held */
 void scsi_eh_wakeup(struct Scsi_Host *shost)
@@ -249,6 +250,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
 	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
 		eh_flag &= ~SCSI_EH_CANCEL_CMD;
 	scmd->eh_eflags |= eh_flag;
+	scsi_eh_reset(scmd);
 	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
 	shost->host_failed++;
 	scsi_eh_wakeup(shost);
@@ -1107,7 +1109,19 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn)
 	if (!blk_rq_is_passthrough(scmd->request)) {
 		struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
 		if (sdrv->eh_action)
-			rtn = sdrv->eh_action(scmd, rtn);
+			rtn = sdrv->eh_action(scmd, rtn, false);
+	}
+	return rtn;
+}
+
+static int scsi_eh_reset(struct scsi_cmnd *scmd)
+{
+	int rtn = SUCCESS;
+
+	if (!blk_rq_is_passthrough(scmd->request)) {
+		struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
+		if (sdrv->eh_action)
+			rtn = sdrv->eh_action(scmd, rtn, true);
 	}
 	return rtn;
 }
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index c7839f6..c794686 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -115,7 +115,7 @@
 static int sd_init_command(struct scsi_cmnd *SCpnt);
 static void sd_uninit_command(struct scsi_cmnd *SCpnt);
 static int sd_done(struct scsi_cmnd *);
-static int sd_eh_action(struct scsi_cmnd *, int);
+static int sd_eh_action(struct scsi_cmnd *, int, bool);
 static void sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer);
 static void scsi_disk_release(struct device *cdev);
 static void sd_print_sense_hdr(struct scsi_disk *, struct scsi_sense_hdr *);
@@ -1689,18 +1689,28 @@ static int sd_pr_clear(struct block_device *bdev, u64 key)
  *	sd_eh_action - error handling callback
  *	@scmd:		sd-issued command that has failed
  *	@eh_disp:	The recovery disposition suggested by the midlayer
+ *	@reset:		Reset medium access counter
  *
  *	This function is called by the SCSI midlayer upon completion of an
  *	error test command (currently TEST UNIT READY). The result of sending
  *	the eh command is passed in eh_disp.  We're looking for devices that
  *	fail medium access commands but are OK with non access commands like
  *	test unit ready (so wrongly see the device as having a successful
- *	recovery)
+ *	recovery).
+ *	We have to be careful to count a medium access failure only once
+ *	per SCSI EH run; there might be several timed out commands which
+ *	will cause the 'max_medium_access_timeouts' counter to trigger
+ *	after the first SCSI EH run already and set the device to offline.
  **/
-static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp)
+static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp, bool reset)
 {
 	struct scsi_disk *sdkp = scsi_disk(scmd->request->rq_disk);
 
+	if (reset) {
+		/* New SCSI EH run, reset gate variable */
+		sdkp->medium_access_reset = 0;
+		return eh_disp;
+	}
 	if (!scsi_device_online(scmd->device) ||
 	    !scsi_medium_access_command(scmd) ||
 	    host_byte(scmd->result) != DID_TIME_OUT ||
@@ -1714,7 +1724,10 @@ static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp)
 	 * process of recovering or has it suffered an internal failure
 	 * that prevents access to the storage medium.
 	 */
-	sdkp->medium_access_timed_out++;
+	if (!sdkp->medium_access_reset) {
+		sdkp->medium_access_timed_out++;
+		sdkp->medium_access_reset = 1;
+	}
 
 	/*
 	 * If the device keeps failing read/write commands but TEST UNIT
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 4dac35e..6a4f75a 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -106,6 +106,7 @@ struct scsi_disk {
 	unsigned	rc_basis: 2;
 	unsigned	zoned: 2;
 	unsigned	urswrz : 1;
+	unsigned	medium_access_reset : 1;
 };
 #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
 
diff --git a/include/scsi/scsi_driver.h b/include/scsi/scsi_driver.h
index 891a658..d5e0012 100644
--- a/include/scsi/scsi_driver.h
+++ b/include/scsi/scsi_driver.h
@@ -15,7 +15,7 @@ struct scsi_driver {
 	int (*init_command)(struct scsi_cmnd *);
 	void (*uninit_command)(struct scsi_cmnd *);
 	int (*done)(struct scsi_cmnd *);
-	int (*eh_action)(struct scsi_cmnd *, int);
+	int (*eh_action)(struct scsi_cmnd *, int, bool);
 };
 #define to_scsi_driver(drv) \
 	container_of((drv), struct scsi_driver, gendrv)
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCHv3 2/6] libsas: allow async aborts
  2017-03-01  9:15 [PATCHv3 0/6] SCSI EH cleanup Hannes Reinecke
  2017-03-01  9:15 ` [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run Hannes Reinecke
@ 2017-03-01  9:15 ` Hannes Reinecke
  2017-03-01  9:15 ` [PATCHv3 3/6] scsi: make eh_eflags persistent Hannes Reinecke
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 29+ messages in thread
From: Hannes Reinecke @ 2017-03-01  9:15 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Bart van Assche, linux-scsi

From: Christoph Hellwig <hch@lst.de>

We now first try to call ->eh_abort_handler from a work queue, but libsas
was always failing that for no good reason.  Allow async aborts.

Reviewed-by: Johannes Thumshirn <jth@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/libsas/sas_scsi_host.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 9bd55bc..ee6b39a 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -491,9 +491,6 @@ int sas_eh_abort_handler(struct scsi_cmnd *cmd)
 	struct Scsi_Host *host = cmd->device->host;
 	struct sas_internal *i = to_sas_internal(host->transportt);
 
-	if (current != host->ehandler)
-		return FAILED;
-
 	if (!i->dft->lldd_abort_task)
 		return FAILED;
 
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCHv3 3/6] scsi: make eh_eflags persistent
  2017-03-01  9:15 [PATCHv3 0/6] SCSI EH cleanup Hannes Reinecke
  2017-03-01  9:15 ` [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run Hannes Reinecke
  2017-03-01  9:15 ` [PATCHv3 2/6] libsas: allow async aborts Hannes Reinecke
@ 2017-03-01  9:15 ` Hannes Reinecke
  2017-03-01 23:29   ` Bart Van Assche
  2017-03-14 17:51   ` Benjamin Block
  2017-03-01  9:15 ` [PATCHv3 4/6] scsi_error: do not escalate failed EH command Hannes Reinecke
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 29+ messages in thread
From: Hannes Reinecke @ 2017-03-01  9:15 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Bart van Assche, linux-scsi,
	Hannes Reinecke

To detect if a failed command has been retried we must not
clear scmd->eh_eflags when EH finishes.
The flag should be persistent throughout the lifetime
of the command.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 Documentation/scsi/scsi_eh.txt      | 14 +++++++-------
 drivers/scsi/libsas/sas_scsi_host.c |  2 --
 drivers/scsi/scsi_error.c           |  4 ++--
 include/scsi/scsi_eh.h              |  1 +
 4 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt
index 37eca00..4edb9c1c 100644
--- a/Documentation/scsi/scsi_eh.txt
+++ b/Documentation/scsi/scsi_eh.txt
@@ -105,11 +105,14 @@ function
 
  2. If the host supports asynchronous completion (as indicated by the
     no_async_abort setting in the host template) scsi_abort_command()
-    is invoked to schedule an asynchrous abort. If that fails
-    Step #3 is taken.
+    is invoked to schedule an asynchrous abort.
+    Asynchronous abort are not invoked for commands which the
+    SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command
+    already had been aborted once, and this is a retry which failed),
+    or when the EH deadline is expired. In these case Step #3 is taken.
 
- 2. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the
-    command.  See [1-3] for more information.
+ 3. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the
+    command.  See [1-4] for more information.
 
 [1-3] Asynchronous command aborts
 
@@ -263,7 +266,6 @@ scmd->allowed.
 
  3. scmd recovered
     ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd
-	- clear scmd->eh_eflags
 	- scsi_setup_cmd_retry()
 	- move from local eh_work_q to local eh_done_q
     LOCKING: none
@@ -456,8 +458,6 @@ except for #1 must be implemented by eh_strategy_handler().
 
  - shost->host_failed is zero.
 
- - Each scmd's eh_eflags field is cleared.
-
  - Each scmd is in such a state that scsi_setup_cmd_retry() on the
    scmd doesn't make any difference.
 
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index ee6b39a..87e5079 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -613,8 +613,6 @@ static void sas_eh_handle_sas_errors(struct Scsi_Host *shost, struct list_head *
 		SAS_DPRINTK("trying to find task 0x%p\n", task);
 		res = sas_scsi_find_task(task);
 
-		cmd->eh_eflags = 0;
-
 		switch (res) {
 		case TASK_IS_DONE:
 			SAS_DPRINTK("%s: task 0x%p is done\n", __func__,
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index cec439c..e1ca3b8 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -189,7 +189,6 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
 		/*
 		 * Retry after abort failed, escalate to next level.
 		 */
-		scmd->eh_eflags &= ~SCSI_EH_ABORT_SCHEDULED;
 		SCSI_LOG_ERROR_RECOVERY(3,
 			scmd_printk(KERN_INFO, scmd,
 				    "previous abort failed\n"));
@@ -933,6 +932,7 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses,
 	ses->result = scmd->result;
 	ses->underflow = scmd->underflow;
 	ses->prot_op = scmd->prot_op;
+	ses->eh_eflags = scmd->eh_eflags;
 
 	scmd->prot_op = SCSI_PROT_NORMAL;
 	scmd->eh_eflags = 0;
@@ -996,6 +996,7 @@ void scsi_eh_restore_cmnd(struct scsi_cmnd* scmd, struct scsi_eh_save *ses)
 	scmd->result = ses->result;
 	scmd->underflow = ses->underflow;
 	scmd->prot_op = ses->prot_op;
+	scmd->eh_eflags = ses->eh_eflags;
 }
 EXPORT_SYMBOL(scsi_eh_restore_cmnd);
 
@@ -1140,7 +1141,6 @@ static int scsi_eh_reset(struct scsi_cmnd *scmd)
  */
 void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
 {
-	scmd->eh_eflags = 0;
 	list_move_tail(&scmd->eh_entry, done_q);
 }
 EXPORT_SYMBOL(scsi_eh_finish_cmd);
diff --git a/include/scsi/scsi_eh.h b/include/scsi/scsi_eh.h
index 98d366b..a25b328 100644
--- a/include/scsi/scsi_eh.h
+++ b/include/scsi/scsi_eh.h
@@ -31,6 +31,7 @@ extern int scsi_get_sense_info_fld(const u8 * sense_buffer, int sb_len,
 struct scsi_eh_save {
 	/* saved state */
 	int result;
+	int eh_eflags;
 	enum dma_data_direction data_direction;
 	unsigned underflow;
 	unsigned char cmd_len;
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCHv3 4/6] scsi_error: do not escalate failed EH command
  2017-03-01  9:15 [PATCHv3 0/6] SCSI EH cleanup Hannes Reinecke
                   ` (2 preceding siblings ...)
  2017-03-01  9:15 ` [PATCHv3 3/6] scsi: make eh_eflags persistent Hannes Reinecke
@ 2017-03-01  9:15 ` Hannes Reinecke
  2017-03-14 17:56   ` Benjamin Block
  2017-03-01  9:15 ` [PATCHv3 5/6] scsi: make scsi_eh_scmd_add() always succeed Hannes Reinecke
  2017-03-01  9:15 ` [PATCHv3 6/6] scsi: make asynchronous aborts mandatory Hannes Reinecke
  5 siblings, 1 reply; 29+ messages in thread
From: Hannes Reinecke @ 2017-03-01  9:15 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Bart van Assche, linux-scsi,
	Hannes Reinecke

When a command is sent as part of the error handling there
is not point whatsoever to start EH escalation when that
command fails; we are _already_ in the error handler,
and the escalation is about to commence anyway.
So just call 'scsi_try_to_abort_cmd()' to abort outstanding
commands and let the main EH routine handle the rest.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
 drivers/scsi/scsi_error.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index e1ca3b8..4613aa1 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -889,15 +889,6 @@ static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt,
 	return hostt->eh_abort_handler(scmd);
 }
 
-static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd)
-{
-	if (scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd) != SUCCESS)
-		if (scsi_try_bus_device_reset(scmd) != SUCCESS)
-			if (scsi_try_target_reset(scmd) != SUCCESS)
-				if (scsi_try_bus_reset(scmd) != SUCCESS)
-					scsi_try_host_reset(scmd);
-}
-
 /**
  * scsi_eh_prep_cmnd  - Save a scsi command info as part of error recovery
  * @scmd:       SCSI command structure to hijack
@@ -1082,7 +1073,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 			break;
 		}
 	} else if (rtn != FAILED) {
-		scsi_abort_eh_cmnd(scmd);
+		scsi_try_to_abort_cmd(shost->hostt, scmd);
 		rtn = FAILED;
 	}
 
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCHv3 5/6] scsi: make scsi_eh_scmd_add() always succeed
  2017-03-01  9:15 [PATCHv3 0/6] SCSI EH cleanup Hannes Reinecke
                   ` (3 preceding siblings ...)
  2017-03-01  9:15 ` [PATCHv3 4/6] scsi_error: do not escalate failed EH command Hannes Reinecke
@ 2017-03-01  9:15 ` Hannes Reinecke
  2017-03-01 23:34   ` Bart Van Assche
  2017-03-14 18:05   ` Benjamin Block
  2017-03-01  9:15 ` [PATCHv3 6/6] scsi: make asynchronous aborts mandatory Hannes Reinecke
  5 siblings, 2 replies; 29+ messages in thread
From: Hannes Reinecke @ 2017-03-01  9:15 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Bart van Assche, linux-scsi,
	Hannes Reinecke

scsi_eh_scmd_add() currently only will fail if no
error handler thread is started (which will never be the
case) or if the state machine encounters an illegal transition.

But if we're encountering an invalid state transition
chances is we cannot fixup things with the error handler.
So better add a WARN_ON for illegal host states and
make scsi_dh_scmd_add() a void function.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/scsi/scsi_error.c | 41 +++++++++++++----------------------------
 drivers/scsi/scsi_lib.c   |  4 ++--
 drivers/scsi/scsi_priv.h  |  2 +-
 3 files changed, 16 insertions(+), 31 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 4613aa1..802a081 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -163,13 +163,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
 		}
 	}
 
-	if (!scsi_eh_scmd_add(scmd, 0)) {
-		SCSI_LOG_ERROR_RECOVERY(3,
-			scmd_printk(KERN_WARNING, scmd,
-				    "terminate aborted command\n"));
-		set_host_byte(scmd, DID_TIME_OUT);
-		scsi_finish_command(scmd);
-	}
+	scsi_eh_scmd_add(scmd, 0);
 }
 
 /**
@@ -224,28 +218,23 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
  * scsi_eh_scmd_add - add scsi cmd to error handling.
  * @scmd:	scmd to run eh on.
  * @eh_flag:	optional SCSI_EH flag.
- *
- * Return value:
- *	0 on failure.
  */
-int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
+void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
 {
 	struct Scsi_Host *shost = scmd->device->host;
 	unsigned long flags;
-	int ret = 0;
+	int ret;
 
-	if (!shost->ehandler)
-		return 0;
+	WARN_ON_ONCE(!shost->ehandler);
 
 	spin_lock_irqsave(shost->host_lock, flags);
-	if (scsi_host_set_state(shost, SHOST_RECOVERY))
-		if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY))
-			goto out_unlock;
-
+	if (scsi_host_set_state(shost, SHOST_RECOVERY)) {
+		ret = scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY);
+		WARN_ON_ONCE(ret);
+	}
 	if (shost->eh_deadline != -1 && !shost->last_reset)
 		shost->last_reset = jiffies;
 
-	ret = 1;
 	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
 		eh_flag &= ~SCSI_EH_CANCEL_CMD;
 	scmd->eh_eflags |= eh_flag;
@@ -253,9 +242,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
 	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
 	shost->host_failed++;
 	scsi_eh_wakeup(shost);
- out_unlock:
 	spin_unlock_irqrestore(shost->host_lock, flags);
-	return ret;
 }
 
 /**
@@ -284,13 +271,11 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
 		rtn = host->hostt->eh_timed_out(scmd);
 
 	if (rtn == BLK_EH_NOT_HANDLED) {
-		if (!host->hostt->no_async_abort &&
-		    scsi_abort_command(scmd) == SUCCESS)
-			return BLK_EH_NOT_HANDLED;
-
-		set_host_byte(scmd, DID_TIME_OUT);
-		if (!scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD))
-			rtn = BLK_EH_HANDLED;
+		if (host->hostt->no_async_abort ||
+		    scsi_abort_command(scmd) != SUCCESS) {
+			set_host_byte(scmd, DID_TIME_OUT);
+			scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD);
+		}
 	}
 
 	return rtn;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index f5e45a2..0735a46 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1593,8 +1593,8 @@ static void scsi_softirq_done(struct request *rq)
 			scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
 			break;
 		default:
-			if (!scsi_eh_scmd_add(cmd, 0))
-				scsi_finish_command(cmd);
+			scsi_eh_scmd_add(cmd, 0);
+			break;
 	}
 }
 
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 99bfc98..5be6cbf6 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -72,7 +72,7 @@ extern int scsi_dev_info_list_add_keyed(int compatible, char *vendor,
 extern int scsi_error_handler(void *host);
 extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
 extern void scsi_eh_wakeup(struct Scsi_Host *shost);
-extern int scsi_eh_scmd_add(struct scsi_cmnd *, int);
+extern void scsi_eh_scmd_add(struct scsi_cmnd *, int);
 void scsi_eh_ready_devs(struct Scsi_Host *shost,
 			struct list_head *work_q,
 			struct list_head *done_q);
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCHv3 6/6] scsi: make asynchronous aborts mandatory
  2017-03-01  9:15 [PATCHv3 0/6] SCSI EH cleanup Hannes Reinecke
                   ` (4 preceding siblings ...)
  2017-03-01  9:15 ` [PATCHv3 5/6] scsi: make scsi_eh_scmd_add() always succeed Hannes Reinecke
@ 2017-03-01  9:15 ` Hannes Reinecke
  2017-03-14 17:33   ` Benjamin Block
  5 siblings, 1 reply; 29+ messages in thread
From: Hannes Reinecke @ 2017-03-01  9:15 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Bart van Assche, linux-scsi,
	Hannes Reinecke

There hasn't been any reports for HBAs where asynchronous abort
would not work, so we should make it mandatory and remove
the fallback.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
---
 Documentation/scsi/scsi_eh.txt | 18 ++++------
 drivers/scsi/scsi_error.c      | 81 ++++--------------------------------------
 drivers/scsi/scsi_lib.c        |  2 +-
 drivers/scsi/scsi_priv.h       |  3 +-
 include/scsi/scsi_host.h       |  5 ---
 5 files changed, 15 insertions(+), 94 deletions(-)

diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt
index 4edb9c1c..11e447b 100644
--- a/Documentation/scsi/scsi_eh.txt
+++ b/Documentation/scsi/scsi_eh.txt
@@ -70,7 +70,7 @@ with the command.
 	scmd is requeued to blk queue.
 
  - otherwise
-	scsi_eh_scmd_add(scmd, 0) is invoked for the command.  See
+	scsi_eh_scmd_add(scmd) is invoked for the command.  See
 	[1-3] for details of this function.
 
 
@@ -103,9 +103,7 @@ function
         eh_timed_out() callback did not handle the command.
 	Step #2 is taken.
 
- 2. If the host supports asynchronous completion (as indicated by the
-    no_async_abort setting in the host template) scsi_abort_command()
-    is invoked to schedule an asynchrous abort.
+ 2. scsi_abort_command() is invoked to schedule an asynchrous abort.
     Asynchronous abort are not invoked for commands which the
     SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command
     already had been aborted once, and this is a retry which failed),
@@ -127,16 +125,13 @@ function
 
  scmds enter EH via scsi_eh_scmd_add(), which does the following.
 
- 1. Turns on scmd->eh_eflags as requested.  It's 0 for error
-    completions and SCSI_EH_CANCEL_CMD for timeouts.
+ 1. Links scmd->eh_entry to shost->eh_cmd_q
 
- 2. Links scmd->eh_entry to shost->eh_cmd_q
+ 2. Sets SHOST_RECOVERY bit in shost->shost_state
 
- 3. Sets SHOST_RECOVERY bit in shost->shost_state
+ 3. Increments shost->host_failed
 
- 4. Increments shost->host_failed
-
- 5. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
+ 4. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
 
  As can be seen above, once any scmd is added to shost->eh_cmd_q,
 SHOST_RECOVERY shost_state bit is turned on.  This prevents any new
@@ -252,7 +247,6 @@ scmd->allowed.
 
  1. Error completion / time out
     ACTION: scsi_eh_scmd_add() is invoked for scmd
-	- set scmd->eh_eflags
 	- add scmd to shost->eh_cmd_q
 	- set SHOST_RECOVERY
 	- shost->host_failed++
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 802a081..7b70ee9 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -163,7 +163,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
 		}
 	}
 
-	scsi_eh_scmd_add(scmd, 0);
+	scsi_eh_scmd_add(scmd);
 }
 
 /**
@@ -217,9 +217,8 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
 /**
  * scsi_eh_scmd_add - add scsi cmd to error handling.
  * @scmd:	scmd to run eh on.
- * @eh_flag:	optional SCSI_EH flag.
  */
-void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
+void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
 {
 	struct Scsi_Host *shost = scmd->device->host;
 	unsigned long flags;
@@ -235,9 +234,6 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
 	if (shost->eh_deadline != -1 && !shost->last_reset)
 		shost->last_reset = jiffies;
 
-	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
-		eh_flag &= ~SCSI_EH_CANCEL_CMD;
-	scmd->eh_eflags |= eh_flag;
 	scsi_eh_reset(scmd);
 	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
 	shost->host_failed++;
@@ -271,10 +267,9 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
 		rtn = host->hostt->eh_timed_out(scmd);
 
 	if (rtn == BLK_EH_NOT_HANDLED) {
-		if (host->hostt->no_async_abort ||
-		    scsi_abort_command(scmd) != SUCCESS) {
+		if (scsi_abort_command(scmd) != SUCCESS) {
 			set_host_byte(scmd, DID_TIME_OUT);
-			scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD);
+			scsi_eh_scmd_add(scmd);
 		}
 	}
 
@@ -327,7 +322,7 @@ static inline void scsi_eh_prt_fail_stats(struct Scsi_Host *shost,
 		list_for_each_entry(scmd, work_q, eh_entry) {
 			if (scmd->device == sdev) {
 				++total_failures;
-				if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD)
+				if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
 					++cmd_cancel;
 				else
 					++cmd_failed;
@@ -1153,8 +1148,7 @@ int scsi_eh_get_sense(struct list_head *work_q,
 	 * should not get sense.
 	 */
 	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
-		if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) ||
-		    (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ||
+		if ((scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ||
 		    SCSI_SENSE_VALID(scmd))
 			continue;
 
@@ -1294,61 +1288,6 @@ static int scsi_eh_test_devices(struct list_head *cmd_list,
 	return list_empty(work_q);
 }
 
-
-/**
- * scsi_eh_abort_cmds - abort pending commands.
- * @work_q:	&list_head for pending commands.
- * @done_q:	&list_head for processed commands.
- *
- * Decription:
- *    Try and see whether or not it makes sense to try and abort the
- *    running command.  This only works out to be the case if we have one
- *    command that has timed out.  If the command simply failed, it makes
- *    no sense to try and abort the command, since as far as the shost
- *    adapter is concerned, it isn't running.
- */
-static int scsi_eh_abort_cmds(struct list_head *work_q,
-			      struct list_head *done_q)
-{
-	struct scsi_cmnd *scmd, *next;
-	LIST_HEAD(check_list);
-	int rtn;
-	struct Scsi_Host *shost;
-
-	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
-		if (!(scmd->eh_eflags & SCSI_EH_CANCEL_CMD))
-			continue;
-		shost = scmd->device->host;
-		if (scsi_host_eh_past_deadline(shost)) {
-			list_splice_init(&check_list, work_q);
-			SCSI_LOG_ERROR_RECOVERY(3,
-				scmd_printk(KERN_INFO, scmd,
-					    "%s: skip aborting cmd, past eh deadline\n",
-					    current->comm));
-			return list_empty(work_q);
-		}
-		SCSI_LOG_ERROR_RECOVERY(3,
-			scmd_printk(KERN_INFO, scmd,
-				     "%s: aborting cmd\n", current->comm));
-		rtn = scsi_try_to_abort_cmd(shost->hostt, scmd);
-		if (rtn == FAILED) {
-			SCSI_LOG_ERROR_RECOVERY(3,
-				scmd_printk(KERN_INFO, scmd,
-					    "%s: aborting cmd failed\n",
-					     current->comm));
-			list_splice_init(&check_list, work_q);
-			return list_empty(work_q);
-		}
-		scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD;
-		if (rtn == FAST_IO_FAIL)
-			scsi_eh_finish_cmd(scmd, done_q);
-		else
-			list_move_tail(&scmd->eh_entry, &check_list);
-	}
-
-	return scsi_eh_test_devices(&check_list, work_q, done_q, 0);
-}
-
 /**
  * scsi_eh_try_stu - Send START_UNIT to device.
  * @scmd:	&scsi_cmnd to send START_UNIT
@@ -1691,11 +1630,6 @@ static void scsi_eh_offline_sdevs(struct list_head *work_q,
 		sdev_printk(KERN_INFO, scmd->device, "Device offlined - "
 			    "not ready after error recovery\n");
 		scsi_device_set_state(scmd->device, SDEV_OFFLINE);
-		if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD) {
-			/*
-			 * FIXME: Handle lost cmds.
-			 */
-		}
 		scsi_eh_finish_cmd(scmd, done_q);
 	}
 	return;
@@ -2139,8 +2073,7 @@ static void scsi_unjam_host(struct Scsi_Host *shost)
 	SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q));
 
 	if (!scsi_eh_get_sense(&eh_work_q, &eh_done_q))
-		if (!scsi_eh_abort_cmds(&eh_work_q, &eh_done_q))
-			scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
+		scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
 
 	spin_lock_irqsave(shost->host_lock, flags);
 	if (shost->eh_deadline != -1)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 0735a46..98b2df8 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1593,7 +1593,7 @@ static void scsi_softirq_done(struct request *rq)
 			scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
 			break;
 		default:
-			scsi_eh_scmd_add(cmd, 0);
+			scsi_eh_scmd_add(cmd);
 			break;
 	}
 }
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 5be6cbf6..e20ab10 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -18,7 +18,6 @@
 /*
  * Scsi Error Handler Flags
  */
-#define SCSI_EH_CANCEL_CMD	0x0001	/* Cancel this cmd */
 #define SCSI_EH_ABORT_SCHEDULED	0x0002	/* Abort has been scheduled */
 
 #define SCSI_SENSE_VALID(scmd) \
@@ -72,7 +71,7 @@ extern int scsi_dev_info_list_add_keyed(int compatible, char *vendor,
 extern int scsi_error_handler(void *host);
 extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
 extern void scsi_eh_wakeup(struct Scsi_Host *shost);
-extern void scsi_eh_scmd_add(struct scsi_cmnd *, int);
+extern void scsi_eh_scmd_add(struct scsi_cmnd *);
 void scsi_eh_ready_devs(struct Scsi_Host *shost,
 			struct list_head *work_q,
 			struct list_head *done_q);
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 3cd8c3b..afb0481 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -452,11 +452,6 @@ struct scsi_host_template {
 	unsigned no_write_same:1;
 
 	/*
-	 * True if asynchronous aborts are not supported
-	 */
-	unsigned no_async_abort:1;
-
-	/*
 	 * Countdown for host blocking with no commands outstanding.
 	 */
 	unsigned int max_host_blocked;
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run
  2017-03-01  9:15 ` [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run Hannes Reinecke
@ 2017-03-01 13:50   ` Steffen Maier
  2017-03-01 23:24   ` Bart Van Assche
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 29+ messages in thread
From: Steffen Maier @ 2017-03-01 13:50 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Bart van Assche, linux-scsi,
	Ewan Milne, Lawrence Obermann, Benjamin Block, Steffen Maier,
	Hannes Reinecke

Hannes, thanks for posting this.

I just found time to review the description so far:

> Subject: [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run

"... to prevent false positives"

Might be nice for understanding but also makes the short log quite long.

On 03/01/2017 10:15 AM, Hannes Reinecke wrote:
> The current medium access timeout counter will be increased for
> each command, so if there are enough failed commands we'll hit

"_Enough_ failed commands" sounds a bit vague to me, especially since we 
seem to know exactly, see below.

> the medium access timeout for even a single failure.
> Fix this by making the timeout per EH run, ie the counter will
> only be increased once per device and EH run.

I miss a description of the user-visible consequences of the bug we fix, 
so any user can understand when they need this.

Just an example
[FYI: users have been misinterpreting "medium access timeouts" for a 
time(out) duration value rather than an event count which it actually is]:

"
Otherwise, any abort due to SCSI command timeout on a SCSI device, 
having at least a *number* of max_medium_access_timeouts [scsi_disk 
sysfs attribute, defaults to 2] SCSI commands pending, causes a false 
positive which erroneously sets the SCSI device offline with the 
following error kernel message:

sd H:C:T:L: [sdXY] Medium access timeout failure. Offlining disk!
"

> Cc: Ewan Milne <emilne@redhat.com>
> Cc: Lawrence Obermann <loberman@redhat.com>
> Cc: Benjamin Block <bblock@linux.vnet.ibm.com>
> Cc: Steffen Maier <maier@de.ibm.com>

I would prefer <maier@linux.vnet.ibm.com>.

> Signed-off-by: Hannes Reinecke <hare@suse.com>

Martin introduced this feature (18a4d0a22ed6c54b67af7718c305cd010f09ddf8),
James did a fix (2451079bc2ae1334058be8babd44be03ecfa7041),
and David did the most recent fix (2a863ba8f6f5d72e4905a91c6281d575809fed5b)

Did Martin's original commit work without these false positives?
If so, some of the other commits might have caused a regression, maybe 
2451079bc2ae1334058be8babd44be03ecfa7041?

If so, we would need a Fixes: tag.

In general I would like to see this tagged for stable due to the already 
evident impact.

> ---
>  drivers/scsi/scsi_error.c  | 16 +++++++++++++++-
>  drivers/scsi/sd.c          | 21 +++++++++++++++++----
>  drivers/scsi/sd.h          |  1 +
>  include/scsi/scsi_driver.h |  2 +-
>  4 files changed, 34 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index f2cafae..cec439c 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -58,6 +58,7 @@
>  static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
>  static int scsi_try_to_abort_cmd(struct scsi_host_template *,
>  				 struct scsi_cmnd *);
> +static int scsi_eh_reset(struct scsi_cmnd *scmd);
>
>  /* called with shost->host_lock held */
>  void scsi_eh_wakeup(struct Scsi_Host *shost)
> @@ -249,6 +250,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>  	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
>  		eh_flag &= ~SCSI_EH_CANCEL_CMD;
>  	scmd->eh_eflags |= eh_flag;
> +	scsi_eh_reset(scmd);
>  	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
>  	shost->host_failed++;
>  	scsi_eh_wakeup(shost);
> @@ -1107,7 +1109,19 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn)
>  	if (!blk_rq_is_passthrough(scmd->request)) {
>  		struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
>  		if (sdrv->eh_action)
> -			rtn = sdrv->eh_action(scmd, rtn);
> +			rtn = sdrv->eh_action(scmd, rtn, false);
> +	}
> +	return rtn;
> +}
> +
> +static int scsi_eh_reset(struct scsi_cmnd *scmd)
> +{
> +	int rtn = SUCCESS;
> +
> +	if (!blk_rq_is_passthrough(scmd->request)) {
> +		struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
> +		if (sdrv->eh_action)
> +			rtn = sdrv->eh_action(scmd, rtn, true);
>  	}
>  	return rtn;
>  }
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index c7839f6..c794686 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -115,7 +115,7 @@
>  static int sd_init_command(struct scsi_cmnd *SCpnt);
>  static void sd_uninit_command(struct scsi_cmnd *SCpnt);
>  static int sd_done(struct scsi_cmnd *);
> -static int sd_eh_action(struct scsi_cmnd *, int);
> +static int sd_eh_action(struct scsi_cmnd *, int, bool);
>  static void sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer);
>  static void scsi_disk_release(struct device *cdev);
>  static void sd_print_sense_hdr(struct scsi_disk *, struct scsi_sense_hdr *);
> @@ -1689,18 +1689,28 @@ static int sd_pr_clear(struct block_device *bdev, u64 key)
>   *	sd_eh_action - error handling callback
>   *	@scmd:		sd-issued command that has failed
>   *	@eh_disp:	The recovery disposition suggested by the midlayer
> + *	@reset:		Reset medium access counter
>   *
>   *	This function is called by the SCSI midlayer upon completion of an
>   *	error test command (currently TEST UNIT READY). The result of sending
>   *	the eh command is passed in eh_disp.  We're looking for devices that
>   *	fail medium access commands but are OK with non access commands like
>   *	test unit ready (so wrongly see the device as having a successful
> - *	recovery)
> + *	recovery).
> + *	We have to be careful to count a medium access failure only once
> + *	per SCSI EH run; there might be several timed out commands which
> + *	will cause the 'max_medium_access_timeouts' counter to trigger
> + *	after the first SCSI EH run already and set the device to offline.
>   **/
> -static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp)
> +static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp, bool reset)
>  {
>  	struct scsi_disk *sdkp = scsi_disk(scmd->request->rq_disk);
>
> +	if (reset) {
> +		/* New SCSI EH run, reset gate variable */
> +		sdkp->medium_access_reset = 0;
> +		return eh_disp;
> +	}
>  	if (!scsi_device_online(scmd->device) ||
>  	    !scsi_medium_access_command(scmd) ||
>  	    host_byte(scmd->result) != DID_TIME_OUT ||
> @@ -1714,7 +1724,10 @@ static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp)
>  	 * process of recovering or has it suffered an internal failure
>  	 * that prevents access to the storage medium.
>  	 */
> -	sdkp->medium_access_timed_out++;
> +	if (!sdkp->medium_access_reset) {
> +		sdkp->medium_access_timed_out++;
> +		sdkp->medium_access_reset = 1;
> +	}
>
>  	/*
>  	 * If the device keeps failing read/write commands but TEST UNIT
> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
> index 4dac35e..6a4f75a 100644
> --- a/drivers/scsi/sd.h
> +++ b/drivers/scsi/sd.h
> @@ -106,6 +106,7 @@ struct scsi_disk {
>  	unsigned	rc_basis: 2;
>  	unsigned	zoned: 2;
>  	unsigned	urswrz : 1;
> +	unsigned	medium_access_reset : 1;
>  };
>  #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
>
> diff --git a/include/scsi/scsi_driver.h b/include/scsi/scsi_driver.h
> index 891a658..d5e0012 100644
> --- a/include/scsi/scsi_driver.h
> +++ b/include/scsi/scsi_driver.h
> @@ -15,7 +15,7 @@ struct scsi_driver {
>  	int (*init_command)(struct scsi_cmnd *);
>  	void (*uninit_command)(struct scsi_cmnd *);
>  	int (*done)(struct scsi_cmnd *);
> -	int (*eh_action)(struct scsi_cmnd *, int);
> +	int (*eh_action)(struct scsi_cmnd *, int, bool);
>  };
>  #define to_scsi_driver(drv) \
>  	container_of((drv), struct scsi_driver, gendrv)
>

-- 
Mit freundlichen Grüßen / Kind regards
Steffen Maier

Linux on z Systems Development

IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run
  2017-03-01  9:15 ` [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run Hannes Reinecke
  2017-03-01 13:50   ` Steffen Maier
@ 2017-03-01 23:24   ` Bart Van Assche
  2017-03-02  8:02     ` Hannes Reinecke
  2017-03-02 20:16   ` Benjamin Block
  2017-03-13 13:37   ` Mauricio Faria de Oliveira
  3 siblings, 1 reply; 29+ messages in thread
From: Bart Van Assche @ 2017-03-01 23:24 UTC (permalink / raw)
  To: hare, martin.petersen
  Cc: bblock, hch, linux-scsi, hare, james.bottomley, emilne, loberman, maier

On Wed, 2017-03-01 at 10:15 +0100, Hannes Reinecke wrote:
> The current medium access timeout counter will be increased for
> each command, so if there are enough failed commands we'll hit
> the medium access timeout for even a single failure.

This sentence describes multiple failed commands as a single failure.
That's confusing to me. Did you perhaps intend "for a single device
failure"?

> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index f2cafae..cec439c 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -58,6 +58,7 @@
>  static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
>  static int scsi_try_to_abort_cmd(struct scsi_host_template *,
>  				 struct scsi_cmnd *);
> +static int scsi_eh_reset(struct scsi_cmnd *scmd);
>  
>  /* called with shost->host_lock held */
>  void scsi_eh_wakeup(struct Scsi_Host *shost)
> @@ -249,6 +250,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>  	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
>  		eh_flag &= ~SCSI_EH_CANCEL_CMD;
>  	scmd->eh_eflags |= eh_flag;
> +	scsi_eh_reset(scmd);
>  	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
>  	shost->host_failed++;
>  	scsi_eh_wakeup(shost);
> @@ -1107,7 +1109,19 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn)
>  	if (!blk_rq_is_passthrough(scmd->request)) {
>  		struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
>  		if (sdrv->eh_action)
> -			rtn = sdrv->eh_action(scmd, rtn);
> +			rtn = sdrv->eh_action(scmd, rtn, false);
> +	}
> +	return rtn;
> +}
> +
> +static int scsi_eh_reset(struct scsi_cmnd *scmd)
> +{
> +	int rtn = SUCCESS;
> +
> +	if (!blk_rq_is_passthrough(scmd->request)) {
> +		struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
> +		if (sdrv->eh_action)
> +			rtn = sdrv->eh_action(scmd, rtn, true);
>  	}
>  	return rtn;
>  }

Can this function be moved up such that we don't need a new forward declaration?

> @@ -1689,18 +1689,28 @@ static int sd_pr_clear(struct block_device *bdev, u64 key)
>   *	sd_eh_action - error handling callback
>   *	@scmd:		sd-issued command that has failed
>   *	@eh_disp:	The recovery disposition suggested by the midlayer
> + *	@reset:		Reset medium access counter

It seems to me that @reset does not trigger a reset of the medium access counter
but rather of the variable that prevents the medium access error counter to be
incremented?

> + *	recovery).
> + *	We have to be careful to count a medium access failure only once
> + *	per SCSI EH run; there might be several timed out commands which

Did you perhaps intend "once per device per SCSI EH run"?

> --- a/drivers/scsi/sd.h
> +++ b/drivers/scsi/sd.h
> @@ -106,6 +106,7 @@ struct scsi_disk {
>  	unsigned	rc_basis: 2;
>  	unsigned	zoned: 2;
>  	unsigned	urswrz : 1;
> +	unsigned	medium_access_reset : 1;

The name of this new member is confusing to me. How about using the name
"ignore_medium_access_errors" instead? And since this is a boolean, how
about using true and false in assignments to this variable?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 3/6] scsi: make eh_eflags persistent
  2017-03-01  9:15 ` [PATCHv3 3/6] scsi: make eh_eflags persistent Hannes Reinecke
@ 2017-03-01 23:29   ` Bart Van Assche
  2017-03-14 17:51   ` Benjamin Block
  1 sibling, 0 replies; 29+ messages in thread
From: Bart Van Assche @ 2017-03-01 23:29 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: hch, james.bottomley, linux-scsi

On Wed, 2017-03-01 at 10:15 +0100, Hannes Reinecke wrote:
> To detect if a failed command has been retried we must not
> clear scmd->eh_eflags when EH finishes.
> The flag should be persistent throughout the lifetime
> of the command.
> 
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> Signed-off-by: Hannes Reinecke <hare@suse.de>

Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 5/6] scsi: make scsi_eh_scmd_add() always succeed
  2017-03-01  9:15 ` [PATCHv3 5/6] scsi: make scsi_eh_scmd_add() always succeed Hannes Reinecke
@ 2017-03-01 23:34   ` Bart Van Assche
  2017-03-14 18:05   ` Benjamin Block
  1 sibling, 0 replies; 29+ messages in thread
From: Bart Van Assche @ 2017-03-01 23:34 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: hch, james.bottomley, linux-scsi

On Wed, 2017-03-01 at 10:15 +0100, Hannes Reinecke wrote:
> scsi_eh_scmd_add() currently only will fail if no
> error handler thread is started (which will never be the
> case) or if the state machine encounters an illegal transition.
> 
> But if we're encountering an invalid state transition
> chances is we cannot fixup things with the error handler.
> So better add a WARN_ON for illegal host states and
> make scsi_dh_scmd_add() a void function.

Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run
  2017-03-01 23:24   ` Bart Van Assche
@ 2017-03-02  8:02     ` Hannes Reinecke
  0 siblings, 0 replies; 29+ messages in thread
From: Hannes Reinecke @ 2017-03-02  8:02 UTC (permalink / raw)
  To: Bart Van Assche, martin.petersen
  Cc: bblock, hch, linux-scsi, hare, james.bottomley, emilne, loberman, maier

On 03/02/2017 12:24 AM, Bart Van Assche wrote:
> On Wed, 2017-03-01 at 10:15 +0100, Hannes Reinecke wrote:
>> The current medium access timeout counter will be increased for
>> each command, so if there are enough failed commands we'll hit
>> the medium access timeout for even a single failure.
>
> This sentence describes multiple failed commands as a single failure.
> That's confusing to me. Did you perhaps intend "for a single device
> failure"?
>
>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>> index f2cafae..cec439c 100644
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -58,6 +58,7 @@
>>  static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
>>  static int scsi_try_to_abort_cmd(struct scsi_host_template *,
>>  				 struct scsi_cmnd *);
>> +static int scsi_eh_reset(struct scsi_cmnd *scmd);
>>
>>  /* called with shost->host_lock held */
>>  void scsi_eh_wakeup(struct Scsi_Host *shost)
>> @@ -249,6 +250,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>>  	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
>>  		eh_flag &= ~SCSI_EH_CANCEL_CMD;
>>  	scmd->eh_eflags |= eh_flag;
>> +	scsi_eh_reset(scmd);
>>  	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
>>  	shost->host_failed++;
>>  	scsi_eh_wakeup(shost);
>> @@ -1107,7 +1109,19 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn)
>>  	if (!blk_rq_is_passthrough(scmd->request)) {
>>  		struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
>>  		if (sdrv->eh_action)
>> -			rtn = sdrv->eh_action(scmd, rtn);
>> +			rtn = sdrv->eh_action(scmd, rtn, false);
>> +	}
>> +	return rtn;
>> +}
>> +
>> +static int scsi_eh_reset(struct scsi_cmnd *scmd)
>> +{
>> +	int rtn = SUCCESS;
>> +
>> +	if (!blk_rq_is_passthrough(scmd->request)) {
>> +		struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
>> +		if (sdrv->eh_action)
>> +			rtn = sdrv->eh_action(scmd, rtn, true);
>>  	}
>>  	return rtn;
>>  }
>
> Can this function be moved up such that we don't need a new forward declaration?
>
Why, but of course.
I just put is here to be next to the original scsi_eh_action() function.

>> @@ -1689,18 +1689,28 @@ static int sd_pr_clear(struct block_device *bdev, u64 key)
>>   *	sd_eh_action - error handling callback
>>   *	@scmd:		sd-issued command that has failed
>>   *	@eh_disp:	The recovery disposition suggested by the midlayer
>> + *	@reset:		Reset medium access counter
>
> It seems to me that @reset does not trigger a reset of the medium access counter
> but rather of the variable that prevents the medium access error counter to be
> incremented?
>
Correct. Will be fixing up the description.

>> + *	recovery).
>> + *	We have to be careful to count a medium access failure only once
>> + *	per SCSI EH run; there might be several timed out commands which
>
> Did you perhaps intend "once per device per SCSI EH run"?
>
Yes.

>> --- a/drivers/scsi/sd.h
>> +++ b/drivers/scsi/sd.h
>> @@ -106,6 +106,7 @@ struct scsi_disk {
>>  	unsigned	rc_basis: 2;
>>  	unsigned	zoned: 2;
>>  	unsigned	urswrz : 1;
>> +	unsigned	medium_access_reset : 1;
>
> The name of this new member is confusing to me. How about using the name
> "ignore_medium_access_errors" instead? And since this is a boolean, how
> about using true and false in assignments to this variable?
>
Ok, will be doing so.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run
  2017-03-01  9:15 ` [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run Hannes Reinecke
  2017-03-01 13:50   ` Steffen Maier
  2017-03-01 23:24   ` Bart Van Assche
@ 2017-03-02 20:16   ` Benjamin Block
  2017-03-13 10:20     ` Hannes Reinecke
  2017-03-13 13:37   ` Mauricio Faria de Oliveira
  3 siblings, 1 reply; 29+ messages in thread
From: Benjamin Block @ 2017-03-02 20:16 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	Bart van Assche, linux-scsi, Ewan Milne, Lawrence Obermann,
	Steffen Maier, Hannes Reinecke

Hej Hannes,

On Wed, Mar 01, 2017 at 10:15:15AM +0100, Hannes Reinecke wrote:
> The current medium access timeout counter will be increased for
> each command, so if there are enough failed commands we'll hit
> the medium access timeout for even a single failure.
> Fix this by making the timeout per EH run, ie the counter will
> only be increased once per device and EH run.
>
> Cc: Ewan Milne <emilne@redhat.com>
> Cc: Lawrence Obermann <loberman@redhat.com>
> Cc: Benjamin Block <bblock@linux.vnet.ibm.com>
> Cc: Steffen Maier <maier@de.ibm.com>
> Signed-off-by: Hannes Reinecke <hare@suse.com>

Steffen already suggested it, It would be nice to have a stable-tag
here. This already caused multiple real-world false-positive outages, I
think that qualifies for stable :)

> ---
>  drivers/scsi/scsi_error.c  | 16 +++++++++++++++-
>  drivers/scsi/sd.c          | 21 +++++++++++++++++----
>  drivers/scsi/sd.h          |  1 +
>  include/scsi/scsi_driver.h |  2 +-
>  4 files changed, 34 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index f2cafae..cec439c 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -58,6 +58,7 @@
>  static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
>  static int scsi_try_to_abort_cmd(struct scsi_host_template *,
>  				 struct scsi_cmnd *);
> +static int scsi_eh_reset(struct scsi_cmnd *scmd);
>
>  /* called with shost->host_lock held */
>  void scsi_eh_wakeup(struct Scsi_Host *shost)
> @@ -249,6 +250,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>  	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
>  		eh_flag &= ~SCSI_EH_CANCEL_CMD;
>  	scmd->eh_eflags |= eh_flag;
> +	scsi_eh_reset(scmd);
>  	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
>  	shost->host_failed++;
>  	scsi_eh_wakeup(shost);
> @@ -1107,7 +1109,19 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn)
>  	if (!blk_rq_is_passthrough(scmd->request)) {
>  		struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
>  		if (sdrv->eh_action)
> -			rtn = sdrv->eh_action(scmd, rtn);
> +			rtn = sdrv->eh_action(scmd, rtn, false);
> +	}
> +	return rtn;
> +}
> +
> +static int scsi_eh_reset(struct scsi_cmnd *scmd)
> +{
> +	int rtn = SUCCESS;
> +
> +	if (!blk_rq_is_passthrough(scmd->request)) {
> +		struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
> +		if (sdrv->eh_action)
> +			rtn = sdrv->eh_action(scmd, rtn, true);
>  	}
>  	return rtn;
>  }
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index c7839f6..c794686 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -115,7 +115,7 @@
>  static int sd_init_command(struct scsi_cmnd *SCpnt);
>  static void sd_uninit_command(struct scsi_cmnd *SCpnt);
>  static int sd_done(struct scsi_cmnd *);
> -static int sd_eh_action(struct scsi_cmnd *, int);
> +static int sd_eh_action(struct scsi_cmnd *, int, bool);
>  static void sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer);
>  static void scsi_disk_release(struct device *cdev);
>  static void sd_print_sense_hdr(struct scsi_disk *, struct scsi_sense_hdr *);
> @@ -1689,18 +1689,28 @@ static int sd_pr_clear(struct block_device *bdev, u64 key)
>   *	sd_eh_action - error handling callback
>   *	@scmd:		sd-issued command that has failed
>   *	@eh_disp:	The recovery disposition suggested by the midlayer
> + *	@reset:		Reset medium access counter
>   *
>   *	This function is called by the SCSI midlayer upon completion of an
>   *	error test command (currently TEST UNIT READY). The result of sending
>   *	the eh command is passed in eh_disp.  We're looking for devices that
>   *	fail medium access commands but are OK with non access commands like
>   *	test unit ready (so wrongly see the device as having a successful
> - *	recovery)
> + *	recovery).
> + *	We have to be careful to count a medium access failure only once
> + *	per SCSI EH run; there might be several timed out commands which
> + *	will cause the 'max_medium_access_timeouts' counter to trigger
> + *	after the first SCSI EH run already and set the device to offline.
>   **/
> -static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp)
> +static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp, bool reset)
>  {
>  	struct scsi_disk *sdkp = scsi_disk(scmd->request->rq_disk);
>
> +	if (reset) {
> +		/* New SCSI EH run, reset gate variable */
> +		sdkp->medium_access_reset = 0;
> +		return eh_disp;
> +	}
>  	if (!scsi_device_online(scmd->device) ||
>  	    !scsi_medium_access_command(scmd) ||
>  	    host_byte(scmd->result) != DID_TIME_OUT ||
> @@ -1714,7 +1724,10 @@ static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp)
>  	 * process of recovering or has it suffered an internal failure
>  	 * that prevents access to the storage medium.
>  	 */
> -	sdkp->medium_access_timed_out++;
> +	if (!sdkp->medium_access_reset) {
> +		sdkp->medium_access_timed_out++;
> +		sdkp->medium_access_reset = 1;
> +	}
>
>  	/*
>  	 * If the device keeps failing read/write commands but TEST UNIT
> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
> index 4dac35e..6a4f75a 100644
> --- a/drivers/scsi/sd.h
> +++ b/drivers/scsi/sd.h
> @@ -106,6 +106,7 @@ struct scsi_disk {
>  	unsigned	rc_basis: 2;
>  	unsigned	zoned: 2;
>  	unsigned	urswrz : 1;
> +	unsigned	medium_access_reset : 1;
>  };
>  #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
>
> diff --git a/include/scsi/scsi_driver.h b/include/scsi/scsi_driver.h
> index 891a658..d5e0012 100644
> --- a/include/scsi/scsi_driver.h
> +++ b/include/scsi/scsi_driver.h
> @@ -15,7 +15,7 @@ struct scsi_driver {
>  	int (*init_command)(struct scsi_cmnd *);
>  	void (*uninit_command)(struct scsi_cmnd *);
>  	int (*done)(struct scsi_cmnd *);
> -	int (*eh_action)(struct scsi_cmnd *, int);
> +	int (*eh_action)(struct scsi_cmnd *, int, bool);
>  };
>  #define to_scsi_driver(drv) \
>  	container_of((drv), struct scsi_driver, gendrv)
> --
> 1.8.5.6
>

Bart already made some suggestions that I think are good, otherwise I
have nothing big to add. Should work just fine.

The only thing that still bugs me about this is the corner-case we
talked about the last time we talked about this. No show-stopper for me,
but I though I might as well mention it.

When we do the different escalation-stages in scsi_unjam_host(), we
always have the same basic pattern:

  for_all_in(work_q)) {
      cmd = get_next(eh_work_q);

      do_action(cmd); /* might be abort, lun_reset, target_reset, ... */

      if (was_ok()) {
          move_to(check_list, cmd);
	  move_same_scope_to(check_list, work_q);
      }
  }

  for_all_in(check_list) {
      cmd  = get_next(eh_work_q);
      sdev = get_sdev(cmd);

      test_device(sdev); /* TUR and maybe STU */

      for_all_in_same_scope_in(check_list) {
          if (test_was_ok() && scsi_eh_action(scmd, SUCCESS) == SUCCESS)
              move_to(done_q, cmd)
          else
              move_to(work_q, cmd)
      }
  }

  return list_empty(eh_work_q)

(I hope I have this right, irritatingly this 'looks' different for no
reason I can see for the different stages).

The corner-case being, if we have a cmd that failes in scsi_eh_action(),
it will be put back into the work_q. But all other command for the same
scope will be put into the done_q because for them, sd_eh_action()
will early-return with SUCCESS (sdev is not online anymore).

Then, because the work_q is not empty, we will escalate to the next
step. Worst case being, we escalate to host_reset, although EH is
actually already done - we already decided that this sdev is offline in
SD and even when we go through host-reset it will not become running
anymore..  which makes the whole host-reset a big waste of time.

AFAIK the escalate to Host-Reset can only happen if the LLD has no
pointer for STU, LUN-, Target- and Bus-Reset, so its actually not as
bad, but I still think its something we should correct.

If you don't wanna change this here, I can probably send a part of the
patch I made for this some time ago.



                                                    Beste Grüße / Best regards,
                                                      - Benjamin Block
--
Linux on z Systems Development         /         IBM Systems & Technology Group
		  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz     /        Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run
  2017-03-02 20:16   ` Benjamin Block
@ 2017-03-13 10:20     ` Hannes Reinecke
  0 siblings, 0 replies; 29+ messages in thread
From: Hannes Reinecke @ 2017-03-13 10:20 UTC (permalink / raw)
  To: Benjamin Block
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	Bart van Assche, linux-scsi, Ewan Milne, Lawrence Obermann,
	Steffen Maier, Hannes Reinecke

On 03/02/2017 09:16 PM, Benjamin Block wrote:
> Hej Hannes,
> 
> On Wed, Mar 01, 2017 at 10:15:15AM +0100, Hannes Reinecke wrote:
>> The current medium access timeout counter will be increased for
>> each command, so if there are enough failed commands we'll hit
>> the medium access timeout for even a single failure.
>> Fix this by making the timeout per EH run, ie the counter will
>> only be increased once per device and EH run.
>>
>> Cc: Ewan Milne <emilne@redhat.com>
>> Cc: Lawrence Obermann <loberman@redhat.com>
>> Cc: Benjamin Block <bblock@linux.vnet.ibm.com>
>> Cc: Steffen Maier <maier@de.ibm.com>
>> Signed-off-by: Hannes Reinecke <hare@suse.com>
> 
> Steffen already suggested it, It would be nice to have a stable-tag
> here. This already caused multiple real-world false-positive outages, I
> think that qualifies for stable :)
> 
>> ---
>>  drivers/scsi/scsi_error.c  | 16 +++++++++++++++-
>>  drivers/scsi/sd.c          | 21 +++++++++++++++++----
>>  drivers/scsi/sd.h          |  1 +
>>  include/scsi/scsi_driver.h |  2 +-
>>  4 files changed, 34 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>> index f2cafae..cec439c 100644
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -58,6 +58,7 @@
>>  static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
>>  static int scsi_try_to_abort_cmd(struct scsi_host_template *,
>>  				 struct scsi_cmnd *);
>> +static int scsi_eh_reset(struct scsi_cmnd *scmd);
>>
>>  /* called with shost->host_lock held */
>>  void scsi_eh_wakeup(struct Scsi_Host *shost)
>> @@ -249,6 +250,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>>  	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
>>  		eh_flag &= ~SCSI_EH_CANCEL_CMD;
>>  	scmd->eh_eflags |= eh_flag;
>> +	scsi_eh_reset(scmd);
>>  	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
>>  	shost->host_failed++;
>>  	scsi_eh_wakeup(shost);
>> @@ -1107,7 +1109,19 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn)
>>  	if (!blk_rq_is_passthrough(scmd->request)) {
>>  		struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
>>  		if (sdrv->eh_action)
>> -			rtn = sdrv->eh_action(scmd, rtn);
>> +			rtn = sdrv->eh_action(scmd, rtn, false);
>> +	}
>> +	return rtn;
>> +}
>> +
>> +static int scsi_eh_reset(struct scsi_cmnd *scmd)
>> +{
>> +	int rtn = SUCCESS;
>> +
>> +	if (!blk_rq_is_passthrough(scmd->request)) {
>> +		struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
>> +		if (sdrv->eh_action)
>> +			rtn = sdrv->eh_action(scmd, rtn, true);
>>  	}
>>  	return rtn;
>>  }
>> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
>> index c7839f6..c794686 100644
>> --- a/drivers/scsi/sd.c
>> +++ b/drivers/scsi/sd.c
>> @@ -115,7 +115,7 @@
>>  static int sd_init_command(struct scsi_cmnd *SCpnt);
>>  static void sd_uninit_command(struct scsi_cmnd *SCpnt);
>>  static int sd_done(struct scsi_cmnd *);
>> -static int sd_eh_action(struct scsi_cmnd *, int);
>> +static int sd_eh_action(struct scsi_cmnd *, int, bool);
>>  static void sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer);
>>  static void scsi_disk_release(struct device *cdev);
>>  static void sd_print_sense_hdr(struct scsi_disk *, struct scsi_sense_hdr *);
>> @@ -1689,18 +1689,28 @@ static int sd_pr_clear(struct block_device *bdev, u64 key)
>>   *	sd_eh_action - error handling callback
>>   *	@scmd:		sd-issued command that has failed
>>   *	@eh_disp:	The recovery disposition suggested by the midlayer
>> + *	@reset:		Reset medium access counter
>>   *
>>   *	This function is called by the SCSI midlayer upon completion of an
>>   *	error test command (currently TEST UNIT READY). The result of sending
>>   *	the eh command is passed in eh_disp.  We're looking for devices that
>>   *	fail medium access commands but are OK with non access commands like
>>   *	test unit ready (so wrongly see the device as having a successful
>> - *	recovery)
>> + *	recovery).
>> + *	We have to be careful to count a medium access failure only once
>> + *	per SCSI EH run; there might be several timed out commands which
>> + *	will cause the 'max_medium_access_timeouts' counter to trigger
>> + *	after the first SCSI EH run already and set the device to offline.
>>   **/
>> -static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp)
>> +static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp, bool reset)
>>  {
>>  	struct scsi_disk *sdkp = scsi_disk(scmd->request->rq_disk);
>>
>> +	if (reset) {
>> +		/* New SCSI EH run, reset gate variable */
>> +		sdkp->medium_access_reset = 0;
>> +		return eh_disp;
>> +	}
>>  	if (!scsi_device_online(scmd->device) ||
>>  	    !scsi_medium_access_command(scmd) ||
>>  	    host_byte(scmd->result) != DID_TIME_OUT ||
>> @@ -1714,7 +1724,10 @@ static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp)
>>  	 * process of recovering or has it suffered an internal failure
>>  	 * that prevents access to the storage medium.
>>  	 */
>> -	sdkp->medium_access_timed_out++;
>> +	if (!sdkp->medium_access_reset) {
>> +		sdkp->medium_access_timed_out++;
>> +		sdkp->medium_access_reset = 1;
>> +	}
>>
>>  	/*
>>  	 * If the device keeps failing read/write commands but TEST UNIT
>> diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
>> index 4dac35e..6a4f75a 100644
>> --- a/drivers/scsi/sd.h
>> +++ b/drivers/scsi/sd.h
>> @@ -106,6 +106,7 @@ struct scsi_disk {
>>  	unsigned	rc_basis: 2;
>>  	unsigned	zoned: 2;
>>  	unsigned	urswrz : 1;
>> +	unsigned	medium_access_reset : 1;
>>  };
>>  #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
>>
>> diff --git a/include/scsi/scsi_driver.h b/include/scsi/scsi_driver.h
>> index 891a658..d5e0012 100644
>> --- a/include/scsi/scsi_driver.h
>> +++ b/include/scsi/scsi_driver.h
>> @@ -15,7 +15,7 @@ struct scsi_driver {
>>  	int (*init_command)(struct scsi_cmnd *);
>>  	void (*uninit_command)(struct scsi_cmnd *);
>>  	int (*done)(struct scsi_cmnd *);
>> -	int (*eh_action)(struct scsi_cmnd *, int);
>> +	int (*eh_action)(struct scsi_cmnd *, int, bool);
>>  };
>>  #define to_scsi_driver(drv) \
>>  	container_of((drv), struct scsi_driver, gendrv)
>> --
>> 1.8.5.6
>>
> 
> Bart already made some suggestions that I think are good, otherwise I
> have nothing big to add. Should work just fine.
> 
> The only thing that still bugs me about this is the corner-case we
> talked about the last time we talked about this. No show-stopper for me,
> but I though I might as well mention it.
> 
> When we do the different escalation-stages in scsi_unjam_host(), we
> always have the same basic pattern:
> 
>   for_all_in(work_q)) {
>       cmd = get_next(eh_work_q);
> 
>       do_action(cmd); /* might be abort, lun_reset, target_reset, ... */
> 
>       if (was_ok()) {
>           move_to(check_list, cmd);
> 	  move_same_scope_to(check_list, work_q);
>       }
>   }
> 
>   for_all_in(check_list) {
>       cmd  = get_next(eh_work_q);
>       sdev = get_sdev(cmd);
> 
>       test_device(sdev); /* TUR and maybe STU */
> 
>       for_all_in_same_scope_in(check_list) {
>           if (test_was_ok() && scsi_eh_action(scmd, SUCCESS) == SUCCESS)
>               move_to(done_q, cmd)
>           else
>               move_to(work_q, cmd)
>       }
>   }
> 
>   return list_empty(eh_work_q)
> 
> (I hope I have this right, irritatingly this 'looks' different for no
> reason I can see for the different stages).
> 
> The corner-case being, if we have a cmd that failes in scsi_eh_action(),
> it will be put back into the work_q. But all other command for the same
> scope will be put into the done_q because for them, sd_eh_action()
> will early-return with SUCCESS (sdev is not online anymore).
> 
> Then, because the work_q is not empty, we will escalate to the next
> step. Worst case being, we escalate to host_reset, although EH is
> actually already done - we already decided that this sdev is offline in
> SD and even when we go through host-reset it will not become running
> anymore..  which makes the whole host-reset a big waste of time.
> 
> AFAIK the escalate to Host-Reset can only happen if the LLD has no
> pointer for STU, LUN-, Target- and Bus-Reset, so its actually not as
> bad, but I still think its something we should correct.
> 
> If you don't wanna change this here, I can probably send a part of the
> patch I made for this some time ago.
> 
Hmm. Shouldn't this resolved by simply returning 'SUCCESS' in
sd_eh_action() when it decides to take the device offline?

IE doing something like this:

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index c794686..311ac3c 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1739,7 +1739,7 @@ static int sd_eh_action(struct scsi_cmnd *scmd,
int eh_dis
p, bool reset)
                            "Medium access timeout failure. Offlining
disk!\n");
                scsi_device_set_state(scmd->device, SDEV_OFFLINE);

-               return FAILED;
+               return SUCCESS;
        }

        return eh_disp;

would help, no?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run
  2017-03-01  9:15 ` [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run Hannes Reinecke
                     ` (2 preceding siblings ...)
  2017-03-02 20:16   ` Benjamin Block
@ 2017-03-13 13:37   ` Mauricio Faria de Oliveira
  2017-03-13 14:48     ` Hannes Reinecke
  3 siblings, 1 reply; 29+ messages in thread
From: Mauricio Faria de Oliveira @ 2017-03-13 13:37 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	Bart van Assche, linux-scsi, Ewan Milne, Lawrence Obermann,
	Benjamin Block, Steffen Maier, Hannes Reinecke

Hannes,

On 03/01/2017 06:15 AM, Hannes Reinecke wrote:
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index f2cafae..cec439c 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -58,6 +58,7 @@
>  static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
>  static int scsi_try_to_abort_cmd(struct scsi_host_template *,
>  				 struct scsi_cmnd *);
> +static int scsi_eh_reset(struct scsi_cmnd *scmd);
>
>  /* called with shost->host_lock held */
>  void scsi_eh_wakeup(struct Scsi_Host *shost)
> @@ -249,6 +250,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>  	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
>  		eh_flag &= ~SCSI_EH_CANCEL_CMD;
>  	scmd->eh_eflags |= eh_flag;
> +	scsi_eh_reset(scmd);
>  	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
>  	shost->host_failed++;
>  	scsi_eh_wakeup(shost);
> @@ -1107,7 +1109,19 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn)
>  	if (!blk_rq_is_passthrough(scmd->request)) {
>  		struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
>  		if (sdrv->eh_action)
> -			rtn = sdrv->eh_action(scmd, rtn);
> +			rtn = sdrv->eh_action(scmd, rtn, false);
> +	}
> +	return rtn;
> +}
> +
> +static int scsi_eh_reset(struct scsi_cmnd *scmd)
> +{
> +	int rtn = SUCCESS;
> +
> +	if (!blk_rq_is_passthrough(scmd->request)) {
> +		struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
> +		if (sdrv->eh_action)
> +			rtn = sdrv->eh_action(scmd, rtn, true);
>  	}
>  	return rtn;
>  }

Apparently we can de-duplicate code in scsi_eh_reset()/scsi_eh_action(),
and change less of sd_eh_action() (i.e., no new parameter).

Something like this.  Thoughts?


- Deduplicate code in scsi_eh_reset() and scsi_eh_action()
   - A call to scsi_eh_reset() with reset == false calls into eh_action()

   - A call to scsi_eh_reset() with reset == true returns early,
     (as happens with/if sd_eh_action() is called in your patch)

   - A call to scsi_eh_reset() in scsi_eh_scmd_add() uses SUCCESS just
     for consistency with scsi_eh_reset() in your patch, as the return
     value is ignored in it.

- Less changes to sd_eh_action()
   - Removed one chunk in sd_eh_action() ('reset gate variable')
   - No more parameter changes in sd_eh_action() (no 'reset' parameter)

- Removed forward declaration of scsi_eh_reset(), as already suggested



static int scsi_eh_reset(struct scsi_cmnd *scmd, int eh_disp, bool reset)
{
	int rtn = eh_disp;

	if (!blk_rq_is_passthrough(scmd->request)) {
		struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
		if (sdrv->eh_action) {
			if (reset) {
			 	struct scsi_disk *sdkp = scsi_disk(scmd->request->rq_disk);

				/* New SCSI EH run, reset gate variable */
				sdkp->medium_access_reset = 0;
				return rtn;
			}
			rtn = sdrv->eh_action(scmd, rtn);
		}
  	}
  	return rtn;
}

Plus,

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index f2cafae..cec439c 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -249,6 +250,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int 
eh_flag)
  	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
  		eh_flag &= ~SCSI_EH_CANCEL_CMD;
  	scmd->eh_eflags |= eh_flag;
+	scsi_eh_reset(scmd, SUCCESS, true); // return value is ignored (early 
exit due to reset)
  	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
  	shost->host_failed++;
  	scsi_eh_wakeup(shost);
@@ -1107,7 +1109,19 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, 
int rtn)
-	if (!blk_rq_is_passthrough(scmd->request)) {
-		struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
-		if (sdrv->eh_action)
-			rtn = sdrv->eh_action(scmd, rtn);
-	}
-	return rtn;
+	return scsi_eh_reset(scmd, rtn, false);
  }


And the rest, without the 'reset' parameter.

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index c7839f6..c794686 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1689,X +1689,Y @@ static int sd_pr_clear(struct block_device *bdev, 
u64 key)
   *	sd_eh_action - error handling callback
   *	@scmd:		sd-issued command that has failed
   *	@eh_disp:	The recovery disposition suggested by the midlayer
   *
   *	This function is called by the SCSI midlayer upon completion of an
   *	error test command (currently TEST UNIT READY). The result of sending
   *	the eh command is passed in eh_disp.  We're looking for devices that
   *	fail medium access commands but are OK with non access commands like
   *	test unit ready (so wrongly see the device as having a successful
- *	recovery)
+ *	recovery).
+ *	We have to be careful to count a medium access failure only once
+ *	per SCSI EH run; there might be several timed out commands which
+ *	will cause the 'max_medium_access_timeouts' counter to trigger
+ *	after the first SCSI EH run already and set the device to offline.
   **/
@@ -1714,7 +1724,10 @@ static int sd_eh_action(struct scsi_cmnd *scmd, 
int eh_disp)
  	 * process of recovering or has it suffered an internal failure
  	 * that prevents access to the storage medium.
  	 */
-	sdkp->medium_access_timed_out++;
+	if (!sdkp->medium_access_reset) {
+		sdkp->medium_access_timed_out++;
+		sdkp->medium_access_reset = 1;
+	}

  	/*
  	 * If the device keeps failing read/write commands but TEST UNIT
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 4dac35e..6a4f75a 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -106,6 +106,7 @@ struct scsi_disk {
  	unsigned	rc_basis: 2;
  	unsigned	zoned: 2;
  	unsigned	urswrz : 1;
+	unsigned	medium_access_reset : 1;
  };
  #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)











-- 
Mauricio Faria de Oliveira
IBM Linux Technology Center

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run
  2017-03-13 13:37   ` Mauricio Faria de Oliveira
@ 2017-03-13 14:48     ` Hannes Reinecke
  2017-03-13 15:54       ` Mauricio Faria de Oliveira
  0 siblings, 1 reply; 29+ messages in thread
From: Hannes Reinecke @ 2017-03-13 14:48 UTC (permalink / raw)
  To: Mauricio Faria de Oliveira
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	Bart van Assche, linux-scsi, Ewan Milne, Lawrence Obermann,
	Benjamin Block, Steffen Maier, Hannes Reinecke

On 03/13/2017 02:37 PM, Mauricio Faria de Oliveira wrote:
> Hannes,
> 
> On 03/01/2017 06:15 AM, Hannes Reinecke wrote:
>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>> index f2cafae..cec439c 100644
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -58,6 +58,7 @@
>>  static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
>>  static int scsi_try_to_abort_cmd(struct scsi_host_template *,
>>                   struct scsi_cmnd *);
>> +static int scsi_eh_reset(struct scsi_cmnd *scmd);
>>
>>  /* called with shost->host_lock held */
>>  void scsi_eh_wakeup(struct Scsi_Host *shost)
>> @@ -249,6 +250,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int
>> eh_flag)
>>      if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
>>          eh_flag &= ~SCSI_EH_CANCEL_CMD;
>>      scmd->eh_eflags |= eh_flag;
>> +    scsi_eh_reset(scmd);
>>      list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
>>      shost->host_failed++;
>>      scsi_eh_wakeup(shost);
>> @@ -1107,7 +1109,19 @@ static int scsi_eh_action(struct scsi_cmnd
>> *scmd, int rtn)
>>      if (!blk_rq_is_passthrough(scmd->request)) {
>>          struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
>>          if (sdrv->eh_action)
>> -            rtn = sdrv->eh_action(scmd, rtn);
>> +            rtn = sdrv->eh_action(scmd, rtn, false);
>> +    }
>> +    return rtn;
>> +}
>> +
>> +static int scsi_eh_reset(struct scsi_cmnd *scmd)
>> +{
>> +    int rtn = SUCCESS;
>> +
>> +    if (!blk_rq_is_passthrough(scmd->request)) {
>> +        struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
>> +        if (sdrv->eh_action)
>> +            rtn = sdrv->eh_action(scmd, rtn, true);
>>      }
>>      return rtn;
>>  }
> 
> Apparently we can de-duplicate code in scsi_eh_reset()/scsi_eh_action(),
> and change less of sd_eh_action() (i.e., no new parameter).
> 
> Something like this.  Thoughts?
> 
> 
> - Deduplicate code in scsi_eh_reset() and scsi_eh_action()
>   - A call to scsi_eh_reset() with reset == false calls into eh_action()
> 
>   - A call to scsi_eh_reset() with reset == true returns early,
>     (as happens with/if sd_eh_action() is called in your patch)
> 
>   - A call to scsi_eh_reset() in scsi_eh_scmd_add() uses SUCCESS just
>     for consistency with scsi_eh_reset() in your patch, as the return
>     value is ignored in it.
> 
> - Less changes to sd_eh_action()
>   - Removed one chunk in sd_eh_action() ('reset gate variable')
>   - No more parameter changes in sd_eh_action() (no 'reset' parameter)
> 
> - Removed forward declaration of scsi_eh_reset(), as already suggested
> 
> 
> 
> static int scsi_eh_reset(struct scsi_cmnd *scmd, int eh_disp, bool reset)
> {
>     int rtn = eh_disp;
> 
>     if (!blk_rq_is_passthrough(scmd->request)) {
>         struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
>         if (sdrv->eh_action) {
>             if (reset) {
>                  struct scsi_disk *sdkp =
> scsi_disk(scmd->request->rq_disk);
> 
>                 /* New SCSI EH run, reset gate variable */
>                 sdkp->medium_access_reset = 0;
>                 return rtn;
>             }
>             rtn = sdrv->eh_action(scmd, rtn);
>         }
>      }
>      return rtn;
> }
> 
Nope.

This is assuming that we're always running on a scsi_disk, and that
scsi_disk is the only one implementing 'eh_action'.
Neither of which is necessarily true.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run
  2017-03-13 14:48     ` Hannes Reinecke
@ 2017-03-13 15:54       ` Mauricio Faria de Oliveira
  0 siblings, 0 replies; 29+ messages in thread
From: Mauricio Faria de Oliveira @ 2017-03-13 15:54 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	Bart van Assche, linux-scsi, Ewan Milne, Lawrence Obermann,
	Benjamin Block, Steffen Maier, Hannes Reinecke

On 03/13/2017 11:48 AM, Hannes Reinecke wrote:
> This is assuming that we're always running on a scsi_disk, and that
> scsi_disk is the only one implementing 'eh_action'.
> Neither of which is necessarily true.

Ah, OK. Thanks for explaining.


-- 
Mauricio Faria de Oliveira
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 6/6] scsi: make asynchronous aborts mandatory
  2017-03-01  9:15 ` [PATCHv3 6/6] scsi: make asynchronous aborts mandatory Hannes Reinecke
@ 2017-03-14 17:33   ` Benjamin Block
  2017-03-15 13:54     ` Hannes Reinecke
  0 siblings, 1 reply; 29+ messages in thread
From: Benjamin Block @ 2017-03-14 17:33 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	Bart van Assche, linux-scsi

Hello Hannes,

On Wed, Mar 01, 2017 at 10:15:20AM +0100, Hannes Reinecke wrote:
> There hasn't been any reports for HBAs where asynchronous abort
> would not work, so we should make it mandatory and remove
> the fallback.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
> ---
>  Documentation/scsi/scsi_eh.txt | 18 ++++------
>  drivers/scsi/scsi_error.c      | 81 ++++--------------------------------------
>  drivers/scsi/scsi_lib.c        |  2 +-
>  drivers/scsi/scsi_priv.h       |  3 +-
>  include/scsi/scsi_host.h       |  5 ---
>  5 files changed, 15 insertions(+), 94 deletions(-)
> 
> diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt
> index 4edb9c1c..11e447b 100644
> --- a/Documentation/scsi/scsi_eh.txt
> +++ b/Documentation/scsi/scsi_eh.txt
> @@ -70,7 +70,7 @@ with the command.
>  	scmd is requeued to blk queue.
> 
>   - otherwise
> -	scsi_eh_scmd_add(scmd, 0) is invoked for the command.  See
> +	scsi_eh_scmd_add(scmd) is invoked for the command.  See
>  	[1-3] for details of this function.
> 
> 
> @@ -103,9 +103,7 @@ function
>          eh_timed_out() callback did not handle the command.
>  	Step #2 is taken.
> 
> - 2. If the host supports asynchronous completion (as indicated by the
> -    no_async_abort setting in the host template) scsi_abort_command()
> -    is invoked to schedule an asynchrous abort.
> + 2. scsi_abort_command() is invoked to schedule an asynchrous abort.
>      Asynchronous abort are not invoked for commands which the
>      SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command
>      already had been aborted once, and this is a retry which failed),
> @@ -127,16 +125,13 @@ function
> 
>   scmds enter EH via scsi_eh_scmd_add(), which does the following.
> 
> - 1. Turns on scmd->eh_eflags as requested.  It's 0 for error
> -    completions and SCSI_EH_CANCEL_CMD for timeouts.
> + 1. Links scmd->eh_entry to shost->eh_cmd_q
> 
> - 2. Links scmd->eh_entry to shost->eh_cmd_q
> + 2. Sets SHOST_RECOVERY bit in shost->shost_state
> 
> - 3. Sets SHOST_RECOVERY bit in shost->shost_state
> + 3. Increments shost->host_failed
> 
> - 4. Increments shost->host_failed
> -
> - 5. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
> + 4. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
> 
>   As can be seen above, once any scmd is added to shost->eh_cmd_q,
>  SHOST_RECOVERY shost_state bit is turned on.  This prevents any new
> @@ -252,7 +247,6 @@ scmd->allowed.
> 
>   1. Error completion / time out
>      ACTION: scsi_eh_scmd_add() is invoked for scmd
> -	- set scmd->eh_eflags
>  	- add scmd to shost->eh_cmd_q
>  	- set SHOST_RECOVERY
>  	- shost->host_failed++
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 802a081..7b70ee9 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -163,7 +163,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
>  		}
>  	}
> 
> -	scsi_eh_scmd_add(scmd, 0);
> +	scsi_eh_scmd_add(scmd);
>  }
> 
>  /**
> @@ -217,9 +217,8 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
>  /**
>   * scsi_eh_scmd_add - add scsi cmd to error handling.
>   * @scmd:	scmd to run eh on.
> - * @eh_flag:	optional SCSI_EH flag.
>   */
> -void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
> +void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
>  {
>  	struct Scsi_Host *shost = scmd->device->host;
>  	unsigned long flags;
> @@ -235,9 +234,6 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>  	if (shost->eh_deadline != -1 && !shost->last_reset)
>  		shost->last_reset = jiffies;
> 
> -	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
> -		eh_flag &= ~SCSI_EH_CANCEL_CMD;
> -	scmd->eh_eflags |= eh_flag;
>  	scsi_eh_reset(scmd);
>  	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
>  	shost->host_failed++;
> @@ -271,10 +267,9 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
>  		rtn = host->hostt->eh_timed_out(scmd);
> 
>  	if (rtn == BLK_EH_NOT_HANDLED) {
> -		if (host->hostt->no_async_abort ||
> -		    scsi_abort_command(scmd) != SUCCESS) {
> +		if (scsi_abort_command(scmd) != SUCCESS) {
>  			set_host_byte(scmd, DID_TIME_OUT);
> -			scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD);
> +			scsi_eh_scmd_add(scmd);
>  		}
>  	}
> 
> @@ -327,7 +322,7 @@ static inline void scsi_eh_prt_fail_stats(struct Scsi_Host *shost,
>  		list_for_each_entry(scmd, work_q, eh_entry) {
>  			if (scmd->device == sdev) {
>  				++total_failures;
> -				if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD)
> +				if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
>  					++cmd_cancel;
>  				else
>  					++cmd_failed;
> @@ -1153,8 +1148,7 @@ int scsi_eh_get_sense(struct list_head *work_q,
>  	 * should not get sense.
>  	 */
>  	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
> -		if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) ||
> -		    (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ||
> +		if ((scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ||
>  		    SCSI_SENSE_VALID(scmd))
>  			continue;
> 
> @@ -1294,61 +1288,6 @@ static int scsi_eh_test_devices(struct list_head *cmd_list,
>  	return list_empty(work_q);
>  }
> 
> -
> -/**
> - * scsi_eh_abort_cmds - abort pending commands.
> - * @work_q:	&list_head for pending commands.
> - * @done_q:	&list_head for processed commands.
> - *
> - * Decription:
> - *    Try and see whether or not it makes sense to try and abort the
> - *    running command.  This only works out to be the case if we have one
> - *    command that has timed out.  If the command simply failed, it makes
> - *    no sense to try and abort the command, since as far as the shost
> - *    adapter is concerned, it isn't running.
> - */
> -static int scsi_eh_abort_cmds(struct list_head *work_q,
> -			      struct list_head *done_q)
> -{
> -	struct scsi_cmnd *scmd, *next;
> -	LIST_HEAD(check_list);
> -	int rtn;
> -	struct Scsi_Host *shost;
> -
> -	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
> -		if (!(scmd->eh_eflags & SCSI_EH_CANCEL_CMD))
> -			continue;
> -		shost = scmd->device->host;
> -		if (scsi_host_eh_past_deadline(shost)) {
> -			list_splice_init(&check_list, work_q);
> -			SCSI_LOG_ERROR_RECOVERY(3,
> -				scmd_printk(KERN_INFO, scmd,
> -					    "%s: skip aborting cmd, past eh deadline\n",
> -					    current->comm));
> -			return list_empty(work_q);
> -		}
> -		SCSI_LOG_ERROR_RECOVERY(3,
> -			scmd_printk(KERN_INFO, scmd,
> -				     "%s: aborting cmd\n", current->comm));
> -		rtn = scsi_try_to_abort_cmd(shost->hostt, scmd);
> -		if (rtn == FAILED) {
> -			SCSI_LOG_ERROR_RECOVERY(3,
> -				scmd_printk(KERN_INFO, scmd,
> -					    "%s: aborting cmd failed\n",
> -					     current->comm));
> -			list_splice_init(&check_list, work_q);
> -			return list_empty(work_q);
> -		}
> -		scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD;
> -		if (rtn == FAST_IO_FAIL)
> -			scsi_eh_finish_cmd(scmd, done_q);
> -		else
> -			list_move_tail(&scmd->eh_entry, &check_list);
> -	}
> -
> -	return scsi_eh_test_devices(&check_list, work_q, done_q, 0);
> -}
> -
>  /**
>   * scsi_eh_try_stu - Send START_UNIT to device.
>   * @scmd:	&scsi_cmnd to send START_UNIT
> @@ -1691,11 +1630,6 @@ static void scsi_eh_offline_sdevs(struct list_head *work_q,
>  		sdev_printk(KERN_INFO, scmd->device, "Device offlined - "
>  			    "not ready after error recovery\n");
>  		scsi_device_set_state(scmd->device, SDEV_OFFLINE);
> -		if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD) {
> -			/*
> -			 * FIXME: Handle lost cmds.
> -			 */
> -		}
>  		scsi_eh_finish_cmd(scmd, done_q);
>  	}
>  	return;
> @@ -2139,8 +2073,7 @@ static void scsi_unjam_host(struct Scsi_Host *shost)
>  	SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q));
> 
>  	if (!scsi_eh_get_sense(&eh_work_q, &eh_done_q))
> -		if (!scsi_eh_abort_cmds(&eh_work_q, &eh_done_q))
> -			scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
> +		scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
> 
>  	spin_lock_irqsave(shost->host_lock, flags);
>  	if (shost->eh_deadline != -1)
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index 0735a46..98b2df8 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1593,7 +1593,7 @@ static void scsi_softirq_done(struct request *rq)
>  			scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
>  			break;
>  		default:
> -			scsi_eh_scmd_add(cmd, 0);
> +			scsi_eh_scmd_add(cmd);
>  			break;
>  	}
>  }
> diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
> index 5be6cbf6..e20ab10 100644
> --- a/drivers/scsi/scsi_priv.h
> +++ b/drivers/scsi/scsi_priv.h
> @@ -18,7 +18,6 @@
>  /*
>   * Scsi Error Handler Flags
>   */
> -#define SCSI_EH_CANCEL_CMD	0x0001	/* Cancel this cmd */
>  #define SCSI_EH_ABORT_SCHEDULED	0x0002	/* Abort has been scheduled */
> 
>  #define SCSI_SENSE_VALID(scmd) \
> @@ -72,7 +71,7 @@ extern int scsi_dev_info_list_add_keyed(int compatible, char *vendor,
>  extern int scsi_error_handler(void *host);
>  extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
>  extern void scsi_eh_wakeup(struct Scsi_Host *shost);
> -extern void scsi_eh_scmd_add(struct scsi_cmnd *, int);
> +extern void scsi_eh_scmd_add(struct scsi_cmnd *);
>  void scsi_eh_ready_devs(struct Scsi_Host *shost,
>  			struct list_head *work_q,
>  			struct list_head *done_q);
> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index 3cd8c3b..afb0481 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -452,11 +452,6 @@ struct scsi_host_template {
>  	unsigned no_write_same:1;
> 
>  	/*
> -	 * True if asynchronous aborts are not supported
> -	 */
> -	unsigned no_async_abort:1;
> -
> -	/*
>  	 * Countdown for host blocking with no commands outstanding.
>  	 */
>  	unsigned int max_host_blocked;
> -- 
> 1.8.5.6
> 

Hmm so, I guess we compromise in terms of how granular we want to
recover? 

When say an abort for command A on LUN 1 behind Port α fails for some
reason, then we also skip all possible aborts for command B on LUN 2
behind Port α and command C on LUN 1 behind Port β? (The host might
already be in recovery by the time command B and C fail)

I mean, in the end, all traffic holds on that host anyway, as long as we
get into EH - which is true as soon as one abort fails. Might as well
skip the wait time till we get there.

I would like it, if we could document that fact a bit more direct.
Otherwise I think that's good. The code also looks good.


Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>


                                                    Beste Grüße / Best regards,
                                                      - Benjamin Block
-- 
Linux on z Systems Development         /         IBM Systems & Technology Group
		  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz     /        Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 3/6] scsi: make eh_eflags persistent
  2017-03-01  9:15 ` [PATCHv3 3/6] scsi: make eh_eflags persistent Hannes Reinecke
  2017-03-01 23:29   ` Bart Van Assche
@ 2017-03-14 17:51   ` Benjamin Block
  1 sibling, 0 replies; 29+ messages in thread
From: Benjamin Block @ 2017-03-14 17:51 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	Bart van Assche, linux-scsi

Hello Hannes,

sry that I only now reply to the other patches of the series.

On Wed, Mar 01, 2017 at 10:15:17AM +0100, Hannes Reinecke wrote:
> To detect if a failed command has been retried we must not
> clear scmd->eh_eflags when EH finishes.
> The flag should be persistent throughout the lifetime
> of the command.
>
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  Documentation/scsi/scsi_eh.txt      | 14 +++++++-------
>  drivers/scsi/libsas/sas_scsi_host.c |  2 --
>  drivers/scsi/scsi_error.c           |  4 ++--
>  include/scsi/scsi_eh.h              |  1 +
>  4 files changed, 10 insertions(+), 11 deletions(-)
>
> diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt
> index 37eca00..4edb9c1c 100644
> --- a/Documentation/scsi/scsi_eh.txt
> +++ b/Documentation/scsi/scsi_eh.txt
> @@ -105,11 +105,14 @@ function
>
>   2. If the host supports asynchronous completion (as indicated by the
>      no_async_abort setting in the host template) scsi_abort_command()
> -    is invoked to schedule an asynchrous abort. If that fails
> -    Step #3 is taken.
> +    is invoked to schedule an asynchrous abort.
> +    Asynchronous abort are not invoked for commands which the
> +    SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command
> +    already had been aborted once, and this is a retry which failed),
> +    or when the EH deadline is expired. In these case Step #3 is taken.
>
> - 2. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the
> -    command.  See [1-3] for more information.
> + 3. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the
> +    command.  See [1-4] for more information.
>
>  [1-3] Asynchronous command aborts
>
> @@ -263,7 +266,6 @@ scmd->allowed.
>
>   3. scmd recovered
>      ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd
> -	- clear scmd->eh_eflags
>  	- scsi_setup_cmd_retry()
>  	- move from local eh_work_q to local eh_done_q
>      LOCKING: none
> @@ -456,8 +458,6 @@ except for #1 must be implemented by eh_strategy_handler().
>
>   - shost->host_failed is zero.
>
> - - Each scmd's eh_eflags field is cleared.
> -
>   - Each scmd is in such a state that scsi_setup_cmd_retry() on the
>     scmd doesn't make any difference.
>
> diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
> index ee6b39a..87e5079 100644
> --- a/drivers/scsi/libsas/sas_scsi_host.c
> +++ b/drivers/scsi/libsas/sas_scsi_host.c
> @@ -613,8 +613,6 @@ static void sas_eh_handle_sas_errors(struct Scsi_Host *shost, struct list_head *
>  		SAS_DPRINTK("trying to find task 0x%p\n", task);
>  		res = sas_scsi_find_task(task);
>
> -		cmd->eh_eflags = 0;
> -
>  		switch (res) {
>  		case TASK_IS_DONE:
>  			SAS_DPRINTK("%s: task 0x%p is done\n", __func__,
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index cec439c..e1ca3b8 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -189,7 +189,6 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
>  		/*
>  		 * Retry after abort failed, escalate to next level.
>  		 */
> -		scmd->eh_eflags &= ~SCSI_EH_ABORT_SCHEDULED;
>  		SCSI_LOG_ERROR_RECOVERY(3,
>  			scmd_printk(KERN_INFO, scmd,
>  				    "previous abort failed\n"));
> @@ -933,6 +932,7 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses,
>  	ses->result = scmd->result;
>  	ses->underflow = scmd->underflow;
>  	ses->prot_op = scmd->prot_op;
> +	ses->eh_eflags = scmd->eh_eflags;
>
>  	scmd->prot_op = SCSI_PROT_NORMAL;
>  	scmd->eh_eflags = 0;
> @@ -996,6 +996,7 @@ void scsi_eh_restore_cmnd(struct scsi_cmnd* scmd, struct scsi_eh_save *ses)
>  	scmd->result = ses->result;
>  	scmd->underflow = ses->underflow;
>  	scmd->prot_op = ses->prot_op;
> +	scmd->eh_eflags = ses->eh_eflags;
>  }
>  EXPORT_SYMBOL(scsi_eh_restore_cmnd);
>
> @@ -1140,7 +1141,6 @@ static int scsi_eh_reset(struct scsi_cmnd *scmd)
>   */
>  void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
>  {
> -	scmd->eh_eflags = 0;
>  	list_move_tail(&scmd->eh_entry, done_q);
>  }
>  EXPORT_SYMBOL(scsi_eh_finish_cmd);
> diff --git a/include/scsi/scsi_eh.h b/include/scsi/scsi_eh.h
> index 98d366b..a25b328 100644
> --- a/include/scsi/scsi_eh.h
> +++ b/include/scsi/scsi_eh.h
> @@ -31,6 +31,7 @@ extern int scsi_get_sense_info_fld(const u8 * sense_buffer, int sb_len,
>  struct scsi_eh_save {
>  	/* saved state */
>  	int result;
> +	int eh_eflags;
>  	enum dma_data_direction data_direction;
>  	unsigned underflow;
>  	unsigned char cmd_len;
> --
> 1.8.5.6
>

So I think the code is good. The only thing I am missing is a bit of
reasoning why we want to escalate immediately after an already retried
command fails again. Anyway, would not require code-changes.


Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>


                                                    Beste Grüße / Best regards,
                                                      - Benjamin Block
--
Linux on z Systems Development         /         IBM Systems & Technology Group
		  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz     /        Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 4/6] scsi_error: do not escalate failed EH command
  2017-03-01  9:15 ` [PATCHv3 4/6] scsi_error: do not escalate failed EH command Hannes Reinecke
@ 2017-03-14 17:56   ` Benjamin Block
  2017-03-15 13:54     ` Hannes Reinecke
  0 siblings, 1 reply; 29+ messages in thread
From: Benjamin Block @ 2017-03-14 17:56 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	Bart van Assche, linux-scsi

Hello Hannes,

On Wed, Mar 01, 2017 at 10:15:18AM +0100, Hannes Reinecke wrote:
> When a command is sent as part of the error handling there
> is not point whatsoever to start EH escalation when that
> command fails; we are _already_ in the error handler,
> and the escalation is about to commence anyway.
> So just call 'scsi_try_to_abort_cmd()' to abort outstanding
> commands and let the main EH routine handle the rest.
>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
> ---
>  drivers/scsi/scsi_error.c | 11 +----------
>  1 file changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index e1ca3b8..4613aa1 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -889,15 +889,6 @@ static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt,
>  	return hostt->eh_abort_handler(scmd);
>  }
>
> -static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd)
> -{
> -	if (scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd) != SUCCESS)
> -		if (scsi_try_bus_device_reset(scmd) != SUCCESS)
> -			if (scsi_try_target_reset(scmd) != SUCCESS)
> -				if (scsi_try_bus_reset(scmd) != SUCCESS)
> -					scsi_try_host_reset(scmd);
> -}
> -
>  /**
>   * scsi_eh_prep_cmnd  - Save a scsi command info as part of error recovery
>   * @scmd:       SCSI command structure to hijack
> @@ -1082,7 +1073,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
>  			break;
>  		}
>  	} else if (rtn != FAILED) {
> -		scsi_abort_eh_cmnd(scmd);
> +		scsi_try_to_abort_cmd(shost->hostt, scmd);
>  		rtn = FAILED;
>  	}

The idea is sound, but this implementation would cause "use-after-free"s.

I only know our own LLD well enough to judge, but with zFCP there will
always be a chance that an abort fails - be it memory pressure,
hardware/firmware behavior or internal EH in zFCP.

Calling queuecommand() will mean for us in the LLD, that we allocate a
unique internal request struct for the scsi_cmnd (struct
zfcp_fsf_request) and add that to our internal hash-table with
outstanding commands. We assume this scsi_cmnd-pointer is ours till we
complete it via scsi_done are yield it via successful EH-actions.

In case the abort fails, you fail to take back the ownership over the
scsi command. Which in turn means possible "use-after-free"s when we
still thinks the scsi command is ours, but EH has already overwritten
the scsi-command with the original one. When we still get an answer or
otherwise use the scsi_cmnd-pointer we would access an invalid one.

I guess this might as well be true for other LLDs.


                                                    Beste Grüße / Best regards,
                                                      - Benjamin Block

>
> --
> 1.8.5.6
>

--
Linux on z Systems Development         /         IBM Systems & Technology Group
		  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz     /        Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 5/6] scsi: make scsi_eh_scmd_add() always succeed
  2017-03-01  9:15 ` [PATCHv3 5/6] scsi: make scsi_eh_scmd_add() always succeed Hannes Reinecke
  2017-03-01 23:34   ` Bart Van Assche
@ 2017-03-14 18:05   ` Benjamin Block
  1 sibling, 0 replies; 29+ messages in thread
From: Benjamin Block @ 2017-03-14 18:05 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	Bart van Assche, linux-scsi

Hello Hannes,

On Wed, Mar 01, 2017 at 10:15:19AM +0100, Hannes Reinecke wrote:
> scsi_eh_scmd_add() currently only will fail if no
> error handler thread is started (which will never be the
> case) or if the state machine encounters an illegal transition.
> 
> But if we're encountering an invalid state transition
> chances is we cannot fixup things with the error handler.
> So better add a WARN_ON for illegal host states and
> make scsi_dh_scmd_add() a void function.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> ---
>  drivers/scsi/scsi_error.c | 41 +++++++++++++----------------------------
>  drivers/scsi/scsi_lib.c   |  4 ++--
>  drivers/scsi/scsi_priv.h  |  2 +-
>  3 files changed, 16 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 4613aa1..802a081 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -163,13 +163,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
>  		}
>  	}
> 
> -	if (!scsi_eh_scmd_add(scmd, 0)) {
> -		SCSI_LOG_ERROR_RECOVERY(3,
> -			scmd_printk(KERN_WARNING, scmd,
> -				    "terminate aborted command\n"));
> -		set_host_byte(scmd, DID_TIME_OUT);
> -		scsi_finish_command(scmd);
> -	}
> +	scsi_eh_scmd_add(scmd, 0);
>  }
> 
>  /**
> @@ -224,28 +218,23 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
>   * scsi_eh_scmd_add - add scsi cmd to error handling.
>   * @scmd:	scmd to run eh on.
>   * @eh_flag:	optional SCSI_EH flag.
> - *
> - * Return value:
> - *	0 on failure.
>   */
> -int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
> +void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>  {
>  	struct Scsi_Host *shost = scmd->device->host;
>  	unsigned long flags;
> -	int ret = 0;
> +	int ret;
> 
> -	if (!shost->ehandler)
> -		return 0;
> +	WARN_ON_ONCE(!shost->ehandler);
>

So I saw that you already changed stuff here after Bart reviewed it. Do
you think it would be useful to make the warning per-host-instance,
rather than per-linux-instance?

Like, if for some reason the EH thread for one SCSI host dies - however
that might happen - we would get an individual warning in the klog for
this one host and could maybe even save the setup by
disabling/re-enabling the whole host - effectively removing the host and
adding a new one, plus a new EH thread for it.

With this WARN_ON_ONCE we end up in a broken setup w/o any information
what exactly broke. Previously we would get at least a SCSI-logging
message. Which would also help with analysis of such bugs - correlate
data etc.


                                                    Beste Grüße / Best regards,
                                                      - Benjamin Block

> 
>  	spin_lock_irqsave(shost->host_lock, flags);
> -	if (scsi_host_set_state(shost, SHOST_RECOVERY))
> -		if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY))
> -			goto out_unlock;
> -
> +	if (scsi_host_set_state(shost, SHOST_RECOVERY)) {
> +		ret = scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY);
> +		WARN_ON_ONCE(ret);
> +	}
>  	if (shost->eh_deadline != -1 && !shost->last_reset)
>  		shost->last_reset = jiffies;
> 
> -	ret = 1;
>  	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
>  		eh_flag &= ~SCSI_EH_CANCEL_CMD;
>  	scmd->eh_eflags |= eh_flag;
> @@ -253,9 +242,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>  	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
>  	shost->host_failed++;
>  	scsi_eh_wakeup(shost);
> - out_unlock:
>  	spin_unlock_irqrestore(shost->host_lock, flags);
> -	return ret;
>  }
> 
>  /**
> @@ -284,13 +271,11 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
>  		rtn = host->hostt->eh_timed_out(scmd);
> 
>  	if (rtn == BLK_EH_NOT_HANDLED) {
> -		if (!host->hostt->no_async_abort &&
> -		    scsi_abort_command(scmd) == SUCCESS)
> -			return BLK_EH_NOT_HANDLED;
> -
> -		set_host_byte(scmd, DID_TIME_OUT);
> -		if (!scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD))
> -			rtn = BLK_EH_HANDLED;
> +		if (host->hostt->no_async_abort ||
> +		    scsi_abort_command(scmd) != SUCCESS) {
> +			set_host_byte(scmd, DID_TIME_OUT);
> +			scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD);
> +		}
>  	}
> 
>  	return rtn;
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index f5e45a2..0735a46 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -1593,8 +1593,8 @@ static void scsi_softirq_done(struct request *rq)
>  			scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
>  			break;
>  		default:
> -			if (!scsi_eh_scmd_add(cmd, 0))
> -				scsi_finish_command(cmd);
> +			scsi_eh_scmd_add(cmd, 0);
> +			break;
>  	}
>  }
> 
> diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
> index 99bfc98..5be6cbf6 100644
> --- a/drivers/scsi/scsi_priv.h
> +++ b/drivers/scsi/scsi_priv.h
> @@ -72,7 +72,7 @@ extern int scsi_dev_info_list_add_keyed(int compatible, char *vendor,
>  extern int scsi_error_handler(void *host);
>  extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
>  extern void scsi_eh_wakeup(struct Scsi_Host *shost);
> -extern int scsi_eh_scmd_add(struct scsi_cmnd *, int);
> +extern void scsi_eh_scmd_add(struct scsi_cmnd *, int);
>  void scsi_eh_ready_devs(struct Scsi_Host *shost,
>  			struct list_head *work_q,
>  			struct list_head *done_q);
> -- 
> 1.8.5.6
> 

-- 
Linux on z Systems Development         /         IBM Systems & Technology Group
		  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz     /        Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 6/6] scsi: make asynchronous aborts mandatory
  2017-03-14 17:33   ` Benjamin Block
@ 2017-03-15 13:54     ` Hannes Reinecke
  2017-03-15 17:55       ` Benjamin Block
  0 siblings, 1 reply; 29+ messages in thread
From: Hannes Reinecke @ 2017-03-15 13:54 UTC (permalink / raw)
  To: Benjamin Block
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	Bart van Assche, linux-scsi

On 03/14/2017 06:33 PM, Benjamin Block wrote:
> Hello Hannes,
> 
> On Wed, Mar 01, 2017 at 10:15:20AM +0100, Hannes Reinecke wrote:
>> There hasn't been any reports for HBAs where asynchronous abort
>> would not work, so we should make it mandatory and remove
>> the fallback.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
>> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
>> ---
>>  Documentation/scsi/scsi_eh.txt | 18 ++++------
>>  drivers/scsi/scsi_error.c      | 81 ++++--------------------------------------
>>  drivers/scsi/scsi_lib.c        |  2 +-
>>  drivers/scsi/scsi_priv.h       |  3 +-
>>  include/scsi/scsi_host.h       |  5 ---
>>  5 files changed, 15 insertions(+), 94 deletions(-)
>>
>> diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt
>> index 4edb9c1c..11e447b 100644
>> --- a/Documentation/scsi/scsi_eh.txt
>> +++ b/Documentation/scsi/scsi_eh.txt
>> @@ -70,7 +70,7 @@ with the command.
>>  	scmd is requeued to blk queue.
>>
>>   - otherwise
>> -	scsi_eh_scmd_add(scmd, 0) is invoked for the command.  See
>> +	scsi_eh_scmd_add(scmd) is invoked for the command.  See
>>  	[1-3] for details of this function.
>>
>>
>> @@ -103,9 +103,7 @@ function
>>          eh_timed_out() callback did not handle the command.
>>  	Step #2 is taken.
>>
>> - 2. If the host supports asynchronous completion (as indicated by the
>> -    no_async_abort setting in the host template) scsi_abort_command()
>> -    is invoked to schedule an asynchrous abort.
>> + 2. scsi_abort_command() is invoked to schedule an asynchrous abort.
>>      Asynchronous abort are not invoked for commands which the
>>      SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command
>>      already had been aborted once, and this is a retry which failed),
>> @@ -127,16 +125,13 @@ function
>>
>>   scmds enter EH via scsi_eh_scmd_add(), which does the following.
>>
>> - 1. Turns on scmd->eh_eflags as requested.  It's 0 for error
>> -    completions and SCSI_EH_CANCEL_CMD for timeouts.
>> + 1. Links scmd->eh_entry to shost->eh_cmd_q
>>
>> - 2. Links scmd->eh_entry to shost->eh_cmd_q
>> + 2. Sets SHOST_RECOVERY bit in shost->shost_state
>>
>> - 3. Sets SHOST_RECOVERY bit in shost->shost_state
>> + 3. Increments shost->host_failed
>>
>> - 4. Increments shost->host_failed
>> -
>> - 5. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
>> + 4. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
>>
>>   As can be seen above, once any scmd is added to shost->eh_cmd_q,
>>  SHOST_RECOVERY shost_state bit is turned on.  This prevents any new
>> @@ -252,7 +247,6 @@ scmd->allowed.
>>
>>   1. Error completion / time out
>>      ACTION: scsi_eh_scmd_add() is invoked for scmd
>> -	- set scmd->eh_eflags
>>  	- add scmd to shost->eh_cmd_q
>>  	- set SHOST_RECOVERY
>>  	- shost->host_failed++
>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>> index 802a081..7b70ee9 100644
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -163,7 +163,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
>>  		}
>>  	}
>>
>> -	scsi_eh_scmd_add(scmd, 0);
>> +	scsi_eh_scmd_add(scmd);
>>  }
>>
>>  /**
>> @@ -217,9 +217,8 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
>>  /**
>>   * scsi_eh_scmd_add - add scsi cmd to error handling.
>>   * @scmd:	scmd to run eh on.
>> - * @eh_flag:	optional SCSI_EH flag.
>>   */
>> -void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>> +void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
>>  {
>>  	struct Scsi_Host *shost = scmd->device->host;
>>  	unsigned long flags;
>> @@ -235,9 +234,6 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>>  	if (shost->eh_deadline != -1 && !shost->last_reset)
>>  		shost->last_reset = jiffies;
>>
>> -	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
>> -		eh_flag &= ~SCSI_EH_CANCEL_CMD;
>> -	scmd->eh_eflags |= eh_flag;
>>  	scsi_eh_reset(scmd);
>>  	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
>>  	shost->host_failed++;
>> @@ -271,10 +267,9 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
>>  		rtn = host->hostt->eh_timed_out(scmd);
>>
>>  	if (rtn == BLK_EH_NOT_HANDLED) {
>> -		if (host->hostt->no_async_abort ||
>> -		    scsi_abort_command(scmd) != SUCCESS) {
>> +		if (scsi_abort_command(scmd) != SUCCESS) {
>>  			set_host_byte(scmd, DID_TIME_OUT);
>> -			scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD);
>> +			scsi_eh_scmd_add(scmd);
>>  		}
>>  	}
>>
>> @@ -327,7 +322,7 @@ static inline void scsi_eh_prt_fail_stats(struct Scsi_Host *shost,
>>  		list_for_each_entry(scmd, work_q, eh_entry) {
>>  			if (scmd->device == sdev) {
>>  				++total_failures;
>> -				if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD)
>> +				if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
>>  					++cmd_cancel;
>>  				else
>>  					++cmd_failed;
>> @@ -1153,8 +1148,7 @@ int scsi_eh_get_sense(struct list_head *work_q,
>>  	 * should not get sense.
>>  	 */
>>  	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
>> -		if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) ||
>> -		    (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ||
>> +		if ((scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ||
>>  		    SCSI_SENSE_VALID(scmd))
>>  			continue;
>>
>> @@ -1294,61 +1288,6 @@ static int scsi_eh_test_devices(struct list_head *cmd_list,
>>  	return list_empty(work_q);
>>  }
>>
>> -
>> -/**
>> - * scsi_eh_abort_cmds - abort pending commands.
>> - * @work_q:	&list_head for pending commands.
>> - * @done_q:	&list_head for processed commands.
>> - *
>> - * Decription:
>> - *    Try and see whether or not it makes sense to try and abort the
>> - *    running command.  This only works out to be the case if we have one
>> - *    command that has timed out.  If the command simply failed, it makes
>> - *    no sense to try and abort the command, since as far as the shost
>> - *    adapter is concerned, it isn't running.
>> - */
>> -static int scsi_eh_abort_cmds(struct list_head *work_q,
>> -			      struct list_head *done_q)
>> -{
>> -	struct scsi_cmnd *scmd, *next;
>> -	LIST_HEAD(check_list);
>> -	int rtn;
>> -	struct Scsi_Host *shost;
>> -
>> -	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
>> -		if (!(scmd->eh_eflags & SCSI_EH_CANCEL_CMD))
>> -			continue;
>> -		shost = scmd->device->host;
>> -		if (scsi_host_eh_past_deadline(shost)) {
>> -			list_splice_init(&check_list, work_q);
>> -			SCSI_LOG_ERROR_RECOVERY(3,
>> -				scmd_printk(KERN_INFO, scmd,
>> -					    "%s: skip aborting cmd, past eh deadline\n",
>> -					    current->comm));
>> -			return list_empty(work_q);
>> -		}
>> -		SCSI_LOG_ERROR_RECOVERY(3,
>> -			scmd_printk(KERN_INFO, scmd,
>> -				     "%s: aborting cmd\n", current->comm));
>> -		rtn = scsi_try_to_abort_cmd(shost->hostt, scmd);
>> -		if (rtn == FAILED) {
>> -			SCSI_LOG_ERROR_RECOVERY(3,
>> -				scmd_printk(KERN_INFO, scmd,
>> -					    "%s: aborting cmd failed\n",
>> -					     current->comm));
>> -			list_splice_init(&check_list, work_q);
>> -			return list_empty(work_q);
>> -		}
>> -		scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD;
>> -		if (rtn == FAST_IO_FAIL)
>> -			scsi_eh_finish_cmd(scmd, done_q);
>> -		else
>> -			list_move_tail(&scmd->eh_entry, &check_list);
>> -	}
>> -
>> -	return scsi_eh_test_devices(&check_list, work_q, done_q, 0);
>> -}
>> -
>>  /**
>>   * scsi_eh_try_stu - Send START_UNIT to device.
>>   * @scmd:	&scsi_cmnd to send START_UNIT
>> @@ -1691,11 +1630,6 @@ static void scsi_eh_offline_sdevs(struct list_head *work_q,
>>  		sdev_printk(KERN_INFO, scmd->device, "Device offlined - "
>>  			    "not ready after error recovery\n");
>>  		scsi_device_set_state(scmd->device, SDEV_OFFLINE);
>> -		if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD) {
>> -			/*
>> -			 * FIXME: Handle lost cmds.
>> -			 */
>> -		}
>>  		scsi_eh_finish_cmd(scmd, done_q);
>>  	}
>>  	return;
>> @@ -2139,8 +2073,7 @@ static void scsi_unjam_host(struct Scsi_Host *shost)
>>  	SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q));
>>
>>  	if (!scsi_eh_get_sense(&eh_work_q, &eh_done_q))
>> -		if (!scsi_eh_abort_cmds(&eh_work_q, &eh_done_q))
>> -			scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
>> +		scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
>>
>>  	spin_lock_irqsave(shost->host_lock, flags);
>>  	if (shost->eh_deadline != -1)
>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>> index 0735a46..98b2df8 100644
>> --- a/drivers/scsi/scsi_lib.c
>> +++ b/drivers/scsi/scsi_lib.c
>> @@ -1593,7 +1593,7 @@ static void scsi_softirq_done(struct request *rq)
>>  			scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
>>  			break;
>>  		default:
>> -			scsi_eh_scmd_add(cmd, 0);
>> +			scsi_eh_scmd_add(cmd);
>>  			break;
>>  	}
>>  }
>> diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
>> index 5be6cbf6..e20ab10 100644
>> --- a/drivers/scsi/scsi_priv.h
>> +++ b/drivers/scsi/scsi_priv.h
>> @@ -18,7 +18,6 @@
>>  /*
>>   * Scsi Error Handler Flags
>>   */
>> -#define SCSI_EH_CANCEL_CMD	0x0001	/* Cancel this cmd */
>>  #define SCSI_EH_ABORT_SCHEDULED	0x0002	/* Abort has been scheduled */
>>
>>  #define SCSI_SENSE_VALID(scmd) \
>> @@ -72,7 +71,7 @@ extern int scsi_dev_info_list_add_keyed(int compatible, char *vendor,
>>  extern int scsi_error_handler(void *host);
>>  extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
>>  extern void scsi_eh_wakeup(struct Scsi_Host *shost);
>> -extern void scsi_eh_scmd_add(struct scsi_cmnd *, int);
>> +extern void scsi_eh_scmd_add(struct scsi_cmnd *);
>>  void scsi_eh_ready_devs(struct Scsi_Host *shost,
>>  			struct list_head *work_q,
>>  			struct list_head *done_q);
>> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
>> index 3cd8c3b..afb0481 100644
>> --- a/include/scsi/scsi_host.h
>> +++ b/include/scsi/scsi_host.h
>> @@ -452,11 +452,6 @@ struct scsi_host_template {
>>  	unsigned no_write_same:1;
>>
>>  	/*
>> -	 * True if asynchronous aborts are not supported
>> -	 */
>> -	unsigned no_async_abort:1;
>> -
>> -	/*
>>  	 * Countdown for host blocking with no commands outstanding.
>>  	 */
>>  	unsigned int max_host_blocked;
>> -- 
>> 1.8.5.6
>>
> 
> Hmm so, I guess we compromise in terms of how granular we want to
> recover? 
> 
> When say an abort for command A on LUN 1 behind Port α fails for some
> reason, then we also skip all possible aborts for command B on LUN 2
> behind Port α and command C on LUN 1 behind Port β? (The host might
> already be in recovery by the time command B and C fail)
> 
No.
Recovery will only be started once all commands have been completed or
aborted.
Hence will be be aborting all commands before entering SCSI EH.

> I mean, in the end, all traffic holds on that host anyway, as long as we
> get into EH - which is true as soon as one abort fails. Might as well
> skip the wait time till we get there.
> 
There still is a benefit to be had from sending individual aborts; after
all, the commands are supposed to be independently, and sending an abort
serves as a pointer to the LLDD to remove any reference to it.

> I would like it, if we could document that fact a bit more direct.
> Otherwise I think that's good. The code also looks good.
> 
> 
> Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
> 
Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 4/6] scsi_error: do not escalate failed EH command
  2017-03-14 17:56   ` Benjamin Block
@ 2017-03-15 13:54     ` Hannes Reinecke
  2017-03-16 11:01       ` Benjamin Block
  0 siblings, 1 reply; 29+ messages in thread
From: Hannes Reinecke @ 2017-03-15 13:54 UTC (permalink / raw)
  To: Benjamin Block
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	Bart van Assche, linux-scsi

On 03/14/2017 06:56 PM, Benjamin Block wrote:
> Hello Hannes,
> 
> On Wed, Mar 01, 2017 at 10:15:18AM +0100, Hannes Reinecke wrote:
>> When a command is sent as part of the error handling there
>> is not point whatsoever to start EH escalation when that
>> command fails; we are _already_ in the error handler,
>> and the escalation is about to commence anyway.
>> So just call 'scsi_try_to_abort_cmd()' to abort outstanding
>> commands and let the main EH routine handle the rest.
>>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
>> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
>> ---
>>  drivers/scsi/scsi_error.c | 11 +----------
>>  1 file changed, 1 insertion(+), 10 deletions(-)
>>
>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>> index e1ca3b8..4613aa1 100644
>> --- a/drivers/scsi/scsi_error.c
>> +++ b/drivers/scsi/scsi_error.c
>> @@ -889,15 +889,6 @@ static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt,
>>  	return hostt->eh_abort_handler(scmd);
>>  }
>>
>> -static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd)
>> -{
>> -	if (scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd) != SUCCESS)
>> -		if (scsi_try_bus_device_reset(scmd) != SUCCESS)
>> -			if (scsi_try_target_reset(scmd) != SUCCESS)
>> -				if (scsi_try_bus_reset(scmd) != SUCCESS)
>> -					scsi_try_host_reset(scmd);
>> -}
>> -
>>  /**
>>   * scsi_eh_prep_cmnd  - Save a scsi command info as part of error recovery
>>   * @scmd:       SCSI command structure to hijack
>> @@ -1082,7 +1073,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
>>  			break;
>>  		}
>>  	} else if (rtn != FAILED) {
>> -		scsi_abort_eh_cmnd(scmd);
>> +		scsi_try_to_abort_cmd(shost->hostt, scmd);
>>  		rtn = FAILED;
>>  	}
> 
> The idea is sound, but this implementation would cause "use-after-free"s.
> 
> I only know our own LLD well enough to judge, but with zFCP there will
> always be a chance that an abort fails - be it memory pressure,
> hardware/firmware behavior or internal EH in zFCP.
> 
> Calling queuecommand() will mean for us in the LLD, that we allocate a
> unique internal request struct for the scsi_cmnd (struct
> zfcp_fsf_request) and add that to our internal hash-table with
> outstanding commands. We assume this scsi_cmnd-pointer is ours till we
> complete it via scsi_done are yield it via successful EH-actions.
> 
> In case the abort fails, you fail to take back the ownership over the
> scsi command. Which in turn means possible "use-after-free"s when we
> still thinks the scsi command is ours, but EH has already overwritten
> the scsi-command with the original one. When we still get an answer or
> otherwise use the scsi_cmnd-pointer we would access an invalid one.
> 
That is actually not try.
As soon as we're calling 'scsi_try_to_abort_command()' ownership is
assumed to reside in the SCSI midlayer; also, the command used for
recovery here is actually using the same structure than the failed
command, so if the command abort failed the command is already in the
list of failed commands, and will be recovered after SCSI EH returned.

So no use-after-free here.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 6/6] scsi: make asynchronous aborts mandatory
  2017-03-15 13:54     ` Hannes Reinecke
@ 2017-03-15 17:55       ` Benjamin Block
  2017-03-16 14:06         ` Hannes Reinecke
  0 siblings, 1 reply; 29+ messages in thread
From: Benjamin Block @ 2017-03-15 17:55 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	Bart van Assche, linux-scsi

On Wed, Mar 15, 2017 at 02:54:09PM +0100, Hannes Reinecke wrote:
> On 03/14/2017 06:33 PM, Benjamin Block wrote:
> > Hello Hannes,
> >
> > On Wed, Mar 01, 2017 at 10:15:20AM +0100, Hannes Reinecke wrote:
> >> There hasn't been any reports for HBAs where asynchronous abort
> >> would not work, so we should make it mandatory and remove
> >> the fallback.
> >>
> >> Signed-off-by: Hannes Reinecke <hare@suse.de>
> >> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> >> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
> >> ---
> >>  Documentation/scsi/scsi_eh.txt | 18 ++++------
> >>  drivers/scsi/scsi_error.c      | 81 ++++--------------------------------------
> >>  drivers/scsi/scsi_lib.c        |  2 +-
> >>  drivers/scsi/scsi_priv.h       |  3 +-
> >>  include/scsi/scsi_host.h       |  5 ---
> >>  5 files changed, 15 insertions(+), 94 deletions(-)
> >>
> >> diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt
> >> index 4edb9c1c..11e447b 100644
> >> --- a/Documentation/scsi/scsi_eh.txt
> >> +++ b/Documentation/scsi/scsi_eh.txt
> >> @@ -70,7 +70,7 @@ with the command.
> >>  	scmd is requeued to blk queue.
> >>
> >>   - otherwise
> >> -	scsi_eh_scmd_add(scmd, 0) is invoked for the command.  See
> >> +	scsi_eh_scmd_add(scmd) is invoked for the command.  See
> >>  	[1-3] for details of this function.
> >>
> >>
> >> @@ -103,9 +103,7 @@ function
> >>          eh_timed_out() callback did not handle the command.
> >>  	Step #2 is taken.
> >>
> >> - 2. If the host supports asynchronous completion (as indicated by the
> >> -    no_async_abort setting in the host template) scsi_abort_command()
> >> -    is invoked to schedule an asynchrous abort.
> >> + 2. scsi_abort_command() is invoked to schedule an asynchrous abort.
> >>      Asynchronous abort are not invoked for commands which the
> >>      SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command
> >>      already had been aborted once, and this is a retry which failed),
> >> @@ -127,16 +125,13 @@ function
> >>
> >>   scmds enter EH via scsi_eh_scmd_add(), which does the following.
> >>
> >> - 1. Turns on scmd->eh_eflags as requested.  It's 0 for error
> >> -    completions and SCSI_EH_CANCEL_CMD for timeouts.
> >> + 1. Links scmd->eh_entry to shost->eh_cmd_q
> >>
> >> - 2. Links scmd->eh_entry to shost->eh_cmd_q
> >> + 2. Sets SHOST_RECOVERY bit in shost->shost_state
> >>
> >> - 3. Sets SHOST_RECOVERY bit in shost->shost_state
> >> + 3. Increments shost->host_failed
> >>
> >> - 4. Increments shost->host_failed
> >> -
> >> - 5. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
> >> + 4. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
> >>
> >>   As can be seen above, once any scmd is added to shost->eh_cmd_q,
> >>  SHOST_RECOVERY shost_state bit is turned on.  This prevents any new
> >> @@ -252,7 +247,6 @@ scmd->allowed.
> >>
> >>   1. Error completion / time out
> >>      ACTION: scsi_eh_scmd_add() is invoked for scmd
> >> -	- set scmd->eh_eflags
> >>  	- add scmd to shost->eh_cmd_q
> >>  	- set SHOST_RECOVERY
> >>  	- shost->host_failed++
> >> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> >> index 802a081..7b70ee9 100644
> >> --- a/drivers/scsi/scsi_error.c
> >> +++ b/drivers/scsi/scsi_error.c
> >> @@ -163,7 +163,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
> >>  		}
> >>  	}
> >>
> >> -	scsi_eh_scmd_add(scmd, 0);
> >> +	scsi_eh_scmd_add(scmd);
> >>  }
> >>
> >>  /**
> >> @@ -217,9 +217,8 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
> >>  /**
> >>   * scsi_eh_scmd_add - add scsi cmd to error handling.
> >>   * @scmd:	scmd to run eh on.
> >> - * @eh_flag:	optional SCSI_EH flag.
> >>   */
> >> -void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
> >> +void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
> >>  {
> >>  	struct Scsi_Host *shost = scmd->device->host;
> >>  	unsigned long flags;
> >> @@ -235,9 +234,6 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
> >>  	if (shost->eh_deadline != -1 && !shost->last_reset)
> >>  		shost->last_reset = jiffies;
> >>
> >> -	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
> >> -		eh_flag &= ~SCSI_EH_CANCEL_CMD;
> >> -	scmd->eh_eflags |= eh_flag;
> >>  	scsi_eh_reset(scmd);
> >>  	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
> >>  	shost->host_failed++;
> >> @@ -271,10 +267,9 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
> >>  		rtn = host->hostt->eh_timed_out(scmd);
> >>
> >>  	if (rtn == BLK_EH_NOT_HANDLED) {
> >> -		if (host->hostt->no_async_abort ||
> >> -		    scsi_abort_command(scmd) != SUCCESS) {
> >> +		if (scsi_abort_command(scmd) != SUCCESS) {
> >>  			set_host_byte(scmd, DID_TIME_OUT);
> >> -			scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD);
> >> +			scsi_eh_scmd_add(scmd);
> >>  		}
> >>  	}
> >>
> >> @@ -327,7 +322,7 @@ static inline void scsi_eh_prt_fail_stats(struct Scsi_Host *shost,
> >>  		list_for_each_entry(scmd, work_q, eh_entry) {
> >>  			if (scmd->device == sdev) {
> >>  				++total_failures;
> >> -				if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD)
> >> +				if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
> >>  					++cmd_cancel;
> >>  				else
> >>  					++cmd_failed;
> >> @@ -1153,8 +1148,7 @@ int scsi_eh_get_sense(struct list_head *work_q,
> >>  	 * should not get sense.
> >>  	 */
> >>  	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
> >> -		if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) ||
> >> -		    (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ||
> >> +		if ((scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ||
> >>  		    SCSI_SENSE_VALID(scmd))
> >>  			continue;
> >>
> >> @@ -1294,61 +1288,6 @@ static int scsi_eh_test_devices(struct list_head *cmd_list,
> >>  	return list_empty(work_q);
> >>  }
> >>
> >> -
> >> -/**
> >> - * scsi_eh_abort_cmds - abort pending commands.
> >> - * @work_q:	&list_head for pending commands.
> >> - * @done_q:	&list_head for processed commands.
> >> - *
> >> - * Decription:
> >> - *    Try and see whether or not it makes sense to try and abort the
> >> - *    running command.  This only works out to be the case if we have one
> >> - *    command that has timed out.  If the command simply failed, it makes
> >> - *    no sense to try and abort the command, since as far as the shost
> >> - *    adapter is concerned, it isn't running.
> >> - */
> >> -static int scsi_eh_abort_cmds(struct list_head *work_q,
> >> -			      struct list_head *done_q)
> >> -{
> >> -	struct scsi_cmnd *scmd, *next;
> >> -	LIST_HEAD(check_list);
> >> -	int rtn;
> >> -	struct Scsi_Host *shost;
> >> -
> >> -	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
> >> -		if (!(scmd->eh_eflags & SCSI_EH_CANCEL_CMD))
> >> -			continue;
> >> -		shost = scmd->device->host;
> >> -		if (scsi_host_eh_past_deadline(shost)) {
> >> -			list_splice_init(&check_list, work_q);
> >> -			SCSI_LOG_ERROR_RECOVERY(3,
> >> -				scmd_printk(KERN_INFO, scmd,
> >> -					    "%s: skip aborting cmd, past eh deadline\n",
> >> -					    current->comm));
> >> -			return list_empty(work_q);
> >> -		}
> >> -		SCSI_LOG_ERROR_RECOVERY(3,
> >> -			scmd_printk(KERN_INFO, scmd,
> >> -				     "%s: aborting cmd\n", current->comm));
> >> -		rtn = scsi_try_to_abort_cmd(shost->hostt, scmd);
> >> -		if (rtn == FAILED) {
> >> -			SCSI_LOG_ERROR_RECOVERY(3,
> >> -				scmd_printk(KERN_INFO, scmd,
> >> -					    "%s: aborting cmd failed\n",
> >> -					     current->comm));
> >> -			list_splice_init(&check_list, work_q);
> >> -			return list_empty(work_q);
> >> -		}
> >> -		scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD;
> >> -		if (rtn == FAST_IO_FAIL)
> >> -			scsi_eh_finish_cmd(scmd, done_q);
> >> -		else
> >> -			list_move_tail(&scmd->eh_entry, &check_list);
> >> -	}
> >> -
> >> -	return scsi_eh_test_devices(&check_list, work_q, done_q, 0);
> >> -}
> >> -
> >>  /**
> >>   * scsi_eh_try_stu - Send START_UNIT to device.
> >>   * @scmd:	&scsi_cmnd to send START_UNIT
> >> @@ -1691,11 +1630,6 @@ static void scsi_eh_offline_sdevs(struct list_head *work_q,
> >>  		sdev_printk(KERN_INFO, scmd->device, "Device offlined - "
> >>  			    "not ready after error recovery\n");
> >>  		scsi_device_set_state(scmd->device, SDEV_OFFLINE);
> >> -		if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD) {
> >> -			/*
> >> -			 * FIXME: Handle lost cmds.
> >> -			 */
> >> -		}
> >>  		scsi_eh_finish_cmd(scmd, done_q);
> >>  	}
> >>  	return;
> >> @@ -2139,8 +2073,7 @@ static void scsi_unjam_host(struct Scsi_Host *shost)
> >>  	SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q));
> >>
> >>  	if (!scsi_eh_get_sense(&eh_work_q, &eh_done_q))
> >> -		if (!scsi_eh_abort_cmds(&eh_work_q, &eh_done_q))
> >> -			scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
> >> +		scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
> >>
> >>  	spin_lock_irqsave(shost->host_lock, flags);
> >>  	if (shost->eh_deadline != -1)
> >> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> >> index 0735a46..98b2df8 100644
> >> --- a/drivers/scsi/scsi_lib.c
> >> +++ b/drivers/scsi/scsi_lib.c
> >> @@ -1593,7 +1593,7 @@ static void scsi_softirq_done(struct request *rq)
> >>  			scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
> >>  			break;
> >>  		default:
> >> -			scsi_eh_scmd_add(cmd, 0);
> >> +			scsi_eh_scmd_add(cmd);
> >>  			break;
> >>  	}
> >>  }
> >> diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
> >> index 5be6cbf6..e20ab10 100644
> >> --- a/drivers/scsi/scsi_priv.h
> >> +++ b/drivers/scsi/scsi_priv.h
> >> @@ -18,7 +18,6 @@
> >>  /*
> >>   * Scsi Error Handler Flags
> >>   */
> >> -#define SCSI_EH_CANCEL_CMD	0x0001	/* Cancel this cmd */
> >>  #define SCSI_EH_ABORT_SCHEDULED	0x0002	/* Abort has been scheduled */
> >>
> >>  #define SCSI_SENSE_VALID(scmd) \
> >> @@ -72,7 +71,7 @@ extern int scsi_dev_info_list_add_keyed(int compatible, char *vendor,
> >>  extern int scsi_error_handler(void *host);
> >>  extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
> >>  extern void scsi_eh_wakeup(struct Scsi_Host *shost);
> >> -extern void scsi_eh_scmd_add(struct scsi_cmnd *, int);
> >> +extern void scsi_eh_scmd_add(struct scsi_cmnd *);
> >>  void scsi_eh_ready_devs(struct Scsi_Host *shost,
> >>  			struct list_head *work_q,
> >>  			struct list_head *done_q);
> >> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> >> index 3cd8c3b..afb0481 100644
> >> --- a/include/scsi/scsi_host.h
> >> +++ b/include/scsi/scsi_host.h
> >> @@ -452,11 +452,6 @@ struct scsi_host_template {
> >>  	unsigned no_write_same:1;
> >>
> >>  	/*
> >> -	 * True if asynchronous aborts are not supported
> >> -	 */
> >> -	unsigned no_async_abort:1;
> >> -
> >> -	/*
> >>  	 * Countdown for host blocking with no commands outstanding.
> >>  	 */
> >>  	unsigned int max_host_blocked;
> >> --
> >> 1.8.5.6
> >>
> >
> > Hmm so, I guess we compromise in terms of how granular we want to
> > recover?
> >
> > When say an abort for command A on LUN 1 behind Port α fails for some
> > reason, then we also skip all possible aborts for command B on LUN 2
> > behind Port α and command C on LUN 1 behind Port β? (The host might
> > already be in recovery by the time command B and C fail)
> >
> No.
> Recovery will only be started once all commands have been completed or
> aborted.
> Hence will be be aborting all commands before entering SCSI EH.
>

When you call scsi_abort_command() after a timeout happend, it will
check with scsi_host_in_recovery() whether the host has recovery already
set or not. And only if not it will schedule the abort.

When the first command A times out, it will schedule a async abort, and
if that fails it will set the state of the host in scsi_eh_scmd_add() to
SHOST_RECOVERY before even the EH thread is kicked. So if command B and
C timeout later than that, there won't be any abort scheduled.

That is with state tag v4.10, and your patchset. And I have actually
observed this behavior in real life already (granted, the timeouts
happened after injects), with the difference that previously the abort
would still happen in the EH handling.


                                                    Beste Grüße / Best regards,
                                                      - Benjamin Block

>
> > I mean, in the end, all traffic holds on that host anyway, as long as we
> > get into EH - which is true as soon as one abort fails. Might as well
> > skip the wait time till we get there.
> >
> There still is a benefit to be had from sending individual aborts; after
> all, the commands are supposed to be independently, and sending an abort
> serves as a pointer to the LLDD to remove any reference to it.
>
> > I would like it, if we could document that fact a bit more direct.
> > Otherwise I think that's good. The code also looks good.
> >
> >
> > Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
> >

--
Linux on z Systems Development         /         IBM Systems & Technology Group
		  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz     /        Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 4/6] scsi_error: do not escalate failed EH command
  2017-03-15 13:54     ` Hannes Reinecke
@ 2017-03-16 11:01       ` Benjamin Block
  2017-03-16 11:53         ` Hannes Reinecke
  0 siblings, 1 reply; 29+ messages in thread
From: Benjamin Block @ 2017-03-16 11:01 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K. Petersen, James Bottomley
  Cc: Christoph Hellwig, Bart van Assche, linux-scsi

On Wed, Mar 15, 2017 at 02:54:16PM +0100, Hannes Reinecke wrote:
> On 03/14/2017 06:56 PM, Benjamin Block wrote:
> > Hello Hannes,
> >
> > On Wed, Mar 01, 2017 at 10:15:18AM +0100, Hannes Reinecke wrote:
> >> When a command is sent as part of the error handling there
> >> is not point whatsoever to start EH escalation when that
> >> command fails; we are _already_ in the error handler,
> >> and the escalation is about to commence anyway.
> >> So just call 'scsi_try_to_abort_cmd()' to abort outstanding
> >> commands and let the main EH routine handle the rest.
> >>
> >> Signed-off-by: Hannes Reinecke <hare@suse.de>
> >> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
> >> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
> >> ---
> >>  drivers/scsi/scsi_error.c | 11 +----------
> >>  1 file changed, 1 insertion(+), 10 deletions(-)
> >>
> >> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> >> index e1ca3b8..4613aa1 100644
> >> --- a/drivers/scsi/scsi_error.c
> >> +++ b/drivers/scsi/scsi_error.c
> >> @@ -889,15 +889,6 @@ static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt,
> >>  	return hostt->eh_abort_handler(scmd);
> >>  }
> >>
> >> -static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd)
> >> -{
> >> -	if (scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd) != SUCCESS)
> >> -		if (scsi_try_bus_device_reset(scmd) != SUCCESS)
> >> -			if (scsi_try_target_reset(scmd) != SUCCESS)
> >> -				if (scsi_try_bus_reset(scmd) != SUCCESS)
> >> -					scsi_try_host_reset(scmd);
> >> -}
> >> -
> >>  /**
> >>   * scsi_eh_prep_cmnd  - Save a scsi command info as part of error recovery
> >>   * @scmd:       SCSI command structure to hijack
> >> @@ -1082,7 +1073,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
> >>  			break;
> >>  		}
> >>  	} else if (rtn != FAILED) {
> >> -		scsi_abort_eh_cmnd(scmd);
> >> +		scsi_try_to_abort_cmd(shost->hostt, scmd);
> >>  		rtn = FAILED;
> >>  	}
> >
> > The idea is sound, but this implementation would cause "use-after-free"s.
> >
> > I only know our own LLD well enough to judge, but with zFCP there will
> > always be a chance that an abort fails - be it memory pressure,
> > hardware/firmware behavior or internal EH in zFCP.
> >
> > Calling queuecommand() will mean for us in the LLD, that we allocate a
> > unique internal request struct for the scsi_cmnd (struct
> > zfcp_fsf_request) and add that to our internal hash-table with
> > outstanding commands. We assume this scsi_cmnd-pointer is ours till we
> > complete it via scsi_done are yield it via successful EH-actions.
> >
> > In case the abort fails, you fail to take back the ownership over the
> > scsi command. Which in turn means possible "use-after-free"s when we
> > still thinks the scsi command is ours, but EH has already overwritten
> > the scsi-command with the original one. When we still get an answer or
> > otherwise use the scsi_cmnd-pointer we would access an invalid one.
> >
> That is actually not try.
> As soon as we're calling 'scsi_try_to_abort_command()' ownership is
> assumed to reside in the SCSI midlayer;

That can not be true. First of all, look at the function itself (v4.10):

	static int scsi_try_to_abort_cmd...
	{
		if (!hostt->eh_abort_handler)
			return FAILED;

		return hostt->eh_abort_handler(scmd);
	}

If what you say is true, then this whole API of LLDs providing or
choosing not to provide implementations for these function would be
fundamentally broken.
The function itself returns FAILED when there is no such function.. how
is a LLD that does not implement it ever to know that you took ownership
by calling scsi_try_to_abort_cmd()?

Then look at the function-comment:

	/**
	 * scsi_try_to_abort_cmd - ...
	 * ...
	 * Notes:
	 *    SUCCESS does not necessarily indicate that the command
	 *    has been aborted; it only indicates that the LLDDs
	 *    has cleared all references to that command.
	 *    LLDDs should return FAILED only if an abort was required
	 *    but could not be executed. LLDDs should return FAST_IO_FAIL
	 *    if the device is temporarily unavailable (eg due to a
	 *    link down on FibreChannel)
	 */

While not written directly, it says that SUCCESS means the references are
cleared, ownership transferred.

Then look at the scsi_eh.txt:

	3. If !list_empty(&eh_work_q), invoke scsi_eh_abort_cmds().

	<<scsi_eh_abort_cmds>>

	    This action is taken for each timed out command when
	    no_async_abort is enabled in the host template.
	    hostt->eh_abort_handler() is invoked for each scmd.  The
	    handler returns SUCCESS if it has succeeded to make LLDD and
	    all related hardware forget about the scmd.

	    If a timedout scmd is successfully aborted and the sdev is
	    either offline or ready, scsi_eh_finish_cmd() is invoked for
	    the scmd.  Otherwise, the scmd is left in eh_work_q for
	    higher-severity actions.

Same as the function-comment, SUCCESS signals ownership transfer.

> also, the command used for
> recovery here is actually using the same structure than the failed
> command, so if the command abort failed the command is already in the
> list of failed commands, and will be recovered after SCSI EH returned.
>

That doesn't change the fact that LLDs may or may not allocate separate
internal buffers for it. We have to, to send the new FCP command, you
asking us to send. So should for some reason a reply arrive for the
eh-scsi-command that you are asking us to work on and fail to abort, we
will remember the SCSI-command pointer and use it. 
And 'reply arrive' is in case of zFCP not even hard, because the same
code-path is used when we at some point trigger an internal
adapter-recovery (independent of SCSI EH). That would cancel all
outstanding commands and would, as we think we still own the SCSI
command, set a appropriate state and finish the command with
scsi_done().


                                                    Beste Grüße / Best regards,
                                                      - Benjamin Block

>
> So no use-after-free here.
>

--
Linux on z Systems Development         /         IBM Systems & Technology Group
		  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz     /        Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 4/6] scsi_error: do not escalate failed EH command
  2017-03-16 11:01       ` Benjamin Block
@ 2017-03-16 11:53         ` Hannes Reinecke
  2017-03-21 19:05           ` Benjamin Block
  0 siblings, 1 reply; 29+ messages in thread
From: Hannes Reinecke @ 2017-03-16 11:53 UTC (permalink / raw)
  To: Benjamin Block, Martin K. Petersen, James Bottomley
  Cc: Christoph Hellwig, Bart van Assche, linux-scsi

On 03/16/2017 12:01 PM, Benjamin Block wrote:
> On Wed, Mar 15, 2017 at 02:54:16PM +0100, Hannes Reinecke wrote:
>> On 03/14/2017 06:56 PM, Benjamin Block wrote:
>>> Hello Hannes,
>>>
>>> On Wed, Mar 01, 2017 at 10:15:18AM +0100, Hannes Reinecke wrote:
>>>> When a command is sent as part of the error handling there
>>>> is not point whatsoever to start EH escalation when that
>>>> command fails; we are _already_ in the error handler,
>>>> and the escalation is about to commence anyway.
>>>> So just call 'scsi_try_to_abort_cmd()' to abort outstanding
>>>> commands and let the main EH routine handle the rest.
>>>>
>>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>>> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
>>>> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
>>>> ---
>>>>  drivers/scsi/scsi_error.c | 11 +----------
>>>>  1 file changed, 1 insertion(+), 10 deletions(-)
>>>>
>>>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>>>> index e1ca3b8..4613aa1 100644
>>>> --- a/drivers/scsi/scsi_error.c
>>>> +++ b/drivers/scsi/scsi_error.c
>>>> @@ -889,15 +889,6 @@ static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt,
>>>>  	return hostt->eh_abort_handler(scmd);
>>>>  }
>>>>
>>>> -static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd)
>>>> -{
>>>> -	if (scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd) != SUCCESS)
>>>> -		if (scsi_try_bus_device_reset(scmd) != SUCCESS)
>>>> -			if (scsi_try_target_reset(scmd) != SUCCESS)
>>>> -				if (scsi_try_bus_reset(scmd) != SUCCESS)
>>>> -					scsi_try_host_reset(scmd);
>>>> -}
>>>> -
>>>>  /**
>>>>   * scsi_eh_prep_cmnd  - Save a scsi command info as part of error recovery
>>>>   * @scmd:       SCSI command structure to hijack
>>>> @@ -1082,7 +1073,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
>>>>  			break;
>>>>  		}
>>>>  	} else if (rtn != FAILED) {
>>>> -		scsi_abort_eh_cmnd(scmd);
>>>> +		scsi_try_to_abort_cmd(shost->hostt, scmd);
>>>>  		rtn = FAILED;
>>>>  	}
>>>
>>> The idea is sound, but this implementation would cause "use-after-free"s.
>>>
>>> I only know our own LLD well enough to judge, but with zFCP there will
>>> always be a chance that an abort fails - be it memory pressure,
>>> hardware/firmware behavior or internal EH in zFCP.
>>>
>>> Calling queuecommand() will mean for us in the LLD, that we allocate a
>>> unique internal request struct for the scsi_cmnd (struct
>>> zfcp_fsf_request) and add that to our internal hash-table with
>>> outstanding commands. We assume this scsi_cmnd-pointer is ours till we
>>> complete it via scsi_done are yield it via successful EH-actions.
>>>
>>> In case the abort fails, you fail to take back the ownership over the
>>> scsi command. Which in turn means possible "use-after-free"s when we
>>> still thinks the scsi command is ours, but EH has already overwritten
>>> the scsi-command with the original one. When we still get an answer or
>>> otherwise use the scsi_cmnd-pointer we would access an invalid one.
>>>
>> That is actually not try.
>> As soon as we're calling 'scsi_try_to_abort_command()' ownership is
>> assumed to reside in the SCSI midlayer;
> 
> That can not be true. First of all, look at the function itself (v4.10):
> 
> 	static int scsi_try_to_abort_cmd...
> 	{
> 		if (!hostt->eh_abort_handler)
> 			return FAILED;
> 
> 		return hostt->eh_abort_handler(scmd);
> 	}
> 
> If what you say is true, then this whole API of LLDs providing or
> choosing not to provide implementations for these function would be
> fundamentally broken.
> The function itself returns FAILED when there is no such function.. how
> is a LLD that does not implement it ever to know that you took ownership
> by calling scsi_try_to_abort_cmd()?
> 
Well. Ok.

_Actually_, the whole concept of 'ownership' in SCSI EH is a bit flaky.

There are two ways of entering the error handler:
- SCSI command timeout
- Failing to evaluate the SCSI command status

For the latter case ownership already _is_ with the SCSI midlayer, as
the LLDD called 'scsi_done' and with that moved ownership to the midlayer.

The interesting part is command timeout.
Once a command timeout triggers the block layer is calling
'blk_mark_rq_complete' to avoid any concurrent completions.
IE any calls to scsi_done() will be short-circuited with that,
effectively transferring ownership to SCSI midlayer.

Now the SCSI midlayer has to inform the LLDD that it has taken
ownership; for that it calls the various eh_XXX callbacks into the LLDD.
While it's quite clear that SUCCESS signals a transfer of ownership to
SCSI ML, it's less clear what happens in the case of FAILED.
Problem here is that the eh_XXX callbacks actually serve a dual purpose;
one it to signal the transfer of ownership to SCSI ML and the other is
to actually _do_ some action on that command.

But as FAILED is just _one_ value we have no idea in the midlayer if the
change of ownership actually took place.

Which leads to the curious effect that _ultimatively_ control still
resides with the LLDD when host_reset fails, so we actually should
_never_ release the scsi command once host reset fails.

With scsi_try_to_abort() things are slightly different in the way that
it's called _without_ SCSI EH being engaged.
However, as scsi_try_to_abort() is called from the timeout handler
(which assumes that ownership does now reside with the midlayer) I don't
see a problem with that.

Where you are right, in fact, is that we should not return FAILED when
calling scsi_try_to_abort() when cleaning up EH commands; if the driver
does not implement this function then no cleanup can be done, so calling
scsi_try_to_abort() is just us being nice.

And I actually can see a problem with cleaning up EH commands if
scsi_try_to_abort() returns FAILED; then the LLDD has potential _two_
stale references, one for the original command and one for the command
send from SCSI EH.
The only way I would imagine this ever worked was by _reusing_ the
references to the original command, effectively sending the TMF with the
same credentials the original SCSI command had.
If a driver (for whatever reason) does _not_ do this things will fall
apart indeed.

However, this was always a problem with SCSI EH; even with the original
codepath we would have this problem, so I don't think it's a problem
with this patchset.

Nevertheless, I'll be adding a fix for eh_try_to_abort() in the context
of cleaning up EH commands.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 6/6] scsi: make asynchronous aborts mandatory
  2017-03-15 17:55       ` Benjamin Block
@ 2017-03-16 14:06         ` Hannes Reinecke
  0 siblings, 0 replies; 29+ messages in thread
From: Hannes Reinecke @ 2017-03-16 14:06 UTC (permalink / raw)
  To: Benjamin Block
  Cc: Martin K. Petersen, Christoph Hellwig, James Bottomley,
	Bart van Assche, linux-scsi

On 03/15/2017 06:55 PM, Benjamin Block wrote:
> On Wed, Mar 15, 2017 at 02:54:09PM +0100, Hannes Reinecke wrote:
>> On 03/14/2017 06:33 PM, Benjamin Block wrote:
>>> Hello Hannes,
>>>
>>> On Wed, Mar 01, 2017 at 10:15:20AM +0100, Hannes Reinecke wrote:
>>>> There hasn't been any reports for HBAs where asynchronous abort
>>>> would not work, so we should make it mandatory and remove
>>>> the fallback.
>>>>
>>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>>> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
>>>> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
>>>> ---
>>>>  Documentation/scsi/scsi_eh.txt | 18 ++++------
>>>>  drivers/scsi/scsi_error.c      | 81 ++++--------------------------------------
>>>>  drivers/scsi/scsi_lib.c        |  2 +-
>>>>  drivers/scsi/scsi_priv.h       |  3 +-
>>>>  include/scsi/scsi_host.h       |  5 ---
>>>>  5 files changed, 15 insertions(+), 94 deletions(-)
>>>>
>>>> diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt
>>>> index 4edb9c1c..11e447b 100644
>>>> --- a/Documentation/scsi/scsi_eh.txt
>>>> +++ b/Documentation/scsi/scsi_eh.txt
>>>> @@ -70,7 +70,7 @@ with the command.
>>>>  	scmd is requeued to blk queue.
>>>>
>>>>   - otherwise
>>>> -	scsi_eh_scmd_add(scmd, 0) is invoked for the command.  See
>>>> +	scsi_eh_scmd_add(scmd) is invoked for the command.  See
>>>>  	[1-3] for details of this function.
>>>>
>>>>
>>>> @@ -103,9 +103,7 @@ function
>>>>          eh_timed_out() callback did not handle the command.
>>>>  	Step #2 is taken.
>>>>
>>>> - 2. If the host supports asynchronous completion (as indicated by the
>>>> -    no_async_abort setting in the host template) scsi_abort_command()
>>>> -    is invoked to schedule an asynchrous abort.
>>>> + 2. scsi_abort_command() is invoked to schedule an asynchrous abort.
>>>>      Asynchronous abort are not invoked for commands which the
>>>>      SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command
>>>>      already had been aborted once, and this is a retry which failed),
>>>> @@ -127,16 +125,13 @@ function
>>>>
>>>>   scmds enter EH via scsi_eh_scmd_add(), which does the following.
>>>>
>>>> - 1. Turns on scmd->eh_eflags as requested.  It's 0 for error
>>>> -    completions and SCSI_EH_CANCEL_CMD for timeouts.
>>>> + 1. Links scmd->eh_entry to shost->eh_cmd_q
>>>>
>>>> - 2. Links scmd->eh_entry to shost->eh_cmd_q
>>>> + 2. Sets SHOST_RECOVERY bit in shost->shost_state
>>>>
>>>> - 3. Sets SHOST_RECOVERY bit in shost->shost_state
>>>> + 3. Increments shost->host_failed
>>>>
>>>> - 4. Increments shost->host_failed
>>>> -
>>>> - 5. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
>>>> + 4. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
>>>>
>>>>   As can be seen above, once any scmd is added to shost->eh_cmd_q,
>>>>  SHOST_RECOVERY shost_state bit is turned on.  This prevents any new
>>>> @@ -252,7 +247,6 @@ scmd->allowed.
>>>>
>>>>   1. Error completion / time out
>>>>      ACTION: scsi_eh_scmd_add() is invoked for scmd
>>>> -	- set scmd->eh_eflags
>>>>  	- add scmd to shost->eh_cmd_q
>>>>  	- set SHOST_RECOVERY
>>>>  	- shost->host_failed++
>>>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>>>> index 802a081..7b70ee9 100644
>>>> --- a/drivers/scsi/scsi_error.c
>>>> +++ b/drivers/scsi/scsi_error.c
>>>> @@ -163,7 +163,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
>>>>  		}
>>>>  	}
>>>>
>>>> -	scsi_eh_scmd_add(scmd, 0);
>>>> +	scsi_eh_scmd_add(scmd);
>>>>  }
>>>>
>>>>  /**
>>>> @@ -217,9 +217,8 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
>>>>  /**
>>>>   * scsi_eh_scmd_add - add scsi cmd to error handling.
>>>>   * @scmd:	scmd to run eh on.
>>>> - * @eh_flag:	optional SCSI_EH flag.
>>>>   */
>>>> -void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>>>> +void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
>>>>  {
>>>>  	struct Scsi_Host *shost = scmd->device->host;
>>>>  	unsigned long flags;
>>>> @@ -235,9 +234,6 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
>>>>  	if (shost->eh_deadline != -1 && !shost->last_reset)
>>>>  		shost->last_reset = jiffies;
>>>>
>>>> -	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
>>>> -		eh_flag &= ~SCSI_EH_CANCEL_CMD;
>>>> -	scmd->eh_eflags |= eh_flag;
>>>>  	scsi_eh_reset(scmd);
>>>>  	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
>>>>  	shost->host_failed++;
>>>> @@ -271,10 +267,9 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
>>>>  		rtn = host->hostt->eh_timed_out(scmd);
>>>>
>>>>  	if (rtn == BLK_EH_NOT_HANDLED) {
>>>> -		if (host->hostt->no_async_abort ||
>>>> -		    scsi_abort_command(scmd) != SUCCESS) {
>>>> +		if (scsi_abort_command(scmd) != SUCCESS) {
>>>>  			set_host_byte(scmd, DID_TIME_OUT);
>>>> -			scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD);
>>>> +			scsi_eh_scmd_add(scmd);
>>>>  		}
>>>>  	}
>>>>
>>>> @@ -327,7 +322,7 @@ static inline void scsi_eh_prt_fail_stats(struct Scsi_Host *shost,
>>>>  		list_for_each_entry(scmd, work_q, eh_entry) {
>>>>  			if (scmd->device == sdev) {
>>>>  				++total_failures;
>>>> -				if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD)
>>>> +				if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
>>>>  					++cmd_cancel;
>>>>  				else
>>>>  					++cmd_failed;
>>>> @@ -1153,8 +1148,7 @@ int scsi_eh_get_sense(struct list_head *work_q,
>>>>  	 * should not get sense.
>>>>  	 */
>>>>  	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
>>>> -		if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) ||
>>>> -		    (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ||
>>>> +		if ((scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ||
>>>>  		    SCSI_SENSE_VALID(scmd))
>>>>  			continue;
>>>>
>>>> @@ -1294,61 +1288,6 @@ static int scsi_eh_test_devices(struct list_head *cmd_list,
>>>>  	return list_empty(work_q);
>>>>  }
>>>>
>>>> -
>>>> -/**
>>>> - * scsi_eh_abort_cmds - abort pending commands.
>>>> - * @work_q:	&list_head for pending commands.
>>>> - * @done_q:	&list_head for processed commands.
>>>> - *
>>>> - * Decription:
>>>> - *    Try and see whether or not it makes sense to try and abort the
>>>> - *    running command.  This only works out to be the case if we have one
>>>> - *    command that has timed out.  If the command simply failed, it makes
>>>> - *    no sense to try and abort the command, since as far as the shost
>>>> - *    adapter is concerned, it isn't running.
>>>> - */
>>>> -static int scsi_eh_abort_cmds(struct list_head *work_q,
>>>> -			      struct list_head *done_q)
>>>> -{
>>>> -	struct scsi_cmnd *scmd, *next;
>>>> -	LIST_HEAD(check_list);
>>>> -	int rtn;
>>>> -	struct Scsi_Host *shost;
>>>> -
>>>> -	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
>>>> -		if (!(scmd->eh_eflags & SCSI_EH_CANCEL_CMD))
>>>> -			continue;
>>>> -		shost = scmd->device->host;
>>>> -		if (scsi_host_eh_past_deadline(shost)) {
>>>> -			list_splice_init(&check_list, work_q);
>>>> -			SCSI_LOG_ERROR_RECOVERY(3,
>>>> -				scmd_printk(KERN_INFO, scmd,
>>>> -					    "%s: skip aborting cmd, past eh deadline\n",
>>>> -					    current->comm));
>>>> -			return list_empty(work_q);
>>>> -		}
>>>> -		SCSI_LOG_ERROR_RECOVERY(3,
>>>> -			scmd_printk(KERN_INFO, scmd,
>>>> -				     "%s: aborting cmd\n", current->comm));
>>>> -		rtn = scsi_try_to_abort_cmd(shost->hostt, scmd);
>>>> -		if (rtn == FAILED) {
>>>> -			SCSI_LOG_ERROR_RECOVERY(3,
>>>> -				scmd_printk(KERN_INFO, scmd,
>>>> -					    "%s: aborting cmd failed\n",
>>>> -					     current->comm));
>>>> -			list_splice_init(&check_list, work_q);
>>>> -			return list_empty(work_q);
>>>> -		}
>>>> -		scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD;
>>>> -		if (rtn == FAST_IO_FAIL)
>>>> -			scsi_eh_finish_cmd(scmd, done_q);
>>>> -		else
>>>> -			list_move_tail(&scmd->eh_entry, &check_list);
>>>> -	}
>>>> -
>>>> -	return scsi_eh_test_devices(&check_list, work_q, done_q, 0);
>>>> -}
>>>> -
>>>>  /**
>>>>   * scsi_eh_try_stu - Send START_UNIT to device.
>>>>   * @scmd:	&scsi_cmnd to send START_UNIT
>>>> @@ -1691,11 +1630,6 @@ static void scsi_eh_offline_sdevs(struct list_head *work_q,
>>>>  		sdev_printk(KERN_INFO, scmd->device, "Device offlined - "
>>>>  			    "not ready after error recovery\n");
>>>>  		scsi_device_set_state(scmd->device, SDEV_OFFLINE);
>>>> -		if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD) {
>>>> -			/*
>>>> -			 * FIXME: Handle lost cmds.
>>>> -			 */
>>>> -		}
>>>>  		scsi_eh_finish_cmd(scmd, done_q);
>>>>  	}
>>>>  	return;
>>>> @@ -2139,8 +2073,7 @@ static void scsi_unjam_host(struct Scsi_Host *shost)
>>>>  	SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q));
>>>>
>>>>  	if (!scsi_eh_get_sense(&eh_work_q, &eh_done_q))
>>>> -		if (!scsi_eh_abort_cmds(&eh_work_q, &eh_done_q))
>>>> -			scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
>>>> +		scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
>>>>
>>>>  	spin_lock_irqsave(shost->host_lock, flags);
>>>>  	if (shost->eh_deadline != -1)
>>>> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
>>>> index 0735a46..98b2df8 100644
>>>> --- a/drivers/scsi/scsi_lib.c
>>>> +++ b/drivers/scsi/scsi_lib.c
>>>> @@ -1593,7 +1593,7 @@ static void scsi_softirq_done(struct request *rq)
>>>>  			scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
>>>>  			break;
>>>>  		default:
>>>> -			scsi_eh_scmd_add(cmd, 0);
>>>> +			scsi_eh_scmd_add(cmd);
>>>>  			break;
>>>>  	}
>>>>  }
>>>> diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
>>>> index 5be6cbf6..e20ab10 100644
>>>> --- a/drivers/scsi/scsi_priv.h
>>>> +++ b/drivers/scsi/scsi_priv.h
>>>> @@ -18,7 +18,6 @@
>>>>  /*
>>>>   * Scsi Error Handler Flags
>>>>   */
>>>> -#define SCSI_EH_CANCEL_CMD	0x0001	/* Cancel this cmd */
>>>>  #define SCSI_EH_ABORT_SCHEDULED	0x0002	/* Abort has been scheduled */
>>>>
>>>>  #define SCSI_SENSE_VALID(scmd) \
>>>> @@ -72,7 +71,7 @@ extern int scsi_dev_info_list_add_keyed(int compatible, char *vendor,
>>>>  extern int scsi_error_handler(void *host);
>>>>  extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
>>>>  extern void scsi_eh_wakeup(struct Scsi_Host *shost);
>>>> -extern void scsi_eh_scmd_add(struct scsi_cmnd *, int);
>>>> +extern void scsi_eh_scmd_add(struct scsi_cmnd *);
>>>>  void scsi_eh_ready_devs(struct Scsi_Host *shost,
>>>>  			struct list_head *work_q,
>>>>  			struct list_head *done_q);
>>>> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
>>>> index 3cd8c3b..afb0481 100644
>>>> --- a/include/scsi/scsi_host.h
>>>> +++ b/include/scsi/scsi_host.h
>>>> @@ -452,11 +452,6 @@ struct scsi_host_template {
>>>>  	unsigned no_write_same:1;
>>>>
>>>>  	/*
>>>> -	 * True if asynchronous aborts are not supported
>>>> -	 */
>>>> -	unsigned no_async_abort:1;
>>>> -
>>>> -	/*
>>>>  	 * Countdown for host blocking with no commands outstanding.
>>>>  	 */
>>>>  	unsigned int max_host_blocked;
>>>> --
>>>> 1.8.5.6
>>>>
>>>
>>> Hmm so, I guess we compromise in terms of how granular we want to
>>> recover?
>>>
>>> When say an abort for command A on LUN 1 behind Port α fails for some
>>> reason, then we also skip all possible aborts for command B on LUN 2
>>> behind Port α and command C on LUN 1 behind Port β? (The host might
>>> already be in recovery by the time command B and C fail)
>>>
>> No.
>> Recovery will only be started once all commands have been completed or
>> aborted.
>> Hence will be be aborting all commands before entering SCSI EH.
>>
> 
> When you call scsi_abort_command() after a timeout happend, it will
> check with scsi_host_in_recovery() whether the host has recovery already
> set or not. And only if not it will schedule the abort.
> 
Ah. Umm.

Yes, you are right...

> When the first command A times out, it will schedule a async abort, and
> if that fails it will set the state of the host in scsi_eh_scmd_add() to
> SHOST_RECOVERY before even the EH thread is kicked. So if command B and
> C timeout later than that, there won't be any abort scheduled.
> 
> That is with state tag v4.10, and your patchset. And I have actually
> observed this behavior in real life already (granted, the timeouts
> happened after injects), with the difference that previously the abort
> would still happen in the EH handling.
> 
Hmm. Okay. But then we should _not_ be checking if the host is in
recovery when sending aborts, but rather trying to do an abort anyway.
We only need to terminate/not send aborts if eh_deadline has triggered,
as then we're doing a host reset and aborts are meaningless anyway.

Will be updating the patchset.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 4/6] scsi_error: do not escalate failed EH command
  2017-03-16 11:53         ` Hannes Reinecke
@ 2017-03-21 19:05           ` Benjamin Block
  2017-03-23 13:11             ` Hannes Reinecke
  0 siblings, 1 reply; 29+ messages in thread
From: Benjamin Block @ 2017-03-21 19:05 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, James Bottomley, Christoph Hellwig,
	Bart van Assche, linux-scsi

On Thu, Mar 16, 2017 at 12:53:45PM +0100, Hannes Reinecke wrote:
>On 03/16/2017 12:01 PM, Benjamin Block wrote:
>> On Wed, Mar 15, 2017 at 02:54:16PM +0100, Hannes Reinecke wrote:
>>> On 03/14/2017 06:56 PM, Benjamin Block wrote:
>>>> Hello Hannes,
>>>>
>>>> On Wed, Mar 01, 2017 at 10:15:18AM +0100, Hannes Reinecke wrote:
>>>>> When a command is sent as part of the error handling there
>>>>> is not point whatsoever to start EH escalation when that
>>>>> command fails; we are _already_ in the error handler,
>>>>> and the escalation is about to commence anyway.
>>>>> So just call 'scsi_try_to_abort_cmd()' to abort outstanding
>>>>> commands and let the main EH routine handle the rest.
>>>>>
>>>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>>>> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
>>>>> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
>>>>> ---
>>>>>  drivers/scsi/scsi_error.c | 11 +----------
>>>>>  1 file changed, 1 insertion(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>>>>> index e1ca3b8..4613aa1 100644
>>>>> --- a/drivers/scsi/scsi_error.c
>>>>> +++ b/drivers/scsi/scsi_error.c
>>>>> @@ -889,15 +889,6 @@ static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt,
>>>>>  	return hostt->eh_abort_handler(scmd);
>>>>>  }
>>>>>
>>>>> -static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd)
>>>>> -{
>>>>> -	if (scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd) != SUCCESS)
>>>>> -		if (scsi_try_bus_device_reset(scmd) != SUCCESS)
>>>>> -			if (scsi_try_target_reset(scmd) != SUCCESS)
>>>>> -				if (scsi_try_bus_reset(scmd) != SUCCESS)
>>>>> -					scsi_try_host_reset(scmd);
>>>>> -}
>>>>> -
>>>>>  /**
>>>>>   * scsi_eh_prep_cmnd  - Save a scsi command info as part of error recovery
>>>>>   * @scmd:       SCSI command structure to hijack
>>>>> @@ -1082,7 +1073,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
>>>>>  			break;
>>>>>  		}
>>>>>  	} else if (rtn != FAILED) {
>>>>> -		scsi_abort_eh_cmnd(scmd);
>>>>> +		scsi_try_to_abort_cmd(shost->hostt, scmd);
>>>>>  		rtn = FAILED;
>>>>>  	}
>>>>
>>>> The idea is sound, but this implementation would cause "use-after-free"s.
>>>>
>>>> I only know our own LLD well enough to judge, but with zFCP there will
>>>> always be a chance that an abort fails - be it memory pressure,
>>>> hardware/firmware behavior or internal EH in zFCP.
>>>>
>>>> Calling queuecommand() will mean for us in the LLD, that we allocate a
>>>> unique internal request struct for the scsi_cmnd (struct
>>>> zfcp_fsf_request) and add that to our internal hash-table with
>>>> outstanding commands. We assume this scsi_cmnd-pointer is ours till we
>>>> complete it via scsi_done are yield it via successful EH-actions.
>>>>
>>>> In case the abort fails, you fail to take back the ownership over the
>>>> scsi command. Which in turn means possible "use-after-free"s when we
>>>> still thinks the scsi command is ours, but EH has already overwritten
>>>> the scsi-command with the original one. When we still get an answer or
>>>> otherwise use the scsi_cmnd-pointer we would access an invalid one.
>>>>
>>> That is actually not try.
>>> As soon as we're calling 'scsi_try_to_abort_command()' ownership is
>>> assumed to reside in the SCSI midlayer;
>>
>> That can not be true. First of all, look at the function itself (v4.10):
>>
>> 	static int scsi_try_to_abort_cmd...
>> 	{
>> 		if (!hostt->eh_abort_handler)
>> 			return FAILED;
>>
>> 		return hostt->eh_abort_handler(scmd);
>> 	}
>>
>> If what you say is true, then this whole API of LLDs providing or
>> choosing not to provide implementations for these function would be
>> fundamentally broken.
>> The function itself returns FAILED when there is no such function.. how
>> is a LLD that does not implement it ever to know that you took ownership
>> by calling scsi_try_to_abort_cmd()?
>>
> Well. Ok.
>
> _Actually_, the whole concept of 'ownership' in SCSI EH is a bit flaky.
>
> There are two ways of entering the error handler:
> - SCSI command timeout
> - Failing to evaluate the SCSI command status
>
> For the latter case ownership already _is_ with the SCSI midlayer, as
> the LLDD called 'scsi_done' and with that moved ownership to the midlayer.
>
> The interesting part is command timeout.
> Once a command timeout triggers the block layer is calling
> 'blk_mark_rq_complete' to avoid any concurrent completions.
> IE any calls to scsi_done() will be short-circuited with that,
> effectively transferring ownership to SCSI midlayer.
>
> Now the SCSI midlayer has to inform the LLDD that it has taken
> ownership; for that it calls the various eh_XXX callbacks into the LLDD.
> While it's quite clear that SUCCESS signals a transfer of ownership to
> SCSI ML, it's less clear what happens in the case of FAILED.
> Problem here is that the eh_XXX callbacks actually serve a dual purpose;
> one it to signal the transfer of ownership to SCSI ML and the other is
> to actually _do_ some action on that command.
>
> But as FAILED is just _one_ value we have no idea in the midlayer if the
> change of ownership actually took place.
>
> Which leads to the curious effect that _ultimatively_ control still
> resides with the LLDD when host_reset fails, so we actually should
> _never_ release the scsi command once host reset fails.
>
> With scsi_try_to_abort() things are slightly different in the way that
> it's called _without_ SCSI EH being engaged.
> However, as scsi_try_to_abort() is called from the timeout handler
> (which assumes that ownership does now reside with the midlayer) I don't
> see a problem with that.
>

Yeah, I was aware of these things, but I think we are talking about 2
different things/implications when we talk about 'ownership' in this
context here. Which makes things seem to be worse than they are.

But we seem to agree on this: if eh_XXX returns FAILED, ownership
ultimatively stays with the LLD. In which case the midlayer has to
accommodate for the possibility that for example scsi_done is called.

Anyway, see below.

>
> Where you are right, in fact, is that we should not return FAILED when
> calling scsi_try_to_abort() when cleaning up EH commands; if the driver
> does not implement this function then no cleanup can be done, so calling
> scsi_try_to_abort() is just us being nice.
>
> And I actually can see a problem with cleaning up EH commands if
> scsi_try_to_abort() returns FAILED; then the LLDD has potential _two_
> stale references, one for the original command and one for the command
> send from SCSI EH.
> The only way I would imagine this ever worked was by _reusing_ the
> references to the original command, effectively sending the TMF with the
> same credentials the original SCSI command had.
> If a driver (for whatever reason) does _not_ do this things will fall
> apart indeed.
>
> However, this was always a problem with SCSI EH; even with the original
> codepath we would have this problem, so I don't think it's a problem
> with this patchset.
>
> Nevertheless, I'll be adding a fix for eh_try_to_abort() in the context
> of cleaning up EH commands.
>

I slept over this a night or two and then I remembered that to get to
this point in the first place, at least one eh_XXX callback must have
returned SUCCESS for the command that is reused to send the EH command.
So the original reference to it should be forgotten already, if not,
then that is indeed a LLD bug.

That makes it much less troubling. But then again, sorry to say, but
that still leaves me with one objection:

When you ignore the FAIL return for the abort, it opens up the
possibility to have an unrelated EH-Command completion being triggered
from a previous EH-Command reference.

Lets say we have a command A for LUN 1 behind Port α, command B for LUN
2 behind Port β; and lets assume both FAIL the abort, so they get to the
stage where EH-Commands might be send. Both will be successfully 'freed'
after a LUN-Reset is issued for both 1 and 2.

After each LUN-Reset we send a TUR, which is the EH-Command. If A_{TUR}
times out and the abort fails (again), then the LLD might still have a
reference to A_{TUR} and with that also to its overwritten scsi_done
function-pointer.

When we later send B_{TUR}, A_{TUR} might concurrently complete, call
the scsi_done function-pointer to scsi_eh_done and complete the
host-wide EH-completion that was setup for B_{TUR} (shost->eh_action).
This is possible because shost->eh_action has a value different from
NULL for B_{TUR}.

And apart from this clear problem, I also find it still troubling that
we call scsi_eh_restore_cmnd() before being sure that the reference to
the EH-Command was forgotten. Because of that, we all of a sudden
'restore' a reference to a command that the LLD already has forgotten
before, just via a different 'LLD-Object'. And fields like
scsi_cmnd->result from A become writeable again, although the
'LLD-Object' should only point to A_{TUR}. And at the same time we
'forget' references to parts of A_{TUR} that we queued that command
with.. such as cmnd, cmnd_len, data_direction, ... . That is kinda the
original point I complained about.

All that was handled by the original code that escalated throughout the
whole process, as ugly as it might be, before restoring the command and
continuing with further possible EH-Commands. And yes, you are right,
host-reset might also fail in the old code, leaving us essentially in
the same mess. But then again, if host-reset fails, SCSI EH is pretty
much lost anyway...


                                                    Beste Grüße / Best regards,
                                                      - Benjamin Block
--
Linux on z Systems Development         /         IBM Systems & Technology Group
		  IBM Deutschland Research & Development GmbH
Vorsitz. AufsR.: Martina Koederitz     /        Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3 4/6] scsi_error: do not escalate failed EH command
  2017-03-21 19:05           ` Benjamin Block
@ 2017-03-23 13:11             ` Hannes Reinecke
  0 siblings, 0 replies; 29+ messages in thread
From: Hannes Reinecke @ 2017-03-23 13:11 UTC (permalink / raw)
  To: Benjamin Block
  Cc: Martin K. Petersen, James Bottomley, Christoph Hellwig,
	Bart van Assche, linux-scsi

On 03/21/2017 08:05 PM, Benjamin Block wrote:
> On Thu, Mar 16, 2017 at 12:53:45PM +0100, Hannes Reinecke wrote:
>> On 03/16/2017 12:01 PM, Benjamin Block wrote:
>>> On Wed, Mar 15, 2017 at 02:54:16PM +0100, Hannes Reinecke wrote:
>>>> On 03/14/2017 06:56 PM, Benjamin Block wrote:
>>>>> Hello Hannes,
>>>>>
>>>>> On Wed, Mar 01, 2017 at 10:15:18AM +0100, Hannes Reinecke wrote:
>>>>>> When a command is sent as part of the error handling there
>>>>>> is not point whatsoever to start EH escalation when that
>>>>>> command fails; we are _already_ in the error handler,
>>>>>> and the escalation is about to commence anyway.
>>>>>> So just call 'scsi_try_to_abort_cmd()' to abort outstanding
>>>>>> commands and let the main EH routine handle the rest.
>>>>>>
>>>>>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>>>>>> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
>>>>>> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
>>>>>> ---
>>>>>>  drivers/scsi/scsi_error.c | 11 +----------
>>>>>>  1 file changed, 1 insertion(+), 10 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
>>>>>> index e1ca3b8..4613aa1 100644
>>>>>> --- a/drivers/scsi/scsi_error.c
>>>>>> +++ b/drivers/scsi/scsi_error.c
>>>>>> @@ -889,15 +889,6 @@ static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt,
>>>>>>  	return hostt->eh_abort_handler(scmd);
>>>>>>  }
>>>>>>
>>>>>> -static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd)
>>>>>> -{
>>>>>> -	if (scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd) != SUCCESS)
>>>>>> -		if (scsi_try_bus_device_reset(scmd) != SUCCESS)
>>>>>> -			if (scsi_try_target_reset(scmd) != SUCCESS)
>>>>>> -				if (scsi_try_bus_reset(scmd) != SUCCESS)
>>>>>> -					scsi_try_host_reset(scmd);
>>>>>> -}
>>>>>> -
>>>>>>  /**
>>>>>>   * scsi_eh_prep_cmnd  - Save a scsi command info as part of error recovery
>>>>>>   * @scmd:       SCSI command structure to hijack
>>>>>> @@ -1082,7 +1073,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
>>>>>>  			break;
>>>>>>  		}
>>>>>>  	} else if (rtn != FAILED) {
>>>>>> -		scsi_abort_eh_cmnd(scmd);
>>>>>> +		scsi_try_to_abort_cmd(shost->hostt, scmd);
>>>>>>  		rtn = FAILED;
>>>>>>  	}
>>>>>
>>>>> The idea is sound, but this implementation would cause "use-after-free"s.
>>>>>
>>>>> I only know our own LLD well enough to judge, but with zFCP there will
>>>>> always be a chance that an abort fails - be it memory pressure,
>>>>> hardware/firmware behavior or internal EH in zFCP.
>>>>>
>>>>> Calling queuecommand() will mean for us in the LLD, that we allocate a
>>>>> unique internal request struct for the scsi_cmnd (struct
>>>>> zfcp_fsf_request) and add that to our internal hash-table with
>>>>> outstanding commands. We assume this scsi_cmnd-pointer is ours till we
>>>>> complete it via scsi_done are yield it via successful EH-actions.
>>>>>
>>>>> In case the abort fails, you fail to take back the ownership over the
>>>>> scsi command. Which in turn means possible "use-after-free"s when we
>>>>> still thinks the scsi command is ours, but EH has already overwritten
>>>>> the scsi-command with the original one. When we still get an answer or
>>>>> otherwise use the scsi_cmnd-pointer we would access an invalid one.
>>>>>
>>>> That is actually not try.
>>>> As soon as we're calling 'scsi_try_to_abort_command()' ownership is
>>>> assumed to reside in the SCSI midlayer;
>>>
>>> That can not be true. First of all, look at the function itself (v4.10):
>>>
>>> 	static int scsi_try_to_abort_cmd...
>>> 	{
>>> 		if (!hostt->eh_abort_handler)
>>> 			return FAILED;
>>>
>>> 		return hostt->eh_abort_handler(scmd);
>>> 	}
>>>
>>> If what you say is true, then this whole API of LLDs providing or
>>> choosing not to provide implementations for these function would be
>>> fundamentally broken.
>>> The function itself returns FAILED when there is no such function.. how
>>> is a LLD that does not implement it ever to know that you took ownership
>>> by calling scsi_try_to_abort_cmd()?
>>>
>> Well. Ok.
>>
>> _Actually_, the whole concept of 'ownership' in SCSI EH is a bit flaky.
>>
>> There are two ways of entering the error handler:
>> - SCSI command timeout
>> - Failing to evaluate the SCSI command status
>>
>> For the latter case ownership already _is_ with the SCSI midlayer, as
>> the LLDD called 'scsi_done' and with that moved ownership to the midlayer.
>>
>> The interesting part is command timeout.
>> Once a command timeout triggers the block layer is calling
>> 'blk_mark_rq_complete' to avoid any concurrent completions.
>> IE any calls to scsi_done() will be short-circuited with that,
>> effectively transferring ownership to SCSI midlayer.
>>
>> Now the SCSI midlayer has to inform the LLDD that it has taken
>> ownership; for that it calls the various eh_XXX callbacks into the LLDD.
>> While it's quite clear that SUCCESS signals a transfer of ownership to
>> SCSI ML, it's less clear what happens in the case of FAILED.
>> Problem here is that the eh_XXX callbacks actually serve a dual purpose;
>> one it to signal the transfer of ownership to SCSI ML and the other is
>> to actually _do_ some action on that command.
>>
>> But as FAILED is just _one_ value we have no idea in the midlayer if the
>> change of ownership actually took place.
>>
>> Which leads to the curious effect that _ultimatively_ control still
>> resides with the LLDD when host_reset fails, so we actually should
>> _never_ release the scsi command once host reset fails.
>>
>> With scsi_try_to_abort() things are slightly different in the way that
>> it's called _without_ SCSI EH being engaged.
>> However, as scsi_try_to_abort() is called from the timeout handler
>> (which assumes that ownership does now reside with the midlayer) I don't
>> see a problem with that.
>>
> 
> Yeah, I was aware of these things, but I think we are talking about 2
> different things/implications when we talk about 'ownership' in this
> context here. Which makes things seem to be worse than they are.
> 
> But we seem to agree on this: if eh_XXX returns FAILED, ownership
> ultimatively stays with the LLD. In which case the midlayer has to
> accommodate for the possibility that for example scsi_done is called.
> 
> Anyway, see below.
> 
>>
>> Where you are right, in fact, is that we should not return FAILED when
>> calling scsi_try_to_abort() when cleaning up EH commands; if the driver
>> does not implement this function then no cleanup can be done, so calling
>> scsi_try_to_abort() is just us being nice.
>>
>> And I actually can see a problem with cleaning up EH commands if
>> scsi_try_to_abort() returns FAILED; then the LLDD has potential _two_
>> stale references, one for the original command and one for the command
>> send from SCSI EH.
>> The only way I would imagine this ever worked was by _reusing_ the
>> references to the original command, effectively sending the TMF with the
>> same credentials the original SCSI command had.
>> If a driver (for whatever reason) does _not_ do this things will fall
>> apart indeed.
>>
>> However, this was always a problem with SCSI EH; even with the original
>> codepath we would have this problem, so I don't think it's a problem
>> with this patchset.
>>
>> Nevertheless, I'll be adding a fix for eh_try_to_abort() in the context
>> of cleaning up EH commands.
>>
> 
> I slept over this a night or two and then I remembered that to get to
> this point in the first place, at least one eh_XXX callback must have
> returned SUCCESS for the command that is reused to send the EH command.
> So the original reference to it should be forgotten already, if not,
> then that is indeed a LLD bug.
> 
> That makes it much less troubling. But then again, sorry to say, but
> that still leaves me with one objection:
> 
> When you ignore the FAIL return for the abort, it opens up the
> possibility to have an unrelated EH-Command completion being triggered
> from a previous EH-Command reference.
> 
> Lets say we have a command A for LUN 1 behind Port α, command B for LUN
> 2 behind Port β; and lets assume both FAIL the abort, so they get to the
> stage where EH-Commands might be send. Both will be successfully 'freed'
> after a LUN-Reset is issued for both 1 and 2.
> 
> After each LUN-Reset we send a TUR, which is the EH-Command. If A_{TUR}
> times out and the abort fails (again), then the LLD might still have a
> reference to A_{TUR} and with that also to its overwritten scsi_done
> function-pointer.
> 
> When we later send B_{TUR}, A_{TUR} might concurrently complete, call
> the scsi_done function-pointer to scsi_eh_done and complete the
> host-wide EH-completion that was setup for B_{TUR} (shost->eh_action).
> This is possible because shost->eh_action has a value different from
> NULL for B_{TUR}.
> 
Hmm. Yes, that's true. But that can be fixed by setting some new
eh_eflag in the command to indicate a TUR timeout.

> And apart from this clear problem, I also find it still troubling that
> we call scsi_eh_restore_cmnd() before being sure that the reference to
> the EH-Command was forgotten. Because of that, we all of a sudden
> 'restore' a reference to a command that the LLD already has forgotten
> before, just via a different 'LLD-Object'. And fields like
> scsi_cmnd->result from A become writeable again, although the
> 'LLD-Object' should only point to A_{TUR}. And at the same time we
> 'forget' references to parts of A_{TUR} that we queued that command
> with.. such as cmnd, cmnd_len, data_direction, ... . That is kinda the
> original point I complained about.
> 
Yeah, I see your point.

> All that was handled by the original code that escalated throughout the
> whole process, as ugly as it might be, before restoring the command and
> continuing with further possible EH-Commands. And yes, you are right,
> host-reset might also fail in the old code, leaving us essentially in
> the same mess. But then again, if host-reset fails, SCSI EH is pretty
> much lost anyway...
> 
Okay. So let's drop this patch until we've sorted this out.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-03-23 13:11 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-01  9:15 [PATCHv3 0/6] SCSI EH cleanup Hannes Reinecke
2017-03-01  9:15 ` [PATCHv3 1/6] scsi_error: count medium access timeout only once per EH run Hannes Reinecke
2017-03-01 13:50   ` Steffen Maier
2017-03-01 23:24   ` Bart Van Assche
2017-03-02  8:02     ` Hannes Reinecke
2017-03-02 20:16   ` Benjamin Block
2017-03-13 10:20     ` Hannes Reinecke
2017-03-13 13:37   ` Mauricio Faria de Oliveira
2017-03-13 14:48     ` Hannes Reinecke
2017-03-13 15:54       ` Mauricio Faria de Oliveira
2017-03-01  9:15 ` [PATCHv3 2/6] libsas: allow async aborts Hannes Reinecke
2017-03-01  9:15 ` [PATCHv3 3/6] scsi: make eh_eflags persistent Hannes Reinecke
2017-03-01 23:29   ` Bart Van Assche
2017-03-14 17:51   ` Benjamin Block
2017-03-01  9:15 ` [PATCHv3 4/6] scsi_error: do not escalate failed EH command Hannes Reinecke
2017-03-14 17:56   ` Benjamin Block
2017-03-15 13:54     ` Hannes Reinecke
2017-03-16 11:01       ` Benjamin Block
2017-03-16 11:53         ` Hannes Reinecke
2017-03-21 19:05           ` Benjamin Block
2017-03-23 13:11             ` Hannes Reinecke
2017-03-01  9:15 ` [PATCHv3 5/6] scsi: make scsi_eh_scmd_add() always succeed Hannes Reinecke
2017-03-01 23:34   ` Bart Van Assche
2017-03-14 18:05   ` Benjamin Block
2017-03-01  9:15 ` [PATCHv3 6/6] scsi: make asynchronous aborts mandatory Hannes Reinecke
2017-03-14 17:33   ` Benjamin Block
2017-03-15 13:54     ` Hannes Reinecke
2017-03-15 17:55       ` Benjamin Block
2017-03-16 14:06         ` Hannes Reinecke

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.