All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv6 0/8] SCSI EH cleanup
@ 2017-04-06 13:36 Hannes Reinecke
  2017-04-06 13:36 ` [PATCHv6 1/8] scsi_error: count medium access timeout only once per EH run Hannes Reinecke
                   ` (9 more replies)
  0 siblings, 10 replies; 18+ messages in thread
From: Hannes Reinecke @ 2017-04-06 13:36 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, Christoph Hellwig, Bart van Assche, linux-scsi,
	Hannes Reinecke

Hi all,

this is a resend of a small patchset for cleaning up SCSI EH.
Primary goal is to make asynchronous aborts mandatory; there hasn't
been a single report so far where asynchronous abort won't work, so
the 'no_async_abort' flag has never been used and will be removed
with this patchset.
Additionally there's a cleanup to detect retries of failed commands and
to inline commands aborts.
I've also included the patch to correctly reset the media access
timeout counter to avoid it being triggered inadvertedly by several
commands timing out for the same device.

As usual, comments and reviews are welcome.

Changes to v1:
- Include reviews from Christoph

Changes to v2:
- Include reviews from Bart
- Add medium access timeout patch

Changes to v3:
- Add patch to inline command abort as per suggestion from Christoph
- Drop patch to not escalate failed EH commands

Changes to v4:
- Split off sdrv->eh_action into two functions

Changes to v5:
- Simplify 'scsi: inline command aborts'
- Add Reviewed-by tags

Christoph Hellwig (1):
  libsas: allow async aborts

Hannes Reinecke (7):
  scsi_error: count medium access timeout only once per EH run
  sd: Return SUCCESS in sd_eh_action() after device offline
  scsi: always send command aborts
  scsi: make eh_eflags persistent
  scsi: make scsi_eh_scmd_add() always succeed
  scsi: make asynchronous aborts mandatory
  scsi: inline command aborts

 Documentation/scsi/scsi_eh.txt      |  30 ++---
 drivers/scsi/libsas/sas_scsi_host.c |   5 -
 drivers/scsi/scsi.c                 |   2 -
 drivers/scsi/scsi_error.c           | 213 ++++++++++++------------------------
 drivers/scsi/scsi_lib.c             |   6 +-
 drivers/scsi/scsi_priv.h            |   4 +-
 drivers/scsi/sd.c                   |  29 ++++-
 drivers/scsi/sd.h                   |   1 +
 include/scsi/scsi_cmnd.h            |   1 -
 include/scsi/scsi_driver.h          |   1 +
 include/scsi/scsi_eh.h              |   1 +
 include/scsi/scsi_host.h            |   5 -
 12 files changed, 113 insertions(+), 185 deletions(-)

-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCHv6 1/8] scsi_error: count medium access timeout only once per EH run
  2017-04-06 13:36 [PATCHv6 0/8] SCSI EH cleanup Hannes Reinecke
@ 2017-04-06 13:36 ` Hannes Reinecke
  2017-04-06 13:36 ` [PATCHv6 2/8] sd: Return SUCCESS in sd_eh_action() after device offline Hannes Reinecke
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Hannes Reinecke @ 2017-04-06 13:36 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, Christoph Hellwig, Bart van Assche, linux-scsi,
	Hannes Reinecke, Ewan Milne, Lawrence Obermann, Benjamin Block,
	Steffen Maier, Hannes Reinecke

The current medium access timeout counter will be increased for
each command, so if there are enough failed commands we'll hit
the medium access timeout for even a single device failure and
the following kernel message is displayed:

sd H:C:T:L: [sdXY] Medium access timeout failure. Offlining disk!

Fix this by making the timeout per EH run, ie the counter will
only be increased once per device and EH run.

Fixes: 18a4d0a ("[SCSI] Handle disk devices which can not process medium access commands")
Cc: Ewan Milne <emilne@redhat.com>
Cc: Lawrence Obermann <loberman@redhat.com>
Cc: Benjamin Block <bblock@linux.vnet.ibm.com>
Cc: Steffen Maier <maier@linux.vnet.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi_error.c  | 18 ++++++++++++++++++
 drivers/scsi/sd.c          | 27 ++++++++++++++++++++++++++-
 drivers/scsi/sd.h          |  1 +
 include/scsi/scsi_driver.h |  1 +
 4 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index f2cafae..370f6c0 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -221,6 +221,23 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
 }
 
 /**
+ * scsi_eh_reset - call into ->eh_action to reset internal counters
+ * @scmd:	scmd to run eh on.
+ *
+ * The scsi driver might be carrying internal state about the
+ * devices, so we need to call into the driver to reset the
+ * internal state once the error handler is started.
+ */
+static void scsi_eh_reset(struct scsi_cmnd *scmd)
+{
+	if (!blk_rq_is_passthrough(scmd->request)) {
+		struct scsi_driver *sdrv = scsi_cmd_to_driver(scmd);
+		if (sdrv->eh_reset)
+			sdrv->eh_reset(scmd);
+	}
+}
+
+/**
  * scsi_eh_scmd_add - add scsi cmd to error handling.
  * @scmd:	scmd to run eh on.
  * @eh_flag:	optional SCSI_EH flag.
@@ -249,6 +266,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
 	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
 		eh_flag &= ~SCSI_EH_CANCEL_CMD;
 	scmd->eh_eflags |= eh_flag;
+	scsi_eh_reset(scmd);
 	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
 	shost->host_failed++;
 	scsi_eh_wakeup(shost);
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d277e86..bd2a38e 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -115,6 +115,7 @@
 static int sd_init_command(struct scsi_cmnd *SCpnt);
 static void sd_uninit_command(struct scsi_cmnd *SCpnt);
 static int sd_done(struct scsi_cmnd *);
+static void sd_eh_reset(struct scsi_cmnd *);
 static int sd_eh_action(struct scsi_cmnd *, int);
 static void sd_read_capacity(struct scsi_disk *sdkp, unsigned char *buffer);
 static void scsi_disk_release(struct device *cdev);
@@ -532,6 +533,7 @@ static void sd_set_flush_flag(struct scsi_disk *sdkp)
 	.uninit_command		= sd_uninit_command,
 	.done			= sd_done,
 	.eh_action		= sd_eh_action,
+	.eh_reset		= sd_eh_reset,
 };
 
 /*
@@ -1686,6 +1688,26 @@ static int sd_pr_clear(struct block_device *bdev, u64 key)
 };
 
 /**
+ *	sd_eh_reset - reset error handling callback
+ *	@scmd:		sd-issued command that has failed
+ *
+ *	This function is called by the SCSI midlayer before starting
+ *	SCSI EH. When counting medium access failures we have to be
+ *	careful to register it only only once per device and SCSI EH run;
+ *	there might be several timed out commands which will cause the
+ *	'max_medium_access_timeouts' counter to trigger after the first
+ *	SCSI EH run already and set the device to offline.
+ *	So this function resets the internal counter before starting SCSI EH.
+ **/
+static void sd_eh_reset(struct scsi_cmnd *scmd)
+{
+	struct scsi_disk *sdkp = scsi_disk(scmd->request->rq_disk);
+
+	/* New SCSI EH run, reset gate variable */
+	sdkp->ignore_medium_access_errors = false;
+}
+
+/**
  *	sd_eh_action - error handling callback
  *	@scmd:		sd-issued command that has failed
  *	@eh_disp:	The recovery disposition suggested by the midlayer
@@ -1714,7 +1736,10 @@ static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp)
 	 * process of recovering or has it suffered an internal failure
 	 * that prevents access to the storage medium.
 	 */
-	sdkp->medium_access_timed_out++;
+	if (!sdkp->ignore_medium_access_errors) {
+		sdkp->medium_access_timed_out++;
+		sdkp->ignore_medium_access_errors = true;
+	}
 
 	/*
 	 * If the device keeps failing read/write commands but TEST UNIT
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 4dac35e..0cf9680 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -106,6 +106,7 @@ struct scsi_disk {
 	unsigned	rc_basis: 2;
 	unsigned	zoned: 2;
 	unsigned	urswrz : 1;
+	unsigned	ignore_medium_access_errors : 1;
 };
 #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
 
diff --git a/include/scsi/scsi_driver.h b/include/scsi/scsi_driver.h
index 891a658..a5534cc 100644
--- a/include/scsi/scsi_driver.h
+++ b/include/scsi/scsi_driver.h
@@ -16,6 +16,7 @@ struct scsi_driver {
 	void (*uninit_command)(struct scsi_cmnd *);
 	int (*done)(struct scsi_cmnd *);
 	int (*eh_action)(struct scsi_cmnd *, int);
+	void (*eh_reset)(struct scsi_cmnd *);
 };
 #define to_scsi_driver(drv) \
 	container_of((drv), struct scsi_driver, gendrv)
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCHv6 2/8] sd: Return SUCCESS in sd_eh_action() after device offline
  2017-04-06 13:36 [PATCHv6 0/8] SCSI EH cleanup Hannes Reinecke
  2017-04-06 13:36 ` [PATCHv6 1/8] scsi_error: count medium access timeout only once per EH run Hannes Reinecke
@ 2017-04-06 13:36 ` Hannes Reinecke
  2017-04-10 23:54   ` Bart Van Assche
  2017-04-06 13:36 ` [PATCHv6 3/8] scsi: always send command aborts Hannes Reinecke
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: Hannes Reinecke @ 2017-04-06 13:36 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, Christoph Hellwig, Bart van Assche, linux-scsi,
	Hannes Reinecke, Benjamin Block, Hannes Reinecke

If sd_eh_action() decides to take the device offline there is
no point in returning FAILED, as taking the device offline
is the ultimate step in SCSI EH anyway.
So further escalation via SCSI EH is not likely to make a
difference and we can as well return SUCCESS.

Cc: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/sd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index bd2a38e..00b168b 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1751,7 +1751,7 @@ static int sd_eh_action(struct scsi_cmnd *scmd, int eh_disp)
 			    "Medium access timeout failure. Offlining disk!\n");
 		scsi_device_set_state(scmd->device, SDEV_OFFLINE);
 
-		return FAILED;
+		return SUCCESS;
 	}
 
 	return eh_disp;
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCHv6 3/8] scsi: always send command aborts
  2017-04-06 13:36 [PATCHv6 0/8] SCSI EH cleanup Hannes Reinecke
  2017-04-06 13:36 ` [PATCHv6 1/8] scsi_error: count medium access timeout only once per EH run Hannes Reinecke
  2017-04-06 13:36 ` [PATCHv6 2/8] sd: Return SUCCESS in sd_eh_action() after device offline Hannes Reinecke
@ 2017-04-06 13:36 ` Hannes Reinecke
  2017-04-10 23:56   ` Bart Van Assche
  2017-04-06 13:36 ` [PATCHv6 4/8] libsas: allow async aborts Hannes Reinecke
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: Hannes Reinecke @ 2017-04-06 13:36 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, Christoph Hellwig, Bart van Assche, linux-scsi,
	Hannes Reinecke, Benjamin Block, Hannes Reinecke

When a command has timed out we always should be sending an
abort; with the previous code a failed abort might signal
SCSI EH to start, and all other timed out commands will
never be aborted, even though they might belong to a
different ITL nexus.

Cc: Benjamin Block <bblock@linux.vnet.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi_error.c | 12 ------------
 1 file changed, 12 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 370f6c0..cff7d9d 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -196,19 +196,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
 		return FAILED;
 	}
 
-	/*
-	 * Do not try a command abort if
-	 * SCSI EH has already started.
-	 */
 	spin_lock_irqsave(shost->host_lock, flags);
-	if (scsi_host_in_recovery(shost)) {
-		spin_unlock_irqrestore(shost->host_lock, flags);
-		SCSI_LOG_ERROR_RECOVERY(3,
-			scmd_printk(KERN_INFO, scmd,
-				    "not aborting, host in recovery\n"));
-		return FAILED;
-	}
-
 	if (shost->eh_deadline != -1 && !shost->last_reset)
 		shost->last_reset = jiffies;
 	spin_unlock_irqrestore(shost->host_lock, flags);
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCHv6 4/8] libsas: allow async aborts
  2017-04-06 13:36 [PATCHv6 0/8] SCSI EH cleanup Hannes Reinecke
                   ` (2 preceding siblings ...)
  2017-04-06 13:36 ` [PATCHv6 3/8] scsi: always send command aborts Hannes Reinecke
@ 2017-04-06 13:36 ` Hannes Reinecke
  2017-04-06 13:36 ` [PATCHv6 5/8] scsi: make eh_eflags persistent Hannes Reinecke
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Hannes Reinecke @ 2017-04-06 13:36 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, Christoph Hellwig, Bart van Assche, linux-scsi

From: Christoph Hellwig <hch@lst.de>

We now first try to call ->eh_abort_handler from a work queue, but libsas
was always failing that for no good reason.  Allow async aborts.

Reviewed-by: Johannes Thumshirn <jth@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/libsas/sas_scsi_host.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 9bd55bc..ee6b39a 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -491,9 +491,6 @@ int sas_eh_abort_handler(struct scsi_cmnd *cmd)
 	struct Scsi_Host *host = cmd->device->host;
 	struct sas_internal *i = to_sas_internal(host->transportt);
 
-	if (current != host->ehandler)
-		return FAILED;
-
 	if (!i->dft->lldd_abort_task)
 		return FAILED;
 
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCHv6 5/8] scsi: make eh_eflags persistent
  2017-04-06 13:36 [PATCHv6 0/8] SCSI EH cleanup Hannes Reinecke
                   ` (3 preceding siblings ...)
  2017-04-06 13:36 ` [PATCHv6 4/8] libsas: allow async aborts Hannes Reinecke
@ 2017-04-06 13:36 ` Hannes Reinecke
  2017-04-10 23:58   ` Bart Van Assche
  2017-04-06 13:36 ` [PATCHv6 6/8] scsi: make scsi_eh_scmd_add() always succeed Hannes Reinecke
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 18+ messages in thread
From: Hannes Reinecke @ 2017-04-06 13:36 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, Christoph Hellwig, Bart van Assche, linux-scsi,
	Hannes Reinecke

If a failed command is retried and fails again we need
to enter SCSI EH, otherwise we will never be able to
recover the command.
To detect this situation we must not clear scmd->eh_eflags
when EH finishes but rather make it persistent throughout
the lifetime of the command.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/scsi/scsi_eh.txt      | 14 +++++++-------
 drivers/scsi/libsas/sas_scsi_host.c |  2 --
 drivers/scsi/scsi_error.c           |  4 ++--
 include/scsi/scsi_eh.h              |  1 +
 4 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt
index 37eca00..4edb9c1c 100644
--- a/Documentation/scsi/scsi_eh.txt
+++ b/Documentation/scsi/scsi_eh.txt
@@ -105,11 +105,14 @@ function
 
  2. If the host supports asynchronous completion (as indicated by the
     no_async_abort setting in the host template) scsi_abort_command()
-    is invoked to schedule an asynchrous abort. If that fails
-    Step #3 is taken.
+    is invoked to schedule an asynchrous abort.
+    Asynchronous abort are not invoked for commands which the
+    SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command
+    already had been aborted once, and this is a retry which failed),
+    or when the EH deadline is expired. In these case Step #3 is taken.
 
- 2. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the
-    command.  See [1-3] for more information.
+ 3. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the
+    command.  See [1-4] for more information.
 
 [1-3] Asynchronous command aborts
 
@@ -263,7 +266,6 @@ scmd->allowed.
 
  3. scmd recovered
     ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd
-	- clear scmd->eh_eflags
 	- scsi_setup_cmd_retry()
 	- move from local eh_work_q to local eh_done_q
     LOCKING: none
@@ -456,8 +458,6 @@ except for #1 must be implemented by eh_strategy_handler().
 
  - shost->host_failed is zero.
 
- - Each scmd's eh_eflags field is cleared.
-
  - Each scmd is in such a state that scsi_setup_cmd_retry() on the
    scmd doesn't make any difference.
 
diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index ee6b39a..87e5079 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -613,8 +613,6 @@ static void sas_eh_handle_sas_errors(struct Scsi_Host *shost, struct list_head *
 		SAS_DPRINTK("trying to find task 0x%p\n", task);
 		res = sas_scsi_find_task(task);
 
-		cmd->eh_eflags = 0;
-
 		switch (res) {
 		case TASK_IS_DONE:
 			SAS_DPRINTK("%s: task 0x%p is done\n", __func__,
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index cff7d9d..4d26ff2 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -188,7 +188,6 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
 		/*
 		 * Retry after abort failed, escalate to next level.
 		 */
-		scmd->eh_eflags &= ~SCSI_EH_ABORT_SCHEDULED;
 		SCSI_LOG_ERROR_RECOVERY(3,
 			scmd_printk(KERN_INFO, scmd,
 				    "previous abort failed\n"));
@@ -937,6 +936,7 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses,
 	ses->result = scmd->result;
 	ses->underflow = scmd->underflow;
 	ses->prot_op = scmd->prot_op;
+	ses->eh_eflags = scmd->eh_eflags;
 
 	scmd->prot_op = SCSI_PROT_NORMAL;
 	scmd->eh_eflags = 0;
@@ -1000,6 +1000,7 @@ void scsi_eh_restore_cmnd(struct scsi_cmnd* scmd, struct scsi_eh_save *ses)
 	scmd->result = ses->result;
 	scmd->underflow = ses->underflow;
 	scmd->prot_op = ses->prot_op;
+	scmd->eh_eflags = ses->eh_eflags;
 }
 EXPORT_SYMBOL(scsi_eh_restore_cmnd);
 
@@ -1132,7 +1133,6 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn)
  */
 void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
 {
-	scmd->eh_eflags = 0;
 	list_move_tail(&scmd->eh_entry, done_q);
 }
 EXPORT_SYMBOL(scsi_eh_finish_cmd);
diff --git a/include/scsi/scsi_eh.h b/include/scsi/scsi_eh.h
index 98d366b..a25b328 100644
--- a/include/scsi/scsi_eh.h
+++ b/include/scsi/scsi_eh.h
@@ -31,6 +31,7 @@ extern int scsi_get_sense_info_fld(const u8 * sense_buffer, int sb_len,
 struct scsi_eh_save {
 	/* saved state */
 	int result;
+	int eh_eflags;
 	enum dma_data_direction data_direction;
 	unsigned underflow;
 	unsigned char cmd_len;
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCHv6 6/8] scsi: make scsi_eh_scmd_add() always succeed
  2017-04-06 13:36 [PATCHv6 0/8] SCSI EH cleanup Hannes Reinecke
                   ` (4 preceding siblings ...)
  2017-04-06 13:36 ` [PATCHv6 5/8] scsi: make eh_eflags persistent Hannes Reinecke
@ 2017-04-06 13:36 ` Hannes Reinecke
  2017-04-06 13:36 ` [PATCHv6 7/8] scsi: make asynchronous aborts mandatory Hannes Reinecke
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Hannes Reinecke @ 2017-04-06 13:36 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, Christoph Hellwig, Bart van Assche, linux-scsi,
	Hannes Reinecke

scsi_eh_scmd_add() currently only will fail if no
error handler thread is started (which will never be the
case) or if the state machine encounters an illegal transition.

But if we're encountering an invalid state transition
chances is we cannot fixup things with the error handler.
So better add a WARN_ON for illegal host states and
make scsi_dh_scmd_add() a void function.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/scsi_error.c | 41 +++++++++++++----------------------------
 drivers/scsi/scsi_lib.c   |  4 ++--
 drivers/scsi/scsi_priv.h  |  2 +-
 3 files changed, 16 insertions(+), 31 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 4d26ff2..bbc4318 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -162,13 +162,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
 		}
 	}
 
-	if (!scsi_eh_scmd_add(scmd, 0)) {
-		SCSI_LOG_ERROR_RECOVERY(3,
-			scmd_printk(KERN_WARNING, scmd,
-				    "terminate aborted command\n"));
-		set_host_byte(scmd, DID_TIME_OUT);
-		scsi_finish_command(scmd);
-	}
+	scsi_eh_scmd_add(scmd, 0);
 }
 
 /**
@@ -228,28 +222,23 @@ static void scsi_eh_reset(struct scsi_cmnd *scmd)
  * scsi_eh_scmd_add - add scsi cmd to error handling.
  * @scmd:	scmd to run eh on.
  * @eh_flag:	optional SCSI_EH flag.
- *
- * Return value:
- *	0 on failure.
  */
-int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
+void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
 {
 	struct Scsi_Host *shost = scmd->device->host;
 	unsigned long flags;
-	int ret = 0;
+	int ret;
 
-	if (!shost->ehandler)
-		return 0;
+	WARN_ON_ONCE(!shost->ehandler);
 
 	spin_lock_irqsave(shost->host_lock, flags);
-	if (scsi_host_set_state(shost, SHOST_RECOVERY))
-		if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY))
-			goto out_unlock;
-
+	if (scsi_host_set_state(shost, SHOST_RECOVERY)) {
+		ret = scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY);
+		WARN_ON_ONCE(ret);
+	}
 	if (shost->eh_deadline != -1 && !shost->last_reset)
 		shost->last_reset = jiffies;
 
-	ret = 1;
 	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
 		eh_flag &= ~SCSI_EH_CANCEL_CMD;
 	scmd->eh_eflags |= eh_flag;
@@ -257,9 +246,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
 	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
 	shost->host_failed++;
 	scsi_eh_wakeup(shost);
- out_unlock:
 	spin_unlock_irqrestore(shost->host_lock, flags);
-	return ret;
 }
 
 /**
@@ -288,13 +275,11 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
 		rtn = host->hostt->eh_timed_out(scmd);
 
 	if (rtn == BLK_EH_NOT_HANDLED) {
-		if (!host->hostt->no_async_abort &&
-		    scsi_abort_command(scmd) == SUCCESS)
-			return BLK_EH_NOT_HANDLED;
-
-		set_host_byte(scmd, DID_TIME_OUT);
-		if (!scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD))
-			rtn = BLK_EH_HANDLED;
+		if (host->hostt->no_async_abort ||
+		    scsi_abort_command(scmd) != SUCCESS) {
+			set_host_byte(scmd, DID_TIME_OUT);
+			scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD);
+		}
 	}
 
 	return rtn;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ba22866..ba8da90 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1593,8 +1593,8 @@ static void scsi_softirq_done(struct request *rq)
 			scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
 			break;
 		default:
-			if (!scsi_eh_scmd_add(cmd, 0))
-				scsi_finish_command(cmd);
+			scsi_eh_scmd_add(cmd, 0);
+			break;
 	}
 }
 
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 99bfc98..5be6cbf6 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -72,7 +72,7 @@ extern int scsi_dev_info_list_add_keyed(int compatible, char *vendor,
 extern int scsi_error_handler(void *host);
 extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
 extern void scsi_eh_wakeup(struct Scsi_Host *shost);
-extern int scsi_eh_scmd_add(struct scsi_cmnd *, int);
+extern void scsi_eh_scmd_add(struct scsi_cmnd *, int);
 void scsi_eh_ready_devs(struct Scsi_Host *shost,
 			struct list_head *work_q,
 			struct list_head *done_q);
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCHv6 7/8] scsi: make asynchronous aborts mandatory
  2017-04-06 13:36 [PATCHv6 0/8] SCSI EH cleanup Hannes Reinecke
                   ` (5 preceding siblings ...)
  2017-04-06 13:36 ` [PATCHv6 6/8] scsi: make scsi_eh_scmd_add() always succeed Hannes Reinecke
@ 2017-04-06 13:36 ` Hannes Reinecke
  2017-04-06 13:36 ` [PATCHv6 8/8] scsi: inline command aborts Hannes Reinecke
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 18+ messages in thread
From: Hannes Reinecke @ 2017-04-06 13:36 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, Christoph Hellwig, Bart van Assche, linux-scsi,
	Hannes Reinecke

There hasn't been any reports for HBAs where asynchronous abort
would not work, so we should make it mandatory and remove
the fallback.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 Documentation/scsi/scsi_eh.txt | 18 ++++------
 drivers/scsi/scsi_error.c      | 81 ++++--------------------------------------
 drivers/scsi/scsi_lib.c        |  2 +-
 drivers/scsi/scsi_priv.h       |  3 +-
 include/scsi/scsi_host.h       |  5 ---
 5 files changed, 15 insertions(+), 94 deletions(-)

diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt
index 4edb9c1c..11e447b 100644
--- a/Documentation/scsi/scsi_eh.txt
+++ b/Documentation/scsi/scsi_eh.txt
@@ -70,7 +70,7 @@ with the command.
 	scmd is requeued to blk queue.
 
  - otherwise
-	scsi_eh_scmd_add(scmd, 0) is invoked for the command.  See
+	scsi_eh_scmd_add(scmd) is invoked for the command.  See
 	[1-3] for details of this function.
 
 
@@ -103,9 +103,7 @@ function
         eh_timed_out() callback did not handle the command.
 	Step #2 is taken.
 
- 2. If the host supports asynchronous completion (as indicated by the
-    no_async_abort setting in the host template) scsi_abort_command()
-    is invoked to schedule an asynchrous abort.
+ 2. scsi_abort_command() is invoked to schedule an asynchrous abort.
     Asynchronous abort are not invoked for commands which the
     SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command
     already had been aborted once, and this is a retry which failed),
@@ -127,16 +125,13 @@ function
 
  scmds enter EH via scsi_eh_scmd_add(), which does the following.
 
- 1. Turns on scmd->eh_eflags as requested.  It's 0 for error
-    completions and SCSI_EH_CANCEL_CMD for timeouts.
+ 1. Links scmd->eh_entry to shost->eh_cmd_q
 
- 2. Links scmd->eh_entry to shost->eh_cmd_q
+ 2. Sets SHOST_RECOVERY bit in shost->shost_state
 
- 3. Sets SHOST_RECOVERY bit in shost->shost_state
+ 3. Increments shost->host_failed
 
- 4. Increments shost->host_failed
-
- 5. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
+ 4. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
 
  As can be seen above, once any scmd is added to shost->eh_cmd_q,
 SHOST_RECOVERY shost_state bit is turned on.  This prevents any new
@@ -252,7 +247,6 @@ scmd->allowed.
 
  1. Error completion / time out
     ACTION: scsi_eh_scmd_add() is invoked for scmd
-	- set scmd->eh_eflags
 	- add scmd to shost->eh_cmd_q
 	- set SHOST_RECOVERY
 	- shost->host_failed++
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index bbc4318..53e3343 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -162,7 +162,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
 		}
 	}
 
-	scsi_eh_scmd_add(scmd, 0);
+	scsi_eh_scmd_add(scmd);
 }
 
 /**
@@ -221,9 +221,8 @@ static void scsi_eh_reset(struct scsi_cmnd *scmd)
 /**
  * scsi_eh_scmd_add - add scsi cmd to error handling.
  * @scmd:	scmd to run eh on.
- * @eh_flag:	optional SCSI_EH flag.
  */
-void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
+void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
 {
 	struct Scsi_Host *shost = scmd->device->host;
 	unsigned long flags;
@@ -239,9 +238,6 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
 	if (shost->eh_deadline != -1 && !shost->last_reset)
 		shost->last_reset = jiffies;
 
-	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
-		eh_flag &= ~SCSI_EH_CANCEL_CMD;
-	scmd->eh_eflags |= eh_flag;
 	scsi_eh_reset(scmd);
 	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
 	shost->host_failed++;
@@ -275,10 +271,9 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
 		rtn = host->hostt->eh_timed_out(scmd);
 
 	if (rtn == BLK_EH_NOT_HANDLED) {
-		if (host->hostt->no_async_abort ||
-		    scsi_abort_command(scmd) != SUCCESS) {
+		if (scsi_abort_command(scmd) != SUCCESS) {
 			set_host_byte(scmd, DID_TIME_OUT);
-			scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD);
+			scsi_eh_scmd_add(scmd);
 		}
 	}
 
@@ -331,7 +326,7 @@ static inline void scsi_eh_prt_fail_stats(struct Scsi_Host *shost,
 		list_for_each_entry(scmd, work_q, eh_entry) {
 			if (scmd->device == sdev) {
 				++total_failures;
-				if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD)
+				if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
 					++cmd_cancel;
 				else
 					++cmd_failed;
@@ -1154,8 +1149,7 @@ int scsi_eh_get_sense(struct list_head *work_q,
 	 * should not get sense.
 	 */
 	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
-		if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) ||
-		    (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ||
+		if ((scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ||
 		    SCSI_SENSE_VALID(scmd))
 			continue;
 
@@ -1295,61 +1289,6 @@ static int scsi_eh_test_devices(struct list_head *cmd_list,
 	return list_empty(work_q);
 }
 
-
-/**
- * scsi_eh_abort_cmds - abort pending commands.
- * @work_q:	&list_head for pending commands.
- * @done_q:	&list_head for processed commands.
- *
- * Decription:
- *    Try and see whether or not it makes sense to try and abort the
- *    running command.  This only works out to be the case if we have one
- *    command that has timed out.  If the command simply failed, it makes
- *    no sense to try and abort the command, since as far as the shost
- *    adapter is concerned, it isn't running.
- */
-static int scsi_eh_abort_cmds(struct list_head *work_q,
-			      struct list_head *done_q)
-{
-	struct scsi_cmnd *scmd, *next;
-	LIST_HEAD(check_list);
-	int rtn;
-	struct Scsi_Host *shost;
-
-	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
-		if (!(scmd->eh_eflags & SCSI_EH_CANCEL_CMD))
-			continue;
-		shost = scmd->device->host;
-		if (scsi_host_eh_past_deadline(shost)) {
-			list_splice_init(&check_list, work_q);
-			SCSI_LOG_ERROR_RECOVERY(3,
-				scmd_printk(KERN_INFO, scmd,
-					    "%s: skip aborting cmd, past eh deadline\n",
-					    current->comm));
-			return list_empty(work_q);
-		}
-		SCSI_LOG_ERROR_RECOVERY(3,
-			scmd_printk(KERN_INFO, scmd,
-				     "%s: aborting cmd\n", current->comm));
-		rtn = scsi_try_to_abort_cmd(shost->hostt, scmd);
-		if (rtn == FAILED) {
-			SCSI_LOG_ERROR_RECOVERY(3,
-				scmd_printk(KERN_INFO, scmd,
-					    "%s: aborting cmd failed\n",
-					     current->comm));
-			list_splice_init(&check_list, work_q);
-			return list_empty(work_q);
-		}
-		scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD;
-		if (rtn == FAST_IO_FAIL)
-			scsi_eh_finish_cmd(scmd, done_q);
-		else
-			list_move_tail(&scmd->eh_entry, &check_list);
-	}
-
-	return scsi_eh_test_devices(&check_list, work_q, done_q, 0);
-}
-
 /**
  * scsi_eh_try_stu - Send START_UNIT to device.
  * @scmd:	&scsi_cmnd to send START_UNIT
@@ -1692,11 +1631,6 @@ static void scsi_eh_offline_sdevs(struct list_head *work_q,
 		sdev_printk(KERN_INFO, scmd->device, "Device offlined - "
 			    "not ready after error recovery\n");
 		scsi_device_set_state(scmd->device, SDEV_OFFLINE);
-		if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD) {
-			/*
-			 * FIXME: Handle lost cmds.
-			 */
-		}
 		scsi_eh_finish_cmd(scmd, done_q);
 	}
 	return;
@@ -2140,8 +2074,7 @@ static void scsi_unjam_host(struct Scsi_Host *shost)
 	SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q));
 
 	if (!scsi_eh_get_sense(&eh_work_q, &eh_done_q))
-		if (!scsi_eh_abort_cmds(&eh_work_q, &eh_done_q))
-			scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
+		scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
 
 	spin_lock_irqsave(shost->host_lock, flags);
 	if (shost->eh_deadline != -1)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ba8da90..9822fde 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1593,7 +1593,7 @@ static void scsi_softirq_done(struct request *rq)
 			scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
 			break;
 		default:
-			scsi_eh_scmd_add(cmd, 0);
+			scsi_eh_scmd_add(cmd);
 			break;
 	}
 }
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 5be6cbf6..e20ab10 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -18,7 +18,6 @@
 /*
  * Scsi Error Handler Flags
  */
-#define SCSI_EH_CANCEL_CMD	0x0001	/* Cancel this cmd */
 #define SCSI_EH_ABORT_SCHEDULED	0x0002	/* Abort has been scheduled */
 
 #define SCSI_SENSE_VALID(scmd) \
@@ -72,7 +71,7 @@ extern int scsi_dev_info_list_add_keyed(int compatible, char *vendor,
 extern int scsi_error_handler(void *host);
 extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
 extern void scsi_eh_wakeup(struct Scsi_Host *shost);
-extern void scsi_eh_scmd_add(struct scsi_cmnd *, int);
+extern void scsi_eh_scmd_add(struct scsi_cmnd *);
 void scsi_eh_ready_devs(struct Scsi_Host *shost,
 			struct list_head *work_q,
 			struct list_head *done_q);
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 3cd8c3b..afb0481 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -452,11 +452,6 @@ struct scsi_host_template {
 	unsigned no_write_same:1;
 
 	/*
-	 * True if asynchronous aborts are not supported
-	 */
-	unsigned no_async_abort:1;
-
-	/*
 	 * Countdown for host blocking with no commands outstanding.
 	 */
 	unsigned int max_host_blocked;
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCHv6 8/8] scsi: inline command aborts
  2017-04-06 13:36 [PATCHv6 0/8] SCSI EH cleanup Hannes Reinecke
                   ` (6 preceding siblings ...)
  2017-04-06 13:36 ` [PATCHv6 7/8] scsi: make asynchronous aborts mandatory Hannes Reinecke
@ 2017-04-06 13:36 ` Hannes Reinecke
  2017-04-10  7:53   ` Christoph Hellwig
  2017-04-11  0:19   ` Bart Van Assche
  2017-04-06 17:09 ` [PATCHv6 0/8] SCSI EH cleanup Martin K. Petersen
  2017-04-06 17:52 ` Benjamin Block
  9 siblings, 2 replies; 18+ messages in thread
From: Hannes Reinecke @ 2017-04-06 13:36 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: James Bottomley, Christoph Hellwig, Bart van Assche, linux-scsi,
	Hannes Reinecke, Hannes Reinecke

The block layer always calls the timeout function from a workqueue
context, so there is no need to have yet another workqueue for
running command aborts.

Signed-off-by: Hannes Reinecke <hare@suse.com>
---
 drivers/scsi/scsi.c       |  2 --
 drivers/scsi/scsi_error.c | 83 +++++++++++++++++++++++------------------------
 drivers/scsi/scsi_lib.c   |  2 --
 drivers/scsi/scsi_priv.h  |  1 -
 include/scsi/scsi_cmnd.h  |  1 -
 5 files changed, 41 insertions(+), 48 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 7bfbcfa..fdec73e 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -115,8 +115,6 @@ void scsi_put_command(struct scsi_cmnd *cmd)
 	BUG_ON(list_empty(&cmd->list));
 	list_del_init(&cmd->list);
 	spin_unlock_irqrestore(&cmd->device->list_lock, flags);
-
-	BUG_ON(delayed_work_pending(&cmd->abort_work));
 }
 
 #ifdef CONFIG_SCSI_LOGGING
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 53e3343..7ce1268 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -115,11 +115,9 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
  * scmd_eh_abort_handler - Handle command aborts
  * @work:	command to be aborted.
  */
-void
-scmd_eh_abort_handler(struct work_struct *work)
+int
+scmd_eh_abort_handler(struct scsi_cmnd *scmd)
 {
-	struct scsi_cmnd *scmd =
-		container_of(work, struct scsi_cmnd, abort_work.work);
 	struct scsi_device *sdev = scmd->device;
 	int rtn;
 
@@ -127,42 +125,41 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
 		SCSI_LOG_ERROR_RECOVERY(3,
 			scmd_printk(KERN_INFO, scmd,
 				    "eh timeout, not aborting\n"));
-	} else {
+		return FAILED;
+	}
+	SCSI_LOG_ERROR_RECOVERY(3,
+		scmd_printk(KERN_INFO, scmd,
+			    "aborting command\n"));
+	rtn = scsi_try_to_abort_cmd(sdev->host->hostt, scmd);
+	if (rtn != SUCCESS) {
 		SCSI_LOG_ERROR_RECOVERY(3,
 			scmd_printk(KERN_INFO, scmd,
-				    "aborting command\n"));
-		rtn = scsi_try_to_abort_cmd(sdev->host->hostt, scmd);
-		if (rtn == SUCCESS) {
-			set_host_byte(scmd, DID_TIME_OUT);
-			if (scsi_host_eh_past_deadline(sdev->host)) {
-				SCSI_LOG_ERROR_RECOVERY(3,
-					scmd_printk(KERN_INFO, scmd,
-						    "eh timeout, not retrying "
-						    "aborted command\n"));
-			} else if (!scsi_noretry_cmd(scmd) &&
-			    (++scmd->retries <= scmd->allowed)) {
-				SCSI_LOG_ERROR_RECOVERY(3,
-					scmd_printk(KERN_WARNING, scmd,
-						    "retry aborted command\n"));
-				scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
-				return;
-			} else {
-				SCSI_LOG_ERROR_RECOVERY(3,
-					scmd_printk(KERN_WARNING, scmd,
-						    "finish aborted command\n"));
-				scsi_finish_command(scmd);
-				return;
-			}
-		} else {
-			SCSI_LOG_ERROR_RECOVERY(3,
-				scmd_printk(KERN_INFO, scmd,
-					    "cmd abort %s\n",
-					    (rtn == FAST_IO_FAIL) ?
-					    "not send" : "failed"));
-		}
+				    "cmd abort %s\n",
+				    (rtn == FAST_IO_FAIL) ?
+				    "not send" : "failed"));
+		return rtn;
 	}
-
-	scsi_eh_scmd_add(scmd);
+	set_host_byte(scmd, DID_TIME_OUT);
+	if (scsi_host_eh_past_deadline(sdev->host)) {
+		SCSI_LOG_ERROR_RECOVERY(3,
+			scmd_printk(KERN_INFO, scmd,
+				    "eh timeout, not retrying "
+				    "aborted command\n"));
+		return FAILED;
+	}
+	if (!scsi_noretry_cmd(scmd) &&
+		   (++scmd->retries <= scmd->allowed)) {
+		SCSI_LOG_ERROR_RECOVERY(3,
+			scmd_printk(KERN_WARNING, scmd,
+				    "retry aborted command\n"));
+		scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
+	} else {
+		SCSI_LOG_ERROR_RECOVERY(3,
+			scmd_printk(KERN_WARNING, scmd,
+				    "finish aborted command\n"));
+		scsi_finish_command(scmd);
+	}
+	return SUCCESS;
 }
 
 /**
@@ -185,7 +182,6 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
 		SCSI_LOG_ERROR_RECOVERY(3,
 			scmd_printk(KERN_INFO, scmd,
 				    "previous abort failed\n"));
-		BUG_ON(delayed_work_pending(&scmd->abort_work));
 		return FAILED;
 	}
 
@@ -197,8 +193,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
 	scmd->eh_eflags |= SCSI_EH_ABORT_SCHEDULED;
 	SCSI_LOG_ERROR_RECOVERY(3,
 		scmd_printk(KERN_INFO, scmd, "abort scheduled\n"));
-	queue_delayed_work(shost->tmf_work_q, &scmd->abort_work, HZ / 100);
-	return SUCCESS;
+	return scmd_eh_abort_handler(scmd);
 }
 
 /**
@@ -271,10 +266,14 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
 		rtn = host->hostt->eh_timed_out(scmd);
 
 	if (rtn == BLK_EH_NOT_HANDLED) {
-		if (scsi_abort_command(scmd) != SUCCESS) {
+		int ret;
+
+		ret = scsi_abort_command(scmd);
+		if (ret == FAILED) {
 			set_host_byte(scmd, DID_TIME_OUT);
 			scsi_eh_scmd_add(scmd);
-		}
+		} else if (ret == FAST_IO_FAIL)
+			rtn = BLK_EH_RESET_TIMER;
 	}
 
 	return rtn;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 9822fde..2ae00b8 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1146,7 +1146,6 @@ void scsi_init_command(struct scsi_device *dev, struct scsi_cmnd *cmd)
 	cmd->device = dev;
 	cmd->sense_buffer = buf;
 	cmd->prot_sdb = prot;
-	INIT_DELAYED_WORK(&cmd->abort_work, scmd_eh_abort_handler);
 	cmd->jiffies_at_alloc = jiffies;
 
 	spin_lock_irqsave(&dev->list_lock, flags);
@@ -1863,7 +1862,6 @@ static int scsi_mq_prep_fn(struct request *req)
 	cmd->prot_op = SCSI_PROT_NORMAL;
 
 	INIT_LIST_HEAD(&cmd->list);
-	INIT_DELAYED_WORK(&cmd->abort_work, scmd_eh_abort_handler);
 	cmd->jiffies_at_alloc = jiffies;
 
 	if (shost->use_cmd_list) {
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index e20ab10..e7f43b9 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -66,7 +66,6 @@ extern int scsi_dev_info_list_add_keyed(int compatible, char *vendor,
 extern void scsi_exit_devinfo(void);
 
 /* scsi_error.c */
-extern void scmd_eh_abort_handler(struct work_struct *work);
 extern enum blk_eh_timer_return scsi_times_out(struct request *req);
 extern int scsi_error_handler(void *host);
 extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
diff --git a/include/scsi/scsi_cmnd.h b/include/scsi/scsi_cmnd.h
index b379f93..a718fda 100644
--- a/include/scsi/scsi_cmnd.h
+++ b/include/scsi/scsi_cmnd.h
@@ -62,7 +62,6 @@ struct scsi_cmnd {
 	struct scsi_device *device;
 	struct list_head list;  /* scsi_cmnd participates in queue lists */
 	struct list_head eh_entry; /* entry for the host eh_cmd_q */
-	struct delayed_work abort_work;
 	int eh_eflags;		/* Used by error handlr */
 
 	/*
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCHv6 0/8] SCSI EH cleanup
  2017-04-06 13:36 [PATCHv6 0/8] SCSI EH cleanup Hannes Reinecke
                   ` (7 preceding siblings ...)
  2017-04-06 13:36 ` [PATCHv6 8/8] scsi: inline command aborts Hannes Reinecke
@ 2017-04-06 17:09 ` Martin K. Petersen
  2017-04-06 17:52 ` Benjamin Block
  9 siblings, 0 replies; 18+ messages in thread
From: Martin K. Petersen @ 2017-04-06 17:09 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, James Bottomley, Christoph Hellwig,
	Bart van Assche, linux-scsi

Hannes Reinecke <hare@suse.de> writes:

Hannes,

> this is a resend of a small patchset for cleaning up SCSI EH.  Primary
> goal is to make asynchronous aborts mandatory; there hasn't been a
> single report so far where asynchronous abort won't work, so the
> 'no_async_abort' flag has never been used and will be removed with
> this patchset.  Additionally there's a cleanup to detect retries of
> failed commands and to inline commands aborts.  I've also included the
> patch to correctly reset the media access timeout counter to avoid it
> being triggered inadvertedly by several commands timing out for the
> same device.

I would still have liked to see the medium access failures be tracked on
a per-command basis. But since this is my concoction, I guess I'll have
to fix that up after I get back from vacation.

Applied 1-7 to 4.12/scsi-queue. Patch 8 needs reviews.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv6 0/8] SCSI EH cleanup
  2017-04-06 13:36 [PATCHv6 0/8] SCSI EH cleanup Hannes Reinecke
                   ` (8 preceding siblings ...)
  2017-04-06 17:09 ` [PATCHv6 0/8] SCSI EH cleanup Martin K. Petersen
@ 2017-04-06 17:52 ` Benjamin Block
  9 siblings, 0 replies; 18+ messages in thread
From: Benjamin Block @ 2017-04-06 17:52 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, James Bottomley, Christoph Hellwig,
	Bart van Assche, linux-scsi

I really would have liked a stable-tag and -send for patches 1 and 2
(both Medium Access Timeout fixes). I think I or at least Steffen also
asked for that.

We saw several real-life occurrences with this bug and I think that
would have qualified for stable just fine.



                                                    Beste Grüße / Best regards,
                                                      - Benjamin Block
-- 
Linux on z Systems Development         /         IBM Systems & Technology Group
		  IBM Deutschland Research & Development GmbH 
Vorsitz. AufsR.: Martina Koederitz     /        Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen / Registergericht: AmtsG Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv6 8/8] scsi: inline command aborts
  2017-04-06 13:36 ` [PATCHv6 8/8] scsi: inline command aborts Hannes Reinecke
@ 2017-04-10  7:53   ` Christoph Hellwig
  2017-04-11  0:19   ` Bart Van Assche
  1 sibling, 0 replies; 18+ messages in thread
From: Christoph Hellwig @ 2017-04-10  7:53 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Martin K. Petersen, James Bottomley, Christoph Hellwig,
	Bart van Assche, linux-scsi, Hannes Reinecke

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv6 2/8] sd: Return SUCCESS in sd_eh_action() after device offline
  2017-04-06 13:36 ` [PATCHv6 2/8] sd: Return SUCCESS in sd_eh_action() after device offline Hannes Reinecke
@ 2017-04-10 23:54   ` Bart Van Assche
  0 siblings, 0 replies; 18+ messages in thread
From: Bart Van Assche @ 2017-04-10 23:54 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: hch, james.bottomley, linux-scsi, bblock, hare

On Thu, 2017-04-06 at 15:36 +0200, Hannes Reinecke wrote:
> If sd_eh_action() decides to take the device offline there is
> no point in returning FAILED, as taking the device offline
> is the ultimate step in SCSI EH anyway.
> So further escalation via SCSI EH is not likely to make a
> difference and we can as well return SUCCESS.

Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv6 3/8] scsi: always send command aborts
  2017-04-06 13:36 ` [PATCHv6 3/8] scsi: always send command aborts Hannes Reinecke
@ 2017-04-10 23:56   ` Bart Van Assche
  0 siblings, 0 replies; 18+ messages in thread
From: Bart Van Assche @ 2017-04-10 23:56 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: hch, james.bottomley, linux-scsi, bblock, hare

On Thu, 2017-04-06 at 15:36 +0200, Hannes Reinecke wrote:
> When a command has timed out we always should be sending an
> abort; with the previous code a failed abort might signal
> SCSI EH to start, and all other timed out commands will
> never be aborted, even though they might belong to a
> different ITL nexus.

Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv6 5/8] scsi: make eh_eflags persistent
  2017-04-06 13:36 ` [PATCHv6 5/8] scsi: make eh_eflags persistent Hannes Reinecke
@ 2017-04-10 23:58   ` Bart Van Assche
  0 siblings, 0 replies; 18+ messages in thread
From: Bart Van Assche @ 2017-04-10 23:58 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: hch, james.bottomley, linux-scsi

On Thu, 2017-04-06 at 15:36 +0200, Hannes Reinecke wrote:
> +    is invoked to schedule an asynchrous abort.
                                 ^^^^^^^^^^
Sorry that I hadn't noticed this before but if you have to repost this patch
please fix the spelling of this word.

Bart.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv6 8/8] scsi: inline command aborts
  2017-04-06 13:36 ` [PATCHv6 8/8] scsi: inline command aborts Hannes Reinecke
  2017-04-10  7:53   ` Christoph Hellwig
@ 2017-04-11  0:19   ` Bart Van Assche
  2017-04-20  2:14     ` Martin K. Petersen
  1 sibling, 1 reply; 18+ messages in thread
From: Bart Van Assche @ 2017-04-11  0:19 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: hch, james.bottomley, linux-scsi, hare

On Thu, 2017-04-06 at 15:36 +0200, Hannes Reinecke wrote:
> The block layer always calls the timeout function from a workqueue
> context, so there is no need to have yet another workqueue for
> running command aborts.
> 
> [ ... ]
> @@ -271,10 +266,14 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
>  		rtn = host->hostt->eh_timed_out(scmd);
>  
>  	if (rtn == BLK_EH_NOT_HANDLED) {
> -		if (scsi_abort_command(scmd) != SUCCESS) {
> +		int ret;
> +
> +		ret = scsi_abort_command(scmd);
> +		if (ret == FAILED) {
>  			set_host_byte(scmd, DID_TIME_OUT);
>  			scsi_eh_scmd_add(scmd);
> -		}
> +		} else if (ret == FAST_IO_FAIL)
> +			rtn = BLK_EH_RESET_TIMER;
>  	}

Has this patch been tested with the traditional block layer? For the
traditional block layer scsi_times_out() is called with the queue lock
held. Does this patch cause .eh_abort_handler(), a function that may
sleep, to be called with the queue lock held?

Bart.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv6 8/8] scsi: inline command aborts
  2017-04-11  0:19   ` Bart Van Assche
@ 2017-04-20  2:14     ` Martin K. Petersen
  2017-04-24 12:41       ` Hannes Reinecke
  0 siblings, 1 reply; 18+ messages in thread
From: Martin K. Petersen @ 2017-04-20  2:14 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: hare, martin.petersen, hch, james.bottomley, linux-scsi, hare

Bart Van Assche <Bart.VanAssche@sandisk.com> writes:

> On Thu, 2017-04-06 at 15:36 +0200, Hannes Reinecke wrote:
>> The block layer always calls the timeout function from a workqueue
>> context, so there is no need to have yet another workqueue for
>> running command aborts.
>> 
>> [ ... ]
>> @@ -271,10 +266,14 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
>>  		rtn = host->hostt->eh_timed_out(scmd);
>>  
>>  	if (rtn == BLK_EH_NOT_HANDLED) {
>> -		if (scsi_abort_command(scmd) != SUCCESS) {
>> +		int ret;
>> +
>> +		ret = scsi_abort_command(scmd);
>> +		if (ret == FAILED) {
>>  			set_host_byte(scmd, DID_TIME_OUT);
>>  			scsi_eh_scmd_add(scmd);
>> -		}
>> +		} else if (ret == FAST_IO_FAIL)
>> +			rtn = BLK_EH_RESET_TIMER;
>>  	}
>
> Has this patch been tested with the traditional block layer? For the
> traditional block layer scsi_times_out() is called with the queue lock
> held. Does this patch cause .eh_abort_handler(), a function that may
> sleep, to be called with the queue lock held?

Hannes: Ping!

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCHv6 8/8] scsi: inline command aborts
  2017-04-20  2:14     ` Martin K. Petersen
@ 2017-04-24 12:41       ` Hannes Reinecke
  0 siblings, 0 replies; 18+ messages in thread
From: Hannes Reinecke @ 2017-04-24 12:41 UTC (permalink / raw)
  To: Martin K. Petersen, Bart Van Assche
  Cc: hch, james.bottomley, linux-scsi, hare

On 04/20/2017 04:14 AM, Martin K. Petersen wrote:
> Bart Van Assche <Bart.VanAssche@sandisk.com> writes:
> 
>> On Thu, 2017-04-06 at 15:36 +0200, Hannes Reinecke wrote:
>>> The block layer always calls the timeout function from a workqueue
>>> context, so there is no need to have yet another workqueue for
>>> running command aborts.
>>>
>>> [ ... ]
>>> @@ -271,10 +266,14 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
>>>  		rtn = host->hostt->eh_timed_out(scmd);
>>>  
>>>  	if (rtn == BLK_EH_NOT_HANDLED) {
>>> -		if (scsi_abort_command(scmd) != SUCCESS) {
>>> +		int ret;
>>> +
>>> +		ret = scsi_abort_command(scmd);
>>> +		if (ret == FAILED) {
>>>  			set_host_byte(scmd, DID_TIME_OUT);
>>>  			scsi_eh_scmd_add(scmd);
>>> -		}
>>> +		} else if (ret == FAST_IO_FAIL)
>>> +			rtn = BLK_EH_RESET_TIMER;
>>>  	}
>>
>> Has this patch been tested with the traditional block layer? For the
>> traditional block layer scsi_times_out() is called with the queue lock
>> held. Does this patch cause .eh_abort_handler(), a function that may
>> sleep, to be called with the queue lock held?
> 
> Hannes: Ping!
> 
Looks like Bart's right. Will be updating the patch.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-04-24 12:41 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-06 13:36 [PATCHv6 0/8] SCSI EH cleanup Hannes Reinecke
2017-04-06 13:36 ` [PATCHv6 1/8] scsi_error: count medium access timeout only once per EH run Hannes Reinecke
2017-04-06 13:36 ` [PATCHv6 2/8] sd: Return SUCCESS in sd_eh_action() after device offline Hannes Reinecke
2017-04-10 23:54   ` Bart Van Assche
2017-04-06 13:36 ` [PATCHv6 3/8] scsi: always send command aborts Hannes Reinecke
2017-04-10 23:56   ` Bart Van Assche
2017-04-06 13:36 ` [PATCHv6 4/8] libsas: allow async aborts Hannes Reinecke
2017-04-06 13:36 ` [PATCHv6 5/8] scsi: make eh_eflags persistent Hannes Reinecke
2017-04-10 23:58   ` Bart Van Assche
2017-04-06 13:36 ` [PATCHv6 6/8] scsi: make scsi_eh_scmd_add() always succeed Hannes Reinecke
2017-04-06 13:36 ` [PATCHv6 7/8] scsi: make asynchronous aborts mandatory Hannes Reinecke
2017-04-06 13:36 ` [PATCHv6 8/8] scsi: inline command aborts Hannes Reinecke
2017-04-10  7:53   ` Christoph Hellwig
2017-04-11  0:19   ` Bart Van Assche
2017-04-20  2:14     ` Martin K. Petersen
2017-04-24 12:41       ` Hannes Reinecke
2017-04-06 17:09 ` [PATCHv6 0/8] SCSI EH cleanup Martin K. Petersen
2017-04-06 17:52 ` Benjamin Block

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.