All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv2 0/5] SCSI EH cleanup
@ 2017-02-22 16:07 Hannes Reinecke
  2017-02-22 16:07 ` [PATCHv2 1/5] libsas: allow async aborts Hannes Reinecke
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Hannes Reinecke @ 2017-02-22 16:07 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Johannes Thumshirn,
	linux-scsi, Hannes Reinecke

Hi all,

this is a resend of a small patchset for cleaning up SCSI EH.
Primary goal is to make asynchronous aborts mandatory; there hasn't
been a single report so far where asynchronous abort won't work, so
the 'no_async_abort' flag has never been used and will be removed
with this patchset.
Additionally there's a cleanup for handle failed EH commands, and
to detect retries of failed commands.

As usual, comments and reviews are welcome.

Changes to v1:
- Include reviews from Christoph

Christoph Hellwig (1):
  libsas: allow async aborts

Hannes Reinecke (4):
  scsi: make eh_eflags persistent
  scsi_error: do not escalate failed EH command
  scsi: make scsi_eh_scmd_add() always succeed
  scsi: make asynchronous aborts mandatory

 Documentation/scsi/scsi_eh.txt      |  30 ++++-----
 drivers/scsi/libsas/sas_scsi_host.c |   3 -
 drivers/scsi/scsi_error.c           | 125 ++++++------------------------------
 drivers/scsi/scsi_lib.c             |   4 +-
 drivers/scsi/scsi_priv.h            |   3 +-
 include/scsi/scsi_eh.h              |   1 +
 include/scsi/scsi_host.h            |   5 --
 7 files changed, 34 insertions(+), 137 deletions(-)

-- 
1.8.5.6

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCHv2 1/5] libsas: allow async aborts
  2017-02-22 16:07 [PATCHv2 0/5] SCSI EH cleanup Hannes Reinecke
@ 2017-02-22 16:07 ` Hannes Reinecke
  2017-02-22 16:07 ` [PATCHv2 2/5] scsi: make eh_eflags persistent Hannes Reinecke
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Hannes Reinecke @ 2017-02-22 16:07 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Johannes Thumshirn, linux-scsi

From: Christoph Hellwig <hch@lst.de>

We now first try to call ->eh_abort_handler from a work queue, but libsas
was always failing that for no good reason.  Allow async aborts.

Reviewed-by: Johannes Thumshirn <jth@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 drivers/scsi/libsas/sas_scsi_host.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/scsi/libsas/sas_scsi_host.c b/drivers/scsi/libsas/sas_scsi_host.c
index 9bd55bc..ee6b39a 100644
--- a/drivers/scsi/libsas/sas_scsi_host.c
+++ b/drivers/scsi/libsas/sas_scsi_host.c
@@ -491,9 +491,6 @@ int sas_eh_abort_handler(struct scsi_cmnd *cmd)
 	struct Scsi_Host *host = cmd->device->host;
 	struct sas_internal *i = to_sas_internal(host->transportt);
 
-	if (current != host->ehandler)
-		return FAILED;
-
 	if (!i->dft->lldd_abort_task)
 		return FAILED;
 
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCHv2 2/5] scsi: make eh_eflags persistent
  2017-02-22 16:07 [PATCHv2 0/5] SCSI EH cleanup Hannes Reinecke
  2017-02-22 16:07 ` [PATCHv2 1/5] libsas: allow async aborts Hannes Reinecke
@ 2017-02-22 16:07 ` Hannes Reinecke
  2017-02-27 22:15   ` Bart Van Assche
  2017-02-22 16:07 ` [PATCHv2 3/5] scsi_error: do not escalate failed EH command Hannes Reinecke
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Hannes Reinecke @ 2017-02-22 16:07 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Johannes Thumshirn,
	linux-scsi, Hannes Reinecke

To detect if a failed command has been retried we must not
clear scmd->eh_eflags when EH finishes.
The flag should be persistent throughout the lifetime
of the command.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 Documentation/scsi/scsi_eh.txt | 14 +++++++-------
 drivers/scsi/scsi_error.c      |  4 ++--
 include/scsi/scsi_eh.h         |  1 +
 3 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt
index 37eca00..4edb9c1c 100644
--- a/Documentation/scsi/scsi_eh.txt
+++ b/Documentation/scsi/scsi_eh.txt
@@ -105,11 +105,14 @@ function
 
  2. If the host supports asynchronous completion (as indicated by the
     no_async_abort setting in the host template) scsi_abort_command()
-    is invoked to schedule an asynchrous abort. If that fails
-    Step #3 is taken.
+    is invoked to schedule an asynchrous abort.
+    Asynchronous abort are not invoked for commands which the
+    SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command
+    already had been aborted once, and this is a retry which failed),
+    or when the EH deadline is expired. In these case Step #3 is taken.
 
- 2. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the
-    command.  See [1-3] for more information.
+ 3. scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD) is invoked for the
+    command.  See [1-4] for more information.
 
 [1-3] Asynchronous command aborts
 
@@ -263,7 +266,6 @@ scmd->allowed.
 
  3. scmd recovered
     ACTION: scsi_eh_finish_cmd() is invoked to EH-finish scmd
-	- clear scmd->eh_eflags
 	- scsi_setup_cmd_retry()
 	- move from local eh_work_q to local eh_done_q
     LOCKING: none
@@ -456,8 +458,6 @@ except for #1 must be implemented by eh_strategy_handler().
 
  - shost->host_failed is zero.
 
- - Each scmd's eh_eflags field is cleared.
-
  - Each scmd is in such a state that scsi_setup_cmd_retry() on the
    scmd doesn't make any difference.
 
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 9d7bfbb..023f882 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -188,7 +188,6 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
 		/*
 		 * Retry after abort failed, escalate to next level.
 		 */
-		scmd->eh_eflags &= ~SCSI_EH_ABORT_SCHEDULED;
 		SCSI_LOG_ERROR_RECOVERY(3,
 			scmd_printk(KERN_INFO, scmd,
 				    "previous abort failed\n"));
@@ -931,6 +930,7 @@ void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd, struct scsi_eh_save *ses,
 	ses->result = scmd->result;
 	ses->underflow = scmd->underflow;
 	ses->prot_op = scmd->prot_op;
+	ses->eh_eflags = scmd->eh_eflags;
 
 	scmd->prot_op = SCSI_PROT_NORMAL;
 	scmd->eh_eflags = 0;
@@ -994,6 +994,7 @@ void scsi_eh_restore_cmnd(struct scsi_cmnd* scmd, struct scsi_eh_save *ses)
 	scmd->result = ses->result;
 	scmd->underflow = ses->underflow;
 	scmd->prot_op = ses->prot_op;
+	scmd->eh_eflags = ses->eh_eflags;
 }
 EXPORT_SYMBOL(scsi_eh_restore_cmnd);
 
@@ -1126,7 +1127,6 @@ static int scsi_eh_action(struct scsi_cmnd *scmd, int rtn)
  */
 void scsi_eh_finish_cmd(struct scsi_cmnd *scmd, struct list_head *done_q)
 {
-	scmd->eh_eflags = 0;
 	list_move_tail(&scmd->eh_entry, done_q);
 }
 EXPORT_SYMBOL(scsi_eh_finish_cmd);
diff --git a/include/scsi/scsi_eh.h b/include/scsi/scsi_eh.h
index 98d366b..a25b328 100644
--- a/include/scsi/scsi_eh.h
+++ b/include/scsi/scsi_eh.h
@@ -31,6 +31,7 @@ extern int scsi_get_sense_info_fld(const u8 * sense_buffer, int sb_len,
 struct scsi_eh_save {
 	/* saved state */
 	int result;
+	int eh_eflags;
 	enum dma_data_direction data_direction;
 	unsigned underflow;
 	unsigned char cmd_len;
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCHv2 3/5] scsi_error: do not escalate failed EH command
  2017-02-22 16:07 [PATCHv2 0/5] SCSI EH cleanup Hannes Reinecke
  2017-02-22 16:07 ` [PATCHv2 1/5] libsas: allow async aborts Hannes Reinecke
  2017-02-22 16:07 ` [PATCHv2 2/5] scsi: make eh_eflags persistent Hannes Reinecke
@ 2017-02-22 16:07 ` Hannes Reinecke
  2017-02-27 22:21   ` Bart Van Assche
  2017-02-22 16:07 ` [PATCHv2 4/5] scsi: make scsi_eh_scmd_add() always succeed Hannes Reinecke
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Hannes Reinecke @ 2017-02-22 16:07 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Johannes Thumshirn,
	linux-scsi, Hannes Reinecke

When a command is sent as part of the error handling there
is not point whatsoever to start EH escalation when that
command fails; we are _already_ in the error handler,
and the escalation is about to commence anyway.
So just call 'scsi_try_to_abort_cmd()' to abort outstanding
commands and let the main EH routine handle the rest.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/scsi/scsi_error.c | 11 +----------
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 023f882..d0ecdef 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -887,15 +887,6 @@ static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt,
 	return hostt->eh_abort_handler(scmd);
 }
 
-static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd)
-{
-	if (scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd) != SUCCESS)
-		if (scsi_try_bus_device_reset(scmd) != SUCCESS)
-			if (scsi_try_target_reset(scmd) != SUCCESS)
-				if (scsi_try_bus_reset(scmd) != SUCCESS)
-					scsi_try_host_reset(scmd);
-}
-
 /**
  * scsi_eh_prep_cmnd  - Save a scsi command info as part of error recovery
  * @scmd:       SCSI command structure to hijack
@@ -1080,7 +1071,7 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, unsigned char *cmnd,
 			break;
 		}
 	} else if (rtn != FAILED) {
-		scsi_abort_eh_cmnd(scmd);
+		scsi_try_to_abort_cmd(shost->hostt, scmd);
 		rtn = FAILED;
 	}
 
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCHv2 4/5] scsi: make scsi_eh_scmd_add() always succeed
  2017-02-22 16:07 [PATCHv2 0/5] SCSI EH cleanup Hannes Reinecke
                   ` (2 preceding siblings ...)
  2017-02-22 16:07 ` [PATCHv2 3/5] scsi_error: do not escalate failed EH command Hannes Reinecke
@ 2017-02-22 16:07 ` Hannes Reinecke
  2017-02-27 22:27   ` Bart Van Assche
  2017-02-22 16:07 ` [PATCHv2 5/5] scsi: make asynchronous aborts mandatory Hannes Reinecke
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Hannes Reinecke @ 2017-02-22 16:07 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Johannes Thumshirn,
	linux-scsi, Hannes Reinecke

scsi_eh_scmd_add() currently only will fail if no
error handler thread is started (which will never be the
case) or if the state machine encounters an illegal transition.

But if we're encountering an invalid state transition
chances is we cannot fixup things with the error handler.
So better add a WARN_ON for illegal host states and
make scsi_dh_scmd_add() a void function.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 drivers/scsi/scsi_error.c | 39 +++++++++++++--------------------------
 drivers/scsi/scsi_lib.c   |  4 ++--
 drivers/scsi/scsi_priv.h  |  2 +-
 3 files changed, 16 insertions(+), 29 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index d0ecdef..381d642 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -162,13 +162,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
 		}
 	}
 
-	if (!scsi_eh_scmd_add(scmd, 0)) {
-		SCSI_LOG_ERROR_RECOVERY(3,
-			scmd_printk(KERN_WARNING, scmd,
-				    "terminate aborted command\n"));
-		set_host_byte(scmd, DID_TIME_OUT);
-		scsi_finish_command(scmd);
-	}
+	scsi_eh_scmd_add(scmd, 0);
 }
 
 /**
@@ -223,37 +217,32 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
  * scsi_eh_scmd_add - add scsi cmd to error handling.
  * @scmd:	scmd to run eh on.
  * @eh_flag:	optional SCSI_EH flag.
- *
- * Return value:
- *	0 on failure.
  */
-int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
+void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
 {
 	struct Scsi_Host *shost = scmd->device->host;
 	unsigned long flags;
-	int ret = 0;
 
-	if (!shost->ehandler)
-		return 0;
+	WARN_ON(!shost->ehandler);
 
 	spin_lock_irqsave(shost->host_lock, flags);
+	WARN_ON(shost->shost_state != SHOST_RUNNING &&
+		shost->shost_state != SHOST_CANCEL &&
+		shost->shost_state != SHOST_RECOVERY &&
+		shost->shost_state != SHOST_CANCEL_RECOVERY);
 	if (scsi_host_set_state(shost, SHOST_RECOVERY))
-		if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY))
-			goto out_unlock;
+		scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY);
 
 	if (shost->eh_deadline != -1 && !shost->last_reset)
 		shost->last_reset = jiffies;
 
-	ret = 1;
 	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
 		eh_flag &= ~SCSI_EH_CANCEL_CMD;
 	scmd->eh_eflags |= eh_flag;
 	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
 	shost->host_failed++;
 	scsi_eh_wakeup(shost);
- out_unlock:
 	spin_unlock_irqrestore(shost->host_lock, flags);
-	return ret;
 }
 
 /**
@@ -282,13 +271,11 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
 		rtn = host->hostt->eh_timed_out(scmd);
 
 	if (rtn == BLK_EH_NOT_HANDLED) {
-		if (!host->hostt->no_async_abort &&
-		    scsi_abort_command(scmd) == SUCCESS)
-			return BLK_EH_NOT_HANDLED;
-
-		set_host_byte(scmd, DID_TIME_OUT);
-		if (!scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD))
-			rtn = BLK_EH_HANDLED;
+		if (host->hostt->no_async_abort ||
+		    scsi_abort_command(scmd) != SUCCESS) {
+			set_host_byte(scmd, DID_TIME_OUT);
+			scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD);
+		}
 	}
 
 	return rtn;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c35b6de..03e9d40 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1571,8 +1571,8 @@ static void scsi_softirq_done(struct request *rq)
 			scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
 			break;
 		default:
-			if (!scsi_eh_scmd_add(cmd, 0))
-				scsi_finish_command(cmd);
+			scsi_eh_scmd_add(cmd, 0);
+			break;
 	}
 }
 
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 193636a..e4e3483 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -72,7 +72,7 @@ extern int scsi_dev_info_list_add_keyed(int compatible, char *vendor,
 extern int scsi_error_handler(void *host);
 extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
 extern void scsi_eh_wakeup(struct Scsi_Host *shost);
-extern int scsi_eh_scmd_add(struct scsi_cmnd *, int);
+extern void scsi_eh_scmd_add(struct scsi_cmnd *, int);
 void scsi_eh_ready_devs(struct Scsi_Host *shost,
 			struct list_head *work_q,
 			struct list_head *done_q);
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCHv2 5/5] scsi: make asynchronous aborts mandatory
  2017-02-22 16:07 [PATCHv2 0/5] SCSI EH cleanup Hannes Reinecke
                   ` (3 preceding siblings ...)
  2017-02-22 16:07 ` [PATCHv2 4/5] scsi: make scsi_eh_scmd_add() always succeed Hannes Reinecke
@ 2017-02-22 16:07 ` Hannes Reinecke
  2017-02-27 22:53   ` Bart Van Assche
  2017-02-22 19:44 ` [PATCHv2 0/5] SCSI EH cleanup Bart Van Assche
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Hannes Reinecke @ 2017-02-22 16:07 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Johannes Thumshirn,
	linux-scsi, Hannes Reinecke

There hasn't been any reports for HBAs where asynchronous abort
would not work, so we should make it mandatory and remove
the fallback.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 Documentation/scsi/scsi_eh.txt | 18 ++++------
 drivers/scsi/scsi_error.c      | 81 ++++--------------------------------------
 drivers/scsi/scsi_lib.c        |  2 +-
 drivers/scsi/scsi_priv.h       |  3 +-
 include/scsi/scsi_host.h       |  5 ---
 5 files changed, 15 insertions(+), 94 deletions(-)

diff --git a/Documentation/scsi/scsi_eh.txt b/Documentation/scsi/scsi_eh.txt
index 4edb9c1c..11e447b 100644
--- a/Documentation/scsi/scsi_eh.txt
+++ b/Documentation/scsi/scsi_eh.txt
@@ -70,7 +70,7 @@ with the command.
 	scmd is requeued to blk queue.
 
  - otherwise
-	scsi_eh_scmd_add(scmd, 0) is invoked for the command.  See
+	scsi_eh_scmd_add(scmd) is invoked for the command.  See
 	[1-3] for details of this function.
 
 
@@ -103,9 +103,7 @@ function
         eh_timed_out() callback did not handle the command.
 	Step #2 is taken.
 
- 2. If the host supports asynchronous completion (as indicated by the
-    no_async_abort setting in the host template) scsi_abort_command()
-    is invoked to schedule an asynchrous abort.
+ 2. scsi_abort_command() is invoked to schedule an asynchrous abort.
     Asynchronous abort are not invoked for commands which the
     SCSI_EH_ABORT_SCHEDULED flag is set (this indicates that the command
     already had been aborted once, and this is a retry which failed),
@@ -127,16 +125,13 @@ function
 
  scmds enter EH via scsi_eh_scmd_add(), which does the following.
 
- 1. Turns on scmd->eh_eflags as requested.  It's 0 for error
-    completions and SCSI_EH_CANCEL_CMD for timeouts.
+ 1. Links scmd->eh_entry to shost->eh_cmd_q
 
- 2. Links scmd->eh_entry to shost->eh_cmd_q
+ 2. Sets SHOST_RECOVERY bit in shost->shost_state
 
- 3. Sets SHOST_RECOVERY bit in shost->shost_state
+ 3. Increments shost->host_failed
 
- 4. Increments shost->host_failed
-
- 5. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
+ 4. Wakes up SCSI EH thread if shost->host_busy == shost->host_failed
 
  As can be seen above, once any scmd is added to shost->eh_cmd_q,
 SHOST_RECOVERY shost_state bit is turned on.  This prevents any new
@@ -252,7 +247,6 @@ scmd->allowed.
 
  1. Error completion / time out
     ACTION: scsi_eh_scmd_add() is invoked for scmd
-	- set scmd->eh_eflags
 	- add scmd to shost->eh_cmd_q
 	- set SHOST_RECOVERY
 	- shost->host_failed++
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 381d642..aa6876d 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -162,7 +162,7 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
 		}
 	}
 
-	scsi_eh_scmd_add(scmd, 0);
+	scsi_eh_scmd_add(scmd);
 }
 
 /**
@@ -216,9 +216,8 @@ static int scsi_host_eh_past_deadline(struct Scsi_Host *shost)
 /**
  * scsi_eh_scmd_add - add scsi cmd to error handling.
  * @scmd:	scmd to run eh on.
- * @eh_flag:	optional SCSI_EH flag.
  */
-void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
+void scsi_eh_scmd_add(struct scsi_cmnd *scmd)
 {
 	struct Scsi_Host *shost = scmd->device->host;
 	unsigned long flags;
@@ -236,9 +235,6 @@ void scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
 	if (shost->eh_deadline != -1 && !shost->last_reset)
 		shost->last_reset = jiffies;
 
-	if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
-		eh_flag &= ~SCSI_EH_CANCEL_CMD;
-	scmd->eh_eflags |= eh_flag;
 	list_add_tail(&scmd->eh_entry, &shost->eh_cmd_q);
 	shost->host_failed++;
 	scsi_eh_wakeup(shost);
@@ -271,10 +267,9 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
 		rtn = host->hostt->eh_timed_out(scmd);
 
 	if (rtn == BLK_EH_NOT_HANDLED) {
-		if (host->hostt->no_async_abort ||
-		    scsi_abort_command(scmd) != SUCCESS) {
+		if (scsi_abort_command(scmd) != SUCCESS) {
 			set_host_byte(scmd, DID_TIME_OUT);
-			scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD);
+			scsi_eh_scmd_add(scmd);
 		}
 	}
 
@@ -327,7 +322,7 @@ static inline void scsi_eh_prt_fail_stats(struct Scsi_Host *shost,
 		list_for_each_entry(scmd, work_q, eh_entry) {
 			if (scmd->device == sdev) {
 				++total_failures;
-				if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD)
+				if (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED)
 					++cmd_cancel;
 				else
 					++cmd_failed;
@@ -1141,8 +1136,7 @@ int scsi_eh_get_sense(struct list_head *work_q,
 	 * should not get sense.
 	 */
 	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
-		if ((scmd->eh_eflags & SCSI_EH_CANCEL_CMD) ||
-		    (scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ||
+		if ((scmd->eh_eflags & SCSI_EH_ABORT_SCHEDULED) ||
 		    SCSI_SENSE_VALID(scmd))
 			continue;
 
@@ -1282,61 +1276,6 @@ static int scsi_eh_test_devices(struct list_head *cmd_list,
 	return list_empty(work_q);
 }
 
-
-/**
- * scsi_eh_abort_cmds - abort pending commands.
- * @work_q:	&list_head for pending commands.
- * @done_q:	&list_head for processed commands.
- *
- * Decription:
- *    Try and see whether or not it makes sense to try and abort the
- *    running command.  This only works out to be the case if we have one
- *    command that has timed out.  If the command simply failed, it makes
- *    no sense to try and abort the command, since as far as the shost
- *    adapter is concerned, it isn't running.
- */
-static int scsi_eh_abort_cmds(struct list_head *work_q,
-			      struct list_head *done_q)
-{
-	struct scsi_cmnd *scmd, *next;
-	LIST_HEAD(check_list);
-	int rtn;
-	struct Scsi_Host *shost;
-
-	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
-		if (!(scmd->eh_eflags & SCSI_EH_CANCEL_CMD))
-			continue;
-		shost = scmd->device->host;
-		if (scsi_host_eh_past_deadline(shost)) {
-			list_splice_init(&check_list, work_q);
-			SCSI_LOG_ERROR_RECOVERY(3,
-				scmd_printk(KERN_INFO, scmd,
-					    "%s: skip aborting cmd, past eh deadline\n",
-					    current->comm));
-			return list_empty(work_q);
-		}
-		SCSI_LOG_ERROR_RECOVERY(3,
-			scmd_printk(KERN_INFO, scmd,
-				     "%s: aborting cmd\n", current->comm));
-		rtn = scsi_try_to_abort_cmd(shost->hostt, scmd);
-		if (rtn == FAILED) {
-			SCSI_LOG_ERROR_RECOVERY(3,
-				scmd_printk(KERN_INFO, scmd,
-					    "%s: aborting cmd failed\n",
-					     current->comm));
-			list_splice_init(&check_list, work_q);
-			return list_empty(work_q);
-		}
-		scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD;
-		if (rtn == FAST_IO_FAIL)
-			scsi_eh_finish_cmd(scmd, done_q);
-		else
-			list_move_tail(&scmd->eh_entry, &check_list);
-	}
-
-	return scsi_eh_test_devices(&check_list, work_q, done_q, 0);
-}
-
 /**
  * scsi_eh_try_stu - Send START_UNIT to device.
  * @scmd:	&scsi_cmnd to send START_UNIT
@@ -1679,11 +1618,6 @@ static void scsi_eh_offline_sdevs(struct list_head *work_q,
 		sdev_printk(KERN_INFO, scmd->device, "Device offlined - "
 			    "not ready after error recovery\n");
 		scsi_device_set_state(scmd->device, SDEV_OFFLINE);
-		if (scmd->eh_eflags & SCSI_EH_CANCEL_CMD) {
-			/*
-			 * FIXME: Handle lost cmds.
-			 */
-		}
 		scsi_eh_finish_cmd(scmd, done_q);
 	}
 	return;
@@ -2127,8 +2061,7 @@ static void scsi_unjam_host(struct Scsi_Host *shost)
 	SCSI_LOG_ERROR_RECOVERY(1, scsi_eh_prt_fail_stats(shost, &eh_work_q));
 
 	if (!scsi_eh_get_sense(&eh_work_q, &eh_done_q))
-		if (!scsi_eh_abort_cmds(&eh_work_q, &eh_done_q))
-			scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
+		scsi_eh_ready_devs(shost, &eh_work_q, &eh_done_q);
 
 	spin_lock_irqsave(shost->host_lock, flags);
 	if (shost->eh_deadline != -1)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 03e9d40..a656d27 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1571,7 +1571,7 @@ static void scsi_softirq_done(struct request *rq)
 			scsi_queue_insert(cmd, SCSI_MLQUEUE_DEVICE_BUSY);
 			break;
 		default:
-			scsi_eh_scmd_add(cmd, 0);
+			scsi_eh_scmd_add(cmd);
 			break;
 	}
 }
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index e4e3483..827172e 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -18,7 +18,6 @@
 /*
  * Scsi Error Handler Flags
  */
-#define SCSI_EH_CANCEL_CMD	0x0001	/* Cancel this cmd */
 #define SCSI_EH_ABORT_SCHEDULED	0x0002	/* Abort has been scheduled */
 
 #define SCSI_SENSE_VALID(scmd) \
@@ -72,7 +71,7 @@ extern int scsi_dev_info_list_add_keyed(int compatible, char *vendor,
 extern int scsi_error_handler(void *host);
 extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
 extern void scsi_eh_wakeup(struct Scsi_Host *shost);
-extern void scsi_eh_scmd_add(struct scsi_cmnd *, int);
+extern void scsi_eh_scmd_add(struct scsi_cmnd *);
 void scsi_eh_ready_devs(struct Scsi_Host *shost,
 			struct list_head *work_q,
 			struct list_head *done_q);
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 36680f1..ed7026c 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -452,11 +452,6 @@ struct scsi_host_template {
 	unsigned no_write_same:1;
 
 	/*
-	 * True if asynchronous aborts are not supported
-	 */
-	unsigned no_async_abort:1;
-
-	/*
 	 * Countdown for host blocking with no commands outstanding.
 	 */
 	unsigned int max_host_blocked;
-- 
1.8.5.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 0/5] SCSI EH cleanup
  2017-02-22 16:07 [PATCHv2 0/5] SCSI EH cleanup Hannes Reinecke
                   ` (4 preceding siblings ...)
  2017-02-22 16:07 ` [PATCHv2 5/5] scsi: make asynchronous aborts mandatory Hannes Reinecke
@ 2017-02-22 19:44 ` Bart Van Assche
  2017-02-23  7:27   ` Hannes Reinecke
  2017-02-22 21:21 ` Bart Van Assche
  2017-02-24  4:01 ` Bart Van Assche
  7 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2017-02-22 19:44 UTC (permalink / raw)
  To: Hannes Reinecke, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Johannes Thumshirn, linux-scsi

On 02/22/2017 08:07 AM, Hannes Reinecke wrote:
> this is a resend of a small patchset for cleaning up SCSI EH.
> Primary goal is to make asynchronous aborts mandatory; there hasn't
> been a single report so far where asynchronous abort won't work, so
> the 'no_async_abort' flag has never been used and will be removed
> with this patchset.
> Additionally there's a cleanup for handle failed EH commands, and
> to detect retries of failed commands.
> 
> As usual, comments and reviews are welcome.

Hello Hannes,

Supporting asynchronous aborts means that if no response is received for
an abort after a certain time that the abort has to be considered as
failed. A SCSI LLD may have to allocate resources before a TMF can be
sent (e.g. ib_srp has to do that). These resources have to be cleaned up
if a TMF times out. We don't want to see a TMF timeout handler in every
LLD that supports asynchronous aborts. So we need a TMF timeout handler
in the SCSI core. Have you considered to allocate a new SCSI request for
submitting a TMF instead of overwriting fields in an existing scsi_cmnd
structure? In that case we will be able to reuse the block layer timeout
handler.

Something else I noticed is that the access of .eh_action in
scsi_eh_done() races with the shost->eh_action = NULL assigment in
scsi_send_eh_cmnd(). Shouldn't these accesses be protected by locking?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 0/5] SCSI EH cleanup
  2017-02-22 16:07 [PATCHv2 0/5] SCSI EH cleanup Hannes Reinecke
                   ` (5 preceding siblings ...)
  2017-02-22 19:44 ` [PATCHv2 0/5] SCSI EH cleanup Bart Van Assche
@ 2017-02-22 21:21 ` Bart Van Assche
  2017-02-23  7:35   ` Hannes Reinecke
  2017-02-24  4:01 ` Bart Van Assche
  7 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2017-02-22 21:21 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: hch, james.bottomley, linux-scsi, jth

On Wed, 2017-02-22 at 17:07 +0100, Hannes Reinecke wrote:
> this is a resend of a small patchset for cleaning up SCSI EH.
> Primary goal is to make asynchronous aborts mandatory; there hasn't
> been a single report so far where asynchronous abort won't work, so
> the 'no_async_abort' flag has never been used and will be removed
> with this patchset.
> Additionally there's a cleanup for handle failed EH commands, and
> to detect retries of failed commands.

An additional comment: I don't think it is safe to execute a host
reset while any asynchronous aborts are ongoing. I think the SCSI EH
will get really confused if after a host reset has been performed a
response is received from an asynchronous abort that was issued
before the host reset started. Although a SCSI target must not send
a LU RESET response before all previously submitted SCSI commands
and TMFs have finished, what prevents in e.g. a multipath topology
that an abort response and a LU RESET response are received out of
order?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 0/5] SCSI EH cleanup
  2017-02-22 19:44 ` [PATCHv2 0/5] SCSI EH cleanup Bart Van Assche
@ 2017-02-23  7:27   ` Hannes Reinecke
  0 siblings, 0 replies; 19+ messages in thread
From: Hannes Reinecke @ 2017-02-23  7:27 UTC (permalink / raw)
  To: Bart Van Assche, Martin K. Petersen
  Cc: Christoph Hellwig, James Bottomley, Johannes Thumshirn, linux-scsi

On 02/22/2017 08:44 PM, Bart Van Assche wrote:
> On 02/22/2017 08:07 AM, Hannes Reinecke wrote:
>> this is a resend of a small patchset for cleaning up SCSI EH.
>> Primary goal is to make asynchronous aborts mandatory; there hasn't
>> been a single report so far where asynchronous abort won't work, so
>> the 'no_async_abort' flag has never been used and will be removed
>> with this patchset.
>> Additionally there's a cleanup for handle failed EH commands, and
>> to detect retries of failed commands.
>>
>> As usual, comments and reviews are welcome.
> 
> Hello Hannes,
> 
> Supporting asynchronous aborts means that if no response is received for
> an abort after a certain time that the abort has to be considered as
> failed. A SCSI LLD may have to allocate resources before a TMF can be
> sent (e.g. ib_srp has to do that). These resources have to be cleaned up
> if a TMF times out. We don't want to see a TMF timeout handler in every
> LLD that supports asynchronous aborts. So we need a TMF timeout handler
> in the SCSI core. Have you considered to allocate a new SCSI request for
> submitting a TMF instead of overwriting fields in an existing scsi_cmnd
> structure? In that case we will be able to reuse the block layer timeout
> handler.
> 
TMF timeout is a completely different issue, and isn't touched at all by
this patchset.

ATM the SCSI core expects any of the eh_XXX callback functions to be
synchronous, ie any TMF timeout handling is delegated to the driver.

I plan to change that for the eh_abort() call, as most HBAs will be
receiving a completion for both commands, the TMF and the associated
command. So for these drivers there is not point in assuming a
synchronous call; we can easily make this call asynchronous and wait for
the completion to deliver the final status.
And indeed have a generic TMF timeout handling in the SCSI core.

And yes, I have considered allocating a separate command for TMFs.
In fact, I'll be looking at implementing a RQF_RESET meta request (much
like the RQF_FLUSH request we already have).
That would neatly solve the problem sg_reset has nowadays (namely that
it doesn't have a SCSI command attached to it), and it would also
decouple the failed command from the error handling itself.
With that we could return the command to the block layer even when EH is
still running, giving us better failover response times.

> Something else I noticed is that the access of .eh_action in
> scsi_eh_done() races with the shost->eh_action = NULL assigment in
> scsi_send_eh_cmnd(). Shouldn't these accesses be protected by locking?
> 
No, I don't think so.
The only place where this can race is once wait_for_completion_timeout()
has timed out.
There indeed is a chance that scsi_eh_done() is called just before
shost->eh_action = NULL.
But then this is not really critical as the 'done' structure itself is
still valid until scsi_send_eh_cmnd() itself is finished.
And calling 'complete' at any time before that will just be a no-op as
the waiting is already terminated.

One _might_ consider a really fucked-up architecture where the assignment
	eh_action = scmd->device->host->eh_action;
is executed before 'shost->eh_action = NULL', but the final
	complete(eh_action)
is executed after the entire scsi_send_eh_cmnd() function has completed.
So from that point we should be issuing a 'wmb' or somesuch after
shost->eh_action = NULL.
However, I do consider this highly unlikely.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 0/5] SCSI EH cleanup
  2017-02-22 21:21 ` Bart Van Assche
@ 2017-02-23  7:35   ` Hannes Reinecke
  0 siblings, 0 replies; 19+ messages in thread
From: Hannes Reinecke @ 2017-02-23  7:35 UTC (permalink / raw)
  To: Bart Van Assche, martin.petersen; +Cc: hch, james.bottomley, linux-scsi, jth

On 02/22/2017 10:21 PM, Bart Van Assche wrote:
> On Wed, 2017-02-22 at 17:07 +0100, Hannes Reinecke wrote:
>> this is a resend of a small patchset for cleaning up SCSI EH.
>> Primary goal is to make asynchronous aborts mandatory; there hasn't
>> been a single report so far where asynchronous abort won't work, so
>> the 'no_async_abort' flag has never been used and will be removed
>> with this patchset.
>> Additionally there's a cleanup for handle failed EH commands, and
>> to detect retries of failed commands.
> 
> An additional comment: I don't think it is safe to execute a host
> reset while any asynchronous aborts are ongoing. I think the SCSI EH
> will get really confused if after a host reset has been performed a
> response is received from an asynchronous abort that was issued
> before the host reset started. Although a SCSI target must not send
> a LU RESET response before all previously submitted SCSI commands
> and TMFs have finished, what prevents in e.g. a multipath topology
> that an abort response and a LU RESET response are received out of
> order?
> 
Hmm.
Not sure this can happen nowadays.
eh_abort() is synchronous, and the command is considered in-flight until
eh_abort() returns.
And as SCSI EH only starts until all commands are completed or timed out
we will not have any stuck TMFs before sending a host reset.
Things are different when calling 'sg_reset' manually, but I'll be
revisiting that anyway in the near future as it has it's own set of
problems.

Cheers,

Hannes
> Thanks,
> 
> Bart.
> 


-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 0/5] SCSI EH cleanup
  2017-02-22 16:07 [PATCHv2 0/5] SCSI EH cleanup Hannes Reinecke
                   ` (6 preceding siblings ...)
  2017-02-22 21:21 ` Bart Van Assche
@ 2017-02-24  4:01 ` Bart Van Assche
  2017-02-24  7:15   ` Hannes Reinecke
  7 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2017-02-24  4:01 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: hch, james.bottomley, linux-scsi, jth

On Wed, 2017-02-22 at 17:07 +0100, Hannes Reinecke wrote:
> Primary goal is to make asynchronous aborts mandatory; there hasn't
> been a single report so far where asynchronous abort won't work, so
> the 'no_async_abort' flag has never been used and will be removed
> with this patchset.

Hello Hannes,

There is a problem with asynchronous aborts. Some SCSI drivers, e.g. ib_srp,
support fast error recovery by performing a transport layer reconnect without
reporting this event as a failure to the SCSI core. While such a reconnect is
ongoing it is important that no attempt is made to use the data structures
that represent the connection. Hence the scsi_target_block() call in
srp_reconnect_rport(). This blocks most .queuecommand() calls except those
that originate from the SCSI EH. Hence the if (current == shost->ehandler)
mutex_lock(&rport->mutex) code in srp_queuecommand(). Asynchronous aborts
break this code because the asynchronous abort code submits an abort from
another context than the SCSI EH thread. I know that this way of detecting
the SCSI EH context is not an optimal solution. A few years ago I have tried
to modify the SCSI EH such that reconnects and .queuecommand() calls could
be serialized but James was not interested in such patches at that time.

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 0/5] SCSI EH cleanup
  2017-02-24  4:01 ` Bart Van Assche
@ 2017-02-24  7:15   ` Hannes Reinecke
  2017-02-27 22:10     ` Bart Van Assche
  0 siblings, 1 reply; 19+ messages in thread
From: Hannes Reinecke @ 2017-02-24  7:15 UTC (permalink / raw)
  To: Bart Van Assche, martin.petersen; +Cc: hch, james.bottomley, linux-scsi, jth

On 02/24/2017 05:01 AM, Bart Van Assche wrote:
> On Wed, 2017-02-22 at 17:07 +0100, Hannes Reinecke wrote:
>> Primary goal is to make asynchronous aborts mandatory; there hasn't
>> been a single report so far where asynchronous abort won't work, so
>> the 'no_async_abort' flag has never been used and will be removed
>> with this patchset.
> 
> Hello Hannes,
> 
> There is a problem with asynchronous aborts. Some SCSI drivers, e.g. ib_srp,
> support fast error recovery by performing a transport layer reconnect without
> reporting this event as a failure to the SCSI core. While such a reconnect is
> ongoing it is important that no attempt is made to use the data structures
> that represent the connection. Hence the scsi_target_block() call in
> srp_reconnect_rport(). This blocks most .queuecommand() calls except those
> that originate from the SCSI EH. Hence the if (current == shost->ehandler)
> mutex_lock(&rport->mutex) code in srp_queuecommand(). Asynchronous aborts
> break this code because the asynchronous abort code submits an abort from
> another context than the SCSI EH thread. I know that this way of detecting
> the SCSI EH context is not an optimal solution. A few years ago I have tried
> to modify the SCSI EH such that reconnects and .queuecommand() calls could
> be serialized but James was not interested in such patches at that time.
> 
Don't think that's much of an issue.
scsi_abort_command() (ie the call which triggers async aborts) is only
called if the .eh_timed_out() callback returns BLK_EH_NOT_HANDLED.
So during reconnects we will not schedule any calls to async aborts.
We _might_ have scheduled an async abort prior to a call to
srp_reconnect_rport(), but that would be equivalent with calling
srp_reconnect_rport() with commands still in flight.
Is that even possible?
If so, how do you handle these commands after reconnect returns?
Any I_T_L nexus will be gone from the target, right?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 0/5] SCSI EH cleanup
  2017-02-24  7:15   ` Hannes Reinecke
@ 2017-02-27 22:10     ` Bart Van Assche
  0 siblings, 0 replies; 19+ messages in thread
From: Bart Van Assche @ 2017-02-27 22:10 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: hch, james.bottomley, linux-scsi, jth

On Fri, 2017-02-24 at 08:15 +0100, Hannes Reinecke wrote:
> scsi_abort_command() (ie the call which triggers async aborts) is only
> called if the .eh_timed_out() callback returns BLK_EH_NOT_HANDLED.
> So during reconnects we will not schedule any calls to async aborts.
> We _might_ have scheduled an async abort prior to a call to
> srp_reconnect_rport(), but that would be equivalent with calling
> srp_reconnect_rport() with commands still in flight.
> Is that even possible?
> If so, how do you handle these commands after reconnect returns?
> Any I_T_L nexus will be gone from the target, right?

srp_reconnect_rport() can proceed while commands are in flight. What will
happen is that no response will be received for the commands that are in
flight and hence that these commands will time out at the initiator side.
However, any newly submitted commands will be processed normally.

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 2/5] scsi: make eh_eflags persistent
  2017-02-22 16:07 ` [PATCHv2 2/5] scsi: make eh_eflags persistent Hannes Reinecke
@ 2017-02-27 22:15   ` Bart Van Assche
  2017-02-28 12:09     ` Hannes Reinecke
  0 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2017-02-27 22:15 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: hch, james.bottomley, linux-scsi, jth

On Wed, 2017-02-22 at 17:07 +0100, Hannes Reinecke wrote:
> To detect if a failed command has been retried we must not
> clear scmd->eh_eflags when EH finishes.
> The flag should be persistent throughout the lifetime
> of the command.

Hello Hannes,

Is it on purpose that this patch does not remove the following statement from
drivers/scsi/libsas/sas_scsi_host.c?

		cmd->eh_eflags = 0;

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 3/5] scsi_error: do not escalate failed EH command
  2017-02-22 16:07 ` [PATCHv2 3/5] scsi_error: do not escalate failed EH command Hannes Reinecke
@ 2017-02-27 22:21   ` Bart Van Assche
  0 siblings, 0 replies; 19+ messages in thread
From: Bart Van Assche @ 2017-02-27 22:21 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: hch, james.bottomley, linux-scsi, jth

On Wed, 2017-02-22 at 17:07 +0100, Hannes Reinecke wrote:
> When a command is sent as part of the error handling there
> is not point whatsoever to start EH escalation when that
> command fails; we are _already_ in the error handler,
> and the escalation is about to commence anyway.
> So just call 'scsi_try_to_abort_cmd()' to abort outstanding
> commands and let the main EH routine handle the rest.

Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 4/5] scsi: make scsi_eh_scmd_add() always succeed
  2017-02-22 16:07 ` [PATCHv2 4/5] scsi: make scsi_eh_scmd_add() always succeed Hannes Reinecke
@ 2017-02-27 22:27   ` Bart Van Assche
  2017-02-28 12:16     ` Hannes Reinecke
  0 siblings, 1 reply; 19+ messages in thread
From: Bart Van Assche @ 2017-02-27 22:27 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: hch, james.bottomley, linux-scsi, jth

On Wed, 2017-02-22 at 17:07 +0100, Hannes Reinecke wrote:
>  	spin_lock_irqsave(shost->host_lock, flags);
> +	WARN_ON(shost->shost_state != SHOST_RUNNING &&
> +		shost->shost_state != SHOST_CANCEL &&
> +		shost->shost_state != SHOST_RECOVERY &&
> +		shost->shost_state != SHOST_CANCEL_RECOVERY);
>  	if (scsi_host_set_state(shost, SHOST_RECOVERY))
> -		if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY))
> -			goto out_unlock;
> +		scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY);

Please issue a warning if the second scsi_host_set_state() fails. And once
that failure triggers a warning, I don't think we need the newly added
WARN_ON() statement anymore. Something else that surprised me is that you
consistently use WARN_ON() instead of WARN_ON_ONCE() in this patch?
  
Otherwise this patch looks fine to me.

Bart.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 5/5] scsi: make asynchronous aborts mandatory
  2017-02-22 16:07 ` [PATCHv2 5/5] scsi: make asynchronous aborts mandatory Hannes Reinecke
@ 2017-02-27 22:53   ` Bart Van Assche
  0 siblings, 0 replies; 19+ messages in thread
From: Bart Van Assche @ 2017-02-27 22:53 UTC (permalink / raw)
  To: hare, martin.petersen; +Cc: hch, james.bottomley, linux-scsi, jth

On Wed, 2017-02-22 at 17:07 +0100, Hannes Reinecke wrote:
> There hasn't been any reports for HBAs where asynchronous abort
> would not work, so we should make it mandatory and remove
> the fallback.

Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 2/5] scsi: make eh_eflags persistent
  2017-02-27 22:15   ` Bart Van Assche
@ 2017-02-28 12:09     ` Hannes Reinecke
  0 siblings, 0 replies; 19+ messages in thread
From: Hannes Reinecke @ 2017-02-28 12:09 UTC (permalink / raw)
  To: Bart Van Assche, martin.petersen; +Cc: hch, james.bottomley, linux-scsi, jth

On 02/27/2017 11:15 PM, Bart Van Assche wrote:
> On Wed, 2017-02-22 at 17:07 +0100, Hannes Reinecke wrote:
>> To detect if a failed command has been retried we must not
>> clear scmd->eh_eflags when EH finishes.
>> The flag should be persistent throughout the lifetime
>> of the command.
> 
> Hello Hannes,
> 
> Is it on purpose that this patch does not remove the following statement from
> drivers/scsi/libsas/sas_scsi_host.c?
> 
> 		cmd->eh_eflags = 0;
> 
Oh, yes, you are right.

Will be fixing it up.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCHv2 4/5] scsi: make scsi_eh_scmd_add() always succeed
  2017-02-27 22:27   ` Bart Van Assche
@ 2017-02-28 12:16     ` Hannes Reinecke
  0 siblings, 0 replies; 19+ messages in thread
From: Hannes Reinecke @ 2017-02-28 12:16 UTC (permalink / raw)
  To: Bart Van Assche, martin.petersen; +Cc: hch, james.bottomley, linux-scsi, jth

On 02/27/2017 11:27 PM, Bart Van Assche wrote:
> On Wed, 2017-02-22 at 17:07 +0100, Hannes Reinecke wrote:
>>  	spin_lock_irqsave(shost->host_lock, flags);
>> +	WARN_ON(shost->shost_state != SHOST_RUNNING &&
>> +		shost->shost_state != SHOST_CANCEL &&
>> +		shost->shost_state != SHOST_RECOVERY &&
>> +		shost->shost_state != SHOST_CANCEL_RECOVERY);
>>  	if (scsi_host_set_state(shost, SHOST_RECOVERY))
>> -		if (scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY))
>> -			goto out_unlock;
>> +		scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY);
> 
> Please issue a warning if the second scsi_host_set_state() fails. And once
> that failure triggers a warning, I don't think we need the newly added
> WARN_ON() statement anymore. Something else that surprised me is that you
> consistently use WARN_ON() instead of WARN_ON_ONCE() in this patch?
> 
Okay, will be fixing it up.

> Otherwise this patch looks fine to me.
> 
Thanks.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-02-28 12:53 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-22 16:07 [PATCHv2 0/5] SCSI EH cleanup Hannes Reinecke
2017-02-22 16:07 ` [PATCHv2 1/5] libsas: allow async aborts Hannes Reinecke
2017-02-22 16:07 ` [PATCHv2 2/5] scsi: make eh_eflags persistent Hannes Reinecke
2017-02-27 22:15   ` Bart Van Assche
2017-02-28 12:09     ` Hannes Reinecke
2017-02-22 16:07 ` [PATCHv2 3/5] scsi_error: do not escalate failed EH command Hannes Reinecke
2017-02-27 22:21   ` Bart Van Assche
2017-02-22 16:07 ` [PATCHv2 4/5] scsi: make scsi_eh_scmd_add() always succeed Hannes Reinecke
2017-02-27 22:27   ` Bart Van Assche
2017-02-28 12:16     ` Hannes Reinecke
2017-02-22 16:07 ` [PATCHv2 5/5] scsi: make asynchronous aborts mandatory Hannes Reinecke
2017-02-27 22:53   ` Bart Van Assche
2017-02-22 19:44 ` [PATCHv2 0/5] SCSI EH cleanup Bart Van Assche
2017-02-23  7:27   ` Hannes Reinecke
2017-02-22 21:21 ` Bart Van Assche
2017-02-23  7:35   ` Hannes Reinecke
2017-02-24  4:01 ` Bart Van Assche
2017-02-24  7:15   ` Hannes Reinecke
2017-02-27 22:10     ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.