All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] New FC timeout handler
@ 2013-05-24  9:50 Hannes Reinecke
  2013-05-24  9:50 ` [PATCH 1/4] scsi: move initialization of scmd->eh_entry Hannes Reinecke
                   ` (4 more replies)
  0 siblings, 5 replies; 16+ messages in thread
From: Hannes Reinecke @ 2013-05-24  9:50 UTC (permalink / raw)
  To: linux-scsi
  Cc: Ewan Milne, James Smart, Bryn Reeves, Roland Dreier, Hannes Reinecke

Hi all,

this is the first step towards a new FC error handler.
This patch implements a new FC command timeout handler
which will be sending command aborts inline without
engaging SCSI EH.

In addition the commands will be returned directly
if the command abort succeeded, cutting down recovery
times dramatically.

With the original scsi error recovery I got:
# time dd if=/dev/zero of=/mnt/test.blk bs=512 count=2048 oflag=sync
2048+0 records in
2048+0 records out
1048576 bytes (1.0 MB) copied, 3.72732 s, 281 kB/s

real	2m14.475s
user	0m0.000s
sys	0m0.104s

with this patchset I got:
# time dd if=/dev/zero of=/mnt/test.blk bs=512 count=2048 oflag=sync
2048+0 records in
2048+0 records out
1048576 bytes (1.0 MB) copied, 79.6376 s, 13.2 kB/s

real	1m19.642s
user	0m0.000s
sys	0m0.100s

Test was to disable RSCN on the target port, disable the
target port, and then start the 'dd' command as indicated.

Comments etc are welcome.

Hannes Reinecke (4):
  scsi: move initialization of scmd->eh_entry
  blk-timeout: add BLK_EH_SCHEDULED return code
  scsi: export functions for new fc timeout handler
  scsi_transport_fc: FC timeout handler

 drivers/scsi/scsi_error.c        |  8 ++++-
 drivers/scsi/scsi_lib.c          |  6 ++--
 drivers/scsi/scsi_priv.h         |  2 ++
 drivers/scsi/scsi_transport_fc.c | 63 +++++++++++++++++++++++++++++++++++++++-
 include/linux/blkdev.h           |  1 +
 include/scsi/scsi_transport_fc.h |  2 ++
 6 files changed, 78 insertions(+), 4 deletions(-)

-- 
1.7.12.4


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] scsi: move initialization of scmd->eh_entry
  2013-05-24  9:50 [PATCH 0/4] New FC timeout handler Hannes Reinecke
@ 2013-05-24  9:50 ` Hannes Reinecke
  2013-05-24 16:57   ` Jörn Engel
  2013-05-24  9:50 ` [PATCH 2/4] blk-timeout: add BLK_EH_SCHEDULED return code Hannes Reinecke
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 16+ messages in thread
From: Hannes Reinecke @ 2013-05-24  9:50 UTC (permalink / raw)
  To: linux-scsi
  Cc: Ewan Milne, James Smart, Bryn Reeves, Roland Dreier, Hannes Reinecke

The 'eh_entry' list might be used even before scsi_softirq_done()
is called. Hence we should rather initialize it together with
the other eh-related variables.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/scsi_lib.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index c31187d..a5515ec 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -318,6 +318,8 @@ static void scsi_init_cmd_errh(struct scsi_cmnd *cmd)
 	memset(cmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
 	if (cmd->cmd_len == 0)
 		cmd->cmd_len = scsi_command_size(cmd->cmnd);
+	INIT_LIST_HEAD(&cmd->eh_entry);
+	cmd->eh_eflags = 0;
 }
 
 void scsi_device_unbusy(struct scsi_device *sdev)
@@ -1483,8 +1485,6 @@ static void scsi_softirq_done(struct request *rq)
 	unsigned long wait_for = (cmd->allowed + 1) * rq->timeout;
 	int disposition;
 
-	INIT_LIST_HEAD(&cmd->eh_entry);
-
 	atomic_inc(&cmd->device->iodone_cnt);
 	if (cmd->result)
 		atomic_inc(&cmd->device->ioerr_cnt);
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/4] blk-timeout: add BLK_EH_SCHEDULED return code
  2013-05-24  9:50 [PATCH 0/4] New FC timeout handler Hannes Reinecke
  2013-05-24  9:50 ` [PATCH 1/4] scsi: move initialization of scmd->eh_entry Hannes Reinecke
@ 2013-05-24  9:50 ` Hannes Reinecke
  2013-05-24  9:50 ` [PATCH 3/4] scsi: export functions for new fc timeout handler Hannes Reinecke
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 16+ messages in thread
From: Hannes Reinecke @ 2013-05-24  9:50 UTC (permalink / raw)
  To: linux-scsi
  Cc: Ewan Milne, James Smart, Bryn Reeves, Roland Dreier, Hannes Reinecke

Add a 'BLK_EH_SCHEDULED' return code for blk-timeout to indicate
that a delayed error recovery has been initiated.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/scsi_error.c | 3 +++
 include/linux/blkdev.h    | 1 +
 2 files changed, 4 insertions(+)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index c1b05a8..bef6799e 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -145,6 +145,9 @@ enum blk_eh_timer_return scsi_times_out(struct request *req)
 		rtn = host->hostt->eh_timed_out(scmd);
 
 	scmd->result |= DID_TIME_OUT << 16;
+	/* Check for delayed EH scheduling */
+	if (rtn == BLK_EH_SCHEDULED)
+		return BLK_EH_NOT_HANDLED;
 
 	if (unlikely(rtn == BLK_EH_NOT_HANDLED &&
 		     !scsi_eh_scmd_add(scmd, SCSI_EH_CANCEL_CMD)))
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 78feda9..0a38211 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -238,6 +238,7 @@ enum blk_eh_timer_return {
 	BLK_EH_NOT_HANDLED,
 	BLK_EH_HANDLED,
 	BLK_EH_RESET_TIMER,
+	BLK_EH_SCHEDULED,
 };
 
 typedef enum blk_eh_timer_return (rq_timed_out_fn)(struct request *);
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/4] scsi: export functions for new fc timeout handler
  2013-05-24  9:50 [PATCH 0/4] New FC timeout handler Hannes Reinecke
  2013-05-24  9:50 ` [PATCH 1/4] scsi: move initialization of scmd->eh_entry Hannes Reinecke
  2013-05-24  9:50 ` [PATCH 2/4] blk-timeout: add BLK_EH_SCHEDULED return code Hannes Reinecke
@ 2013-05-24  9:50 ` Hannes Reinecke
  2013-05-24  9:50 ` [PATCH 4/4] scsi_transport_fc: FC " Hannes Reinecke
  2013-05-30  8:37 ` [PATCH 0/4] New " Ren Mingxin
  4 siblings, 0 replies; 16+ messages in thread
From: Hannes Reinecke @ 2013-05-24  9:50 UTC (permalink / raw)
  To: linux-scsi
  Cc: Ewan Milne, James Smart, Bryn Reeves, Roland Dreier, Hannes Reinecke

Export some more functions which are used by the new FC timeout
handler.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/scsi_error.c | 5 ++++-
 drivers/scsi/scsi_lib.c   | 2 ++
 drivers/scsi/scsi_priv.h  | 2 ++
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index bef6799e..f21ad7c 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -119,6 +119,7 @@ int scsi_eh_scmd_add(struct scsi_cmnd *scmd, int eh_flag)
 	spin_unlock_irqrestore(shost->host_lock, flags);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(scsi_eh_scmd_add);
 
 /**
  * scsi_times_out - Timeout function for normal scsi commands.
@@ -659,13 +660,14 @@ static int scsi_try_bus_device_reset(struct scsi_cmnd *scmd)
 	return rtn;
 }
 
-static int scsi_try_to_abort_cmd(struct scsi_host_template *hostt, struct scsi_cmnd *scmd)
+int scsi_try_to_abort_cmd(struct scsi_host_template *hostt, struct scsi_cmnd *scmd)
 {
 	if (!hostt->eh_abort_handler)
 		return FAILED;
 
 	return hostt->eh_abort_handler(scmd);
 }
+EXPORT_SYMBOL_GPL(scsi_try_to_abort_cmd);
 
 static void scsi_abort_eh_cmnd(struct scsi_cmnd *scmd)
 {
@@ -1412,6 +1414,7 @@ int scsi_noretry_cmd(struct scsi_cmnd *scmd)
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(scsi_noretry_cmd);
 
 /**
  * scsi_decide_disposition - Disposition a cmd on return from LLD.
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index a5515ec..7306aaa 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -210,6 +210,8 @@ void scsi_queue_insert(struct scsi_cmnd *cmd, int reason)
 {
 	__scsi_queue_insert(cmd, reason, 1);
 }
+EXPORT_SYMBOL_GPL(scsi_queue_insert);
+
 /**
  * scsi_execute - insert request and wait for the result
  * @sdev:	scsi device
diff --git a/drivers/scsi/scsi_priv.h b/drivers/scsi/scsi_priv.h
index 8f9a0ca..1d8d7d1 100644
--- a/drivers/scsi/scsi_priv.h
+++ b/drivers/scsi/scsi_priv.h
@@ -71,6 +71,8 @@ extern int scsi_error_handler(void *host);
 extern int scsi_decide_disposition(struct scsi_cmnd *cmd);
 extern void scsi_eh_wakeup(struct Scsi_Host *shost);
 extern int scsi_eh_scmd_add(struct scsi_cmnd *, int);
+extern int scsi_try_to_abort_cmd(struct scsi_host_template *hostt,
+				 struct scsi_cmnd *scmd);
 void scsi_eh_ready_devs(struct Scsi_Host *shost,
 			struct list_head *work_q,
 			struct list_head *done_q);
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/4] scsi_transport_fc: FC timeout handler
  2013-05-24  9:50 [PATCH 0/4] New FC timeout handler Hannes Reinecke
                   ` (2 preceding siblings ...)
  2013-05-24  9:50 ` [PATCH 3/4] scsi: export functions for new fc timeout handler Hannes Reinecke
@ 2013-05-24  9:50 ` Hannes Reinecke
  2013-05-25  5:08   ` Christoph Hellwig
  2013-05-30  8:37 ` [PATCH 0/4] New " Ren Mingxin
  4 siblings, 1 reply; 16+ messages in thread
From: Hannes Reinecke @ 2013-05-24  9:50 UTC (permalink / raw)
  To: linux-scsi
  Cc: Ewan Milne, James Smart, Bryn Reeves, Roland Dreier, Hannes Reinecke

When a command runs into a timeout we need to send an 'ABORT TASK'
TMF. This is typically done by the 'eh_abort_handler' LLDD callback.

Conceptually, however, this function is a normal SCSI command, so
there is no need to enter the error handler.

This patch implements a new FC timeout handler which invokes
the 'eh_abort_handler' directly when the timeout occurs without
entering the error handler.

If the 'eh_abort_handler' returns SUCCESS or FAST_IO_FAIL the
command will be retried if possible. If no retries are allowed
the command will be returned immediately, as we have to assume
the TMF succeeded and the command is completed with the LLDD.
For any other return code from 'eh_abort_handler' the command
will be pushed onto the existing SCSI EH handler, or aborted
with an error if that fails.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/scsi_transport_fc.c | 63 +++++++++++++++++++++++++++++++++++++++-
 include/scsi/scsi_transport_fc.h |  2 ++
 2 files changed, 64 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index e106c27..370b263 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -2048,6 +2048,51 @@ static int fc_vport_match(struct attribute_container *cont,
 	return &i->vport_attr_cont.ac == cont;
 }
 
+/**
+ * fc_rport_eh_handler - FC-specific error handler
+ * @work:	remote port on which the error occurred.
+ */
+static void
+fc_rport_eh_handler(struct work_struct *work)
+{
+	struct fc_rport *rport =
+		container_of(work, struct fc_rport, rport_eh_work);
+	struct Scsi_Host *shost = rport_to_shost(rport);
+	struct scsi_cmnd *scmd, *tmp;
+	unsigned long flags;
+	int rtn;
+
+	spin_lock_irqsave(shost->host_lock, flags);
+	list_for_each_entry_safe(scmd, tmp, &rport->eh_work_q, eh_entry) {
+		list_del_init(&scmd->eh_entry);
+		spin_unlock_irqrestore(shost->host_lock, flags);
+		dev_printk(KERN_WARNING, &rport->dev,
+			   "trying to abort scmd %p", scmd);
+		rtn = scsi_try_to_abort_cmd(shost->hostt, scmd);
+		if (rtn == SUCCESS || rtn == FAST_IO_FAIL) {
+			if (!scsi_noretry_cmd(scmd) &&
+			    (++scmd->retries <= scmd->allowed)) {
+				dev_printk(KERN_WARNING, &rport->dev,
+					   "retry scmd %p", scmd);
+				scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY);
+			} else {
+				dev_printk(KERN_WARNING, &rport->dev,
+					   "fast fail scmd %p", scmd);
+				scmd->result |= DID_TRANSPORT_FAILFAST << 16;
+				scsi_finish_command(scmd);
+			}
+		} else {
+			if (!scsi_eh_scmd_add(scmd, 0)) {
+				dev_printk(KERN_WARNING, &rport->dev,
+					   "terminate scmd %p", scmd);
+				scmd->result |= DID_TIME_OUT << 16;
+				scsi_finish_command(scmd);
+			}
+		}
+		spin_lock_irqsave(shost->host_lock, flags);
+	}
+	spin_unlock_irqrestore(shost->host_lock, flags);
+}
 
 /**
  * fc_timed_out - FC Transport I/O timeout intercept handler
@@ -2075,11 +2120,25 @@ static enum blk_eh_timer_return
 fc_timed_out(struct scsi_cmnd *scmd)
 {
 	struct fc_rport *rport = starget_to_rport(scsi_target(scmd->device));
+	struct Scsi_Host *shost = rport_to_shost(rport);
+	unsigned long flags;
+	int kick_worker = 0;
 
 	if (rport->port_state == FC_PORTSTATE_BLOCKED)
 		return BLK_EH_RESET_TIMER;
 
-	return BLK_EH_NOT_HANDLED;
+	spin_lock_irqsave(shost->host_lock, flags);
+	if (list_empty(&rport->eh_work_q))
+		kick_worker = 1;
+	list_add(&scmd->eh_entry, &rport->eh_work_q);
+	dev_printk(KERN_WARNING, &rport->dev,
+		   "scmd %p added to eh queue\n", scmd);
+	spin_unlock_irqrestore(shost->host_lock, flags);
+
+	if (kick_worker)
+		fc_queue_work(shost, &rport->rport_eh_work);
+
+	return BLK_EH_SCHEDULED;
 }
 
 /*
@@ -2630,11 +2689,13 @@ fc_rport_create(struct Scsi_Host *shost, int channel,
 	rport->channel = channel;
 	rport->fast_io_fail_tmo = -1;
 
+	INIT_LIST_HEAD(&rport->eh_work_q);
 	INIT_DELAYED_WORK(&rport->dev_loss_work, fc_timeout_deleted_rport);
 	INIT_DELAYED_WORK(&rport->fail_io_work, fc_timeout_fail_rport_io);
 	INIT_WORK(&rport->scan_work, fc_scsi_scan_rport);
 	INIT_WORK(&rport->stgt_delete_work, fc_starget_delete);
 	INIT_WORK(&rport->rport_delete_work, fc_rport_final_delete);
+	INIT_WORK(&rport->rport_eh_work, fc_rport_eh_handler);
 
 	spin_lock_irqsave(shost->host_lock, flags);
 
diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h
index b797e8f..ecf8934 100644
--- a/include/scsi/scsi_transport_fc.h
+++ b/include/scsi/scsi_transport_fc.h
@@ -345,12 +345,14 @@ struct fc_rport {	/* aka fc_starget_attrs */
 	u32 number;
 	u8 flags;
 	struct list_head peers;
+	struct list_head eh_work_q;
 	struct device dev;
  	struct delayed_work dev_loss_work;
  	struct work_struct scan_work;
  	struct delayed_work fail_io_work;
  	struct work_struct stgt_delete_work;
 	struct work_struct rport_delete_work;
+	struct work_struct rport_eh_work;
 	struct request_queue *rqst_q;	/* bsg support */
 } __attribute__((aligned(sizeof(unsigned long))));
 
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] scsi: move initialization of scmd->eh_entry
  2013-05-24  9:50 ` [PATCH 1/4] scsi: move initialization of scmd->eh_entry Hannes Reinecke
@ 2013-05-24 16:57   ` Jörn Engel
  2013-05-25  8:47     ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: Jörn Engel @ 2013-05-24 16:57 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: linux-scsi, Ewan Milne, James Smart, Bryn Reeves, Roland Dreier

On Fri, 24 May 2013 11:50:47 +0200, Hannes Reinecke wrote:
> 
> The 'eh_entry' list might be used even before scsi_softirq_done()
> is called. Hence we should rather initialize it together with
> the other eh-related variables.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  drivers/scsi/scsi_lib.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index c31187d..a5515ec 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -318,6 +318,8 @@ static void scsi_init_cmd_errh(struct scsi_cmnd *cmd)
>  	memset(cmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
>  	if (cmd->cmd_len == 0)
>  		cmd->cmd_len = scsi_command_size(cmd->cmnd);
> +	INIT_LIST_HEAD(&cmd->eh_entry);
> +	cmd->eh_eflags = 0;

I always suspect something subtle going on when a variable is
initialized to 0 three lines below a memset.  Can line this go away?

Jörn

--
Those who come seeking peace without a treaty are plotting.
-- Sun Tzu
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] scsi_transport_fc: FC timeout handler
  2013-05-24  9:50 ` [PATCH 4/4] scsi_transport_fc: FC " Hannes Reinecke
@ 2013-05-25  5:08   ` Christoph Hellwig
  2013-05-25  9:38     ` Hannes Reinecke
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2013-05-25  5:08 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: linux-scsi, Ewan Milne, James Smart, Bryn Reeves, Roland Dreier

This looks like a good start, but why would we make this FC specific?


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] scsi_transport_fc: FC timeout handler
  2013-05-25  9:38     ` Hannes Reinecke
@ 2013-05-25  8:42       ` Christoph Hellwig
  2013-05-25  9:55         ` Hannes Reinecke
  0 siblings, 1 reply; 16+ messages in thread
From: Christoph Hellwig @ 2013-05-25  8:42 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, linux-scsi, Ewan Milne, James Smart,
	Bryn Reeves, Roland Dreier

On Sat, May 25, 2013 at 11:38:55AM +0200, Hannes Reinecke wrote:
> On 05/25/2013 07:08 AM, Christoph Hellwig wrote:
> >This looks like a good start, but why would we make this FC specific?
> >
> Because James B. said so :-)

I can't remember that at all, but I'll happily let him speak.

> No, seriously:
> You would need to revisit the good old SCSI parallel LLDDs to figure
> out if this approach works for them.
> Eg aic7xxx basically has to stop the entire HBA even for an abort as
> it has to look through its internal queues etc to locate the
> command.
> So for them it wouldn't make much difference.
> Plus no-one has the knowledge nor the equipment to check _all_ SPI
> LLDDs to ensure this will work for them.

This is not what I suggested.  But by making the new code a common
library we can also make other modern transports like SAS or iSCSI
benefit from the saner error handling.   With motivated enough
maintainers even modern SPI adapters - basically make it the same kind
of transition from the old "old" EH to the current "new" EH back in the
2.2 days.

With a bit of luck 10 years down the road all drivers requiring current
EH will have faded away.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] scsi: move initialization of scmd->eh_entry
  2013-05-24 16:57   ` Jörn Engel
@ 2013-05-25  8:47     ` Christoph Hellwig
  0 siblings, 0 replies; 16+ messages in thread
From: Christoph Hellwig @ 2013-05-25  8:47 UTC (permalink / raw)
  To: J??rn Engel
  Cc: Hannes Reinecke, linux-scsi, Ewan Milne, James Smart,
	Bryn Reeves, Roland Dreier

On Fri, May 24, 2013 at 12:57:40PM -0400, J??rn Engel wrote:
> >  	memset(cmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
> >  	if (cmd->cmd_len == 0)
> >  		cmd->cmd_len = scsi_command_size(cmd->cmnd);
> > +	INIT_LIST_HEAD(&cmd->eh_entry);
> > +	cmd->eh_eflags = 0;
> 
> I always suspect something subtle going on when a variable is
> initialized to 0 three lines below a memset.  Can line this go away?

The memset only zeroes the sense_buffer, not the whole scsi command
which would cause a major havoc for the callers.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] scsi_transport_fc: FC timeout handler
  2013-05-25  5:08   ` Christoph Hellwig
@ 2013-05-25  9:38     ` Hannes Reinecke
  2013-05-25  8:42       ` Christoph Hellwig
  0 siblings, 1 reply; 16+ messages in thread
From: Hannes Reinecke @ 2013-05-25  9:38 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-scsi, Ewan Milne, James Smart, Bryn Reeves, Roland Dreier

On 05/25/2013 07:08 AM, Christoph Hellwig wrote:
> This looks like a good start, but why would we make this FC specific?
>
Because James B. said so :-)

No, seriously:
You would need to revisit the good old SCSI parallel LLDDs to figure out 
if this approach works for them.
Eg aic7xxx basically has to stop the entire HBA even for an abort as it 
has to look through its internal queues etc to locate the command.
So for them it wouldn't make much difference.
Plus no-one has the knowledge nor the equipment to check _all_ SPI LLDDs 
to ensure this will work for them.

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] scsi_transport_fc: FC timeout handler
  2013-05-25  8:42       ` Christoph Hellwig
@ 2013-05-25  9:55         ` Hannes Reinecke
  2013-06-05 21:55           ` Jörn Engel
  0 siblings, 1 reply; 16+ messages in thread
From: Hannes Reinecke @ 2013-05-25  9:55 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-scsi, Ewan Milne, James Smart, Bryn Reeves, Roland Dreier

On 05/25/2013 10:42 AM, Christoph Hellwig wrote:
> On Sat, May 25, 2013 at 11:38:55AM +0200, Hannes Reinecke wrote:
>> On 05/25/2013 07:08 AM, Christoph Hellwig wrote:
>>> This looks like a good start, but why would we make this FC specific?
>>>
>> Because James B. said so :-)
>
> I can't remember that at all, but I'll happily let him speak.
>
>> No, seriously:
>> You would need to revisit the good old SCSI parallel LLDDs to figure
>> out if this approach works for them.
>> Eg aic7xxx basically has to stop the entire HBA even for an abort as
>> it has to look through its internal queues etc to locate the
>> command.
>> So for them it wouldn't make much difference.
>> Plus no-one has the knowledge nor the equipment to check _all_ SPI
>> LLDDs to ensure this will work for them.
>
> This is not what I suggested.  But by making the new code a common
> library we can also make other modern transports like SAS or iSCSI
> benefit from the saner error handling.   With motivated enough
> maintainers even modern SPI adapters - basically make it the same kind
> of transition from the old "old" EH to the current "new" EH back in the
> 2.2 days.
>
> With a bit of luck 10 years down the road all drivers requiring current
> EH will have faded away.
>
It should be possible to move the code into scsi_lib and just have small 
hooks for the individual transports to use this.

The main reason I didn't do this atm is that I'm still working on the 
remaining bits:
- Use 'ABORT TASK SET' instead of 'LUN RESET'
- Do not stop the host when issuing 'ABORT TASK SET'; it should be
   sufficient to set the sdev to 'QUIESCE' and let the other LUNs
   unaffected.
- Use 'reset I_T nexus' instead of 'TARGET RESET'
- Similarly, do not stop the host when running 'reset I_T nexus',
   but use existing transport mechanisms (eg setting the rport
   to 'BLOCKED' for FC) to block I/O to the affected target.

So the plan is to implement the above for FC, then see how much can be 
leveraged for other transports, and then move these bits to the common 
layer.

Cheers,

Hannes


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4] New FC timeout handler
  2013-05-24  9:50 [PATCH 0/4] New FC timeout handler Hannes Reinecke
                   ` (3 preceding siblings ...)
  2013-05-24  9:50 ` [PATCH 4/4] scsi_transport_fc: FC " Hannes Reinecke
@ 2013-05-30  8:37 ` Ren Mingxin
  2013-05-30 12:01   ` Hannes Reinecke
  4 siblings, 1 reply; 16+ messages in thread
From: Ren Mingxin @ 2013-05-30  8:37 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: linux-scsi, Ewan Milne, James Smart, Bryn Reeves, Roland Dreier,
	Christoph Hellwig

Hi, Hannes:

On 05/24/2013 05:50 PM, Hannes Reinecke wrote:
> this is the first step towards a new FC error handler.
> This patch implements a new FC command timeout handler
> which will be sending command aborts inline without
> engaging SCSI EH.
>
> In addition the commands will be returned directly
> if the command abort succeeded, cutting down recovery
> times dramatically.

To the commands which can be aborted successfully, I guess your
patchset has solved the problem "the error handler can't even be
called until host_failed == host_busy", because it needn't to
wait for the scheduling of EH threads(without engaging SCSI EH
as you said) now, right?

> For any other return code from 'eh_abort_handler' the command
> will be pushed onto the existing SCSI EH handler, or aborted
> with an error if that fails.

To the commands which can NOT be aborted successfully, there is
not any improvements for the SCSI EH will be invoked as usual.
But should we consider the repetitive/time-consuming issue for
the commands will be tried to abort again in the SCSI EH handler?

Thanks,
Ren

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/4] New FC timeout handler
  2013-05-30  8:37 ` [PATCH 0/4] New " Ren Mingxin
@ 2013-05-30 12:01   ` Hannes Reinecke
  0 siblings, 0 replies; 16+ messages in thread
From: Hannes Reinecke @ 2013-05-30 12:01 UTC (permalink / raw)
  To: Ren Mingxin
  Cc: linux-scsi, Ewan Milne, James Smart, Bryn Reeves, Roland Dreier,
	Christoph Hellwig

On 05/30/2013 10:37 AM, Ren Mingxin wrote:
> Hi, Hannes:
>
> On 05/24/2013 05:50 PM, Hannes Reinecke wrote:
>> this is the first step towards a new FC error handler.
>> This patch implements a new FC command timeout handler
>> which will be sending command aborts inline without
>> engaging SCSI EH.
>>
>> In addition the commands will be returned directly
>> if the command abort succeeded, cutting down recovery
>> times dramatically.
>
> To the commands which can be aborted successfully, I guess your
> patchset has solved the problem "the error handler can't even be
> called until host_failed == host_busy", because it needn't to
> wait for the scheduling of EH threads(without engaging SCSI EH
> as you said) now, right?
>
Yes.

>> For any other return code from 'eh_abort_handler' the command
>> will be pushed onto the existing SCSI EH handler, or aborted
>> with an error if that fails.
>
> To the commands which can NOT be aborted successfully, there is
> not any improvements for the SCSI EH will be invoked as usual.

Correct. But I'm working on a patchset to improve that.

> But should we consider the repetitive/time-consuming issue for
> the commands will be tried to abort again in the SCSI EH handler?
>
I doubt this is an issue. With this patchset even aborted commands
will be retried, with the usual retry count.
And for an successful abort we don't need to invoke the error handler
as target itself is considered alive.
If the abort itself failed the command will be pushed onto the SCSI EH 
handler, instructing the EH handler _not_ to abort the command.

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/4] scsi_transport_fc: FC timeout handler
  2013-05-25  9:55         ` Hannes Reinecke
@ 2013-06-05 21:55           ` Jörn Engel
  0 siblings, 0 replies; 16+ messages in thread
From: Jörn Engel @ 2013-06-05 21:55 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, linux-scsi, Ewan Milne, James Smart,
	Bryn Reeves, Roland Dreier

On Sat, 25 May 2013 11:55:58 +0200, Hannes Reinecke wrote:
> >
> It should be possible to move the code into scsi_lib and just have
> small hooks for the individual transports to use this.

We have done something a bit tasteless and unconditionally enabled
your new error handling for all scsi drivers.  That solved a rather
serious problem for us and the subset of scsi drivers we actually
exercise seems to be doing fine.  Maybe we could add a hook into
scsi_times_out() that does the unconditional thing if users opt-in by
setting CONFIG_I_MIGHT_BREAK_STUFF or so.

We also have a scsi error injector now that should allow testing your
patches on any hardware, including kvm.  On a trial run the system
eventually rebooted (likely through a hung task timeout, haven't
checked) with the old error handling and lost a single drive with your
new code.  Awesome!

So my vote is for ignoring the remaining bits and pushing forward.
Your code is too useful to wait another half year.

And we should brush up the error injector and send it out as well.
Testing your code on cheap hardware would be nice.

Jörn

--
He who knows others is wise.
He who knows himself is enlightened.
-- Lao Tsu
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/4] scsi: move initialization of scmd->eh_entry
  2013-06-06  9:43 ` [PATCH 1/4] scsi: move initialization of scmd->eh_entry Hannes Reinecke
@ 2013-06-06 16:14   ` Jörn Engel
  0 siblings, 0 replies; 16+ messages in thread
From: Jörn Engel @ 2013-06-06 16:14 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: James Bottomley, linux-scsi, Ewan Milne, James Smart,
	Ren Mingxin, Roland Dreier, Bryn Reeves, Christoph Hellwig

On Thu, 6 June 2013 11:43:52 +0200, Hannes Reinecke wrote:
> 
> The 'eh_entry' list might be used even before scsi_softirq_done()
> is called. Hence we should rather initialize it together with
> the other eh-related variables.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>

Tested-by: Joern Engel <joern@logfs.org>

Jörn

--
It is better to die of hunger having lived without grief and fear,
than to live with a troubled spirit amid abundance.
-- Epictetus
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/4] scsi: move initialization of scmd->eh_entry
  2013-06-06  9:43 [PATCH 0/4] New SCSI command " Hannes Reinecke
@ 2013-06-06  9:43 ` Hannes Reinecke
  2013-06-06 16:14   ` Jörn Engel
  0 siblings, 1 reply; 16+ messages in thread
From: Hannes Reinecke @ 2013-06-06  9:43 UTC (permalink / raw)
  To: James Bottomley
  Cc: linux-scsi, Joern Engel, Ewan Milne, James Smart, Ren Mingxin,
	Roland Dreier, Bryn Reeves, Christoph Hellwig, Hannes Reinecke

The 'eh_entry' list might be used even before scsi_softirq_done()
is called. Hence we should rather initialize it together with
the other eh-related variables.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 drivers/scsi/scsi_lib.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 86d5220..c1774de 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -317,6 +317,8 @@ static void scsi_init_cmd_errh(struct scsi_cmnd *cmd)
 	memset(cmd->sense_buffer, 0, SCSI_SENSE_BUFFERSIZE);
 	if (cmd->cmd_len == 0)
 		cmd->cmd_len = scsi_command_size(cmd->cmnd);
+	INIT_LIST_HEAD(&cmd->eh_entry);
+	cmd->eh_eflags = 0;
 }
 
 void scsi_device_unbusy(struct scsi_device *sdev)
@@ -1482,8 +1484,6 @@ static void scsi_softirq_done(struct request *rq)
 	unsigned long wait_for = (cmd->allowed + 1) * rq->timeout;
 	int disposition;
 
-	INIT_LIST_HEAD(&cmd->eh_entry);
-
 	atomic_inc(&cmd->device->iodone_cnt);
 	if (cmd->result)
 		atomic_inc(&cmd->device->ioerr_cnt);
-- 
1.7.12.4


^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-06-06 17:43 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-24  9:50 [PATCH 0/4] New FC timeout handler Hannes Reinecke
2013-05-24  9:50 ` [PATCH 1/4] scsi: move initialization of scmd->eh_entry Hannes Reinecke
2013-05-24 16:57   ` Jörn Engel
2013-05-25  8:47     ` Christoph Hellwig
2013-05-24  9:50 ` [PATCH 2/4] blk-timeout: add BLK_EH_SCHEDULED return code Hannes Reinecke
2013-05-24  9:50 ` [PATCH 3/4] scsi: export functions for new fc timeout handler Hannes Reinecke
2013-05-24  9:50 ` [PATCH 4/4] scsi_transport_fc: FC " Hannes Reinecke
2013-05-25  5:08   ` Christoph Hellwig
2013-05-25  9:38     ` Hannes Reinecke
2013-05-25  8:42       ` Christoph Hellwig
2013-05-25  9:55         ` Hannes Reinecke
2013-06-05 21:55           ` Jörn Engel
2013-05-30  8:37 ` [PATCH 0/4] New " Ren Mingxin
2013-05-30 12:01   ` Hannes Reinecke
2013-06-06  9:43 [PATCH 0/4] New SCSI command " Hannes Reinecke
2013-06-06  9:43 ` [PATCH 1/4] scsi: move initialization of scmd->eh_entry Hannes Reinecke
2013-06-06 16:14   ` Jörn Engel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.