All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Reduce error recovery time by reducing use of TURs
@ 2011-05-19 18:41 David Jeffery
  2011-05-24 16:53 ` James Bottomley
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: David Jeffery @ 2011-05-19 18:41 UTC (permalink / raw)
  To: linux-scsi

In error recovery, most scsi error recovery stages will send a TUR command
for every bad command when a driver's error handler reports success.  When
several bad commands to the same device, this results in a device
being probed multiple times.

This becomes very problematic if the device or connection is in a state
where the device still doesn't respond to commands even after a recovery
function returns success.  The error handler must wait for the test
commands to time out.  The time waiting for the redundant commands can
drastically lengthen error recovery.

This patch alters the scsi mid-layer's error routines to send test commands
once per device instead of once per bad command.  This can drastically
lower error recovery time.

Signed-of-by: David Jeffery <djeffery@redhat.com>
---

This is a resend of a patch with some minor whitespace cleanups which I'd
like to have considered for inclusion.

As an extreme example of the problem with the current error recovery,
use scsi_debug configured to drop all commands sent to a device.  It
takes 10 commands 5.5 minutes from starting error recovery to the returning
of errors without the patch.  After the patch, error recovery only
takes 1 minute.  This is do to all error stages only having to wait for the
test commands to time out once per device instead of once for each bad
command.

The patch adds a helper function, scsi_eh_test_devices(), to handle a list
of commands to devices which need to be checked.  If a check for a device is
successful, all commands to a device are completed to the done queue.  If a
check to a device fails, all commands to the failing device are returned to
the work queue for additional error recovery.

scsi_eh_abort_cmds(), scsi_eh_target_reset(), scsi_eh_bus_reset(), and
scsi_eh_host_reset() were all converted to use this helper function.
scsi_eh_bus_device_reset() was not changed as it already has the behavior
of probing once per device.

One other behavior change with this patch is the probing commands occur after
all driver error handlers of a given type are called. e.g.  Instead of abort,
probe, abort, probe... it's now abort, abort, abort, probe... etc.

 scsi_error.c |   84 ++++++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 64 insertions(+), 20 deletions(-)

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index 633c239..5339e89 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -50,6 +50,8 @@
 #define BUS_RESET_SETTLE_TIME   (10)
 #define HOST_RESET_SETTLE_TIME  (10)
 
+static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
+
 /* called with shost->host_lock held */
 void scsi_eh_wakeup(struct Scsi_Host *shost)
 {
@@ -941,6 +943,45 @@ retry_tur:
 }
 
 /**
+ * scsi_eh_test_devices - check if devices are responding from error recovery.
+ * @cmd_list:	scsi commands in error recovery.
+ * @work_q:     queue for commands which still need more error recovery
+ * @done_q:     queue for commands which are finished
+ * @try_stu:    boolean on if a STU command should be tried in addition to TUR.
+ *
+ * Decription:
+ *    Tests if devices are in a working state.  Commands to devices now in
+ *    a working state are sent to the done_q while commands to devices which
+ *    are still failing to respond are returned to the work_q for more
+ *    processing.
+ **/
+static int scsi_eh_test_devices(struct list_head *cmd_list, struct list_head *work_q, struct list_head *done_q, int try_stu)
+{
+	struct scsi_cmnd *scmd, *next;
+	struct scsi_device *sdev;
+	int finish_cmds;
+
+	while (!list_empty(cmd_list)) {
+		scmd = list_entry(cmd_list->next, struct scsi_cmnd, eh_entry);
+		sdev = scmd->device;
+
+		finish_cmds = !scsi_device_online(scmd->device) || 
+			(try_stu && !scsi_eh_try_stu(scmd) && !scsi_eh_tur(scmd)) ||
+			!scsi_eh_tur(scmd);
+			
+		list_for_each_entry_safe(scmd, next, cmd_list, eh_entry)
+			if (scmd->device == sdev) {
+				if (finish_cmds)
+					scsi_eh_finish_cmd(scmd, done_q);
+				else
+					list_move_tail(&scmd->eh_entry, work_q);
+			}
+	}
+	return list_empty(work_q);
+}
+
+
+/**
  * scsi_eh_abort_cmds - abort pending commands.
  * @work_q:	&list_head for pending commands.
  * @done_q:	&list_head for processed commands.
@@ -956,6 +997,7 @@ static int scsi_eh_abort_cmds(struct list_head *work_q,
 			      struct list_head *done_q)
 {
 	struct scsi_cmnd *scmd, *next;
+	LIST_HEAD(check_list);
 	int rtn;
 
 	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
@@ -967,11 +1009,10 @@ static int scsi_eh_abort_cmds(struct list_head *work_q,
 		rtn = scsi_try_to_abort_cmd(scmd->device->host->hostt, scmd);
 		if (rtn == SUCCESS || rtn == FAST_IO_FAIL) {
 			scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD;
-			if (!scsi_device_online(scmd->device) ||
-			    rtn == FAST_IO_FAIL ||
-			    !scsi_eh_tur(scmd)) {
+			if (rtn == FAST_IO_FAIL)
 				scsi_eh_finish_cmd(scmd, done_q);
-			}
+			else
+				list_move_tail(&scmd->eh_entry, &check_list);
 		} else
 			SCSI_LOG_ERROR_RECOVERY(3, printk("%s: aborting"
 							  " cmd failed:"
@@ -980,7 +1021,7 @@ static int scsi_eh_abort_cmds(struct list_head *work_q,
 							  scmd));
 	}
 
-	return list_empty(work_q);
+	return scsi_eh_test_devices(&check_list, work_q, done_q, 0);
 }
 
 /**
@@ -1131,6 +1172,7 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost,
 				struct list_head *done_q)
 {
 	LIST_HEAD(tmp_list);
+	LIST_HEAD(check_list);
 
 	list_splice_init(work_q, &tmp_list);
 
@@ -1155,9 +1197,9 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost,
 			if (scmd_id(scmd) != id)
 				continue;
 
-			if ((rtn == SUCCESS || rtn == FAST_IO_FAIL)
-			    && (!scsi_device_online(scmd->device) ||
-				 rtn == FAST_IO_FAIL || !scsi_eh_tur(scmd)))
+			if (rtn == SUCCESS)
+				list_move_tail(&scmd->eh_entry, &check_list);
+			else if (rtn == FAST_IO_FAIL)
 				scsi_eh_finish_cmd(scmd, done_q);
 			else
 				/* push back on work queue for further processing */
@@ -1165,7 +1207,7 @@ static int scsi_eh_target_reset(struct Scsi_Host *shost,
 		}
 	}
 
-	return list_empty(work_q);
+	return scsi_eh_test_devices(&check_list, work_q, done_q, 0);
 }
 
 /**
@@ -1179,6 +1221,7 @@ static int scsi_eh_bus_reset(struct Scsi_Host *shost,
 			     struct list_head *done_q)
 {
 	struct scsi_cmnd *scmd, *chan_scmd, *next;
+	LIST_HEAD(check_list);
 	unsigned int channel;
 	int rtn;
 
@@ -1210,12 +1253,14 @@ static int scsi_eh_bus_reset(struct Scsi_Host *shost,
 		rtn = scsi_try_bus_reset(chan_scmd);
 		if (rtn == SUCCESS || rtn == FAST_IO_FAIL) {
 			list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
-				if (channel == scmd_channel(scmd))
-					if (!scsi_device_online(scmd->device) ||
-					    rtn == FAST_IO_FAIL ||
-					    !scsi_eh_tur(scmd))
+				if (channel == scmd_channel(scmd)) {
+					if (rtn == FAST_IO_FAIL)
 						scsi_eh_finish_cmd(scmd,
 								   done_q);
+					else
+						list_move_tail(&scmd->eh_entry,
+							       &check_list);
+				}
 			}
 		} else {
 			SCSI_LOG_ERROR_RECOVERY(3, printk("%s: BRST"
@@ -1224,7 +1269,7 @@ static int scsi_eh_bus_reset(struct Scsi_Host *shost,
 							  channel));
 		}
 	}
-	return list_empty(work_q);
+	return scsi_eh_test_devices(&check_list, work_q, done_q, 0);
 }
 
 /**
@@ -1236,6 +1281,7 @@ static int scsi_eh_host_reset(struct list_head *work_q,
 			      struct list_head *done_q)
 {
 	struct scsi_cmnd *scmd, *next;
+	LIST_HEAD(check_list);
 	int rtn;
 
 	if (!list_empty(work_q)) {
@@ -1246,12 +1292,10 @@ static int scsi_eh_host_reset(struct list_head *work_q,
 						  , current->comm));
 
 		rtn = scsi_try_host_reset(scmd);
-		if (rtn == SUCCESS || rtn == FAST_IO_FAIL) {
+		if (rtn == SUCCESS) {
+			list_splice_init(work_q, &check_list);
+		} else if(rtn == FAST_IO_FAIL) {
 			list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
-				if (!scsi_device_online(scmd->device) ||
-				    rtn == FAST_IO_FAIL ||
-				    (!scsi_eh_try_stu(scmd) && !scsi_eh_tur(scmd)) ||
-				    !scsi_eh_tur(scmd))
 					scsi_eh_finish_cmd(scmd, done_q);
 			}
 		} else {
@@ -1260,7 +1304,7 @@ static int scsi_eh_host_reset(struct list_head *work_q,
 							  current->comm));
 		}
 	}
-	return list_empty(work_q);
+	return scsi_eh_test_devices(&check_list, work_q, done_q, 1);
 }
 
 /**

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] Reduce error recovery time by reducing use of TURs
  2011-05-19 18:41 [PATCH] Reduce error recovery time by reducing use of TURs David Jeffery
@ 2011-05-24 16:53 ` James Bottomley
  2011-05-26  5:14 ` Desai, Kashyap
  2011-05-26  8:07 ` Ankit Jain
  2 siblings, 0 replies; 4+ messages in thread
From: James Bottomley @ 2011-05-24 16:53 UTC (permalink / raw)
  To: David Jeffery; +Cc: linux-scsi

On Thu, 2011-05-19 at 14:41 -0400, David Jeffery wrote:
> This is a resend of a patch with some minor whitespace cleanups which I'd
> like to have considered for inclusion.

Not enough, though:

WARNING: line over 80 characters
#53: FILE: drivers/scsi/scsi_error.c:964:
+static int scsi_eh_test_devices(struct list_head *cmd_list, struct
list_head *work_q, struct list_head *done_q, int try_stu)

ERROR: trailing whitespace
#63: FILE: drivers/scsi/scsi_error.c:974:
+^I^Ifinish_cmds = !scsi_device_online(scmd->device) || $

WARNING: line over 80 characters
#64: FILE: drivers/scsi/scsi_error.c:975:
+                       (try_stu && !scsi_eh_try_stu(scmd) && !
scsi_eh_tur(scmd)) ||

ERROR: trailing whitespace
#66: FILE: drivers/scsi/scsi_error.c:977:
+^I^I^I$

ERROR: space required before the open parenthesis '('
#196: FILE: drivers/scsi/scsi_error.c:1303:
+               } else if(rtn == FAST_IO_FAIL) {

total: 3 errors, 2 warnings, 172 lines checked

checkpatch.pl is your friend ...

This is pretty trivial, so I fixed it up manually (I usually only care
about errors, but in this case the warning about the function template
and long line needed fixing to match the file).

James



^ permalink raw reply	[flat|nested] 4+ messages in thread

* RE: [PATCH] Reduce error recovery time by reducing use of TURs
  2011-05-19 18:41 [PATCH] Reduce error recovery time by reducing use of TURs David Jeffery
  2011-05-24 16:53 ` James Bottomley
@ 2011-05-26  5:14 ` Desai, Kashyap
  2011-05-26  8:07 ` Ankit Jain
  2 siblings, 0 replies; 4+ messages in thread
From: Desai, Kashyap @ 2011-05-26  5:14 UTC (permalink / raw)
  To: David Jeffery, linux-scsi



> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
> owner@vger.kernel.org] On Behalf Of David Jeffery
> Sent: Friday, May 20, 2011 12:11 AM
> To: linux-scsi@vger.kernel.org
> Subject: [PATCH] Reduce error recovery time by reducing use of TURs
> 
> In error recovery, most scsi error recovery stages will send a TUR
> command
> for every bad command when a driver's error handler reports success.
> When
> several bad commands to the same device, this results in a device
> being probed multiple times.
> 
> This becomes very problematic if the device or connection is in a state
> where the device still doesn't respond to commands even after a
> recovery
> function returns success.  The error handler must wait for the test
> commands to time out.  The time waiting for the redundant commands can
> drastically lengthen error recovery.
> 
> This patch alters the scsi mid-layer's error routines to send test
> commands
> once per device instead of once per bad command.  This can drastically
> lower error recovery time.
> 
> Signed-of-by: David Jeffery <djeffery@redhat.com>
> ---
> 
> This is a resend of a patch with some minor whitespace cleanups which
> I'd
> like to have considered for inclusion.
> 
> As an extreme example of the problem with the current error recovery,
> use scsi_debug configured to drop all commands sent to a device.  It
> takes 10 commands 5.5 minutes from starting error recovery to the
> returning
> of errors without the patch.  After the patch, error recovery only
> takes 1 minute.  This is do to all error stages only having to wait for
> the
> test commands to time out once per device instead of once for each bad
> command.
> 
> The patch adds a helper function, scsi_eh_test_devices(), to handle a
> list
> of commands to devices which need to be checked.  If a check for a
> device is
> successful, all commands to a device are completed to the done queue.
> If a
> check to a device fails, all commands to the failing device are
> returned to
> the work queue for additional error recovery.
> 
> scsi_eh_abort_cmds(), scsi_eh_target_reset(), scsi_eh_bus_reset(), and
> scsi_eh_host_reset() were all converted to use this helper function.
> scsi_eh_bus_device_reset() was not changed as it already has the
> behavior
> of probing once per device.
> 
> One other behavior change with this patch is the probing commands occur
> after
> all driver error handlers of a given type are called. e.g.  Instead of
> abort,
> probe, abort, probe... it's now abort, abort, abort, probe... etc.

I agree with above change. _BUT_ don't we restrict this new way of handling error recovery only for abort command.
I mean we can leave Target reset, bus reset and Host reset as it is.

> 
>  scsi_error.c |   84 ++++++++++++++++++++++++++++++++++++++++++++------
> ---------
>  1 file changed, 64 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
> index 633c239..5339e89 100644
> --- a/drivers/scsi/scsi_error.c
> +++ b/drivers/scsi/scsi_error.c
> @@ -50,6 +50,8 @@
>  #define BUS_RESET_SETTLE_TIME   (10)
>  #define HOST_RESET_SETTLE_TIME  (10)
> 
> +static int scsi_eh_try_stu(struct scsi_cmnd *scmd);
> +
>  /* called with shost->host_lock held */
>  void scsi_eh_wakeup(struct Scsi_Host *shost)
>  {
> @@ -941,6 +943,45 @@ retry_tur:
>  }
> 
>  /**
> + * scsi_eh_test_devices - check if devices are responding from error
> recovery.
> + * @cmd_list:	scsi commands in error recovery.
> + * @work_q:     queue for commands which still need more error
> recovery
> + * @done_q:     queue for commands which are finished
> + * @try_stu:    boolean on if a STU command should be tried in
> addition to TUR.
> + *
> + * Decription:
> + *    Tests if devices are in a working state.  Commands to devices
> now in
> + *    a working state are sent to the done_q while commands to devices
> which
> + *    are still failing to respond are returned to the work_q for more
> + *    processing.
> + **/
> +static int scsi_eh_test_devices(struct list_head *cmd_list, struct
> list_head *work_q, struct list_head *done_q, int try_stu)
> +{
> +	struct scsi_cmnd *scmd, *next;
> +	struct scsi_device *sdev;
> +	int finish_cmds;
> +
> +	while (!list_empty(cmd_list)) {
> +		scmd = list_entry(cmd_list->next, struct scsi_cmnd,
> eh_entry);
> +		sdev = scmd->device;
> +
> +		finish_cmds = !scsi_device_online(scmd->device) ||
> +			(try_stu && !scsi_eh_try_stu(scmd) &&
> !scsi_eh_tur(scmd)) ||
> +			!scsi_eh_tur(scmd);
> +
> +		list_for_each_entry_safe(scmd, next, cmd_list, eh_entry)
> +			if (scmd->device == sdev) {
> +				if (finish_cmds)
> +					scsi_eh_finish_cmd(scmd, done_q);
> +				else
> +					list_move_tail(&scmd->eh_entry, work_q);
> +			}
> +	}
> +	return list_empty(work_q);
> +}
> +
> +
> +/**
>   * scsi_eh_abort_cmds - abort pending commands.
>   * @work_q:	&list_head for pending commands.
>   * @done_q:	&list_head for processed commands.
> @@ -956,6 +997,7 @@ static int scsi_eh_abort_cmds(struct list_head
> *work_q,
>  			      struct list_head *done_q)
>  {
>  	struct scsi_cmnd *scmd, *next;
> +	LIST_HEAD(check_list);
>  	int rtn;
> 
>  	list_for_each_entry_safe(scmd, next, work_q, eh_entry) {
> @@ -967,11 +1009,10 @@ static int scsi_eh_abort_cmds(struct list_head
> *work_q,
>  		rtn = scsi_try_to_abort_cmd(scmd->device->host->hostt,
> scmd);
>  		if (rtn == SUCCESS || rtn == FAST_IO_FAIL) {
>  			scmd->eh_eflags &= ~SCSI_EH_CANCEL_CMD;
> -			if (!scsi_device_online(scmd->device) ||
> -			    rtn == FAST_IO_FAIL ||
> -			    !scsi_eh_tur(scmd)) {
> +			if (rtn == FAST_IO_FAIL)
>  				scsi_eh_finish_cmd(scmd, done_q);
> -			}
> +			else
> +				list_move_tail(&scmd->eh_entry, &check_list);
>  		} else
>  			SCSI_LOG_ERROR_RECOVERY(3, printk("%s: aborting"
>  							  " cmd failed:"
> @@ -980,7 +1021,7 @@ static int scsi_eh_abort_cmds(struct list_head
> *work_q,
>  							  scmd));
>  	}
> 
> -	return list_empty(work_q);
> +	return scsi_eh_test_devices(&check_list, work_q, done_q, 0);
>  }
> 
>  /**
> @@ -1131,6 +1172,7 @@ static int scsi_eh_target_reset(struct Scsi_Host
> *shost,
>  				struct list_head *done_q)
>  {
>  	LIST_HEAD(tmp_list);
> +	LIST_HEAD(check_list);
> 
>  	list_splice_init(work_q, &tmp_list);
> 
> @@ -1155,9 +1197,9 @@ static int scsi_eh_target_reset(struct Scsi_Host
> *shost,
>  			if (scmd_id(scmd) != id)
>  				continue;
See above check will make sure only one TUR per device will be send.  There will not be redundant TUR in this case same as Abort call.

> 
> -			if ((rtn == SUCCESS || rtn == FAST_IO_FAIL)
> -			    && (!scsi_device_online(scmd->device) ||
> -				 rtn == FAST_IO_FAIL || !scsi_eh_tur(scmd)))
> +			if (rtn == SUCCESS)
> +				list_move_tail(&scmd->eh_entry, &check_list);
> +			else if (rtn == FAST_IO_FAIL)
>  				scsi_eh_finish_cmd(scmd, done_q);
>  			else
>  				/* push back on work queue for further
> processing */
> @@ -1165,7 +1207,7 @@ static int scsi_eh_target_reset(struct Scsi_Host
> *shost,
>  		}
>  	}
> 
> -	return list_empty(work_q);
> +	return scsi_eh_test_devices(&check_list, work_q, done_q, 0);

We can avoid this change..(considering fact that there is no redundant TUR same as task abort..!
Similarly same comment for bus reset and host reset call.

>  }
> 
>  /**
> @@ -1179,6 +1221,7 @@ static int scsi_eh_bus_reset(struct Scsi_Host
> *shost,
>  			     struct list_head *done_q)
>  {
>  	struct scsi_cmnd *scmd, *chan_scmd, *next;
> +	LIST_HEAD(check_list);
>  	unsigned int channel;
>  	int rtn;
> 
> @@ -1210,12 +1253,14 @@ static int scsi_eh_bus_reset(struct Scsi_Host
> *shost,
>  		rtn = scsi_try_bus_reset(chan_scmd);
>  		if (rtn == SUCCESS || rtn == FAST_IO_FAIL) {
>  			list_for_each_entry_safe(scmd, next, work_q,
> eh_entry) {
> -				if (channel == scmd_channel(scmd))
> -					if (!scsi_device_online(scmd->device) ||
> -					    rtn == FAST_IO_FAIL ||
> -					    !scsi_eh_tur(scmd))
> +				if (channel == scmd_channel(scmd)) {
> +					if (rtn == FAST_IO_FAIL)
>  						scsi_eh_finish_cmd(scmd,
>  								   done_q);
> +					else
> +						list_move_tail(&scmd->eh_entry,
> +							       &check_list);
> +				}
>  			}
>  		} else {
>  			SCSI_LOG_ERROR_RECOVERY(3, printk("%s: BRST"
> @@ -1224,7 +1269,7 @@ static int scsi_eh_bus_reset(struct Scsi_Host
> *shost,
>  							  channel));
>  		}
>  	}
> -	return list_empty(work_q);
> +	return scsi_eh_test_devices(&check_list, work_q, done_q, 0);
>  }
> 
>  /**
> @@ -1236,6 +1281,7 @@ static int scsi_eh_host_reset(struct list_head
> *work_q,
>  			      struct list_head *done_q)
>  {
>  	struct scsi_cmnd *scmd, *next;
> +	LIST_HEAD(check_list);
>  	int rtn;
> 
>  	if (!list_empty(work_q)) {
> @@ -1246,12 +1292,10 @@ static int scsi_eh_host_reset(struct list_head
> *work_q,
>  						  , current->comm));
> 
>  		rtn = scsi_try_host_reset(scmd);
> -		if (rtn == SUCCESS || rtn == FAST_IO_FAIL) {
> +		if (rtn == SUCCESS) {
> +			list_splice_init(work_q, &check_list);
> +		} else if(rtn == FAST_IO_FAIL) {
>  			list_for_each_entry_safe(scmd, next, work_q,
> eh_entry) {
> -				if (!scsi_device_online(scmd->device) ||
> -				    rtn == FAST_IO_FAIL ||
> -				    (!scsi_eh_try_stu(scmd) &&
> !scsi_eh_tur(scmd)) ||
> -				    !scsi_eh_tur(scmd))
>  					scsi_eh_finish_cmd(scmd, done_q);
>  			}
>  		} else {
> @@ -1260,7 +1304,7 @@ static int scsi_eh_host_reset(struct list_head
> *work_q,
>  							  current->comm));
>  		}
>  	}
> -	return list_empty(work_q);
> +	return scsi_eh_test_devices(&check_list, work_q, done_q, 1);
>  }
> 
>  /**
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Reduce error recovery time by reducing use of TURs
  2011-05-19 18:41 [PATCH] Reduce error recovery time by reducing use of TURs David Jeffery
  2011-05-24 16:53 ` James Bottomley
  2011-05-26  5:14 ` Desai, Kashyap
@ 2011-05-26  8:07 ` Ankit Jain
  2 siblings, 0 replies; 4+ messages in thread
From: Ankit Jain @ 2011-05-26  8:07 UTC (permalink / raw)
  To: David Jeffery; +Cc: linux-scsi

On 05/20/2011 12:11 AM, David Jeffery wrote:
>...
>  
>  /**
> + * scsi_eh_test_devices - check if devices are responding from error recovery.
> + * @cmd_list:	scsi commands in error recovery.
> + * @work_q:     queue for commands which still need more error recovery
> + * @done_q:     queue for commands which are finished
> + * @try_stu:    boolean on if a STU command should be tried in addition to TUR.
> + *
> + * Decription:
> + *    Tests if devices are in a working state.  Commands to devices now in
> + *    a working state are sent to the done_q while commands to devices which
> + *    are still failing to respond are returned to the work_q for more
> + *    processing.
> + **/
> +static int scsi_eh_test_devices(struct list_head *cmd_list, struct list_head *work_q, struct list_head *done_q, int try_stu)
> +{
> +	struct scsi_cmnd *scmd, *next;
> +	struct scsi_device *sdev;
> +	int finish_cmds;
> +
> +	while (!list_empty(cmd_list)) {
> +		scmd = list_entry(cmd_list->next, struct scsi_cmnd, eh_entry);
> +		sdev = scmd->device;
> +
> +		finish_cmds = !scsi_device_online(scmd->device) || 
> +			(try_stu && !scsi_eh_try_stu(scmd) && !scsi_eh_tur(scmd)) ||
> +			!scsi_eh_tur(scmd);

If "(try_stu && !scsi_eh_try_stu(scmd) && !scsi_eh_tur(scmd))" is false,
then this could be sending the TUR command twice. I'm very new to this,
so am not sure what the cost/implication of doing that might be, but it
seems like, that wasn't the intention.

Thanks,
-- 
Ankit Jain
SUSE Labs

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-05-26  8:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-19 18:41 [PATCH] Reduce error recovery time by reducing use of TURs David Jeffery
2011-05-24 16:53 ` James Bottomley
2011-05-26  5:14 ` Desai, Kashyap
2011-05-26  8:07 ` Ankit Jain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.