All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset
@ 2012-12-11  8:23 Hannes Reinecke
  2012-12-11 12:46 ` Martin Peschke
  2013-03-07 19:19 ` Mike Christie
  0 siblings, 2 replies; 13+ messages in thread
From: Hannes Reinecke @ 2012-12-11  8:23 UTC (permalink / raw)
  To: linux-scsi
  Cc: Hannes Reinecke, Mike Christie, James Smart, Andrew Vasquez,
	Chad Dupuis, Krishna C Gudipati, James Bottomley

'Bus reset' is not really applicable to FibreChannel, as
the concept of a bus doesn't really apply. All FC driver
simulate a 'bus reset' by sending a target reset to each
attached remote port, causing error handling to spill
over to unaffected devices.
In addition, 'Target reset' has been removed from SAM
since SAM-3.

Instead, SAM-5 proposes an REMOVE I_T NEXUS TMF,
which just removes the I_T nexus, thereby avoiding
any spill-over to unaffected ports.

This patch implements fc_eh_it_nexus_loss_handler(),
which attempts to reset the I_T nexus to the remote
port.

For I_T nexus reset we first check if the port
is already blocked, then call a new LLDD-provided
'eh_it_nexus_loss' callback to allow the LLDD
to cleanup any outstanding resources or abort I/O.
If the callback succeeds the dev_loss_tmo
mechanism is called with '-1' fast fail timeout,
which causes fast_io_fail_tmo to be skipped.
Otherwise the dev_loss_tmo mechanism is called
with a '0' fast fail timeout, causing any
outstanding I/O to be aborted immediately.
The port is then set to 'blocked' to indicate that
no further I/O should be issued to this port.
Finally the standard dev_loss_tmo mechanism will
eventually clear up any outstanding resources.

fc_eh_it_nexus_loss_handler() is invoked as the
eh_target_reset_handler() callback and the
eh_bus_reset_handler() is removed.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: James Smart <james.smart@emulex.com>
Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>
Cc: Chad Dupuis <chad.dupuis@qlogic.com>
Cc: Krishna C Gudipati <kgudipat@brocade.com>
Cc: James Bottomley <jbottomley@parallels.com>
---
 drivers/scsi/bfa/bfad_im.c       |    6 ++-
 drivers/scsi/lpfc/lpfc_scsi.c    |    8 ++--
 drivers/scsi/qla2xxx/qla_os.c    |    4 +-
 drivers/scsi/scsi_transport_fc.c |   63 +++++++++++++++++++++++++++++++++++---
 include/scsi/scsi_transport_fc.h |    2 +
 5 files changed, 70 insertions(+), 13 deletions(-)

diff --git a/drivers/scsi/bfa/bfad_im.c b/drivers/scsi/bfa/bfad_im.c
index 8f92732..fd1fc4a 100644
--- a/drivers/scsi/bfa/bfad_im.c
+++ b/drivers/scsi/bfa/bfad_im.c
@@ -793,7 +793,8 @@ struct scsi_host_template bfad_im_scsi_host_template = {
 	.queuecommand = bfad_im_queuecommand,
 	.eh_abort_handler = bfad_im_abort_handler,
 	.eh_device_reset_handler = bfad_im_reset_lun_handler,
-	.eh_bus_reset_handler = bfad_im_reset_bus_handler,
+	.eh_target_reset_handler = fc_eh_it_nexus_loss_handler,
+	.eh_bus_reset_handler = NULL,
 
 	.slave_alloc = bfad_im_slave_alloc,
 	.slave_configure = bfad_im_slave_configure,
@@ -815,7 +816,8 @@ struct scsi_host_template bfad_im_vport_template = {
 	.queuecommand = bfad_im_queuecommand,
 	.eh_abort_handler = bfad_im_abort_handler,
 	.eh_device_reset_handler = bfad_im_reset_lun_handler,
-	.eh_bus_reset_handler = bfad_im_reset_bus_handler,
+	.eh_target_reset_handler = fc_eh_it_nexus_loss_handler,
+	.eh_bus_reset_handler = NULL,
 
 	.slave_alloc = bfad_im_slave_alloc,
 	.slave_configure = bfad_im_slave_configure,
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index 60e5a17..c4e2788 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -5135,8 +5135,8 @@ struct scsi_host_template lpfc_template = {
 	.queuecommand		= lpfc_queuecommand,
 	.eh_abort_handler	= lpfc_abort_handler,
 	.eh_device_reset_handler = lpfc_device_reset_handler,
-	.eh_target_reset_handler = lpfc_target_reset_handler,
-	.eh_bus_reset_handler	= lpfc_bus_reset_handler,
+	.eh_target_reset_handler = fc_eh_it_nexus_loss_handler,
+	.eh_bus_reset_handler	= NULL,
 	.eh_host_reset_handler  = lpfc_host_reset_handler,
 	.slave_alloc		= lpfc_slave_alloc,
 	.slave_configure	= lpfc_slave_configure,
@@ -5159,8 +5159,8 @@ struct scsi_host_template lpfc_vport_template = {
 	.queuecommand		= lpfc_queuecommand,
 	.eh_abort_handler	= lpfc_abort_handler,
 	.eh_device_reset_handler = lpfc_device_reset_handler,
-	.eh_target_reset_handler = lpfc_target_reset_handler,
-	.eh_bus_reset_handler	= lpfc_bus_reset_handler,
+	.eh_target_reset_handler = fc_eh_it_nexus_loss_handler,
+	.eh_bus_reset_handler	= NULL,
 	.slave_alloc		= lpfc_slave_alloc,
 	.slave_configure	= lpfc_slave_configure,
 	.slave_destroy		= lpfc_slave_destroy,
diff --git a/drivers/scsi/qla2xxx/qla_os.c b/drivers/scsi/qla2xxx/qla_os.c
index 3a1661c..c59e681 100644
--- a/drivers/scsi/qla2xxx/qla_os.c
+++ b/drivers/scsi/qla2xxx/qla_os.c
@@ -245,8 +245,8 @@ struct scsi_host_template qla2xxx_driver_template = {
 
 	.eh_abort_handler	= qla2xxx_eh_abort,
 	.eh_device_reset_handler = qla2xxx_eh_device_reset,
-	.eh_target_reset_handler = qla2xxx_eh_target_reset,
-	.eh_bus_reset_handler	= qla2xxx_eh_bus_reset,
+	.eh_target_reset_handler = fc_eh_it_nexus_loss_handler,
+	.eh_bus_reset_handler	= NULL,
 	.eh_host_reset_handler	= qla2xxx_eh_host_reset,
 
 	.slave_configure	= qla2xxx_slave_configure,
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/scsi_transport_fc.c
index e894ca7..da647d3 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -2971,7 +2971,7 @@ EXPORT_SYMBOL(fc_remote_port_add);
  *	This routine assumes no locks are held on entry.
  */
 void
-fc_remote_port_delete(struct fc_rport  *rport)
+__fc_remote_port_delete(struct fc_rport *rport, int fast_io_fail_tmo)
 {
 	struct Scsi_Host *shost = rport_to_shost(rport);
 	unsigned long timeout = rport->dev_loss_tmo;
@@ -3018,14 +3018,19 @@ fc_remote_port_delete(struct fc_rport  *rport)
 	scsi_target_block(&rport->dev);
 
 	/* see if we need to kill io faster than waiting for device loss */
-	if ((rport->fast_io_fail_tmo != -1) &&
-	    (rport->fast_io_fail_tmo < timeout))
+	if ((fast_io_fail_tmo != -1) && (fast_io_fail_tmo < timeout))
 		fc_queue_devloss_work(shost, &rport->fail_io_work,
-					rport->fast_io_fail_tmo * HZ);
+					fast_io_fail_tmo * HZ);
 
 	/* cap the length the devices can be blocked until they are deleted */
 	fc_queue_devloss_work(shost, &rport->dev_loss_work, timeout * HZ);
 }
+
+void
+fc_remote_port_delete(struct fc_rport  *rport)
+{
+	__fc_remote_port_delete(rport, rport->fast_io_fail_tmo);
+}
 EXPORT_SYMBOL(fc_remote_port_delete);
 
 /**
@@ -3266,8 +3271,8 @@ fc_timeout_fail_rport_io(struct work_struct *work)
 	if (rport->port_state != FC_PORTSTATE_BLOCKED)
 		return;
 
-	rport->flags |= FC_RPORT_FAST_FAIL_TIMEDOUT;
 	fc_terminate_rport_io(rport);
+	rport->flags |= FC_RPORT_FAST_FAIL_TIMEDOUT;
 }
 
 /**
@@ -3332,6 +3337,54 @@ int fc_block_scsi_eh(struct scsi_cmnd *cmnd)
 EXPORT_SYMBOL(fc_block_scsi_eh);
 
 /**
+ * fc_eh_it_nexus_loss_handler - Invoke REMOVE I_T NEXUS TMF
+ * @cmnd: SCSI command that scsi_eh is trying to recover
+ *
+ * This routine can be called from a FC LLD scsi_eh callback. It
+ * attempts to perform an REMOVE I_T NEXUS transport management
+ * function by failing all outstanding commands and invoke
+ * dev_loss_tmo() on the affected port.
+ *
+ * Returns: SUCCESS if all commands on the remote port have been
+ *	    terminated or the port is in PORTSTATE_ONLINE again
+ *	    FAST_IO_FAIL if the fast_io_fail_tmo fired and there
+ *	    is still I/O in flight
+ *	    FAILED otherwise.
+ */
+int
+fc_eh_it_nexus_loss_handler(struct scsi_cmnd *cmnd)
+{
+	struct fc_internal *i = to_fc_internal(cmnd->device->host->transportt);
+	struct scsi_target *starget = scsi_target(cmnd->device);
+	struct fc_rport *rport = starget_to_rport(starget);
+	int ret;
+
+	ret = fc_block_scsi_eh(cmnd);
+	if (i->f->eh_it_nexus_loss)
+		ret = i->f->eh_it_nexus_loss(cmnd);
+
+	/* FAST_IO_FAIL indicates the port is already blocked */
+	if (ret == FAST_IO_FAIL)
+		return ret;
+	if (ret == SUCCESS)
+		/* All outstanding I/O has been aborted */
+		__fc_remote_port_delete(rport, -1);
+	else {
+		/* Failed to abort outstanding I/O, trigger FAST_IO_FAIL */
+		__fc_remote_port_delete(rport, 0);
+		ret = fc_block_scsi_eh(cmnd);
+	}
+	if (ret != FAST_IO_FAIL) {
+		if (rport->port_state == FC_PORTSTATE_ONLINE)
+			ret = SUCCESS;
+		else
+			ret = FAILED;
+	}
+	return ret;
+}
+EXPORT_SYMBOL(fc_eh_it_nexus_loss_handler);
+
+/**
  * fc_vport_setup - allocates and creates a FC virtual port.
  * @shost:	scsi host the virtual port is connected to.
  * @channel:	Channel on shost port connected to.
diff --git a/include/scsi/scsi_transport_fc.h b/include/scsi/scsi_transport_fc.h
index b797e8f..17e2968 100644
--- a/include/scsi/scsi_transport_fc.h
+++ b/include/scsi/scsi_transport_fc.h
@@ -684,6 +684,7 @@ struct fc_function_template {
 
 	void    (*dev_loss_tmo_callbk)(struct fc_rport *);
 	void	(*terminate_rport_io)(struct fc_rport *);
+	int	(*eh_it_nexus_loss)(struct scsi_cmnd *);
 
 	void	(*set_vport_symbolic_name)(struct fc_vport *);
 	int  	(*vport_create)(struct fc_vport *, bool);
@@ -851,5 +852,6 @@ struct fc_vport *fc_vport_create(struct Scsi_Host *shost, int channel,
 		struct fc_vport_identifiers *);
 int fc_vport_terminate(struct fc_vport *vport);
 int fc_block_scsi_eh(struct scsi_cmnd *cmnd);
+int fc_eh_it_nexus_loss_handler(struct scsi_cmnd *cmnd);
 
 #endif /* SCSI_TRANSPORT_FC_H */
-- 
1.7.4.2


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset
  2012-12-11  8:23 [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset Hannes Reinecke
@ 2012-12-11 12:46 ` Martin Peschke
  2012-12-11 14:06   ` Hannes Reinecke
  2013-03-07 19:19 ` Mike Christie
  1 sibling, 1 reply; 13+ messages in thread
From: Martin Peschke @ 2012-12-11 12:46 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: linux-scsi

Hello Hannes,

> fc_eh_it_nexus_loss_handler() is invoked as the
> eh_target_reset_handler() callback and the
> eh_bus_reset_handler() is removed.

lpfc_target_reset_handler(), which is replaced by your patch, used to
issue a TARGET_RESET task management function over FCP in the
eh_target_reset_handler() callback. What's wrong with that?

Martin



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset
  2012-12-11 12:46 ` Martin Peschke
@ 2012-12-11 14:06   ` Hannes Reinecke
  0 siblings, 0 replies; 13+ messages in thread
From: Hannes Reinecke @ 2012-12-11 14:06 UTC (permalink / raw)
  To: Martin Peschke; +Cc: linux-scsi

On 12/11/2012 01:46 PM, Martin Peschke wrote:
> Hello Hannes,
>
>> fc_eh_it_nexus_loss_handler() is invoked as the
>> eh_target_reset_handler() callback and the
>> eh_bus_reset_handler() is removed.
>
> lpfc_target_reset_handler(), which is replaced by your patch, used to
> issue a TARGET_RESET task management function over FCP in the
> eh_target_reset_handler() callback. What's wrong with that?
>
Nothing per se.
Only that the TARGET_RESET TMF has been removed from SAM-3/FCP-3 
onwards, so there might not be any functionality behind it.
But drivers can supply the functionality via ->eh_it_nexus_loss 
callback.

I didn't want to touch the existing eh_target_reset_handler myself
as I'm not familiar with the firmware specifics.
That is being left as an exercise to the reader :-)

The main point here is that we're emulating REMOVE I_T NEXUS by 
setting the port state to BLOCKED and invoke dev_loss_tmo.
This will prevent any further I/O to be send down.
With the original handler the port state wasn't modified,
which led to excessive recovery times when no RSCN was received.

And yes, I had several bug reports now where the HBA did not receive 
RSCNs, either due to a switch malfunction or due to an error injection.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset
  2012-12-11  8:23 [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset Hannes Reinecke
  2012-12-11 12:46 ` Martin Peschke
@ 2013-03-07 19:19 ` Mike Christie
  2013-03-07 20:13   ` Jeremy Linton
  1 sibling, 1 reply; 13+ messages in thread
From: Mike Christie @ 2013-03-07 19:19 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: linux-scsi, James Smart, Andrew Vasquez, Chad Dupuis,
	Krishna C Gudipati, James Bottomley

Sorry for the late reply.

On 12/11/2012 02:23 AM, Hannes Reinecke wrote:
> @@ -793,7 +793,8 @@ struct scsi_host_template bfad_im_scsi_host_template = {
>  	.queuecommand = bfad_im_queuecommand,
>  	.eh_abort_handler = bfad_im_abort_handler,
>  	.eh_device_reset_handler = bfad_im_reset_lun_handler,
> -	.eh_bus_reset_handler = bfad_im_reset_bus_handler,
> +	.eh_target_reset_handler = fc_eh_it_nexus_loss_handler,
> +	.eh_bus_reset_handler = NULL,

Don't need to set to NULL in the final patch, and don't forget to send a
patch to remove all the code we do not need anymore :)


> +fc_eh_it_nexus_loss_handler(struct scsi_cmnd *cmnd)
> +{
> +	struct fc_internal *i = to_fc_internal(cmnd->device->host->transportt);
> +	struct scsi_target *starget = scsi_target(cmnd->device);
> +	struct fc_rport *rport = starget_to_rport(starget);
> +	int ret;
> +
> +	ret = fc_block_scsi_eh(cmnd);
> +	if (i->f->eh_it_nexus_loss)
> +		ret = i->f->eh_it_nexus_loss(cmnd);
> +
> +	/* FAST_IO_FAIL indicates the port is already blocked */
> +	if (ret == FAST_IO_FAIL)
> +		return ret;
> +	if (ret == SUCCESS)
> +		/* All outstanding I/O has been aborted */
> +		__fc_remote_port_delete(rport, -1);
> +	else {
> +		/* Failed to abort outstanding I/O, trigger FAST_IO_FAIL */
> +		__fc_remote_port_delete(rport, 0);

I think it looks ok from a high level, but I am not sure how the drivers
are working here.

What happens for lpfc? It seems __fc_remote_port_delete ends up calling
the fast io fail code right away and that sets
FC_RPORT_FAST_FAIL_TIMEDOUT. We will then call lpfc_terminate_rport_io
which only will send aborts for the commands. We will then call
fc_block_scsi_eh above and that returns FAST_IO_FAIL and we will pass
that back up to the scsi eh right away.

But it seems lpfc_terminate_rport_io does not wait for the abort
reposnses and clean up the affected scsi_cmnds, and it does not seem to
do something to prevent lpfc from touching affected scsi_cmnds, does it
(I could not find the code)? If lpfc ends up touching a scsi_cmnd after
we have return FAST_IO_FAIL from this function then both lpfc and some
other code could be using the same scsi_cmnd struct.


For qla2xxx, it seems qla2x00_terminate_rport_io aborts commands, but it
looks like there is a small race where if some other thread was actually
completing the command already, then that thread could be touching the
scsi command, but this function could return and the scsi eh could end
up giving the command to some other driver or retrying while the other
thread was still touching it.

It also seems like there is a race where since
qla2x00_terminate_rport_io also calls the logout functions for the port,
then if that path was fast enough it could it lead to
fc_remote_port_delete getting called by qla2xxx while
fc_eh_it_nexus_loss_handler's call to __fc_remote_port_delete was still
running?


> +		ret = fc_block_scsi_eh(cmnd);
> +	}
> +	if (ret != FAST_IO_FAIL) {
> +		if (rport->port_state == FC_PORTSTATE_ONLINE)
> +			ret = SUCCESS;
> +		else
> +			ret = FAILED;
> +	}
> +	return ret;
> +}

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset
  2013-03-07 19:19 ` Mike Christie
@ 2013-03-07 20:13   ` Jeremy Linton
  2013-03-07 20:20     ` Mike Christie
  2013-03-07 21:44     ` Douglas Gilbert
  0 siblings, 2 replies; 13+ messages in thread
From: Jeremy Linton @ 2013-03-07 20:13 UTC (permalink / raw)
  To: Mike Christie
  Cc: Hannes Reinecke, linux-scsi, James Smart, Andrew Vasquez,
	Chad Dupuis, Krishna C Gudipati, James Bottomley

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 3/7/2013 1:19 PM, Mike Christie wrote:
> What happens for lpfc? It seems __fc_remote_port_delete ends up calling the
> fast io fail code right away and that sets FC_RPORT_FAST_FAIL_TIMEDOUT. We
> will then call lpfc_terminate_rport_io which only will send aborts for the
> commands. We will then call fc_block_scsi_eh above and that returns
> FAST_IO_FAIL and we will pass that back up to the scsi eh right away.

	
	For lpfc, you never get to the code. Or rather when I was testing it, I
couldn't find any way to propagate an error beyond the initial
lpfc_reset_flush_io_context() call in lpfc_device_reset_handler().

	That call pretty much always returns success indpependent of the remote
device because the firmware acks the context clear aborts, resulting in the
outstanding iocb count being zero (independent of both the mid layer status
and the actual device state).
	
	Result: all the code beyond the device reset handler never gets called.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJROPTfAAoJEL5i86xrzcy7MSMIAKaUZV1sfE55/n95b28WTdAS
7HdUechq5JRh2jqW+PVQub3iADgjl5RZkj8T3vNTZgzR9pcQ6NE/qdkwho+p29Wx
enBa68HMosO+oiqPVSz7mmyuOsubB/DxPC3D+5ODu3nTJNMBxE4wYgdfGYsXVZS7
f/HCLo0Ysg7SBzTBQKvk0E1UtMJv1miEsIgxxqYSvOAOcHtKwUaYtCclE2z9egby
AnyVV1UrVa/cI8R4w0nArnyLCrLzG4IVAMByyb0KAQ3NKOdxGPqxPTkoY6GEpcQ9
GxzoZVWerGbzdjYXz2gckiN8oonBIB3esrrOTyq14sTqfOxtynH+8X3qS2uRFhg=
=t9Gx
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset
  2013-03-07 20:13   ` Jeremy Linton
@ 2013-03-07 20:20     ` Mike Christie
  2013-03-07 20:24       ` Mike Christie
  2013-03-07 20:35       ` Jeremy Linton
  2013-03-07 21:44     ` Douglas Gilbert
  1 sibling, 2 replies; 13+ messages in thread
From: Mike Christie @ 2013-03-07 20:20 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: Hannes Reinecke, linux-scsi, James Smart, Andrew Vasquez,
	Chad Dupuis, Krishna C Gudipati, James Bottomley

On 03/07/2013 02:13 PM, Jeremy Linton wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 3/7/2013 1:19 PM, Mike Christie wrote:
>> What happens for lpfc? It seems __fc_remote_port_delete ends up calling the
>> fast io fail code right away and that sets FC_RPORT_FAST_FAIL_TIMEDOUT. We
>> will then call lpfc_terminate_rport_io which only will send aborts for the
>> commands. We will then call fc_block_scsi_eh above and that returns
>> FAST_IO_FAIL and we will pass that back up to the scsi eh right away.
> 
> 	
> 	For lpfc, you never get to the code. Or rather when I was testing it, I
> couldn't find any way to propagate an error beyond the initial
> lpfc_reset_flush_io_context() call in lpfc_device_reset_handler().
> 
> 	That call pretty much always returns success indpependent of the remote
> device because the firmware acks the context clear aborts, resulting in the
> outstanding iocb count being zero (independent of both the mid layer status
> and the actual device state).
> 	

Your lpfc patch fixes that right?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset
  2013-03-07 20:20     ` Mike Christie
@ 2013-03-07 20:24       ` Mike Christie
  2013-03-07 20:35       ` Jeremy Linton
  1 sibling, 0 replies; 13+ messages in thread
From: Mike Christie @ 2013-03-07 20:24 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: Hannes Reinecke, linux-scsi, James Smart, Andrew Vasquez,
	Chad Dupuis, Krishna C Gudipati, James Bottomley

On 03/07/2013 02:20 PM, Mike Christie wrote:
> On 03/07/2013 02:13 PM, Jeremy Linton wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> On 3/7/2013 1:19 PM, Mike Christie wrote:
>>> What happens for lpfc? It seems __fc_remote_port_delete ends up calling the
>>> fast io fail code right away and that sets FC_RPORT_FAST_FAIL_TIMEDOUT. We
>>> will then call lpfc_terminate_rport_io which only will send aborts for the
>>> commands. We will then call fc_block_scsi_eh above and that returns
>>> FAST_IO_FAIL and we will pass that back up to the scsi eh right away.
>>
>> 	
>> 	For lpfc, you never get to the code. Or rather when I was testing it, I
>> couldn't find any way to propagate an error beyond the initial
>> lpfc_reset_flush_io_context() call in lpfc_device_reset_handler().
>>
>> 	That call pretty much always returns success indpependent of the remote
>> device because the firmware acks the context clear aborts, resulting in the
>> outstanding iocb count being zero (independent of both the mid layer status
>> and the actual device state).
>> 	
> 
> Your lpfc patch fixes that right?
> 

Nevermind. Found your patch. It looks like it does fix that problem.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset
  2013-03-07 20:20     ` Mike Christie
  2013-03-07 20:24       ` Mike Christie
@ 2013-03-07 20:35       ` Jeremy Linton
  2013-03-11 17:05         ` Hannes Reinecke
  1 sibling, 1 reply; 13+ messages in thread
From: Jeremy Linton @ 2013-03-07 20:35 UTC (permalink / raw)
  To: Mike Christie
  Cc: Hannes Reinecke, linux-scsi, James Smart, Andrew Vasquez,
	Chad Dupuis, Krishna C Gudipati, James Bottomley

On 3/7/2013 2:20 PM, Mike Christie wrote:
> On 03/07/2013 02:13 PM, Jeremy Linton wrote:
>> 	For lpfc, you never get to the code. Or rather when I was testing it, I
>> couldn't find any way to propagate an error beyond the initial
>> lpfc_reset_flush_io_context() call in lpfc_device_reset_handler().
>>
>> 	That call pretty much always returns success indpependent of the remote
>> device because the firmware acks the context clear aborts, resulting in the
>> outstanding iocb count being zero (independent of both the mid layer status
>> and the actual device state).
>> 	
> 
> Your lpfc patch fixes that right?


	Yes. It allows the device reset to fail if the device doesn't respond to the
task mgmt request, or rejects it, etc.

	It doesn't unjam the commands that get aborted by the flush_io_context() call.
Those have to depend on their timeouts. That is another patch...




	



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset
  2013-03-07 20:13   ` Jeremy Linton
  2013-03-07 20:20     ` Mike Christie
@ 2013-03-07 21:44     ` Douglas Gilbert
  1 sibling, 0 replies; 13+ messages in thread
From: Douglas Gilbert @ 2013-03-07 21:44 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: Mike Christie, Hannes Reinecke, linux-scsi, James Smart,
	Andrew Vasquez, Chad Dupuis, Krishna C Gudipati, James Bottomley

On 13-03-07 03:13 PM, Jeremy Linton wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 3/7/2013 1:19 PM, Mike Christie wrote:
>> What happens for lpfc? It seems __fc_remote_port_delete ends up calling the
>> fast io fail code right away and that sets FC_RPORT_FAST_FAIL_TIMEDOUT. We
>> will then call lpfc_terminate_rport_io which only will send aborts for the
>> commands. We will then call fc_block_scsi_eh above and that returns
>> FAST_IO_FAIL and we will pass that back up to the scsi eh right away.
>
> 	
> 	For lpfc, you never get to the code. Or rather when I was testing it, I
> couldn't find any way to propagate an error beyond the initial
> lpfc_reset_flush_io_context() call in lpfc_device_reset_handler().
>
> 	That call pretty much always returns success indpependent of the remote
> device because the firmware acks the context clear aborts, resulting in the
> outstanding iocb count being zero (independent of both the mid layer status
> and the actual device state).
> 	
> 	Result: all the code beyond the device reset handler never gets called.

Unsurprisingly, I found pretty well the same thing with
megaraid and mpt2sas (SAS) drivers. A big thumbs up from
the drivers if a LU reset was sent when there was
no way through the expander (due to zoning) to the LU (disk)
in question. Further, when that LU (disk) was viewed from
another initiator, no UA condition had been set; more
evidence that the LU reset did not get through.

"Fire and forget" task management functions ...

Doug Gilbert




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset
  2013-03-07 20:35       ` Jeremy Linton
@ 2013-03-11 17:05         ` Hannes Reinecke
  2013-03-11 18:04           ` James Smart
  0 siblings, 1 reply; 13+ messages in thread
From: Hannes Reinecke @ 2013-03-11 17:05 UTC (permalink / raw)
  To: Jeremy Linton
  Cc: Mike Christie, linux-scsi, James Smart, Andrew Vasquez,
	Chad Dupuis, Robert Elliot

On 03/07/2013 09:35 PM, Jeremy Linton wrote:
> On 3/7/2013 2:20 PM, Mike Christie wrote:
>> On 03/07/2013 02:13 PM, Jeremy Linton wrote:
>>> 	For lpfc, you never get to the code. Or rather when I was testing it, I
>>> couldn't find any way to propagate an error beyond the initial
>>> lpfc_reset_flush_io_context() call in lpfc_device_reset_handler().
>>>
>>> 	That call pretty much always returns success indpependent of the remote
>>> device because the firmware acks the context clear aborts, resulting in the
>>> outstanding iocb count being zero (independent of both the mid layer status
>>> and the actual device state).
>>> 	
>>
>> Your lpfc patch fixes that right?
>
> 	Yes. It allows the device reset to fail if the device doesn't respond to the
> task mgmt request, or rejects it, etc.
>
> 	It doesn't unjam the commands that get aborted by the flush_io_context() call.
> Those have to depend on their timeouts. That is another patch...
>
>

It's actually worse than that.
lpfc_terminate_rport_io() calls lpfc_sli_abort_iocb(), which has this:


  		if (lpfc_is_link_up(phba))
			abtsiocb->iocb.ulpCommand = CMD_ABORT_XRI_CN;
		else
			abtsiocb->iocb.ulpCommand = CMD_CLOSE_XRI_CN;

		/* Setup callback routine and issue the command. */
		abtsiocb->iocb_cmpl = lpfc_sli_abort_fcp_cmpl;
		ret_val = lpfc_sli_issue_iocb(phba, pring->ringno,
					      abtsiocb, 0);
		if (ret_val == IOCB_ERROR) {
			lpfc_sli_release_iocbq(phba, abtsiocb);
			errcnt++;
			continue;
		}


Ie we're calling into firmware and waiting for an async event telling us 
that the command has been aborted (ideally).
What I would like is some kind of synchronous call here, which would
guarantee us that we won't run into use-after-free issues.

Also 'lpfc_is_link_up' is clearly deficient here as the link itself most 
likely is up, it's the I_T Nexus which is not.

James, is it safe to use 'CMD_CLOSE_XRI_CN' even when the link is up?

Which makes me wonder, how _exactly_ is I_T nexus reset supposed to 
work? After all, we're trying to tell the target port that we cannot 
talk to it anymore, right?
Which has some hurdles, conceptually ...
So from my POV I_T nexus reset can only be implemented on the 
_initiator_ side, disregarding any target implementation.
(which would be pointless anyway).

Hmm. Probably have to ask T10 for clarification. Robert, any insights?

Cheers,

Hannes

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset
  2013-03-11 17:05         ` Hannes Reinecke
@ 2013-03-11 18:04           ` James Smart
  2013-03-11 18:32             ` Vijay Mohan Guvva
  2013-03-12 15:59             ` Hannes Reinecke
  0 siblings, 2 replies; 13+ messages in thread
From: James Smart @ 2013-03-11 18:04 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Jeremy Linton, Mike Christie, linux-scsi, Andrew Vasquez,
	Chad Dupuis, Robert Elliot, Smart, James


On 3/11/2013 1:05 PM, Hannes Reinecke wrote:
> On 03/07/2013 09:35 PM, Jeremy Linton wrote:
>> On 3/7/2013 2:20 PM, Mike Christie wrote:
>>> On 03/07/2013 02:13 PM, Jeremy Linton wrote:
>>>>     For lpfc, you never get to the code. Or rather when I was 
>>>> testing it, I
>>>> couldn't find any way to propagate an error beyond the initial
>>>> lpfc_reset_flush_io_context() call in lpfc_device_reset_handler().
>>>>
>>>>     That call pretty much always returns success indpependent of 
>>>> the remote
>>>> device because the firmware acks the context clear aborts, 
>>>> resulting in the
>>>> outstanding iocb count being zero (independent of both the mid 
>>>> layer status
>>>> and the actual device state).
>>>>
>>>
>>> Your lpfc patch fixes that right?
>>
>>     Yes. It allows the device reset to fail if the device doesn't 
>> respond to the
>> task mgmt request, or rejects it, etc.
>>
>>     It doesn't unjam the commands that get aborted by the 
>> flush_io_context() call.
>> Those have to depend on their timeouts. That is another patch...
>>
>>
>
> It's actually worse than that.
> lpfc_terminate_rport_io() calls lpfc_sli_abort_iocb(), which has this:
>
>
>          if (lpfc_is_link_up(phba))
>             abtsiocb->iocb.ulpCommand = CMD_ABORT_XRI_CN;
>         else
>             abtsiocb->iocb.ulpCommand = CMD_CLOSE_XRI_CN;
>
>         /* Setup callback routine and issue the command. */
>         abtsiocb->iocb_cmpl = lpfc_sli_abort_fcp_cmpl;
>         ret_val = lpfc_sli_issue_iocb(phba, pring->ringno,
>                           abtsiocb, 0);
>         if (ret_val == IOCB_ERROR) {
>             lpfc_sli_release_iocbq(phba, abtsiocb);
>             errcnt++;
>             continue;
>         }
>
>
> Ie we're calling into firmware and waiting for an async event telling 
> us that the command has been aborted (ideally).
> What I would like is some kind of synchronous call here, which would
> guarantee us that we won't run into use-after-free issues.
>
> Also 'lpfc_is_link_up' is clearly deficient here as the link itself 
> most likely is up, it's the I_T Nexus which is not.
>
> James, is it safe to use 'CMD_CLOSE_XRI_CN' even when the link is up?

No, it's not safe.  The ABORT, which sends an ABTS, is mandated so that 
the other end and ourselves maintain proper (unique) exchange id 
state.   CLOSE sends no link traffic - but can only be used if the login 
is broken (e.g. there's a different mechanism that communicated 
termination of exchange states).   I don't believe I can trust the logic 
in the OS about frames laying in wait in the fabric (maybe sent earlier, 
delayed at a switch, delivered after os thinks nexus is gone), so driver 
needs to terminate them properly.


>
> Which makes me wonder, how _exactly_ is I_T nexus reset supposed to 
> work? After all, we're trying to tell the target port that we cannot 
> talk to it anymore, right?
> Which has some hurdles, conceptually ...
> So from my POV I_T nexus reset can only be implemented on the 
> _initiator_ side, disregarding any target implementation.
> (which would be pointless anyway).
>
> Hmm. Probably have to ask T10 for clarification. Robert, any insights?


The I_T nexus reset should be a FC transport implicit logout call to the 
LLDD.  E.g. this becomes a transport-specific action on what it means to 
break the I_T nexus, which for FC, is to terminate the login.   This 
logout call allows the driver to do all the implicit work to kill 
exchange contexts and allows it to adjust the state of the target in 
it's FC discovery engine.  Question is - should the driver re-login ?   
Typically, this would be driven by a RSCN, which I'm guessing for this 
scenario, would not be occurring. If you knew it would, you could let 
the driver respond to the RSCN and re-login later.   If there's no RSCN, 
then I would assume we put a heartbeat into the transport to retry login 
(to a WWPN/WWNN basis - remembered from the I_T nexus reset) with the 
LLDD - a new interface as well - call it "establish I_T_nexus".

In lpfc's case - the Logout would allow the driver to take the CLOSE_XRI 
case, giving you the speed/asynchronicity you desire. Reuse of scsi job 
structures still can't occur until the driver returns then via the 
completion routines (as DMA related to them must be cancelled within the 
card by the ABORT/CLOSE commands - even if we know there shouldn't be 
something to DMA).

-- james s


>
> Cheers,
>
> Hannes
>
>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset
  2013-03-11 18:04           ` James Smart
@ 2013-03-11 18:32             ` Vijay Mohan Guvva
  2013-03-12 15:59             ` Hannes Reinecke
  1 sibling, 0 replies; 13+ messages in thread
From: Vijay Mohan Guvva @ 2013-03-11 18:32 UTC (permalink / raw)
  To: James.Smart, Hannes Reinecke
  Cc: Jeremy Linton, Mike Christie, linux-scsi, Andrew Vasquez,
	Chad Dupuis, Robert Elliot, Anil Gurumurthy

> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
> owner@vger.kernel.org] On Behalf Of James Smart
> Sent: Monday, March 11, 2013 11:04 AM
> To: Hannes Reinecke
> Cc: Jeremy Linton; Mike Christie; linux-scsi@vger.kernel.org; Andrew
> Vasquez; Chad Dupuis; Robert Elliot; Smart, James
> Subject: Re: [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset
> 
> 
> On 3/11/2013 1:05 PM, Hannes Reinecke wrote:
> > On 03/07/2013 09:35 PM, Jeremy Linton wrote:
> >> On 3/7/2013 2:20 PM, Mike Christie wrote:
> >>> On 03/07/2013 02:13 PM, Jeremy Linton wrote:
> >>>>     For lpfc, you never get to the code. Or rather when I was
> >>>> testing it, I couldn't find any way to propagate an error beyond
> >>>> the initial
> >>>> lpfc_reset_flush_io_context() call in lpfc_device_reset_handler().
> >>>>
> >>>>     That call pretty much always returns success indpependent of
> >>>> the remote device because the firmware acks the context clear
> >>>> aborts, resulting in the outstanding iocb count being zero
> >>>> (independent of both the mid layer status and the actual device
> >>>> state).
> >>>>
> >>>
> >>> Your lpfc patch fixes that right?
> >>
> >>     Yes. It allows the device reset to fail if the device doesn't
> >> respond to the task mgmt request, or rejects it, etc.
> >>
> >>     It doesn't unjam the commands that get aborted by the
> >> flush_io_context() call.
> >> Those have to depend on their timeouts. That is another patch...
> >>
> >>
> >
> > It's actually worse than that.
> > lpfc_terminate_rport_io() calls lpfc_sli_abort_iocb(), which has this:
> >
> >
> >          if (lpfc_is_link_up(phba))
> >             abtsiocb->iocb.ulpCommand = CMD_ABORT_XRI_CN;
> >         else
> >             abtsiocb->iocb.ulpCommand = CMD_CLOSE_XRI_CN;
> >
> >         /* Setup callback routine and issue the command. */
> >         abtsiocb->iocb_cmpl = lpfc_sli_abort_fcp_cmpl;
> >         ret_val = lpfc_sli_issue_iocb(phba, pring->ringno,
> >                           abtsiocb, 0);
> >         if (ret_val == IOCB_ERROR) {
> >             lpfc_sli_release_iocbq(phba, abtsiocb);
> >             errcnt++;
> >             continue;
> >         }
> >
> >
> > Ie we're calling into firmware and waiting for an async event telling
> > us that the command has been aborted (ideally).
> > What I would like is some kind of synchronous call here, which would
> > guarantee us that we won't run into use-after-free issues.
> >
> > Also 'lpfc_is_link_up' is clearly deficient here as the link itself
> > most likely is up, it's the I_T Nexus which is not.
> >
> > James, is it safe to use 'CMD_CLOSE_XRI_CN' even when the link is up?
> 
> No, it's not safe.  The ABORT, which sends an ABTS, is mandated so that the
> other end and ourselves maintain proper (unique) exchange id
> state.   CLOSE sends no link traffic - but can only be used if the login
> is broken (e.g. there's a different mechanism that communicated
> termination of exchange states).   I don't believe I can trust the logic
> in the OS about frames laying in wait in the fabric (maybe sent earlier,
> delayed at a switch, delivered after os thinks nexus is gone), so driver needs
> to terminate them properly.
> 
> 
> >
> > Which makes me wonder, how _exactly_ is I_T nexus reset supposed to
> > work? After all, we're trying to tell the target port that we cannot
> > talk to it anymore, right?
> > Which has some hurdles, conceptually ...
> > So from my POV I_T nexus reset can only be implemented on the
> > _initiator_ side, disregarding any target implementation.
> > (which would be pointless anyway).
> >
> > Hmm. Probably have to ask T10 for clarification. Robert, any insights?
> 
> 
> The I_T nexus reset should be a FC transport implicit logout call to the LLDD.
> E.g. this becomes a transport-specific action on what it means to
> break the I_T nexus, which for FC, is to terminate the login.   This
> logout call allows the driver to do all the implicit work to kill exchange
> contexts and allows it to adjust the state of the target in
> it's FC discovery engine.  Question is - should the driver re-login ?
> Typically, this would be driven by a RSCN, which I'm guessing for this
> scenario, would not be occurring. If you knew it would, you could let
> the driver respond to the RSCN and re-login later.   If there's no RSCN,
> then I would assume we put a heartbeat into the transport to retry login (to a
> WWPN/WWNN basis - remembered from the I_T nexus reset) with the LLDD
> - a new interface as well - call it "establish I_T_nexus".
> 
> In lpfc's case - the Logout would allow the driver to take the CLOSE_XRI case,
> giving you the speed/asynchronicity you desire. Reuse of scsi job structures
> still can't occur until the driver returns then via the completion routines (as
> DMA related to them must be cancelled within the card by the ABORT/CLOSE
> commands - even if we know there shouldn't be something to DMA).
> 
> -- james s
> 
> 
> >
> > Cheers,
> >
> > Hannes
> >
> >

Adding BROCADE BFA FC SCSI DRIVER maintainer Anil to the CC.

Thanks,
Vijay



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset
  2013-03-11 18:04           ` James Smart
  2013-03-11 18:32             ` Vijay Mohan Guvva
@ 2013-03-12 15:59             ` Hannes Reinecke
  1 sibling, 0 replies; 13+ messages in thread
From: Hannes Reinecke @ 2013-03-12 15:59 UTC (permalink / raw)
  To: James.Smart
  Cc: Jeremy Linton, Mike Christie, linux-scsi, Andrew Vasquez,
	Chad Dupuis, Robert Elliot

On 03/11/2013 07:04 PM, James Smart wrote:
>
> On 3/11/2013 1:05 PM, Hannes Reinecke wrote:
>> On 03/07/2013 09:35 PM, Jeremy Linton wrote:
>>> On 3/7/2013 2:20 PM, Mike Christie wrote:
>>>> On 03/07/2013 02:13 PM, Jeremy Linton wrote:
>>>>>     For lpfc, you never get to the code. Or rather when I was
>>>>> testing it, I
>>>>> couldn't find any way to propagate an error beyond the initial
>>>>> lpfc_reset_flush_io_context() call in lpfc_device_reset_handler().
>>>>>
>>>>>     That call pretty much always returns success indpependent
>>>>> of the remote
>>>>> device because the firmware acks the context clear aborts,
>>>>> resulting in the
>>>>> outstanding iocb count being zero (independent of both the mid
>>>>> layer status
>>>>> and the actual device state).
>>>>>
>>>>
>>>> Your lpfc patch fixes that right?
>>>
>>>     Yes. It allows the device reset to fail if the device doesn't
>>> respond to the
>>> task mgmt request, or rejects it, etc.
>>>
>>>     It doesn't unjam the commands that get aborted by the
>>> flush_io_context() call.
>>> Those have to depend on their timeouts. That is another patch...
>>>
>>>
>>
>> It's actually worse than that.
>> lpfc_terminate_rport_io() calls lpfc_sli_abort_iocb(), which has
>> this:
>>
>>
>>          if (lpfc_is_link_up(phba))
>>             abtsiocb->iocb.ulpCommand = CMD_ABORT_XRI_CN;
>>         else
>>             abtsiocb->iocb.ulpCommand = CMD_CLOSE_XRI_CN;
>>
>>         /* Setup callback routine and issue the command. */
>>         abtsiocb->iocb_cmpl = lpfc_sli_abort_fcp_cmpl;
>>         ret_val = lpfc_sli_issue_iocb(phba, pring->ringno,
>>                           abtsiocb, 0);
>>         if (ret_val == IOCB_ERROR) {
>>             lpfc_sli_release_iocbq(phba, abtsiocb);
>>             errcnt++;
>>             continue;
>>         }
>>
>>
>> Ie we're calling into firmware and waiting for an async event
>> telling us that the command has been aborted (ideally).
>> What I would like is some kind of synchronous call here, which would
>> guarantee us that we won't run into use-after-free issues.
>>
>> Also 'lpfc_is_link_up' is clearly deficient here as the link
>> itself most likely is up, it's the I_T Nexus which is not.
>>
>> James, is it safe to use 'CMD_CLOSE_XRI_CN' even when the link is up?
>
> No, it's not safe.  The ABORT, which sends an ABTS, is mandated so
> that the other end and ourselves maintain proper (unique) exchange
> id state.   CLOSE sends no link traffic - but can only be used if
> the login is broken (e.g. there's a different mechanism that
> communicated termination of exchange states).   I don't believe I
> can trust the logic in the OS about frames laying in wait in the
> fabric (maybe sent earlier, delayed at a switch, delivered after os
> thinks nexus is gone), so driver needs to terminate them properly.
>
True. Just as I thought.

>>
>> Which makes me wonder, how _exactly_ is I_T nexus reset supposed
>> to work? After all, we're trying to tell the target port that we
>> cannot talk to it anymore, right?
>> Which has some hurdles, conceptually ...
>> So from my POV I_T nexus reset can only be implemented on the
>> _initiator_ side, disregarding any target implementation.
>> (which would be pointless anyway).
>>
>> Hmm. Probably have to ask T10 for clarification. Robert, any
>> insights?
>
>
> The I_T nexus reset should be a FC transport implicit logout call to
> the LLDD.  E.g. this becomes a transport-specific action on what it
> means to break the I_T nexus, which for FC, is to terminate the
> login.   This logout call allows the driver to do all the implicit
> work to kill exchange contexts and allows it to adjust the state of
> the target in it's FC discovery engine.  Question is - should the
> driver re-login ? Typically, this would be driven by a RSCN, which
> I'm guessing for this scenario, would not be occurring. If you knew
> it would, you could let the driver respond to the RSCN and re-login
> later.   If there's no RSCN, then I would assume we put a heartbeat
> into the transport to retry login (to a WWPN/WWNN basis - remembered
> from the I_T nexus reset) with the LLDD - a new interface as well -
> call it "establish I_T_nexus".
>
Hmm. As I feared, my solution was a bit optimistic.
But good idea, using a 'logout' to trigger I_T nexus removal.
I wonder if we shouldn't attempt to logout for the fast_io_fail 
case, too?
And for the timer, yeah, I guess we need something like this.

> In lpfc's case - the Logout would allow the driver to take the
> CLOSE_XRI case, giving you the speed/asynchronicity you desire.
> Reuse of scsi job structures still can't occur until the driver
> returns then via the completion routines (as DMA related to them
> must be cancelled within the card by the ABORT/CLOSE commands - even
> if we know there shouldn't be something to DMA).
>
The problem here is that the _eh calls are _synchronous_ in nature.
Not that it works perfectly nowadays (cf the discussion about TMF 
results) but that's at least the theory.

Anyway, thanks for you insights.
It has been _very_ helpful.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@suse.de			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-03-12 15:59 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-11  8:23 [PATCH v2][RFC] scsi_transport_fc: Implement I_T nexus reset Hannes Reinecke
2012-12-11 12:46 ` Martin Peschke
2012-12-11 14:06   ` Hannes Reinecke
2013-03-07 19:19 ` Mike Christie
2013-03-07 20:13   ` Jeremy Linton
2013-03-07 20:20     ` Mike Christie
2013-03-07 20:24       ` Mike Christie
2013-03-07 20:35       ` Jeremy Linton
2013-03-11 17:05         ` Hannes Reinecke
2013-03-11 18:04           ` James Smart
2013-03-11 18:32             ` Vijay Mohan Guvva
2013-03-12 15:59             ` Hannes Reinecke
2013-03-07 21:44     ` Douglas Gilbert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.