All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11
@ 2013-07-03 12:41 Bart Van Assche
  2013-07-03 12:53 ` [PATCH v3 06/13] IB/srp: Keep rport as long as the IB transport layer Bart Van Assche
                   ` (2 more replies)
  0 siblings, 3 replies; 41+ messages in thread
From: Bart Van Assche @ 2013-07-03 12:41 UTC (permalink / raw)
  To: Roland Dreier
  Cc: David Dillow, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma,
	linux-scsi

The purpose of this InfiniBand SRP initiator patch series is as follows:
- Make the SRP initiator driver better suited for use in a H.A. setup.
   Add fast_io_fail_tmo and dev_loss_tmo parameters. These can be used
   either to speed up failover or to avoid device removal when e.g. using
   initiator side mirroring.
- Make the SRP initiator better suited for use on NUMA systems by
   making the HCA completion vector configurable.

Changes since the v2 of the "IB SRP initiator patches for kernel 3.11" 
patch series:
- Improved documentation of the newly added sysfs parameters.
- Limit fast_io_fail_tmo to SCSI_DEVICE_BLOCK_MAX_TIMEOUT.
- Simplified the code for parsing values written into sysfs attributes.
- Fixed a potential deadlock in the code added in scsi_transport_srp
   (invoking cancel_delayed_work() with the rport mutex held for work
    that needs the rport mutex itself).
- Changed the default retry count back from 2 to 7 since there is not
   yet agreement about this change.
- Dropped the patch that silences failed SCSI commands and also the
   patch that fixes a race between srp_queuecommand() and
   srp_claim_req() since there is no agreement about these patches.

Changes since the v1 of the "IB SRP initiator patches for kernel 3.11" 
patch series:
- scsi_transport_srp: Allowed both fast_io_fail and dev_loss timeouts
   to be disabled.
- scsi_transport_srp, srp_reconnect_rport(): switched from
   scsi_block_requests() to scsi_target_block() for blocking SCSI command
   processing temporarily.
- scsi_transport_srp, srp_start_tl_fail_timers(): only block SCSI device
   command processing if the fast_io_fail timer is enabled.
- Changed srp_abort() such that upon transport offline the value
   FAST_IO_FAIL is returned instead of SUCCESS.
- Fixed a race condition in the "maintain single connection" patch: a
   new login after removal had started but before removal had finished
   still could create a duplicate connection. Fixed this by deferring
   removal from the target list until removal has finished.
- Modified the error message in the same patch for reporting that a
   duplicate connection has been rejected.
- Modified patch 2/15 such that all possible race conditions with
   srp_claim_req() are addressed.
- Documented the comp_vector and tl_retry_count login string parameters.
- Updated dev_loss_tmo and fast_io_fail_tmo documentation - mentioned
   "off" is a valid choice.

Changes compared to v5 of the "Make ib_srp better suited for H.A. 
purposes" patch series:
- Left out patches that are already upstream.
- Made it possible to set dev_loss_tmo to "off". This is useful in a
   setup using initiator side mirroring to avoid that new /dev/sd* names
   are reassigned after a failover or cable pull and reinsert.
- Added kernel module parameters to ib_srp for configuring default
   values of the fast_io_fail_tmo and dev_loss_tmo parameters.
- Added a patch from Dotan Barak that fixes a kernel oops during rmmod
   triggered by resource allocation failure at module load time.
- Avoid duplicate connections by refusing relogins instead of dropping
   duplicate connections, as proposed by Sebastian Riemer.
- Added a patch from Sebastian Riemer for failing SCSI commands
   silently.
- Added a patch from Vu Pham to make the transport layer (IB RC) retry
   count configurable.
- Made HCA completion vector configurable.

Changes since v4:
- Added a patch for removing SCSI devices upon a port down event

Changes since v3:
- Restored the dev_loss_tmo and fast_io_fail_tmo sysfs attributes.
- Included a patch to fix an ib_srp crash that could be triggered by
   cable pulling.

Changes since v2:
- Addressed the v2 review comments.
- Dropped the patches that have already been merged.
- Dropped the patches for integration with multipathd.
- Dropped the micro-optimization of the IB completion handlers.

The individual patches in this series are as follows:
0001-IB-srp-Fix-remove_one-crash-due-to-resource-exhausti.patch
0002-IB-srp-Fix-race-between-srp_queuecommand-and-srp_cla.patch
0003-IB-srp-Avoid-that-srp_reset_host-is-skipped-after-a-.patch
0004-IB-srp-Fail-I-O-fast-if-target-offline.patch
0005-IB-srp-Skip-host-settle-delay.patch
0006-IB-srp-Maintain-a-single-connection-per-I_T-nexus.patch
0007-IB-srp-Keep-rport-as-long-as-the-IB-transport-layer.patch
0008-scsi_transport_srp-Add-transport-layer-error-handlin.patch
0009-IB-srp-Add-srp_terminate_io.patch
0010-IB-srp-Use-SRP-transport-layer-error-recovery.patch
0011-IB-srp-Start-timers-if-a-transport-layer-error-occur.patch
0012-IB-srp-Fail-SCSI-commands-silently.patch
0013-IB-srp-Make-HCA-completion-vector-configurable.patch
0014-IB-srp-Make-transport-layer-retry-count-configurable.patch
0015-IB-srp-Bump-driver-version-and-release-date.patch
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v3 01/13] IB/srp: Fix remove_one crash due to resource exhaustion
       [not found] ` <51D41C03.4020607-HInyCGIudOg@public.gmane.org>
@ 2013-07-03 12:43   ` Bart Van Assche
  2013-07-03 12:44   ` [PATCH v3 02/13] IB/srp: Avoid that srp_reset_host() is skipped after a TL error Bart Van Assche
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2013-07-03 12:43 UTC (permalink / raw)
  To: Roland Dreier
  Cc: David Dillow, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma,
	Dotan Barak, Eli Cohen

From: Dotan Barak <dotanb-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>

If the add_one callback fails during driver load no resources are
allocated so there isn't a need to release any resources. Trying
to clean the resource may lead to the following kernel panic:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa0132331>] srp_remove_one+0x31/0x240 [ib_srp]
RIP: 0010:[<ffffffffa0132331>]  [<ffffffffa0132331>] srp_remove_one+0x31/0x240 [ib_srp]
Process rmmod (pid: 4562, threadinfo ffff8800dd738000, task ffff8801167e60c0)
Call Trace:
 [<ffffffffa024500e>] ib_unregister_client+0x4e/0x120 [ib_core]
 [<ffffffffa01361bd>] srp_cleanup_module+0x15/0x71 [ib_srp]
 [<ffffffff810ac6a4>] sys_delete_module+0x194/0x260
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

[bvanassche: Shortened patch description]
Signed-off-by: Dotan Barak <dotanb-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
Reviewed-by: Eli Cohen <eli-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Acked-by: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 7ccf328..368d160 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -2507,6 +2507,8 @@ static void srp_remove_one(struct ib_device *device)
 	struct srp_target_port *target;
 
 	srp_dev = ib_get_client_data(device, &srp_client);
+	if (!srp_dev)
+		return;
 
 	list_for_each_entry_safe(host, tmp_host, &srp_dev->dev_list, list) {
 		device_unregister(&host->dev);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 02/13] IB/srp: Avoid that srp_reset_host() is skipped after a TL error
       [not found] ` <51D41C03.4020607-HInyCGIudOg@public.gmane.org>
  2013-07-03 12:43   ` [PATCH v3 01/13] IB/srp: Fix remove_one crash due to resource exhaustion Bart Van Assche
@ 2013-07-03 12:44   ` Bart Van Assche
  2013-07-03 12:45   ` [PATCH v3 03/13] IB/srp: Fail I/O fast if target offline Bart Van Assche
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2013-07-03 12:44 UTC (permalink / raw)
  To: Roland Dreier
  Cc: David Dillow, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma

The SCSI error handler assumes that the transport layer is
operational if an eh_abort_handler() returns SUCCESS. Hence let
srp_abort() only return SUCCESS if sending the ABORT TASK task
management function succeeded. This patch avoids that the SCSI
error handler skips the srp_reset_host() call after a transport
layer error.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 368d160..0e0a5a2 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1744,18 +1744,22 @@ static int srp_abort(struct scsi_cmnd *scmnd)
 {
 	struct srp_target_port *target = host_to_target(scmnd->device->host);
 	struct srp_request *req = (struct srp_request *) scmnd->host_scribble;
+	int ret;
 
 	shost_printk(KERN_ERR, target->scsi_host, "SRP abort called\n");
 
 	if (!req || !srp_claim_req(target, req, scmnd))
 		return FAILED;
-	srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
-			  SRP_TSK_ABORT_TASK);
+	if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
+			      SRP_TSK_ABORT_TASK) == 0)
+		ret = SUCCESS;
+	else
+		ret = FAILED;
 	srp_free_req(target, req, scmnd, 0);
 	scmnd->result = DID_ABORT << 16;
 	scmnd->scsi_done(scmnd);
 
-	return SUCCESS;
+	return ret;
 }
 
 static int srp_reset_device(struct scsi_cmnd *scmnd)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 03/13] IB/srp: Fail I/O fast if target offline
       [not found] ` <51D41C03.4020607-HInyCGIudOg@public.gmane.org>
  2013-07-03 12:43   ` [PATCH v3 01/13] IB/srp: Fix remove_one crash due to resource exhaustion Bart Van Assche
  2013-07-03 12:44   ` [PATCH v3 02/13] IB/srp: Avoid that srp_reset_host() is skipped after a TL error Bart Van Assche
@ 2013-07-03 12:45   ` Bart Van Assche
  2013-07-03 12:50   ` [PATCH v3 04/13] IB/srp: Skip host settle delay Bart Van Assche
                     ` (8 subsequent siblings)
  11 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2013-07-03 12:45 UTC (permalink / raw)
  To: Roland Dreier
  Cc: David Dillow, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma

If reconnecting failed we know that no command completion will
be received anymore. Hence let the SCSI error handler fail such
commands immediately.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Acked-by: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 0e0a5a2..19279e5 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1753,6 +1753,8 @@ static int srp_abort(struct scsi_cmnd *scmnd)
 	if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
 			      SRP_TSK_ABORT_TASK) == 0)
 		ret = SUCCESS;
+	else if (target->transport_offline)
+		ret = FAST_IO_FAIL;
 	else
 		ret = FAILED;
 	srp_free_req(target, req, scmnd, 0);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 04/13] IB/srp: Skip host settle delay
       [not found] ` <51D41C03.4020607-HInyCGIudOg@public.gmane.org>
                     ` (2 preceding siblings ...)
  2013-07-03 12:45   ` [PATCH v3 03/13] IB/srp: Fail I/O fast if target offline Bart Van Assche
@ 2013-07-03 12:50   ` Bart Van Assche
  2013-07-03 12:51   ` [PATCH v3 05/13] IB/srp: Maintain a single connection per I_T nexus Bart Van Assche
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2013-07-03 12:50 UTC (permalink / raw)
  To: Roland Dreier
  Cc: David Dillow, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma

The SRP initiator implements host reset by reconnecting to the SRP
target. That means that communication with the target is possible
as soon as host reset finished. Hence skip the host settle delay.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 19279e5..2c82b90 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1952,6 +1952,7 @@ static struct scsi_host_template srp_template = {
 	.eh_abort_handler		= srp_abort,
 	.eh_device_reset_handler	= srp_reset_device,
 	.eh_host_reset_handler		= srp_reset_host,
+	.skip_settle_delay		= true,
 	.sg_tablesize			= SRP_DEF_SG_TABLESIZE,
 	.can_queue			= SRP_CMD_SQ_SIZE,
 	.this_id			= -1,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 05/13] IB/srp: Maintain a single connection per I_T nexus
       [not found] ` <51D41C03.4020607-HInyCGIudOg@public.gmane.org>
                     ` (3 preceding siblings ...)
  2013-07-03 12:50   ` [PATCH v3 04/13] IB/srp: Skip host settle delay Bart Van Assche
@ 2013-07-03 12:51   ` Bart Van Assche
  2013-07-03 12:54   ` [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling Bart Van Assche
                     ` (6 subsequent siblings)
  11 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2013-07-03 12:51 UTC (permalink / raw)
  To: Roland Dreier
  Cc: David Dillow, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma

An SRP target is required to maintain a single connection between
initiator and target. This means that if the 'add_target' attribute
is used to create a second connection to a target that the first
connection will be logged out and that the SCSI error handler will
kick in. The SCSI error handler will cause the SRP initiator to
reconnect, which will cause I/O over the second connection to fail.
Avoid such ping-pong behavior by disabling relogins. Note: if
reconnecting manually is necessary, that is possible by deleting
and recreating an rport via sysfs.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Signed-off-by: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |   44 +++++++++++++++++++++++++++++++++--
 1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 2c82b90..f046e32 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -542,11 +542,11 @@ static void srp_remove_work(struct work_struct *work)
 
 	WARN_ON_ONCE(target->state != SRP_TARGET_REMOVED);
 
+	srp_remove_target(target);
+
 	spin_lock(&target->srp_host->target_lock);
 	list_del(&target->list);
 	spin_unlock(&target->srp_host->target_lock);
-
-	srp_remove_target(target);
 }
 
 static void srp_rport_delete(struct srp_rport *rport)
@@ -2008,6 +2008,36 @@ static struct class srp_class = {
 	.dev_release = srp_release_dev
 };
 
+/**
+ * srp_conn_unique() - check whether the connection to a target is unique
+ */
+static bool srp_conn_unique(struct srp_host *host,
+			    struct srp_target_port *target)
+{
+	struct srp_target_port *t;
+	bool ret = false;
+
+	if (target->state == SRP_TARGET_REMOVED)
+		goto out;
+
+	ret = true;
+
+	spin_lock(&host->target_lock);
+	list_for_each_entry(t, &host->target_list, list) {
+		if (t != target &&
+		    target->id_ext == t->id_ext &&
+		    target->ioc_guid == t->ioc_guid &&
+		    target->initiator_ext == t->initiator_ext) {
+			ret = false;
+			break;
+		}
+	}
+	spin_unlock(&host->target_lock);
+
+out:
+	return ret;
+}
+
 /*
  * Target ports are added by writing
  *
@@ -2264,6 +2294,16 @@ static ssize_t srp_create_target(struct device *dev,
 	if (ret)
 		goto err;
 
+	if (!srp_conn_unique(target->srp_host, target)) {
+		shost_printk(KERN_INFO, target->scsi_host,
+			     PFX "Already connected to target port with id_ext=%016llx;ioc_guid=%016llx;initiator_ext=%016llx\n",
+			     be64_to_cpu(target->id_ext),
+			     be64_to_cpu(target->ioc_guid),
+			     be64_to_cpu(target->initiator_ext));
+		ret = -EEXIST;
+		goto err;
+	}
+
 	if (!host->srp_dev->fmr_pool && !target->allow_ext_sg &&
 				target->cmd_sg_cnt < target->sg_tablesize) {
 		pr_warn("No FMR pool and no external indirect descriptors, limiting sg_tablesize to cmd_sg_cnt\n");
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 06/13] IB/srp: Keep rport as long as the IB transport layer
  2013-07-03 12:41 [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11 Bart Van Assche
@ 2013-07-03 12:53 ` Bart Van Assche
       [not found] ` <51D41C03.4020607-HInyCGIudOg@public.gmane.org>
  2013-07-03 13:38 ` [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11 Or Gerlitz
  2 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2013-07-03 12:53 UTC (permalink / raw)
  To: Roland Dreier
  Cc: David Dillow, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma,
	linux-scsi, James Bottomley

Keep the rport data structure around after srp_remove_host() has
finished until cleanup of the IB transport layer has finished
completely. This is necessary because later patches use the rport
pointer inside the queuecommand callback. Without this patch
accessing the rport from inside a queuecommand callback is racy
because srp_remove_host() must be invoked before scsi_remove_host()
and because the queuecommand callback may get invoked after
srp_remove_host() has finished.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@purestorage.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: David Dillow <dillowda@ornl.gov>
Cc: Vu Pham <vu@mellanox.com>
Cc: Sebastian Riemer <sebastian.riemer@profitbricks.com>
---
 drivers/infiniband/ulp/srp/ib_srp.c |    3 +++
 drivers/infiniband/ulp/srp/ib_srp.h |    1 +
 drivers/scsi/scsi_transport_srp.c   |   18 ++++++++++++++++++
 include/scsi/scsi_transport_srp.h   |    2 ++
 4 files changed, 24 insertions(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index f046e32..f65701d 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -526,11 +526,13 @@ static void srp_remove_target(struct srp_target_port *target)
 	WARN_ON_ONCE(target->state != SRP_TARGET_REMOVED);
 
 	srp_del_scsi_host_attr(target->scsi_host);
+	srp_rport_get(target->rport);
 	srp_remove_host(target->scsi_host);
 	scsi_remove_host(target->scsi_host);
 	srp_disconnect_target(target);
 	ib_destroy_cm_id(target->cm_id);
 	srp_free_target_ib(target);
+	srp_rport_put(target->rport);
 	srp_free_req_data(target);
 	scsi_host_put(target->scsi_host);
 }
@@ -1982,6 +1984,7 @@ static int srp_add_target(struct srp_host *host, struct srp_target_port *target)
 	}
 
 	rport->lld_data = target;
+	target->rport = rport;
 
 	spin_lock(&host->target_lock);
 	list_add_tail(&target->list, &host->target_list);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index 66fbedd..1817ed5 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -153,6 +153,7 @@ struct srp_target_port {
 	u16			io_class;
 	struct srp_host	       *srp_host;
 	struct Scsi_Host       *scsi_host;
+	struct srp_rport       *rport;
 	char			target_name[32];
 	unsigned int		scsi_id;
 	unsigned int		sg_tablesize;
diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index f379c7f..f7ba94a 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -185,6 +185,24 @@ static int srp_host_match(struct attribute_container *cont, struct device *dev)
 }
 
 /**
+ * srp_rport_get() - increment rport reference count
+ */
+void srp_rport_get(struct srp_rport *rport)
+{
+	get_device(&rport->dev);
+}
+EXPORT_SYMBOL(srp_rport_get);
+
+/**
+ * srp_rport_put() - decrement rport reference count
+ */
+void srp_rport_put(struct srp_rport *rport)
+{
+	put_device(&rport->dev);
+}
+EXPORT_SYMBOL(srp_rport_put);
+
+/**
  * srp_rport_add - add a SRP remote port to the device hierarchy
  * @shost:	scsi host the remote port is connected to.
  * @ids:	The port id for the remote port.
diff --git a/include/scsi/scsi_transport_srp.h b/include/scsi/scsi_transport_srp.h
index ff0f04a..5a2d2d1 100644
--- a/include/scsi/scsi_transport_srp.h
+++ b/include/scsi/scsi_transport_srp.h
@@ -38,6 +38,8 @@ extern struct scsi_transport_template *
 srp_attach_transport(struct srp_function_template *);
 extern void srp_release_transport(struct scsi_transport_template *);
 
+extern void srp_rport_get(struct srp_rport *rport);
+extern void srp_rport_put(struct srp_rport *rport);
 extern struct srp_rport *srp_rport_add(struct Scsi_Host *,
 				       struct srp_rport_identifiers *);
 extern void srp_rport_del(struct srp_rport *);
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling
       [not found] ` <51D41C03.4020607-HInyCGIudOg@public.gmane.org>
                     ` (4 preceding siblings ...)
  2013-07-03 12:51   ` [PATCH v3 05/13] IB/srp: Maintain a single connection per I_T nexus Bart Van Assche
@ 2013-07-03 12:54   ` Bart Van Assche
       [not found]     ` <51D41F13.6060203-HInyCGIudOg@public.gmane.org>
  2013-07-03 12:55   ` [PATCH v3 08/13] IB/srp: Add srp_terminate_io() Bart Van Assche
                     ` (5 subsequent siblings)
  11 siblings, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2013-07-03 12:54 UTC (permalink / raw)
  Cc: Roland Dreier, David Dillow, Vu Pham, Sebastian Riemer,
	Jinpu Wang, linux-rdma, linux-scsi, James Bottomley

Add the necessary functions in the SRP transport module to allow
an SRP initiator driver to implement transport layer error handling
similar to the functionality already provided by the FC transport
layer. This includes:
- Support for implementing fast_io_fail_tmo, the time that should
  elapse after having detected a transport layer problem and
  before failing I/O.
- Support for implementing dev_loss_tmo, the time that should
  elapse after having detected a transport layer problem and
  before removing a remote port.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: James Bottomley <JBottomley-MU7nAjRaF3makBO8gow8eQ@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 Documentation/ABI/stable/sysfs-transport-srp |   38 +++
 drivers/scsi/scsi_transport_srp.c            |  468 +++++++++++++++++++++++++-
 include/scsi/scsi_transport_srp.h            |   62 +++-
 3 files changed, 565 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-transport-srp b/Documentation/ABI/stable/sysfs-transport-srp
index b36fb0d..52babb9 100644
--- a/Documentation/ABI/stable/sysfs-transport-srp
+++ b/Documentation/ABI/stable/sysfs-transport-srp
@@ -5,6 +5,24 @@ Contact:	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
 Description:	Instructs an SRP initiator to disconnect from a target and to
 		remove all LUNs imported from that target.
 
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/dev_loss_tmo
+Date:		October 1, 2013
+KernelVersion:	3.11
+Contact:	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	Number of seconds the SCSI layer will wait after a transport
+		layer error has been observed before removing a target port.
+		Zero means immediate removal. Setting this attribute to "off"
+		will disable this behavior.
+
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/fast_io_fail_tmo
+Date:		October 1, 2013
+KernelVersion:	3.11
+Contact:	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	Number of seconds the SCSI layer will wait after a transport
+		layer error has been observed before failing I/O. Zero means
+		failing I/O immediately. Setting this attribute to "off" will
+		disable this behavior.
+
 What:		/sys/class/srp_remote_ports/port-<h>:<n>/port_id
 Date:		June 27, 2007
 KernelVersion:	2.6.24
@@ -12,8 +30,28 @@ Contact:	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
 Description:	16-byte local SRP port identifier in hexadecimal format. An
 		example: 4c:49:4e:55:58:20:56:49:4f:00:00:00:00:00:00:00.
 
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/reconnect_delay
+Date:		October 1, 2013
+KernelVersion:	3.11
+Contact:	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	Number of seconds the SCSI layer will wait after a reconnect
+		attempt failed before retrying.
+
 What:		/sys/class/srp_remote_ports/port-<h>:<n>/roles
 Date:		June 27, 2007
 KernelVersion:	2.6.24
 Contact:	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
 Description:	Role of the remote port. Either "SRP Initiator" or "SRP Target".
+
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/state
+Date:		October 1, 2013
+KernelVersion:	3.11
+Contact:	linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
+Description:	State of the transport layer used for communication with the
+		remote port. "running" if the transport layer is operational;
+		"blocked" if a transport layer error has been encountered but
+		the fail_io_fast_tmo timer has not yet fired; "fail-fast"
+		after the fail_io_fast_tmo timer has fired and before the
+		"dev_loss_tmo" timer has fired; "lost" after the
+		"dev_loss_tmo" timer has fired and before the port is finally
+		removed.
diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index f7ba94a..1b9ebd5 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -24,12 +24,15 @@
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <linux/string.h>
+#include <linux/delay.h>
 
 #include <scsi/scsi.h>
+#include <scsi/scsi_cmnd.h>
 #include <scsi/scsi_device.h>
 #include <scsi/scsi_host.h>
 #include <scsi/scsi_transport.h>
 #include <scsi/scsi_transport_srp.h>
+#include "scsi_priv.h"
 #include "scsi_transport_srp_internal.h"
 
 struct srp_host_attrs {
@@ -38,7 +41,7 @@ struct srp_host_attrs {
 #define to_srp_host_attrs(host)	((struct srp_host_attrs *)(host)->shost_data)
 
 #define SRP_HOST_ATTRS 0
-#define SRP_RPORT_ATTRS 3
+#define SRP_RPORT_ATTRS 8
 
 struct srp_internal {
 	struct scsi_transport_template t;
@@ -54,6 +57,26 @@ struct srp_internal {
 
 #define	dev_to_rport(d)	container_of(d, struct srp_rport, dev)
 #define transport_class_to_srp_rport(dev) dev_to_rport((dev)->parent)
+static inline struct Scsi_Host *rport_to_shost(struct srp_rport *r)
+{
+	return dev_to_shost(r->dev.parent);
+}
+
+/**
+ * srp_tmo_valid() - check timeout combination validity
+ *
+ * If both a fast I/O fail and a device loss timeout have been configured then
+ * the fast I/O fail timeout must be below the device loss timeout. The fast
+ * I/O fail timeout must not exceed SCSI_DEVICE_BLOCK_MAX_TIMEOUT.
+ */
+int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
+{
+	return (fast_io_fail_tmo < 0 || dev_loss_tmo < 0 ||
+		fast_io_fail_tmo < dev_loss_tmo) &&
+		fast_io_fail_tmo <= SCSI_DEVICE_BLOCK_MAX_TIMEOUT &&
+		dev_loss_tmo < LONG_MAX / HZ ? 0 : -EINVAL;
+}
+EXPORT_SYMBOL_GPL(srp_tmo_valid);
 
 static int srp_host_setup(struct transport_container *tc, struct device *dev,
 			  struct device *cdev)
@@ -134,10 +157,422 @@ static ssize_t store_srp_rport_delete(struct device *dev,
 
 static DEVICE_ATTR(delete, S_IWUSR, NULL, store_srp_rport_delete);
 
+static ssize_t show_srp_rport_state(struct device *dev,
+				    struct device_attribute *attr,
+				    char *buf)
+{
+	static const char *const state_name[] = {
+		[SRP_RPORT_RUNNING]	= "running",
+		[SRP_RPORT_BLOCKED]	= "blocked",
+		[SRP_RPORT_FAIL_FAST]	= "fail-fast",
+		[SRP_RPORT_LOST]	= "lost",
+	};
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+	enum srp_rport_state state = rport->state;
+
+	return sprintf(buf, "%s\n",
+		       (unsigned)state < ARRAY_SIZE(state_name) ?
+		       state_name[state] : "???");
+}
+
+static DEVICE_ATTR(state, S_IRUGO, show_srp_rport_state, NULL);
+
+static ssize_t show_reconnect_delay(struct device *dev,
+				    struct device_attribute *attr, char *buf)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+
+	return sprintf(buf, "%d\n", rport->reconnect_delay);
+}
+
+static ssize_t store_reconnect_delay(struct device *dev,
+				     struct device_attribute *attr,
+				     const char *buf, const size_t count)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+	int res, delay;
+
+	res = kstrtoint(buf, 0, &delay);
+	if (res)
+		goto out;
+
+	if (rport->reconnect_delay <= 0 && delay > 0 &&
+	    rport->state != SRP_RPORT_RUNNING) {
+		queue_delayed_work(system_long_wq, &rport->reconnect_work,
+				   delay * HZ);
+	} else if (delay <= 0) {
+		cancel_delayed_work(&rport->reconnect_work);
+	}
+	rport->reconnect_delay = delay;
+	res = count;
+
+out:
+	return res;
+}
+
+static DEVICE_ATTR(reconnect_delay, S_IRUGO | S_IWUSR, show_reconnect_delay,
+		   store_reconnect_delay);
+
+static ssize_t show_failed_reconnects(struct device *dev,
+				      struct device_attribute *attr, char *buf)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+
+	return sprintf(buf, "%d\n", rport->failed_reconnects);
+}
+
+static DEVICE_ATTR(failed_reconnects, S_IRUGO, show_failed_reconnects, NULL);
+
+static ssize_t show_srp_rport_fast_io_fail_tmo(struct device *dev,
+					       struct device_attribute *attr,
+					       char *buf)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+
+	if (rport->fast_io_fail_tmo >= 0)
+		return sprintf(buf, "%d\n", rport->fast_io_fail_tmo);
+	else
+		return sprintf(buf, "off\n");
+}
+
+static ssize_t store_srp_rport_fast_io_fail_tmo(struct device *dev,
+						struct device_attribute *attr,
+						const char *buf, size_t count)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+	int res;
+	int fast_io_fail_tmo;
+
+	if (strncmp(buf, "off", 3) != 0) {
+		res = kstrtoint(buf, 0, &fast_io_fail_tmo);
+		if (res)
+			goto out;
+	} else {
+		fast_io_fail_tmo = -1;
+	}
+	res = srp_tmo_valid(fast_io_fail_tmo, rport->dev_loss_tmo);
+	if (res)
+		goto out;
+	rport->fast_io_fail_tmo = fast_io_fail_tmo;
+	res = count;
+
+out:
+	return res;
+}
+
+static DEVICE_ATTR(fast_io_fail_tmo, S_IRUGO | S_IWUSR,
+		   show_srp_rport_fast_io_fail_tmo,
+		   store_srp_rport_fast_io_fail_tmo);
+
+static ssize_t show_srp_rport_dev_loss_tmo(struct device *dev,
+					   struct device_attribute *attr,
+					   char *buf)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+
+	if (rport->dev_loss_tmo >= 0)
+		return sprintf(buf, "%d\n", rport->dev_loss_tmo);
+	else
+		return sprintf(buf, "off\n");
+}
+
+static ssize_t store_srp_rport_dev_loss_tmo(struct device *dev,
+					    struct device_attribute *attr,
+					    const char *buf, size_t count)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+	int res;
+	int dev_loss_tmo;
+
+	if (strncmp(buf, "off", 3) != 0) {
+		res = kstrtoint(buf, 0, &dev_loss_tmo);
+		if (res)
+			goto out;
+	} else {
+		dev_loss_tmo = -1;
+	}
+	res = srp_tmo_valid(rport->fast_io_fail_tmo, dev_loss_tmo);
+	if (res)
+		goto out;
+	rport->dev_loss_tmo = dev_loss_tmo;
+	res = count;
+
+out:
+	return res;
+}
+
+static DEVICE_ATTR(dev_loss_tmo, S_IRUGO | S_IWUSR,
+		   show_srp_rport_dev_loss_tmo,
+		   store_srp_rport_dev_loss_tmo);
+
+static int srp_rport_set_state(struct srp_rport *rport,
+			       enum srp_rport_state new_state)
+{
+	enum srp_rport_state old_state = rport->state;
+
+	lockdep_assert_held(&rport->mutex);
+
+	switch (new_state) {
+	case SRP_RPORT_RUNNING:
+		switch (old_state) {
+		case SRP_RPORT_LOST:
+			goto invalid;
+		default:
+			break;
+		}
+		break;
+	case SRP_RPORT_BLOCKED:
+		switch (old_state) {
+		case SRP_RPORT_RUNNING:
+			break;
+		default:
+			goto invalid;
+		}
+		break;
+	case SRP_RPORT_FAIL_FAST:
+		switch (old_state) {
+		case SRP_RPORT_LOST:
+			goto invalid;
+		default:
+			break;
+		}
+		break;
+	case SRP_RPORT_LOST:
+		break;
+	}
+	rport->state = new_state;
+	return 0;
+
+invalid:
+	return -EINVAL;
+}
+
+/**
+ * scsi_request_fn_active() - number of kernel threads inside scsi_request_fn()
+ */
+static int scsi_request_fn_active(struct Scsi_Host *shost)
+{
+	struct scsi_device *sdev;
+	struct request_queue *q;
+	int request_fn_active = 0;
+
+	shost_for_each_device(sdev, shost) {
+		q = sdev->request_queue;
+
+		spin_lock_irq(q->queue_lock);
+		request_fn_active += q->request_fn_active;
+		spin_unlock_irq(q->queue_lock);
+	}
+
+	return request_fn_active;
+}
+
+/**
+ * srp_reconnect_rport() - reconnect to an SRP target port
+ *
+ * Blocks SCSI command queueing before invoking reconnect() such that
+ * queuecommand() won't be invoked concurrently with reconnect(). This is
+ * important since a reconnect() implementation may reallocate resources
+ * needed by queuecommand(). Please note that this function neither waits
+ * until outstanding requests have finished nor tries to abort these. It is
+ * the responsibility of the reconnect() function to finish outstanding
+ * commands before reconnecting to the target port.
+ */
+int srp_reconnect_rport(struct srp_rport *rport)
+{
+	struct Scsi_Host *shost = rport_to_shost(rport);
+	struct srp_internal *i = to_srp_internal(shost->transportt);
+	struct scsi_device *sdev;
+	int res;
+
+	pr_debug("SCSI host %s\n", dev_name(&shost->shost_gendev));
+
+	res = mutex_lock_interruptible(&rport->mutex);
+	if (res)
+		goto out;
+	scsi_target_block(&shost->shost_gendev);
+	while (scsi_request_fn_active(shost))
+		msleep(20);
+	res = i->f->reconnect(rport);
+	pr_debug("%s (state %d): transport.reconnect() returned %d\n",
+		 dev_name(&shost->shost_gendev), rport->state, res);
+	if (res == 0) {
+		mutex_unlock(&rport->mutex);
+
+		cancel_delayed_work(&rport->fast_io_fail_work);
+		cancel_delayed_work(&rport->dev_loss_work);
+
+		mutex_lock(&rport->mutex);
+		rport->failed_reconnects = 0;
+		srp_rport_set_state(rport, SRP_RPORT_RUNNING);
+		scsi_target_unblock(&shost->shost_gendev, SDEV_RUNNING);
+		/*
+		 * It can occur that after fast_io_fail_tmo expired and before
+		 * dev_loss_tmo expired that the SCSI error handler has
+		 * offlined one or more devices. scsi_target_unblock() doesn't
+		 * change the state of these devices into running, so do that
+		 * explicitly.
+		 */
+		spin_lock_irq(shost->host_lock);
+		__shost_for_each_device(sdev, shost)
+			if (sdev->sdev_state == SDEV_OFFLINE)
+				sdev->sdev_state = SDEV_RUNNING;
+		spin_unlock_irq(shost->host_lock);
+	} else if (rport->state == SRP_RPORT_RUNNING) {
+		srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST);
+		scsi_target_unblock(&shost->shost_gendev,
+				    SDEV_TRANSPORT_OFFLINE);
+	}
+	mutex_unlock(&rport->mutex);
+
+out:
+	return res;
+}
+EXPORT_SYMBOL(srp_reconnect_rport);
+
+/**
+ * srp_reconnect_work() - reconnect and schedule a new attempt if necessary
+ */
+static void srp_reconnect_work(struct work_struct *work)
+{
+	struct srp_rport *rport = container_of(to_delayed_work(work),
+					struct srp_rport, reconnect_work);
+	struct Scsi_Host *shost = rport_to_shost(rport);
+	int delay, res;
+
+	res = srp_reconnect_rport(rport);
+	if (res != 0) {
+		shost_printk(KERN_ERR, shost,
+			     "reconnect attempt %d failed (%d)\n",
+			     ++rport->failed_reconnects, res);
+		delay = rport->reconnect_delay *
+			min(100, max(1, rport->failed_reconnects - 10));
+		if (delay > 0)
+			queue_delayed_work(system_long_wq,
+					   &rport->reconnect_work, delay * HZ);
+	}
+}
+
+static void __rport_fast_io_fail_timedout(struct srp_rport *rport)
+{
+	struct Scsi_Host *shost = rport_to_shost(rport);
+	struct srp_internal *i;
+
+	lockdep_assert_held(&rport->mutex);
+
+	if (srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST))
+		return;
+	scsi_target_unblock(rport->dev.parent, SDEV_TRANSPORT_OFFLINE);
+
+	/* Involve the LLD if possible to terminate all I/O on the rport. */
+	i = to_srp_internal(shost->transportt);
+	if (i->f->terminate_rport_io)
+		i->f->terminate_rport_io(rport);
+}
+
+/**
+ * rport_fast_io_fail_timedout() - fast I/O failure timeout handler
+ *
+ * Unblocks the SCSI host.
+ */
+static void rport_fast_io_fail_timedout(struct work_struct *work)
+{
+	struct srp_rport *rport = container_of(to_delayed_work(work),
+					struct srp_rport, fast_io_fail_work);
+	struct Scsi_Host *shost = rport_to_shost(rport);
+
+	pr_debug("fast_io_fail_tmo expired for %s.\n",
+		 dev_name(&shost->shost_gendev));
+
+	mutex_lock(&rport->mutex);
+	__rport_fast_io_fail_timedout(rport);
+	mutex_unlock(&rport->mutex);
+}
+
+/**
+ * rport_dev_loss_timedout() - device loss timeout handler
+ */
+static void rport_dev_loss_timedout(struct work_struct *work)
+{
+	struct srp_rport *rport = container_of(to_delayed_work(work),
+					struct srp_rport, dev_loss_work);
+	struct Scsi_Host *shost = rport_to_shost(rport);
+	struct srp_internal *i = to_srp_internal(shost->transportt);
+
+	pr_err("dev_loss_tmo expired for %s.\n",
+	       dev_name(&shost->shost_gendev));
+
+	mutex_lock(&rport->mutex);
+	WARN_ON(srp_rport_set_state(rport, SRP_RPORT_LOST) != 0);
+	scsi_target_unblock(rport->dev.parent, SDEV_TRANSPORT_OFFLINE);
+	mutex_unlock(&rport->mutex);
+
+	i->f->rport_delete(rport);
+}
+
+/**
+ * srp_start_tl_fail_timers() - start the transport layer failure timers
+ *
+ * Start the transport layer fast I/O failure and device loss timers. Do not
+ * modify a timer that was already started.
+ */
+void srp_start_tl_fail_timers(struct srp_rport *rport)
+{
+	struct Scsi_Host *shost = rport_to_shost(rport);
+	int fast_io_fail_tmo, dev_loss_tmo, delay;
+
+	mutex_lock(&rport->mutex);
+	delay = rport->reconnect_delay;
+	fast_io_fail_tmo = rport->fast_io_fail_tmo;
+	dev_loss_tmo = rport->dev_loss_tmo;
+	pr_debug("%s current state: %d\n", dev_name(&shost->shost_gendev),
+		 rport->state);
+
+	if (delay > 0)
+		queue_delayed_work(system_long_wq, &rport->reconnect_work,
+				   1UL * delay * HZ);
+	if (fast_io_fail_tmo >= 0 &&
+	    srp_rport_set_state(rport, SRP_RPORT_BLOCKED) == 0) {
+		pr_debug("%s new state: %d\n", dev_name(&shost->shost_gendev),
+			 rport->state);
+		scsi_target_block(&shost->shost_gendev);
+		queue_delayed_work(system_long_wq, &rport->fast_io_fail_work,
+				   1UL * fast_io_fail_tmo * HZ);
+	}
+	if (dev_loss_tmo >= 0)
+		queue_delayed_work(system_long_wq, &rport->dev_loss_work,
+				   1UL * dev_loss_tmo * HZ);
+	mutex_unlock(&rport->mutex);
+}
+EXPORT_SYMBOL(srp_start_tl_fail_timers);
+
+/**
+ * srp_timed_out() - SRP transport intercept of the SCSI timeout EH
+ *
+ * If a timeout occurs while an rport is in the blocked state, ask the SCSI
+ * EH to continue waiting (BLK_EH_RESET_TIMER). Otherwise let the SCSI core
+ * handle the timeout (BLK_EH_NOT_HANDLED).
+ *
+ * Note: This function is called from soft-IRQ context and with the request
+ * queue lock held.
+ */
+static enum blk_eh_timer_return srp_timed_out(struct scsi_cmnd *scmd)
+{
+	struct scsi_device *sdev = scmd->device;
+
+	pr_debug("timeout for sdev %s\n", dev_name(&sdev->sdev_gendev));
+	return scsi_device_blocked(sdev) ? BLK_EH_RESET_TIMER :
+		BLK_EH_NOT_HANDLED;
+}
+
 static void srp_rport_release(struct device *dev)
 {
 	struct srp_rport *rport = dev_to_rport(dev);
 
+	cancel_delayed_work_sync(&rport->reconnect_work);
+	cancel_delayed_work_sync(&rport->fast_io_fail_work);
+	cancel_delayed_work_sync(&rport->dev_loss_work);
+
 	put_device(dev->parent);
 	kfree(rport);
 }
@@ -214,12 +649,15 @@ struct srp_rport *srp_rport_add(struct Scsi_Host *shost,
 {
 	struct srp_rport *rport;
 	struct device *parent = &shost->shost_gendev;
+	struct srp_internal *i = to_srp_internal(shost->transportt);
 	int id, ret;
 
 	rport = kzalloc(sizeof(*rport), GFP_KERNEL);
 	if (!rport)
 		return ERR_PTR(-ENOMEM);
 
+	mutex_init(&rport->mutex);
+
 	device_initialize(&rport->dev);
 
 	rport->dev.parent = get_device(parent);
@@ -228,6 +666,17 @@ struct srp_rport *srp_rport_add(struct Scsi_Host *shost,
 	memcpy(rport->port_id, ids->port_id, sizeof(rport->port_id));
 	rport->roles = ids->roles;
 
+	if (i->f->reconnect)
+		rport->reconnect_delay = i->f->reconnect_delay ?
+			*i->f->reconnect_delay : 10;
+	INIT_DELAYED_WORK(&rport->reconnect_work, srp_reconnect_work);
+	rport->fast_io_fail_tmo = i->f->fast_io_fail_tmo ?
+		*i->f->fast_io_fail_tmo : 15;
+	rport->dev_loss_tmo = i->f->dev_loss_tmo ? *i->f->dev_loss_tmo : 600;
+	INIT_DELAYED_WORK(&rport->fast_io_fail_work,
+			  rport_fast_io_fail_timedout);
+	INIT_DELAYED_WORK(&rport->dev_loss_work, rport_dev_loss_timedout);
+
 	id = atomic_inc_return(&to_srp_host_attrs(shost)->next_port_id);
 	dev_set_name(&rport->dev, "port-%d:%d", shost->host_no, id);
 
@@ -277,6 +726,12 @@ void srp_rport_del(struct srp_rport *rport)
 	transport_remove_device(dev);
 	device_del(dev);
 	transport_destroy_device(dev);
+
+	mutex_lock(&rport->mutex);
+	if (rport->state == SRP_RPORT_BLOCKED)
+		__rport_fast_io_fail_timedout(rport);
+	mutex_unlock(&rport->mutex);
+
 	put_device(dev);
 }
 EXPORT_SYMBOL_GPL(srp_rport_del);
@@ -328,6 +783,8 @@ srp_attach_transport(struct srp_function_template *ft)
 	if (!i)
 		return NULL;
 
+	i->t.eh_timed_out = srp_timed_out;
+
 	i->t.tsk_mgmt_response = srp_tsk_mgmt_response;
 	i->t.it_nexus_response = srp_it_nexus_response;
 
@@ -345,6 +802,15 @@ srp_attach_transport(struct srp_function_template *ft)
 	count = 0;
 	i->rport_attrs[count++] = &dev_attr_port_id;
 	i->rport_attrs[count++] = &dev_attr_roles;
+	if (ft->has_rport_state) {
+		i->rport_attrs[count++] = &dev_attr_state;
+		i->rport_attrs[count++] = &dev_attr_fast_io_fail_tmo;
+		i->rport_attrs[count++] = &dev_attr_dev_loss_tmo;
+	}
+	if (ft->reconnect) {
+		i->rport_attrs[count++] = &dev_attr_reconnect_delay;
+		i->rport_attrs[count++] = &dev_attr_failed_reconnects;
+	}
 	if (ft->rport_delete)
 		i->rport_attrs[count++] = &dev_attr_delete;
 	i->rport_attrs[count++] = NULL;
diff --git a/include/scsi/scsi_transport_srp.h b/include/scsi/scsi_transport_srp.h
index 5a2d2d1..fbcc985 100644
--- a/include/scsi/scsi_transport_srp.h
+++ b/include/scsi/scsi_transport_srp.h
@@ -13,6 +13,26 @@ struct srp_rport_identifiers {
 	u8 roles;
 };
 
+/**
+ * enum srp_rport_state - SRP transport layer state
+ * @SRP_RPORT_RUNNING:   Transport layer operational.
+ * @SRP_RPORT_BLOCKED:   Transport layer not operational; fast I/O fail timer
+ *                       is running and I/O has been blocked.
+ * @SRP_RPORT_FAIL_FAST: Fast I/O fail timer has expired; fail I/O fast.
+ * @SRP_RPORT_LOST:      Device loss timer has expired; port is being removed.
+ */
+enum srp_rport_state {
+	SRP_RPORT_RUNNING,
+	SRP_RPORT_BLOCKED,
+	SRP_RPORT_FAIL_FAST,
+	SRP_RPORT_LOST,
+};
+
+/**
+ * struct srp_rport
+ * @mutex:   Protects against concurrent rport fast_io_fail / dev_loss_tmo /
+ *   reconnect activity.
+ */
 struct srp_rport {
 	/* for initiator and target drivers */
 
@@ -23,11 +43,27 @@ struct srp_rport {
 
 	/* for initiator drivers */
 
-	void *lld_data;	/* LLD private data */
+	void			*lld_data;	/* LLD private data */
+
+	struct mutex		mutex;
+	enum srp_rport_state	state;
+	int			reconnect_delay;
+	int			failed_reconnects;
+	struct delayed_work	reconnect_work;
+	int			fast_io_fail_tmo;
+	int			dev_loss_tmo;
+	struct delayed_work	fast_io_fail_work;
+	struct delayed_work	dev_loss_work;
 };
 
 struct srp_function_template {
 	/* for initiator drivers */
+	bool has_rport_state;
+	int *reconnect_delay;
+	int *fast_io_fail_tmo;
+	int *dev_loss_tmo;
+	int (*reconnect)(struct srp_rport *rport);
+	void (*terminate_rport_io)(struct srp_rport *rport);
 	void (*rport_delete)(struct srp_rport *rport);
 	/* for target drivers */
 	int (* tsk_mgmt_response)(struct Scsi_Host *, u64, u64, int);
@@ -43,7 +79,29 @@ extern void srp_rport_put(struct srp_rport *rport);
 extern struct srp_rport *srp_rport_add(struct Scsi_Host *,
 				       struct srp_rport_identifiers *);
 extern void srp_rport_del(struct srp_rport *);
-
+extern int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo);
+extern int srp_reconnect_rport(struct srp_rport *rport);
+extern void srp_start_tl_fail_timers(struct srp_rport *rport);
 extern void srp_remove_host(struct Scsi_Host *);
 
+/**
+ * srp_chkready() - evaluate the transport layer state before I/O
+ *
+ * Returns a SCSI result code that can be returned by the LLD. The role of
+ * this function is similar to that of fc_remote_port_chkready().
+ */
+static inline int srp_chkready(struct srp_rport *rport)
+{
+	switch (rport->state) {
+	case SRP_RPORT_RUNNING:
+	case SRP_RPORT_BLOCKED:
+	default:
+		return 0;
+	case SRP_RPORT_FAIL_FAST:
+		return DID_TRANSPORT_FAILFAST << 16;
+	case SRP_RPORT_LOST:
+		return DID_NO_CONNECT << 16;
+	}
+}
+
 #endif
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 08/13] IB/srp: Add srp_terminate_io()
       [not found] ` <51D41C03.4020607-HInyCGIudOg@public.gmane.org>
                     ` (5 preceding siblings ...)
  2013-07-03 12:54   ` [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling Bart Van Assche
@ 2013-07-03 12:55   ` Bart Van Assche
       [not found]     ` <51D41F52.4000409-HInyCGIudOg@public.gmane.org>
  2013-07-03 12:56   ` [PATCH v3 09/13] IB/srp: Use SRP transport layer error recovery Bart Van Assche
                     ` (4 subsequent siblings)
  11 siblings, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2013-07-03 12:55 UTC (permalink / raw)
  To: Roland Dreier
  Cc: David Dillow, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma

Finish all outstanding I/O requests after fast_io_fail_tmo expired,
which speeds up failover in a multipath setup. This patch is a
reworked version of a patch from Sebastian Riemer.

Reported-by: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |   22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index f65701d..8ba4e9c 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -686,17 +686,29 @@ static void srp_free_req(struct srp_target_port *target,
 	spin_unlock_irqrestore(&target->lock, flags);
 }
 
-static void srp_reset_req(struct srp_target_port *target, struct srp_request *req)
+static void srp_finish_req(struct srp_target_port *target,
+			   struct srp_request *req, int result)
 {
 	struct scsi_cmnd *scmnd = srp_claim_req(target, req, NULL);
 
 	if (scmnd) {
 		srp_free_req(target, req, scmnd, 0);
-		scmnd->result = DID_RESET << 16;
+		scmnd->result = result;
 		scmnd->scsi_done(scmnd);
 	}
 }
 
+static void srp_terminate_io(struct srp_rport *rport)
+{
+	struct srp_target_port *target = rport->lld_data;
+	int i;
+
+	for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) {
+		struct srp_request *req = &target->req_ring[i];
+		srp_finish_req(target, req, DID_TRANSPORT_FAILFAST << 16);
+	}
+}
+
 static int srp_reconnect_target(struct srp_target_port *target)
 {
 	struct Scsi_Host *shost = target->scsi_host;
@@ -723,8 +735,7 @@ static int srp_reconnect_target(struct srp_target_port *target)
 
 	for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) {
 		struct srp_request *req = &target->req_ring[i];
-		if (req->scmnd)
-			srp_reset_req(target, req);
+		srp_finish_req(target, req, DID_RESET << 16);
 	}
 
 	INIT_LIST_HEAD(&target->free_tx);
@@ -1782,7 +1793,7 @@ static int srp_reset_device(struct scsi_cmnd *scmnd)
 	for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) {
 		struct srp_request *req = &target->req_ring[i];
 		if (req->scmnd && req->scmnd->device == scmnd->device)
-			srp_reset_req(target, req);
+			srp_finish_req(target, req, DID_RESET << 16);
 	}
 
 	return SUCCESS;
@@ -2594,6 +2605,7 @@ static void srp_remove_one(struct ib_device *device)
 
 static struct srp_function_template ib_srp_transport_functions = {
 	.rport_delete		 = srp_rport_delete,
+	.terminate_rport_io	 = srp_terminate_io,
 };
 
 static int __init srp_init_module(void)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 09/13] IB/srp: Use SRP transport layer error recovery
       [not found] ` <51D41C03.4020607-HInyCGIudOg@public.gmane.org>
                     ` (6 preceding siblings ...)
  2013-07-03 12:55   ` [PATCH v3 08/13] IB/srp: Add srp_terminate_io() Bart Van Assche
@ 2013-07-03 12:56   ` Bart Van Assche
  2013-07-03 12:57   ` [PATCH v3 10/13] IB/srp: Start timers if a transport layer error occurs Bart Van Assche
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2013-07-03 12:56 UTC (permalink / raw)
  To: Roland Dreier
  Cc: David Dillow, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma

Enable reconnect_delay, fast_io_fail_tmo and dev_loss_tmo
functionality for the IB SRP initiator. Add kernel module
parameters that allow to specify default values for these
three parameters.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |  123 +++++++++++++++++++++++++----------
 drivers/infiniband/ulp/srp/ib_srp.h |    1 -
 2 files changed, 88 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 8ba4e9c..0f69ae1 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -86,6 +86,31 @@ module_param(topspin_workarounds, int, 0444);
 MODULE_PARM_DESC(topspin_workarounds,
 		 "Enable workarounds for Topspin/Cisco SRP target bugs if != 0");
 
+static int srp_reconnect_delay = 10;
+module_param_named(reconnect_delay, srp_reconnect_delay, int, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(reconnect_delay, "Time between successive reconnect attempts");
+
+static struct kernel_param_ops srp_tmo_ops;
+
+static int srp_fast_io_fail_tmo = 15;
+module_param_cb(fast_io_fail_tmo, &srp_tmo_ops, &srp_fast_io_fail_tmo,
+		S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(fast_io_fail_tmo,
+		 "Number of seconds between the observation of a transport"
+		 " layer error and failing all I/O. \"off\" means that this"
+		 " functionality is disabled.");
+
+static int srp_dev_loss_tmo = 600;
+module_param_cb(dev_loss_tmo, &srp_tmo_ops, &srp_dev_loss_tmo,
+		S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(dev_loss_tmo,
+		 "Maximum number of seconds that the SRP transport should"
+		 " insulate transport layer errors. After this time has been"
+		 " exceeded the SCSI target is removed. Should be"
+		 " between 1 and " __stringify(SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
+		 " if fast_io_fail_tmo has not been set. \"off\" means that"
+		 " this functionality is disabled.");
+
 static void srp_add_one(struct ib_device *device);
 static void srp_remove_one(struct ib_device *device);
 static void srp_recv_completion(struct ib_cq *cq, void *target_ptr);
@@ -102,6 +127,44 @@ static struct ib_client srp_client = {
 
 static struct ib_sa_client srp_sa_client;
 
+static int srp_tmo_get(char *buffer, const struct kernel_param *kp)
+{
+	int tmo = *(int *)kp->arg;
+
+	if (tmo >= 0)
+		return sprintf(buffer, "%d", tmo);
+	else
+		return sprintf(buffer, "off");
+}
+
+static int srp_tmo_set(const char *val, const struct kernel_param *kp)
+{
+	int tmo, res;
+
+	if (strncmp(val, "off", 3) != 0) {
+		res = kstrtoint(val, 0, &tmo);
+		if (res)
+			goto out;
+	} else {
+		tmo = -1;
+	}
+	if (kp->arg == &srp_fast_io_fail_tmo)
+		res = srp_tmo_valid(tmo, srp_dev_loss_tmo);
+	else
+		res = srp_tmo_valid(srp_fast_io_fail_tmo, tmo);
+	if (res)
+		goto out;
+	*(int *)kp->arg = tmo;
+
+out:
+	return res;
+}
+
+static struct kernel_param_ops srp_tmo_ops = {
+	.get = srp_tmo_get,
+	.set = srp_tmo_set,
+};
+
 static inline struct srp_target_port *host_to_target(struct Scsi_Host *host)
 {
 	return (struct srp_target_port *) host->hostdata;
@@ -709,13 +772,20 @@ static void srp_terminate_io(struct srp_rport *rport)
 	}
 }
 
-static int srp_reconnect_target(struct srp_target_port *target)
+/*
+ * It is up to the caller to ensure that srp_rport_reconnect() calls are
+ * serialized and that no concurrent srp_queuecommand(), srp_abort(),
+ * srp_reset_device() or srp_reset_host() calls will occur while this function
+ * is in progress. One way to realize that is not to call this function
+ * directly but to call srp_reconnect_rport() instead since that last function
+ * serializes calls of this function via rport->mutex and also blocks
+ * srp_queuecommand() calls before invoking this function.
+ */
+static int srp_rport_reconnect(struct srp_rport *rport)
 {
-	struct Scsi_Host *shost = target->scsi_host;
+	struct srp_target_port *target = rport->lld_data;
 	int i, ret;
 
-	scsi_target_block(&shost->shost_gendev);
-
 	srp_disconnect_target(target);
 	/*
 	 * Now get a new local CM ID so that we avoid confusing the target in
@@ -745,28 +815,9 @@ static int srp_reconnect_target(struct srp_target_port *target)
 	if (ret == 0)
 		ret = srp_connect_target(target);
 
-	scsi_target_unblock(&shost->shost_gendev, ret == 0 ? SDEV_RUNNING :
-			    SDEV_TRANSPORT_OFFLINE);
-	target->transport_offline = !!ret;
-
-	if (ret)
-		goto err;
-
-	shost_printk(KERN_INFO, target->scsi_host, PFX "reconnect succeeded\n");
-
-	return ret;
-
-err:
-	shost_printk(KERN_ERR, target->scsi_host,
-		     PFX "reconnect failed (%d), removing target port.\n", ret);
-
-	/*
-	 * We couldn't reconnect, so kill our target port off.
-	 * However, we have to defer the real removal because we
-	 * are in the context of the SCSI error handler now, which
-	 * will deadlock if we call scsi_remove_host().
-	 */
-	srp_queue_remove_work(target);
+	if (ret == 0)
+		shost_printk(KERN_INFO, target->scsi_host,
+			     PFX "reconnect succeeded\n");
 
 	return ret;
 }
@@ -1365,10 +1416,11 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 	struct srp_cmd *cmd;
 	struct ib_device *dev;
 	unsigned long flags;
-	int len;
+	int len, result;
 
-	if (unlikely(target->transport_offline)) {
-		scmnd->result = DID_NO_CONNECT << 16;
+	result = srp_chkready(target->rport);
+	if (unlikely(result)) {
+		scmnd->result = result;
 		scmnd->scsi_done(scmnd);
 		return 0;
 	}
@@ -1766,7 +1818,7 @@ static int srp_abort(struct scsi_cmnd *scmnd)
 	if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
 			      SRP_TSK_ABORT_TASK) == 0)
 		ret = SUCCESS;
-	else if (target->transport_offline)
+	else if (target->rport->state == SRP_RPORT_LOST)
 		ret = FAST_IO_FAIL;
 	else
 		ret = FAILED;
@@ -1802,14 +1854,10 @@ static int srp_reset_device(struct scsi_cmnd *scmnd)
 static int srp_reset_host(struct scsi_cmnd *scmnd)
 {
 	struct srp_target_port *target = host_to_target(scmnd->device->host);
-	int ret = FAILED;
 
 	shost_printk(KERN_ERR, target->scsi_host, PFX "SRP reset_host called\n");
 
-	if (!srp_reconnect_target(target))
-		ret = SUCCESS;
-
-	return ret;
+	return srp_reconnect_rport(target->rport) == 0 ? SUCCESS : FAILED;
 }
 
 static int srp_slave_configure(struct scsi_device *sdev)
@@ -2604,6 +2652,11 @@ static void srp_remove_one(struct ib_device *device)
 }
 
 static struct srp_function_template ib_srp_transport_functions = {
+	.has_rport_state	 = true,
+	.reconnect_delay	 = &srp_reconnect_delay,
+	.fast_io_fail_tmo	 = &srp_fast_io_fail_tmo,
+	.dev_loss_tmo		 = &srp_dev_loss_tmo,
+	.reconnect		 = srp_rport_reconnect,
 	.rport_delete		 = srp_rport_delete,
 	.terminate_rport_io	 = srp_terminate_io,
 };
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index 1817ed5..fda82f7 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -140,7 +140,6 @@ struct srp_target_port {
 	unsigned int		cmd_sg_cnt;
 	unsigned int		indirect_size;
 	bool			allow_ext_sg;
-	bool			transport_offline;
 
 	/* Everything above this point is used in the hot path of
 	 * command processing. Try to keep them packed into cachelines.
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 10/13] IB/srp: Start timers if a transport layer error occurs
       [not found] ` <51D41C03.4020607-HInyCGIudOg@public.gmane.org>
                     ` (7 preceding siblings ...)
  2013-07-03 12:56   ` [PATCH v3 09/13] IB/srp: Use SRP transport layer error recovery Bart Van Assche
@ 2013-07-03 12:57   ` Bart Van Assche
  2013-07-03 12:58   ` [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable Bart Van Assche
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2013-07-03 12:57 UTC (permalink / raw)
  To: Roland Dreier
  Cc: David Dillow, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma

Start the reconnect timer, fast_io_fail timer and dev_loss timers
if a transport layer error occurs.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |   19 +++++++++++++++++++
 drivers/infiniband/ulp/srp/ib_srp.h |    1 +
 2 files changed, 20 insertions(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 0f69ae1..2557b7a 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -595,6 +595,7 @@ static void srp_remove_target(struct srp_target_port *target)
 	srp_disconnect_target(target);
 	ib_destroy_cm_id(target->cm_id);
 	srp_free_target_ib(target);
+	cancel_work_sync(&target->tl_err_work);
 	srp_rport_put(target->rport);
 	srp_free_req_data(target);
 	scsi_host_put(target->scsi_host);
@@ -1364,6 +1365,21 @@ static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc)
 			     PFX "Recv failed with error code %d\n", res);
 }
 
+/**
+ * srp_tl_err_work() - handle a transport layer error
+ *
+ * Note: This function may get invoked before the rport has been created,
+ * hence the target->rport test.
+ */
+static void srp_tl_err_work(struct work_struct *work)
+{
+	struct srp_target_port *target;
+
+	target = container_of(work, struct srp_target_port, tl_err_work);
+	if (target->rport)
+		srp_start_tl_fail_timers(target->rport);
+}
+
 static void srp_handle_qp_err(enum ib_wc_status wc_status,
 			      enum ib_wc_opcode wc_opcode,
 			      struct srp_target_port *target)
@@ -1373,6 +1389,7 @@ static void srp_handle_qp_err(enum ib_wc_status wc_status,
 			     PFX "failed %s status %d\n",
 			     wc_opcode & IB_WC_RECV ? "receive" : "send",
 			     wc_status);
+		queue_work(system_long_wq, &target->tl_err_work);
 	}
 	target->qp_in_error = true;
 }
@@ -1735,6 +1752,7 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
 		if (ib_send_cm_drep(cm_id, NULL, 0))
 			shost_printk(KERN_ERR, target->scsi_host,
 				     PFX "Sending CM DREP failed\n");
+		queue_work(system_long_wq, &target->tl_err_work);
 		break;
 
 	case IB_CM_TIMEWAIT_EXIT:
@@ -2379,6 +2397,7 @@ static ssize_t srp_create_target(struct device *dev,
 			     sizeof (struct srp_indirect_buf) +
 			     target->cmd_sg_cnt * sizeof (struct srp_direct_buf);
 
+	INIT_WORK(&target->tl_err_work, srp_tl_err_work);
 	INIT_WORK(&target->remove_work, srp_remove_work);
 	spin_lock_init(&target->lock);
 	INIT_LIST_HEAD(&target->free_tx);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index fda82f7..e45d9d0 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -175,6 +175,7 @@ struct srp_target_port {
 	struct srp_iu	       *rx_ring[SRP_RQ_SIZE];
 	struct srp_request	req_ring[SRP_CMD_SQ_SIZE];
 
+	struct work_struct	tl_err_work;
 	struct work_struct	remove_work;
 
 	struct list_head	list;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable
       [not found] ` <51D41C03.4020607-HInyCGIudOg@public.gmane.org>
                     ` (8 preceding siblings ...)
  2013-07-03 12:57   ` [PATCH v3 10/13] IB/srp: Start timers if a transport layer error occurs Bart Van Assche
@ 2013-07-03 12:58   ` Bart Van Assche
       [not found]     ` <51D41FFC.6070105-HInyCGIudOg@public.gmane.org>
  2013-07-03 12:59   ` [PATCH v3 12/13] IB/srp: Make transport layer retry count configurable Bart Van Assche
  2013-07-03 13:00   ` [PATCH v3 13/13] IB/srp: Bump driver version and release date Bart Van Assche
  11 siblings, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2013-07-03 12:58 UTC (permalink / raw)
  To: Roland Dreier
  Cc: David Dillow, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma

Several InfiniBand HCA's allow to configure the completion vector
per queue pair. This allows to spread the workload created by IB
completion interrupts over multiple MSI-X vectors and hence over
multiple CPU cores. In other words, configuring the completion
vector properly not only allows to reduce latency on an initiator
connected to multiple SRP targets but also allows to improve
throughput.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 Documentation/ABI/stable/sysfs-driver-ib_srp |    7 +++++++
 drivers/infiniband/ulp/srp/ib_srp.c          |   26 ++++++++++++++++++++++++--
 drivers/infiniband/ulp/srp/ib_srp.h          |    1 +
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp b/Documentation/ABI/stable/sysfs-driver-ib_srp
index 481aae9..5c53d28 100644
--- a/Documentation/ABI/stable/sysfs-driver-ib_srp
+++ b/Documentation/ABI/stable/sysfs-driver-ib_srp
@@ -54,6 +54,13 @@ Description:	Interface for making ib_srp connect to a new target.
 		  ib_srp. Specifying a value that exceeds cmd_sg_entries is
 		  only safe with partial memory descriptor list support enabled
 		  (allow_ext_sg=1).
+		* comp_vector, a number in the range 0..n-1 specifying the
+		  MSI-X completion vector. Some HCA's allocate multiple (n)
+		  MSI-X vectors per HCA port. If the IRQ affinity masks of
+		  these interrupts have been configured such that each MSI-X
+		  interrupt is handled by a different CPU then the comp_vector
+		  parameter can be used to spread the SRP completion workload
+		  over multiple CPU's.
 
 What:		/sys/class/infiniband_srp/srp-<hca>-<port_number>/ibdev
 Date:		January 2, 2006
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 2557b7a..6c164f6 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -294,14 +294,16 @@ static int srp_create_target_ib(struct srp_target_port *target)
 		return -ENOMEM;
 
 	recv_cq = ib_create_cq(target->srp_host->srp_dev->dev,
-			       srp_recv_completion, NULL, target, SRP_RQ_SIZE, 0);
+			       srp_recv_completion, NULL, target, SRP_RQ_SIZE,
+			       target->comp_vector);
 	if (IS_ERR(recv_cq)) {
 		ret = PTR_ERR(recv_cq);
 		goto err;
 	}
 
 	send_cq = ib_create_cq(target->srp_host->srp_dev->dev,
-			       srp_send_completion, NULL, target, SRP_SQ_SIZE, 0);
+			       srp_send_completion, NULL, target, SRP_SQ_SIZE,
+			       target->comp_vector);
 	if (IS_ERR(send_cq)) {
 		ret = PTR_ERR(send_cq);
 		goto err_recv_cq;
@@ -1976,6 +1978,14 @@ static ssize_t show_local_ib_device(struct device *dev,
 	return sprintf(buf, "%s\n", target->srp_host->srp_dev->dev->name);
 }
 
+static ssize_t show_comp_vector(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	struct srp_target_port *target = host_to_target(class_to_shost(dev));
+
+	return sprintf(buf, "%d\n", target->comp_vector);
+}
+
 static ssize_t show_cmd_sg_entries(struct device *dev,
 				   struct device_attribute *attr, char *buf)
 {
@@ -2002,6 +2012,7 @@ static DEVICE_ATTR(req_lim,         S_IRUGO, show_req_lim,         NULL);
 static DEVICE_ATTR(zero_req_lim,    S_IRUGO, show_zero_req_lim,	   NULL);
 static DEVICE_ATTR(local_ib_port,   S_IRUGO, show_local_ib_port,   NULL);
 static DEVICE_ATTR(local_ib_device, S_IRUGO, show_local_ib_device, NULL);
+static DEVICE_ATTR(comp_vector,     S_IRUGO, show_comp_vector,     NULL);
 static DEVICE_ATTR(cmd_sg_entries,  S_IRUGO, show_cmd_sg_entries,  NULL);
 static DEVICE_ATTR(allow_ext_sg,    S_IRUGO, show_allow_ext_sg,    NULL);
 
@@ -2016,6 +2027,7 @@ static struct device_attribute *srp_host_attrs[] = {
 	&dev_attr_zero_req_lim,
 	&dev_attr_local_ib_port,
 	&dev_attr_local_ib_device,
+	&dev_attr_comp_vector,
 	&dev_attr_cmd_sg_entries,
 	&dev_attr_allow_ext_sg,
 	NULL
@@ -2140,6 +2152,7 @@ enum {
 	SRP_OPT_CMD_SG_ENTRIES	= 1 << 9,
 	SRP_OPT_ALLOW_EXT_SG	= 1 << 10,
 	SRP_OPT_SG_TABLESIZE	= 1 << 11,
+	SRP_OPT_COMP_VECTOR	= 1 << 12,
 	SRP_OPT_ALL		= (SRP_OPT_ID_EXT	|
 				   SRP_OPT_IOC_GUID	|
 				   SRP_OPT_DGID		|
@@ -2160,6 +2173,7 @@ static const match_table_t srp_opt_tokens = {
 	{ SRP_OPT_CMD_SG_ENTRIES,	"cmd_sg_entries=%u"	},
 	{ SRP_OPT_ALLOW_EXT_SG,		"allow_ext_sg=%u"	},
 	{ SRP_OPT_SG_TABLESIZE,		"sg_tablesize=%u"	},
+	{ SRP_OPT_COMP_VECTOR,		"comp_vector=%u"	},
 	{ SRP_OPT_ERR,			NULL 			}
 };
 
@@ -2315,6 +2329,14 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target)
 			target->sg_tablesize = token;
 			break;
 
+		case SRP_OPT_COMP_VECTOR:
+			if (match_int(args, &token) || token < 0) {
+				pr_warn("bad comp_vector parameter '%s'\n", p);
+				goto out;
+			}
+			target->comp_vector = token;
+			break;
+
 		default:
 			pr_warn("unknown parameter or missing value '%s' in target creation request\n",
 				p);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index e45d9d0..cbc0b14 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -156,6 +156,7 @@ struct srp_target_port {
 	char			target_name[32];
 	unsigned int		scsi_id;
 	unsigned int		sg_tablesize;
+	int			comp_vector;
 
 	struct ib_sa_path_rec	path;
 	__be16			orig_dgid[8];
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 12/13] IB/srp: Make transport layer retry count configurable
       [not found] ` <51D41C03.4020607-HInyCGIudOg@public.gmane.org>
                     ` (9 preceding siblings ...)
  2013-07-03 12:58   ` [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable Bart Van Assche
@ 2013-07-03 12:59   ` Bart Van Assche
       [not found]     ` <51D4204E.7040301-HInyCGIudOg@public.gmane.org>
  2013-07-03 13:00   ` [PATCH v3 13/13] IB/srp: Bump driver version and release date Bart Van Assche
  11 siblings, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2013-07-03 12:59 UTC (permalink / raw)
  To: Roland Dreier
  Cc: David Dillow, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma

Allow the InfiniBand RC retry count to be configured by the user
as an option in the target login string. Reducing this retry count
helps with reducing path failover time.

[bvanassche: Rewrote patch description / changed default retry count from 2 back to 7]
Signed-off-by: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 Documentation/ABI/stable/sysfs-driver-ib_srp |    2 ++
 drivers/infiniband/ulp/srp/ib_srp.c          |   24 +++++++++++++++++++++++-
 drivers/infiniband/ulp/srp/ib_srp.h          |    1 +
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp b/Documentation/ABI/stable/sysfs-driver-ib_srp
index 5c53d28..18e9b27 100644
--- a/Documentation/ABI/stable/sysfs-driver-ib_srp
+++ b/Documentation/ABI/stable/sysfs-driver-ib_srp
@@ -61,6 +61,8 @@ Description:	Interface for making ib_srp connect to a new target.
 		  interrupt is handled by a different CPU then the comp_vector
 		  parameter can be used to spread the SRP completion workload
 		  over multiple CPU's.
+		* tl_retry_count, a number in the range 2..7 specifying the
+		  IB RC retry count.
 
 What:		/sys/class/infiniband_srp/srp-<hca>-<port_number>/ibdev
 Date:		January 2, 2006
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 6c164f6..91b2d04 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -453,7 +453,7 @@ static int srp_send_req(struct srp_target_port *target)
 	req->param.responder_resources	      = 4;
 	req->param.remote_cm_response_timeout = 20;
 	req->param.local_cm_response_timeout  = 20;
-	req->param.retry_count 		      = 7;
+	req->param.retry_count                = target->tl_retry_count;
 	req->param.rnr_retry_count 	      = 7;
 	req->param.max_cm_retries 	      = 15;
 
@@ -1986,6 +1986,14 @@ static ssize_t show_comp_vector(struct device *dev,
 	return sprintf(buf, "%d\n", target->comp_vector);
 }
 
+static ssize_t show_tl_retry_count(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct srp_target_port *target = host_to_target(class_to_shost(dev));
+
+	return sprintf(buf, "%d\n", target->tl_retry_count);
+}
+
 static ssize_t show_cmd_sg_entries(struct device *dev,
 				   struct device_attribute *attr, char *buf)
 {
@@ -2013,6 +2021,7 @@ static DEVICE_ATTR(zero_req_lim,    S_IRUGO, show_zero_req_lim,	   NULL);
 static DEVICE_ATTR(local_ib_port,   S_IRUGO, show_local_ib_port,   NULL);
 static DEVICE_ATTR(local_ib_device, S_IRUGO, show_local_ib_device, NULL);
 static DEVICE_ATTR(comp_vector,     S_IRUGO, show_comp_vector,     NULL);
+static DEVICE_ATTR(tl_retry_count,  S_IRUGO, show_tl_retry_count,  NULL);
 static DEVICE_ATTR(cmd_sg_entries,  S_IRUGO, show_cmd_sg_entries,  NULL);
 static DEVICE_ATTR(allow_ext_sg,    S_IRUGO, show_allow_ext_sg,    NULL);
 
@@ -2028,6 +2037,7 @@ static struct device_attribute *srp_host_attrs[] = {
 	&dev_attr_local_ib_port,
 	&dev_attr_local_ib_device,
 	&dev_attr_comp_vector,
+	&dev_attr_tl_retry_count,
 	&dev_attr_cmd_sg_entries,
 	&dev_attr_allow_ext_sg,
 	NULL
@@ -2153,6 +2163,7 @@ enum {
 	SRP_OPT_ALLOW_EXT_SG	= 1 << 10,
 	SRP_OPT_SG_TABLESIZE	= 1 << 11,
 	SRP_OPT_COMP_VECTOR	= 1 << 12,
+	SRP_OPT_TL_RETRY_COUNT	= 1 << 13,
 	SRP_OPT_ALL		= (SRP_OPT_ID_EXT	|
 				   SRP_OPT_IOC_GUID	|
 				   SRP_OPT_DGID		|
@@ -2174,6 +2185,7 @@ static const match_table_t srp_opt_tokens = {
 	{ SRP_OPT_ALLOW_EXT_SG,		"allow_ext_sg=%u"	},
 	{ SRP_OPT_SG_TABLESIZE,		"sg_tablesize=%u"	},
 	{ SRP_OPT_COMP_VECTOR,		"comp_vector=%u"	},
+	{ SRP_OPT_TL_RETRY_COUNT,	"tl_retry_count=%u"	},
 	{ SRP_OPT_ERR,			NULL 			}
 };
 
@@ -2337,6 +2349,15 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target)
 			target->comp_vector = token;
 			break;
 
+		case SRP_OPT_TL_RETRY_COUNT:
+			if (match_int(args, &token) || token < 2 || token > 7) {
+				pr_warn("bad tl_retry_count parameter '%s' (must be a number between 2 and 7)\n",
+					p);
+				goto out;
+			}
+			target->tl_retry_count = token;
+			break;
+
 		default:
 			pr_warn("unknown parameter or missing value '%s' in target creation request\n",
 				p);
@@ -2391,6 +2412,7 @@ static ssize_t srp_create_target(struct device *dev,
 	target->cmd_sg_cnt	= cmd_sg_entries;
 	target->sg_tablesize	= indirect_sg_entries ? : cmd_sg_entries;
 	target->allow_ext_sg	= allow_ext_sg;
+	target->tl_retry_count	= 7;
 
 	ret = srp_parse_options(buf, target);
 	if (ret)
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index cbc0b14..446b045 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -157,6 +157,7 @@ struct srp_target_port {
 	unsigned int		scsi_id;
 	unsigned int		sg_tablesize;
 	int			comp_vector;
+	int			tl_retry_count;
 
 	struct ib_sa_path_rec	path;
 	__be16			orig_dgid[8];
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v3 13/13] IB/srp: Bump driver version and release date
       [not found] ` <51D41C03.4020607-HInyCGIudOg@public.gmane.org>
                     ` (10 preceding siblings ...)
  2013-07-03 12:59   ` [PATCH v3 12/13] IB/srp: Make transport layer retry count configurable Bart Van Assche
@ 2013-07-03 13:00   ` Bart Van Assche
  11 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2013-07-03 13:00 UTC (permalink / raw)
  To: Roland Dreier
  Cc: David Dillow, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma

Signed-off-by: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 91b2d04..fa38bc3 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -53,8 +53,8 @@
 
 #define DRV_NAME	"ib_srp"
 #define PFX		DRV_NAME ": "
-#define DRV_VERSION	"0.2"
-#define DRV_RELDATE	"November 1, 2005"
+#define DRV_VERSION	"1.0"
+#define DRV_RELDATE	"July 1, 2013"
 
 MODULE_AUTHOR("Roland Dreier");
 MODULE_DESCRIPTION("InfiniBand SCSI RDMA Protocol initiator "
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11
  2013-07-03 12:41 [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11 Bart Van Assche
  2013-07-03 12:53 ` [PATCH v3 06/13] IB/srp: Keep rport as long as the IB transport layer Bart Van Assche
       [not found] ` <51D41C03.4020607-HInyCGIudOg@public.gmane.org>
@ 2013-07-03 13:38 ` Or Gerlitz
  2013-07-03 14:38   ` Bart Van Assche
  2 siblings, 1 reply; 41+ messages in thread
From: Or Gerlitz @ 2013-07-03 13:38 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, David Dillow, Vu Pham, Sebastian Riemer,
	Jinpu Wang, linux-rdma, linux-scsi

On 03/07/2013 15:41, Bart Van Assche wrote:


[...]

Bart,

> The individual patches in this series are as follows:
> 0001-IB-srp-Fix-remove_one-crash-due-to-resource-exhausti.patch
> 0002-IB-srp-Fix-race-between-srp_queuecommand-and-srp_cla.patch
> 0003-IB-srp-Avoid-that-srp_reset_host-is-skipped-after-a-.patch
> 0004-IB-srp-Fail-I-O-fast-if-target-offline.patch
> 0005-IB-srp-Skip-host-settle-delay.patch
> 0006-IB-srp-Maintain-a-single-connection-per-I_T-nexus.patch
> 0007-IB-srp-Keep-rport-as-long-as-the-IB-transport-layer.patch
> 0008-scsi_transport_srp-Add-transport-layer-error-handlin.patch
> 0009-IB-srp-Add-srp_terminate_io.patch
> 0010-IB-srp-Use-SRP-transport-layer-error-recovery.patch
> 0011-IB-srp-Start-timers-if-a-transport-layer-error-occur.patch
> 0012-IB-srp-Fail-SCSI-commands-silently.patch
> 0013-IB-srp-Make-HCA-completion-vector-configurable.patch
> 0014-IB-srp-Make-transport-layer-retry-count-configurable.patch
> 0015-IB-srp-Bump-driver-version-and-release-date.patch

Some of these patches were already picked by Roland (SB), I would 
suggest that you
post V4 and drop the ones which were accepted.

e8ca413 IB/srp: Bump driver version and release date
4b5e5f4 IB/srp: Make HCA completion vector configurable
96fc248 IB/srp: Maintain a single connection per I_T nexus
99e1c13 IB/srp: Fail I/O fast if target offline
2742c1d IB/srp: Skip host settle delay
086f44f IB/srp: Avoid skipping srp_reset_host() after a transport error
1fe0cb8 IB/srp: Fix remove_one crash due to resource exhaustion

Also, Would help if you use the --cover-letter of git format-patch and
the resulted cover letter  (patch 0/N) as it has standard content which
you can enhance and place your additions.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 08/13] IB/srp: Add srp_terminate_io()
       [not found]     ` <51D41F52.4000409-HInyCGIudOg@public.gmane.org>
@ 2013-07-03 14:08       ` David Dillow
       [not found]         ` <1372860491.24238.0.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: David Dillow @ 2013-07-03 14:08 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma

On Wed, 2013-07-03 at 14:55 +0200, Bart Van Assche wrote:
> Finish all outstanding I/O requests after fast_io_fail_tmo expired,
> which speeds up failover in a multipath setup. This patch is a
> reworked version of a patch from Sebastian Riemer.
> 
> Reported-by: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
> Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
> Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>

I don't believe I ack'd this; I don't want the callers doing the result
shift, do it in srp_finish_req().


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 12/13] IB/srp: Make transport layer retry count configurable
       [not found]     ` <51D4204E.7040301-HInyCGIudOg@public.gmane.org>
@ 2013-07-03 14:30       ` David Dillow
  0 siblings, 0 replies; 41+ messages in thread
From: David Dillow @ 2013-07-03 14:30 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma

On Wed, 2013-07-03 at 14:59 +0200, Bart Van Assche wrote:
> Allow the InfiniBand RC retry count to be configured by the user
> as an option in the target login string. Reducing this retry count
> helps with reducing path failover time.
> 
> [bvanassche: Rewrote patch description / changed default retry count from 2 back to 7]

Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11
  2013-07-03 13:38 ` [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11 Or Gerlitz
@ 2013-07-03 14:38   ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2013-07-03 14:38 UTC (permalink / raw)
  To: Or Gerlitz
  Cc: Roland Dreier, David Dillow, Vu Pham, Sebastian Riemer,
	Jinpu Wang, linux-rdma, linux-scsi

On 07/03/13 15:38, Or Gerlitz wrote:
> Some of these patches were already picked by Roland (SB), I would
> suggest that you post V4 and drop the ones which were accepted.

One of the patches that is already in Roland's tree and that was in v1 
of this series has been split into two patches in v2 and v3 of this 
series. So I'd like to hear from Roland what he prefers himself - that I 
drop the patches that are already in his tree or that Roland updates his 
tree with the most recently posted patches.

Bart.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 08/13] IB/srp: Add srp_terminate_io()
       [not found]         ` <1372860491.24238.0.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>
@ 2013-07-03 14:45           ` Bart Van Assche
       [not found]             ` <51D43915.9000007-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2013-07-03 14:45 UTC (permalink / raw)
  To: David Dillow
  Cc: Roland Dreier, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma

On 07/03/13 16:08, David Dillow wrote:
> On Wed, 2013-07-03 at 14:55 +0200, Bart Van Assche wrote:
>> Finish all outstanding I/O requests after fast_io_fail_tmo expired,
>> which speeds up failover in a multipath setup. This patch is a
>> reworked version of a patch from Sebastian Riemer.
>>
>> Reported-by: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
>> Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
>> Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
>
> I don't believe I ack'd this; I don't want the callers doing the result
> shift, do it in srp_finish_req().

My apologies. You are correct, this patch was not yet acknowledged by you.

Regarding the shift itself: is it really that important whether the 
caller or callee performs that shift ? Having it in the caller has the 
advantage that the compiler can optimize the shift operation out because 
the number that is being shifted left is a constant. And if later on it 
would be necessary to set more fields of the SCSI result in a caller of 
srp_finish_req() then that will be possible without having to modify the 
srp_finish_req() function itself.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 08/13] IB/srp: Add srp_terminate_io()
       [not found]             ` <51D43915.9000007-HInyCGIudOg@public.gmane.org>
@ 2013-07-03 14:57               ` David Dillow
       [not found]                 ` <1372863441.24238.26.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: David Dillow @ 2013-07-03 14:57 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma

On Wed, 2013-07-03 at 16:45 +0200, Bart Van Assche wrote:
> Having it in the caller has the 
> advantage that the compiler can optimize the shift operation out because 
> the number that is being shifted left is a constant.

srp_finish_req() is likely to be inlined, so the compiler will be able
to make this optimization. Regardless, this is so far in the noise that
it looses to readability. 

>  And if later on it 
> would be necessary to set more fields of the SCSI result in a caller of 
> srp_finish_req() then that will be possible without having to modify the 
> srp_finish_req() function itself.

Other than REQ_QUIET, what do you think would need to be added? I think
we can cross that bridge when we get there, as I don't think REQ_QUIET
should not be set in the LLDs.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 08/13] IB/srp: Add srp_terminate_io()
       [not found]                 ` <1372863441.24238.26.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>
@ 2013-07-03 15:13                   ` David Dillow
  0 siblings, 0 replies; 41+ messages in thread
From: David Dillow @ 2013-07-03 15:13 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma

On Wed, 2013-07-03 at 10:57 -0400, David Dillow wrote:
> On Wed, 2013-07-03 at 16:45 +0200, Bart Van Assche wrote:
> > Having it in the caller has the 
> > advantage that the compiler can optimize the shift operation out because 
> > the number that is being shifted left is a constant.
> 
> srp_finish_req() is likely to be inlined, so the compiler will be able
> to make this optimization. Regardless, this is so far in the noise that
> it looses to readability. 

Eh, just leave it alone. As much as I don't like it, it does look to be
fairly common among the LLDs and other transport code.

Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling
       [not found]     ` <51D41F13.6060203-HInyCGIudOg@public.gmane.org>
@ 2013-07-03 15:14       ` David Dillow
  2013-07-03 16:00         ` Bart Van Assche
  0 siblings, 1 reply; 41+ messages in thread
From: David Dillow @ 2013-07-03 15:14 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma,
	linux-scsi, James Bottomley

On Wed, 2013-07-03 at 14:54 +0200, Bart Van Assche wrote:
> +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
> +{
> +	return (fast_io_fail_tmo < 0 || dev_loss_tmo < 0 ||
> +		fast_io_fail_tmo < dev_loss_tmo) &&
> +		fast_io_fail_tmo <= SCSI_DEVICE_BLOCK_MAX_TIMEOUT &&
> +		dev_loss_tmo < LONG_MAX / HZ ? 0 : -EINVAL;
> +}
> +EXPORT_SYMBOL_GPL(srp_tmo_valid);

This would have been more readable:

int srp_tmo_valid(int fast_io_fail_tmp, int dev_loss_tmo)
{
	/* Fast IO fail must be off, or no greater than the max timeout */
	if (fast_io_fail_tmo > SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
		return -EINVAL;

	/* Device timeout must be off, or fit into jiffies */
	if (dev_loss_tmo >= LONG_MAX / HZ)
		return -EINVAL;

	/* Fast IO must trigger before device loss, or one of the
	 * timeouts must be disabled.
	 */
	if (fast_io_fail_tmo < 0 || dev_loss_tmo < 0)
		return 0;
	if (fast_io_fail < dev_loss_tmo)
		return 0;

	return -EINVAL;	 
}

Though, now that I've unpacked it -- I don't think it is OK for
dev_loss_tmo to be off, but fast IO to be on? That drops another
conditional.

Also, FC caps dev_loss_tmo at SCSI_DEVICE_BLOCK_MAX_TIMEOUT if
fail_io_fast_tmo is off; I agree with your reasoning about leaving it
unlimited if fast fail is on, but does that still hold if it is off?



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling
  2013-07-03 15:14       ` David Dillow
@ 2013-07-03 16:00         ` Bart Van Assche
       [not found]           ` <51D44A86.5050000-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2013-07-03 16:00 UTC (permalink / raw)
  To: David Dillow
  Cc: Bart Van Assche, Roland Dreier, Vu Pham, Sebastian Riemer,
	Jinpu Wang, linux-rdma, linux-scsi, James Bottomley

On 07/03/13 17:14, David Dillow wrote:
> On Wed, 2013-07-03 at 14:54 +0200, Bart Van Assche wrote:
>> +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
>> +{
>> +	return (fast_io_fail_tmo < 0 || dev_loss_tmo < 0 ||
>> +		fast_io_fail_tmo < dev_loss_tmo) &&
>> +		fast_io_fail_tmo <= SCSI_DEVICE_BLOCK_MAX_TIMEOUT &&
>> +		dev_loss_tmo < LONG_MAX / HZ ? 0 : -EINVAL;
>> +}
>> +EXPORT_SYMBOL_GPL(srp_tmo_valid);
>
> This would have been more readable:
>
> int srp_tmo_valid(int fast_io_fail_tmp, int dev_loss_tmo)
> {
> 	/* Fast IO fail must be off, or no greater than the max timeout */
> 	if (fast_io_fail_tmo > SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
> 		return -EINVAL;
>
> 	/* Device timeout must be off, or fit into jiffies */
> 	if (dev_loss_tmo >= LONG_MAX / HZ)
> 		return -EINVAL;
>
> 	/* Fast IO must trigger before device loss, or one of the
> 	 * timeouts must be disabled.
> 	 */
> 	if (fast_io_fail_tmo < 0 || dev_loss_tmo < 0)
> 		return 0;
> 	if (fast_io_fail < dev_loss_tmo)
> 		return 0;
>
> 	return -EINVAL;	
> }

Isn't that a matter of personal taste which of the above two is more 
clear ? It might also depend on the number of mathematics courses in 
someones educational background :-)

> Though, now that I've unpacked it -- I don't think it is OK for
> dev_loss_tmo to be off, but fast IO to be on? That drops another
> conditional.

The combination of dev_loss_tmo off and reconnect_delay > 0 worked fine 
in my tests. An I/O failure was detected shortly after the cable to the 
target was pulled. I/O resumed shortly after the cable to the target was 
reinserted.

> Also, FC caps dev_loss_tmo at SCSI_DEVICE_BLOCK_MAX_TIMEOUT if
> fail_io_fast_tmo is off; I agree with your reasoning about leaving it
> unlimited if fast fail is on, but does that still hold if it is off?

I think setting dev_loss_tmo to a large value only makes sense if the 
value of reconnect_delay is not too large. Setting both to a large value 
would result in slow recovery after a transport layer failure has been 
corrected.

Bart.


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling
       [not found]           ` <51D44A86.5050000-HInyCGIudOg@public.gmane.org>
@ 2013-07-03 17:27             ` David Dillow
       [not found]               ` <1372872474.24238.43.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: David Dillow @ 2013-07-03 17:27 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma,
	linux-scsi, James Bottomley

On Wed, 2013-07-03 at 18:00 +0200, Bart Van Assche wrote:
> On 07/03/13 17:14, David Dillow wrote:
> > On Wed, 2013-07-03 at 14:54 +0200, Bart Van Assche wrote:
> >> +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
> >> +{
> >> +	return (fast_io_fail_tmo < 0 || dev_loss_tmo < 0 ||
> >> +		fast_io_fail_tmo < dev_loss_tmo) &&
> >> +		fast_io_fail_tmo <= SCSI_DEVICE_BLOCK_MAX_TIMEOUT &&
> >> +		dev_loss_tmo < LONG_MAX / HZ ? 0 : -EINVAL;
> >> +}
> >> +EXPORT_SYMBOL_GPL(srp_tmo_valid);
> >
> > This would have been more readable:
> >
> > int srp_tmo_valid(int fast_io_fail_tmp, int dev_loss_tmo)
> > {
> > 	/* Fast IO fail must be off, or no greater than the max timeout */
> > 	if (fast_io_fail_tmo > SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
> > 		return -EINVAL;
> >
> > 	/* Device timeout must be off, or fit into jiffies */
> > 	if (dev_loss_tmo >= LONG_MAX / HZ)
> > 		return -EINVAL;
> >
> > 	/* Fast IO must trigger before device loss, or one of the
> > 	 * timeouts must be disabled.
> > 	 */
> > 	if (fast_io_fail_tmo < 0 || dev_loss_tmo < 0)
> > 		return 0;
> > 	if (fast_io_fail < dev_loss_tmo)
> > 		return 0;
> >
> > 	return -EINVAL;	
> > }
> 
> Isn't that a matter of personal taste which of the above two is more 
> clear ?

No, it is quite common in Linux for complicated conditionals to be
broken up into helper functions, and Vu found logic bugs in previous
iterations. After unpacking it, I still found behavior that is
questionable. All of this strongly points to that block being too dense
for its own good.

> It might also depend on the number of mathematics courses in 
> someones educational background :-)

Or the number of logic courses, or their experience with Lisp. :)

> > Though, now that I've unpacked it -- I don't think it is OK for
> > dev_loss_tmo to be off, but fast IO to be on? That drops another
> > conditional.
> 
> The combination of dev_loss_tmo off and reconnect_delay > 0 worked fine 
> in my tests. An I/O failure was detected shortly after the cable to the 
> target was pulled. I/O resumed shortly after the cable to the target was 
> reinserted.

Perhaps I don't understand your answer -- I'm asking about dev_loss_tmo
< 0, and fast_io_fail_tmo >= 0. The other transports do not allow this
scenario, and I'm asking if it makes sense for SRP to allow it.

But now that you mention reconnect_delay, what is the meaning of that
when it is negative? That's not in the documentation. And should it be
considered in srp_tmo_valid() -- are there values of reconnect_delay
that cause problems?

I'm starting to get a bit concerned about this patch -- can you, Vu, and
Sebastian comment on the testing you have done?

> > Also, FC caps dev_loss_tmo at SCSI_DEVICE_BLOCK_MAX_TIMEOUT if
> > fail_io_fast_tmo is off; I agree with your reasoning about leaving it
> > unlimited if fast fail is on, but does that still hold if it is off?
> 
> I think setting dev_loss_tmo to a large value only makes sense if the 
> value of reconnect_delay is not too large. Setting both to a large value 
> would result in slow recovery after a transport layer failure has been 
> corrected.

So you agree it should be capped? I can't tell from your response.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling
       [not found]               ` <1372872474.24238.43.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>
@ 2013-07-03 18:24                 ` Bart Van Assche
  2013-07-03 18:57                   ` David Dillow
  2013-07-08 17:26                 ` Vu Pham
  1 sibling, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2013-07-03 18:24 UTC (permalink / raw)
  To: David Dillow
  Cc: Roland Dreier, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma,
	linux-scsi, James Bottomley

On 07/03/13 19:27, David Dillow wrote:
> On Wed, 2013-07-03 at 18:00 +0200, Bart Van Assche wrote:
>> The combination of dev_loss_tmo off and reconnect_delay > 0 worked fine
>> in my tests. An I/O failure was detected shortly after the cable to the
>> target was pulled. I/O resumed shortly after the cable to the target was
>> reinserted.
>
> Perhaps I don't understand your answer -- I'm asking about dev_loss_tmo
> < 0, and fast_io_fail_tmo >= 0. The other transports do not allow this
> scenario, and I'm asking if it makes sense for SRP to allow it.
>
> But now that you mention reconnect_delay, what is the meaning of that
> when it is negative? That's not in the documentation. And should it be
> considered in srp_tmo_valid() -- are there values of reconnect_delay
> that cause problems?

None of the combinations that can be configured from user space can 
bring the kernel in trouble. If reconnect_delay <= 0 that means that the 
time-based reconnect mechanism is disabled.

> I'm starting to get a bit concerned about this patch -- can you, Vu, and
> Sebastian comment on the testing you have done?

All combinations of reconnect_delay, fast_io_fail_tmo and dev_loss_tmo 
that result in different behavior have been tested.

>>> Also, FC caps dev_loss_tmo at SCSI_DEVICE_BLOCK_MAX_TIMEOUT if
>>> fail_io_fast_tmo is off; I agree with your reasoning about leaving it
>>> unlimited if fast fail is on, but does that still hold if it is off?
>>
>> I think setting dev_loss_tmo to a large value only makes sense if the
>> value of reconnect_delay is not too large. Setting both to a large value
>> would result in slow recovery after a transport layer failure has been
>> corrected.
>
> So you agree it should be capped? I can't tell from your response.

Not all combinations of reconnect_delay / fail_io_fast_tmo / 
dev_loss_tmo result in useful behavior. It is up to the user to choose a 
meaningful combination.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling
  2013-07-03 18:24                 ` Bart Van Assche
@ 2013-07-03 18:57                   ` David Dillow
       [not found]                     ` <1372877861.24238.64.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: David Dillow @ 2013-07-03 18:57 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma,
	linux-scsi, James Bottomley

On Wed, 2013-07-03 at 20:24 +0200, Bart Van Assche wrote:
> On 07/03/13 19:27, David Dillow wrote:
> > On Wed, 2013-07-03 at 18:00 +0200, Bart Van Assche wrote:
> >> The combination of dev_loss_tmo off and reconnect_delay > 0 worked fine
> >> in my tests. An I/O failure was detected shortly after the cable to the
> >> target was pulled. I/O resumed shortly after the cable to the target was
> >> reinserted.
> >
> > Perhaps I don't understand your answer -- I'm asking about dev_loss_tmo
> > < 0, and fast_io_fail_tmo >= 0. The other transports do not allow this
> > scenario, and I'm asking if it makes sense for SRP to allow it.
> >
> > But now that you mention reconnect_delay, what is the meaning of that
> > when it is negative? That's not in the documentation. And should it be
> > considered in srp_tmo_valid() -- are there values of reconnect_delay
> > that cause problems?
> 
> None of the combinations that can be configured from user space can 
> bring the kernel in trouble. If reconnect_delay <= 0 that means that the 
> time-based reconnect mechanism is disabled.

Then it should use the same semantics as the other attributes, and have
the user store "off" to turn it off.

And I'm getting the strong sense that the answer to my question about
fast_io_fail_tmo >= 0 when dev_loss_tmo is that we should not allow that
combination, even if it doesn't break the kernel. If it doesn't make
sense, there is no reason to create an opportunity for user confusion.

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling
       [not found]                     ` <1372877861.24238.64.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>
@ 2013-07-03 23:41                       ` Vu Pham
  2013-07-04  8:01                       ` Bart Van Assche
  1 sibling, 0 replies; 41+ messages in thread
From: Vu Pham @ 2013-07-03 23:41 UTC (permalink / raw)
  To: David Dillow
  Cc: Bart Van Assche, Roland Dreier, Sebastian Riemer, Jinpu Wang,
	linux-rdma, linux-scsi, James Bottomley

David Dillow wrote:
> On Wed, 2013-07-03 at 20:24 +0200, Bart Van Assche wrote:
>   
>> On 07/03/13 19:27, David Dillow wrote:
>>     
>>> On Wed, 2013-07-03 at 18:00 +0200, Bart Van Assche wrote:
>>>       
>>>> The combination of dev_loss_tmo off and reconnect_delay > 0 worked fine
>>>> in my tests. An I/O failure was detected shortly after the cable to the
>>>> target was pulled. I/O resumed shortly after the cable to the target was
>>>> reinserted.
>>>>         
>>> Perhaps I don't understand your answer -- I'm asking about dev_loss_tmo
>>> < 0, and fast_io_fail_tmo >= 0. The other transports do not allow this
>>> scenario, and I'm asking if it makes sense for SRP to allow it.
>>>
>>> But now that you mention reconnect_delay, what is the meaning of that
>>> when it is negative? That's not in the documentation. And should it be
>>> considered in srp_tmo_valid() -- are there values of reconnect_delay
>>> that cause problems?
>>>       
>> None of the combinations that can be configured from user space can 
>> bring the kernel in trouble. If reconnect_delay <= 0 that means that the 
>> time-based reconnect mechanism is disabled.
>>     
>
> Then it should use the same semantics as the other attributes, and have
> the user store "off" to turn it off.
>
> And I'm getting the strong sense that the answer to my question about
> fast_io_fail_tmo >= 0 when dev_loss_tmo is that we should not allow that
> combination, even if it doesn't break the kernel. If it doesn't make
> sense, there is no reason to create an opportunity for user confusion.
>   
Hello Dave,

when dev_loss_tmo expired, srp not only removes the rport but also 
removes the associated scsi_host.
One may wish to set fast_io_fail_tmo >=0 for I/Os to fail-over fast to 
other paths, and dev_loss_tmo off to keep the scsi_host around until the 
target coming back.

-vu
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling
       [not found]                     ` <1372877861.24238.64.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>
  2013-07-03 23:41                       ` Vu Pham
@ 2013-07-04  8:01                       ` Bart Van Assche
       [not found]                         ` <51D52BD7.1090506-HInyCGIudOg@public.gmane.org>
  1 sibling, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2013-07-04  8:01 UTC (permalink / raw)
  To: David Dillow
  Cc: Roland Dreier, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma,
	linux-scsi, James Bottomley

On 07/03/13 20:57, David Dillow wrote:
> And I'm getting the strong sense that the answer to my question about
> fast_io_fail_tmo >= 0 when dev_loss_tmo is that we should not allow that
> combination, even if it doesn't break the kernel. If it doesn't make
> sense, there is no reason to create an opportunity for user confusion.

Let's take a step back. I think we agree that the only combinations of 
timeout parameters that are useful are those combinations that guarantee 
that SCSI commands will be finished in a reasonable time and also that 
allow multipath to detect failed paths. The first requirement comes down 
to limiting the value fast_io_fail_tmo can be set to. The second 
requirement means that either reconnect_delay or fast_io_fail_tmo must 
be set (please note that a reconnect failure changes the state of all 
involved SCSI devices into SDEV_TRANSPORT_OFFLINE). So how about 
modifying srp_tmo_valid() as follows:
* Add an argument called "reconnect_delay".
* Add the following code in that function:
     if (reconnect_delay < 0 && fast_io_fail_tmo < 0 && dev_loss_tmo < 0)
         return -EINVAL;

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling
       [not found]                         ` <51D52BD7.1090506-HInyCGIudOg@public.gmane.org>
@ 2013-07-04  8:16                           ` Bart Van Assche
  2013-07-08 20:37                           ` David Dillow
  1 sibling, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2013-07-04  8:16 UTC (permalink / raw)
  Cc: David Dillow, Roland Dreier, Vu Pham, Sebastian Riemer,
	Jinpu Wang, linux-rdma, linux-scsi, James Bottomley

On 07/04/13 10:01, Bart Van Assche wrote:
> On 07/03/13 20:57, David Dillow wrote:
>> And I'm getting the strong sense that the answer to my question about
>> fast_io_fail_tmo >= 0 when dev_loss_tmo is that we should not allow that
>> combination, even if it doesn't break the kernel. If it doesn't make
>> sense, there is no reason to create an opportunity for user confusion.
>
> Let's take a step back. I think we agree that the only combinations of
> timeout parameters that are useful are those combinations that guarantee
> that SCSI commands will be finished in a reasonable time and also that
> allow multipath to detect failed paths. The first requirement comes down
> to limiting the value fast_io_fail_tmo can be set to. The second
> requirement means that either reconnect_delay or fast_io_fail_tmo must
> be set (please note that a reconnect failure changes the state of all
> involved SCSI devices into SDEV_TRANSPORT_OFFLINE). So how about
> modifying srp_tmo_valid() as follows:
> * Add an argument called "reconnect_delay".
> * Add the following code in that function:
>      if (reconnect_delay < 0 && fast_io_fail_tmo < 0 && dev_loss_tmo < 0)
>          return -EINVAL;

(replying to my own e-mail)

A small correction to what I wrote above: a reconnect failure only 
causes the SCSI device state to be changed into SDEV_TRANSPORT_OFFLINE 
if the fast_io_fail mechanism has been disabled.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling
       [not found]               ` <1372872474.24238.43.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>
  2013-07-03 18:24                 ` Bart Van Assche
@ 2013-07-08 17:26                 ` Vu Pham
       [not found]                   ` <51DAF63D.9010906-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  1 sibling, 1 reply; 41+ messages in thread
From: Vu Pham @ 2013-07-08 17:26 UTC (permalink / raw)
  To: David Dillow
  Cc: Bart Van Assche, Roland Dreier, Sebastian Riemer, Jinpu Wang,
	linux-rdma, linux-scsi, James Bottomley

[-- Attachment #1: Type: text/plain, Size: 1306 bytes --]


>   
>>> Though, now that I've unpacked it -- I don't think it is OK for
>>> dev_loss_tmo to be off, but fast IO to be on? That drops another
>>> conditional.
>>>       
>> The combination of dev_loss_tmo off and reconnect_delay > 0 worked fine 
>> in my tests. An I/O failure was detected shortly after the cable to the 
>> target was pulled. I/O resumed shortly after the cable to the target was 
>> reinserted.
>>     
>
> Perhaps I don't understand your answer -- I'm asking about dev_loss_tmo
> < 0, and fast_io_fail_tmo >= 0. The other transports do not allow this
> scenario, and I'm asking if it makes sense for SRP to allow it.
>
> But now that you mention reconnect_delay, what is the meaning of that
> when it is negative? That's not in the documentation. And should it be
> considered in srp_tmo_valid() -- are there values of reconnect_delay
> that cause problems?
>
> I'm starting to get a bit concerned about this patch -- can you, Vu, and
> Sebastian comment on the testing you have done?
>
>   
Hello Bart,

After running cable pull test on two local IB links for several hrs, 
I/Os got stuck.
Further commands "multipath -ll" or "fdisk -l" got stuck and never return
Here are the stack dump for srp-x kernel threads.
I'll run with #DEBUG to get more debug info on scsi host & rport

-vu

[-- Attachment #2: srp_threads.txt.tgz --]
[-- Type: application/x-compressed, Size: 1317 bytes --]

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling
       [not found]                   ` <51DAF63D.9010906-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-07-08 18:42                     ` Bart Van Assche
  0 siblings, 0 replies; 41+ messages in thread
From: Bart Van Assche @ 2013-07-08 18:42 UTC (permalink / raw)
  To: Vu Pham
  Cc: David Dillow, Roland Dreier, Sebastian Riemer, Jinpu Wang,
	linux-rdma, linux-scsi, James Bottomley

On 07/08/13 19:26, Vu Pham wrote:
> After running cable pull test on two local IB links for several hrs, 
> I/Os got stuck.
> Further commands "multipath -ll" or "fdisk -l" got stuck and never return
> Here are the stack dump for srp-x kernel threads.
> I'll run with #DEBUG to get more debug info on scsi host & rport

Hello Vu,

I had a quick look at the stack dump that was attached to your e-mail.
It shows that scsi_execute_req() hangs in blk_execute_rq(). It would be
appreciated if you could continue your tests with the kernel patch below
applied on top of v3 of this patch series. This patch should avoid that
a transport layer error that occurs after device removal has started can
cause the SCSI device state to change into "blocked". This patch also
causes such TL errors to fail I/O quickly (scsi_host_alloc() zero-
initializes the memory it allocates so no explicit initialization of the
"deleted" variable is necessary).

Thanks,

Bart.

diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index 1b9ebd5..1bb7c63 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -522,6 +522,12 @@ void srp_start_tl_fail_timers(struct srp_rport *rport)
 	int fast_io_fail_tmo, dev_loss_tmo, delay;
 
 	mutex_lock(&rport->mutex);
+	if (rport->deleted) {
+		srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST);
+		scsi_target_unblock(&shost->shost_gendev,
+				    SDEV_TRANSPORT_OFFLINE);
+		goto unlock;
+	}
 	delay = rport->reconnect_delay;
 	fast_io_fail_tmo = rport->fast_io_fail_tmo;
 	dev_loss_tmo = rport->dev_loss_tmo;
@@ -542,6 +548,7 @@ void srp_start_tl_fail_timers(struct srp_rport *rport)
 	if (dev_loss_tmo >= 0)
 		queue_delayed_work(system_long_wq, &rport->dev_loss_work,
 				   1UL * dev_loss_tmo * HZ);
+unlock:
 	mutex_unlock(&rport->mutex);
 }
 EXPORT_SYMBOL(srp_start_tl_fail_timers);
@@ -730,6 +737,7 @@ void srp_rport_del(struct srp_rport *rport)
 	mutex_lock(&rport->mutex);
 	if (rport->state == SRP_RPORT_BLOCKED)
 		__rport_fast_io_fail_timedout(rport);
+	rport->deleted = true;
 	mutex_unlock(&rport->mutex);
 
 	put_device(dev);
diff --git a/include/scsi/scsi_transport_srp.h b/include/scsi/scsi_transport_srp.h
index fbcc985..a4addcf 100644
--- a/include/scsi/scsi_transport_srp.h
+++ b/include/scsi/scsi_transport_srp.h
@@ -54,6 +54,7 @@ struct srp_rport {
 	int			dev_loss_tmo;
 	struct delayed_work	fast_io_fail_work;
 	struct delayed_work	dev_loss_work;
+	bool			deleted;
 };
 
 struct srp_function_template {

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling
       [not found]                         ` <51D52BD7.1090506-HInyCGIudOg@public.gmane.org>
  2013-07-04  8:16                           ` Bart Van Assche
@ 2013-07-08 20:37                           ` David Dillow
  1 sibling, 0 replies; 41+ messages in thread
From: David Dillow @ 2013-07-08 20:37 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, Vu Pham, Sebastian Riemer, Jinpu Wang, linux-rdma,
	linux-scsi, James Bottomley

On Thu, 2013-07-04 at 10:01 +0200, Bart Van Assche wrote:
> On 07/03/13 20:57, David Dillow wrote:
> > And I'm getting the strong sense that the answer to my question about
> > fast_io_fail_tmo >= 0 when dev_loss_tmo is that we should not allow that
> > combination, even if it doesn't break the kernel. If it doesn't make
> > sense, there is no reason to create an opportunity for user confusion.
> 
> Let's take a step back. I think we agree that the only combinations of 
> timeout parameters that are useful are those combinations that guarantee 
> that SCSI commands will be finished in a reasonable time and also that 
> allow multipath to detect failed paths. The first requirement comes down 
> to limiting the value fast_io_fail_tmo can be set to. The second 
> requirement means that either reconnect_delay or fast_io_fail_tmo must 
> be set (please note that a reconnect failure changes the state of all 
> involved SCSI devices into SDEV_TRANSPORT_OFFLINE). So how about 
> modifying srp_tmo_valid() as follows:
> * Add an argument called "reconnect_delay".
> * Add the following code in that function:
>      if (reconnect_delay < 0 && fast_io_fail_tmo < 0 && dev_loss_tmo < 0)
>          return -EINVAL;

I think this sounds reasonable; I need to make sure I understand the new
behaviors of the code to be sure.

I'm also concerned about Vu's bug report at this late stage; I'll be
watching for its resolution -- hopefully in time for inclusion.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable
       [not found]     ` <51D41FFC.6070105-HInyCGIudOg@public.gmane.org>
@ 2013-07-14  9:43       ` Sagi Grimberg
       [not found]         ` <51E272A4.5030707-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Sagi Grimberg @ 2013-07-14  9:43 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, David Dillow, Vu Pham, Sebastian Riemer,
	Jinpu Wang, linux-rdma

On 7/3/2013 3:58 PM, Bart Van Assche wrote:
> Several InfiniBand HCA's allow to configure the completion vector
> per queue pair. This allows to spread the workload created by IB
> completion interrupts over multiple MSI-X vectors and hence over
> multiple CPU cores. In other words, configuring the completion
> vector properly not only allows to reduce latency on an initiator
> connected to multiple SRP targets but also allows to improve
> throughput.

Hey Bart,
Just wrote a small patch to allow srp_daemon spread connection across 
HCA's completion vectors.
But re-thinking on this, is it really a good idea to give the user 
control over completion
vectors for CQs he doesn't really owns. This way the user must retrieve 
the maximum completion
vectors from the ib_device and consider this when adding a connection 
and In addition will need to set proper IRQ affinity.

Perhaps the driver can manage this on it's own without involving the 
user, take the mlx4_en driver for
example, it spreads it's CQs across HCAs completion vectors without 
involving the user. the user that
opens a socket has no influence of the underlying cq<->comp-vector 
assignment.

The only use-case I can think of is where the user will want to use only 
a subset of the completion-vectors
if the user will want to reserve some completion-vectors for native IB 
applications but I don't know
how common it is.

Other from that, I think it is always better to spread the CQs across 
HCA completion-vectors, so perhaps the driver
just assign connection CQs across comp-vecs without getting args from 
the user, but simply iterate over comp_vectors.

What do you think?

> Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
> Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
> Cc: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
> ---
>   Documentation/ABI/stable/sysfs-driver-ib_srp |    7 +++++++
>   drivers/infiniband/ulp/srp/ib_srp.c          |   26 ++++++++++++++++++++++++--
>   drivers/infiniband/ulp/srp/ib_srp.h          |    1 +
>   3 files changed, 32 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp b/Documentation/ABI/stable/sysfs-driver-ib_srp
> index 481aae9..5c53d28 100644
> --- a/Documentation/ABI/stable/sysfs-driver-ib_srp
> +++ b/Documentation/ABI/stable/sysfs-driver-ib_srp
> @@ -54,6 +54,13 @@ Description:	Interface for making ib_srp connect to a new target.
>   		  ib_srp. Specifying a value that exceeds cmd_sg_entries is
>   		  only safe with partial memory descriptor list support enabled
>   		  (allow_ext_sg=1).
> +		* comp_vector, a number in the range 0..n-1 specifying the
> +		  MSI-X completion vector. Some HCA's allocate multiple (n)
> +		  MSI-X vectors per HCA port. If the IRQ affinity masks of
> +		  these interrupts have been configured such that each MSI-X
> +		  interrupt is handled by a different CPU then the comp_vector
> +		  parameter can be used to spread the SRP completion workload
> +		  over multiple CPU's.
>   
>   What:		/sys/class/infiniband_srp/srp-<hca>-<port_number>/ibdev
>   Date:		January 2, 2006
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index 2557b7a..6c164f6 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -294,14 +294,16 @@ static int srp_create_target_ib(struct srp_target_port *target)
>   		return -ENOMEM;
>   
>   	recv_cq = ib_create_cq(target->srp_host->srp_dev->dev,
> -			       srp_recv_completion, NULL, target, SRP_RQ_SIZE, 0);
> +			       srp_recv_completion, NULL, target, SRP_RQ_SIZE,
> +			       target->comp_vector);
>   	if (IS_ERR(recv_cq)) {
>   		ret = PTR_ERR(recv_cq);
>   		goto err;
>   	}
>   
>   	send_cq = ib_create_cq(target->srp_host->srp_dev->dev,
> -			       srp_send_completion, NULL, target, SRP_SQ_SIZE, 0);
> +			       srp_send_completion, NULL, target, SRP_SQ_SIZE,
> +			       target->comp_vector);
>   	if (IS_ERR(send_cq)) {
>   		ret = PTR_ERR(send_cq);
>   		goto err_recv_cq;
> @@ -1976,6 +1978,14 @@ static ssize_t show_local_ib_device(struct device *dev,
>   	return sprintf(buf, "%s\n", target->srp_host->srp_dev->dev->name);
>   }
>   
> +static ssize_t show_comp_vector(struct device *dev,
> +				struct device_attribute *attr, char *buf)
> +{
> +	struct srp_target_port *target = host_to_target(class_to_shost(dev));
> +
> +	return sprintf(buf, "%d\n", target->comp_vector);
> +}
> +
>   static ssize_t show_cmd_sg_entries(struct device *dev,
>   				   struct device_attribute *attr, char *buf)
>   {
> @@ -2002,6 +2012,7 @@ static DEVICE_ATTR(req_lim,         S_IRUGO, show_req_lim,         NULL);
>   static DEVICE_ATTR(zero_req_lim,    S_IRUGO, show_zero_req_lim,	   NULL);
>   static DEVICE_ATTR(local_ib_port,   S_IRUGO, show_local_ib_port,   NULL);
>   static DEVICE_ATTR(local_ib_device, S_IRUGO, show_local_ib_device, NULL);
> +static DEVICE_ATTR(comp_vector,     S_IRUGO, show_comp_vector,     NULL);
>   static DEVICE_ATTR(cmd_sg_entries,  S_IRUGO, show_cmd_sg_entries,  NULL);
>   static DEVICE_ATTR(allow_ext_sg,    S_IRUGO, show_allow_ext_sg,    NULL);
>   
> @@ -2016,6 +2027,7 @@ static struct device_attribute *srp_host_attrs[] = {
>   	&dev_attr_zero_req_lim,
>   	&dev_attr_local_ib_port,
>   	&dev_attr_local_ib_device,
> +	&dev_attr_comp_vector,
>   	&dev_attr_cmd_sg_entries,
>   	&dev_attr_allow_ext_sg,
>   	NULL
> @@ -2140,6 +2152,7 @@ enum {
>   	SRP_OPT_CMD_SG_ENTRIES	= 1 << 9,
>   	SRP_OPT_ALLOW_EXT_SG	= 1 << 10,
>   	SRP_OPT_SG_TABLESIZE	= 1 << 11,
> +	SRP_OPT_COMP_VECTOR	= 1 << 12,
>   	SRP_OPT_ALL		= (SRP_OPT_ID_EXT	|
>   				   SRP_OPT_IOC_GUID	|
>   				   SRP_OPT_DGID		|
> @@ -2160,6 +2173,7 @@ static const match_table_t srp_opt_tokens = {
>   	{ SRP_OPT_CMD_SG_ENTRIES,	"cmd_sg_entries=%u"	},
>   	{ SRP_OPT_ALLOW_EXT_SG,		"allow_ext_sg=%u"	},
>   	{ SRP_OPT_SG_TABLESIZE,		"sg_tablesize=%u"	},
> +	{ SRP_OPT_COMP_VECTOR,		"comp_vector=%u"	},
>   	{ SRP_OPT_ERR,			NULL 			}
>   };
>   
> @@ -2315,6 +2329,14 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target)
>   			target->sg_tablesize = token;
>   			break;
>   
> +		case SRP_OPT_COMP_VECTOR:
> +			if (match_int(args, &token) || token < 0) {
> +				pr_warn("bad comp_vector parameter '%s'\n", p);
> +				goto out;
> +			}
> +			target->comp_vector = token;
> +			break;
> +
>   		default:
>   			pr_warn("unknown parameter or missing value '%s' in target creation request\n",
>   				p);
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
> index e45d9d0..cbc0b14 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.h
> +++ b/drivers/infiniband/ulp/srp/ib_srp.h
> @@ -156,6 +156,7 @@ struct srp_target_port {
>   	char			target_name[32];
>   	unsigned int		scsi_id;
>   	unsigned int		sg_tablesize;
> +	int			comp_vector;
>   
>   	struct ib_sa_path_rec	path;
>   	__be16			orig_dgid[8];

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable
       [not found]         ` <51E272A4.5030707-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-07-15 11:06           ` Bart Van Assche
       [not found]             ` <51E3D79D.9070808-HInyCGIudOg@public.gmane.org>
  2013-07-16 15:11           ` Bart Van Assche
  1 sibling, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2013-07-15 11:06 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Roland Dreier, David Dillow, Vu Pham, Sebastian Riemer,
	Jinpu Wang, linux-rdma

On 14/07/2013 3:43, Sagi Grimberg wrote:
> On 7/3/2013 3:58 PM, Bart Van Assche wrote:
>> Several InfiniBand HCA's allow to configure the completion vector
>> per queue pair. This allows to spread the workload created by IB
>> completion interrupts over multiple MSI-X vectors and hence over
>> multiple CPU cores. In other words, configuring the completion
>> vector properly not only allows to reduce latency on an initiator
>> connected to multiple SRP targets but also allows to improve
>> throughput.
>
> Hey Bart,
> Just wrote a small patch to allow srp_daemon spread connection across
> HCA's completion vectors.
> But re-thinking on this, is it really a good idea to give the user
> control over completion
> vectors for CQs he doesn't really owns. This way the user must retrieve
> the maximum completion
> vectors from the ib_device and consider this when adding a connection
> and In addition will need to set proper IRQ affinity.
>
> Perhaps the driver can manage this on it's own without involving the
> user, take the mlx4_en driver for
> example, it spreads it's CQs across HCAs completion vectors without
> involving the user. the user that
> opens a socket has no influence of the underlying cq<->comp-vector
> assignment.
>
> The only use-case I can think of is where the user will want to use only
> a subset of the completion-vectors
> if the user will want to reserve some completion-vectors for native IB
> applications but I don't know
> how common it is.
>
> Other from that, I think it is always better to spread the CQs across
> HCA completion-vectors, so perhaps the driver
> just assign connection CQs across comp-vecs without getting args from
> the user, but simply iterate over comp_vectors.
>
> What do you think?

Hello Sagi,

Sorry but I do not think it is a good idea to let srp_daemon assign the 
completion vector. While this might work well on single-socket systems 
this will result in suboptimal results on NUMA systems. For certain 
workloads on NUMA systems, and when a NUMA initiator system is connected 
to multiple target systems, the optimal configuration is to make sure 
that all processing that is associated with a single SCSI host occurs on 
the same NUMA node. This means configuring the completion vector value 
such that IB interrupts are generated on the same NUMA node where the 
associated SCSI host and applications are running.

More in general, performance tuning on NUMA systems requires system-wide 
knowledge of all applications that are running and also of which 
interrupt is processed by which NUMA node. So choosing a proper value 
for the completion vector is only possible once the system topology and 
the IRQ affinity masks are known. I don't think we should build 
knowledge of all this in srp_daemon.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable
       [not found]             ` <51E3D79D.9070808-HInyCGIudOg@public.gmane.org>
@ 2013-07-15 13:29               ` Sagi Grimberg
       [not found]                 ` <51E3F931.9080903-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Sagi Grimberg @ 2013-07-15 13:29 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, David Dillow, Vu Pham, Sebastian Riemer,
	Jinpu Wang, linux-rdma

On 7/15/2013 2:06 PM, Bart Van Assche wrote:
> On 14/07/2013 3:43, Sagi Grimberg wrote:
>> On 7/3/2013 3:58 PM, Bart Van Assche wrote:
>>> Several InfiniBand HCA's allow to configure the completion vector
>>> per queue pair. This allows to spread the workload created by IB
>>> completion interrupts over multiple MSI-X vectors and hence over
>>> multiple CPU cores. In other words, configuring the completion
>>> vector properly not only allows to reduce latency on an initiator
>>> connected to multiple SRP targets but also allows to improve
>>> throughput.
>>
>> Hey Bart,
>> Just wrote a small patch to allow srp_daemon spread connection across
>> HCA's completion vectors.
>> But re-thinking on this, is it really a good idea to give the user
>> control over completion
>> vectors for CQs he doesn't really owns. This way the user must retrieve
>> the maximum completion
>> vectors from the ib_device and consider this when adding a connection
>> and In addition will need to set proper IRQ affinity.
>>
>> Perhaps the driver can manage this on it's own without involving the
>> user, take the mlx4_en driver for
>> example, it spreads it's CQs across HCAs completion vectors without
>> involving the user. the user that
>> opens a socket has no influence of the underlying cq<->comp-vector
>> assignment.
>>
>> The only use-case I can think of is where the user will want to use only
>> a subset of the completion-vectors
>> if the user will want to reserve some completion-vectors for native IB
>> applications but I don't know
>> how common it is.
>>
>> Other from that, I think it is always better to spread the CQs across
>> HCA completion-vectors, so perhaps the driver
>> just assign connection CQs across comp-vecs without getting args from
>> the user, but simply iterate over comp_vectors.
>>
>> What do you think?
>
> Hello Sagi,
>
> Sorry but I do not think it is a good idea to let srp_daemon assign 
> the completion vector. While this might work well on single-socket 
> systems this will result in suboptimal results on NUMA systems. For 
> certain workloads on NUMA systems, and when a NUMA initiator system is 
> connected to multiple target systems, the optimal configuration is to 
> make sure that all processing that is associated with a single SCSI 
> host occurs on the same NUMA node. This means configuring the 
> completion vector value such that IB interrupts are generated on the 
> same NUMA node where the associated SCSI host and applications are 
> running.
>
> More in general, performance tuning on NUMA systems requires 
> system-wide knowledge of all applications that are running and also of 
> which interrupt is processed by which NUMA node. So choosing a proper 
> value for the completion vector is only possible once the system 
> topology and the IRQ affinity masks are known. I don't think we should 
> build knowledge of all this in srp_daemon.
>
> Bart.
>

Hey Bart,

Thanks for your quick attention for my question.
srp_daemon is a package designated for the costumer to automatically 
detect targets in the IB fabric. From our expeirience here in Mellanox, 
costumers/users like automatic "plug&play" tools.
They are reluctant to build their own scriptology to enhance performance 
and settle with srp_daemon which is preferred over use of ibsrpdm and 
manual adding new targets.
Regardless, the completion vectors assignment is meaningless without 
setting proper IRQ affinity, so in the worst case where the user didn't 
set his IRQ affinity,
this assignment will perform like the default completion vector 
assignment as all IRQs are directed without any masking i.e. core 0.

 From my expiriments in NUMA systems, optimal performance is gained 
where all IRQs are directed to half of the cores on the NUMA node close 
to the HCA, and all traffic generators share the other half of the cores 
on the same NUMA node. So based on that knowledge, I thought that 
srp_daemon/srp driver will assign it's CQs across the HCAs completion 
vectors, and the user is encouraged to set the IRQ affinity as described 
above to gain optimal performance.
Adding connections over the far NUMA node don't seem to benefit 
performance too much...

As I mentioned, a use-case I see that may raise a problem here, is if 
the user would like to maintain multiple SRP connections and reserve 
some completion vectors for other IB applications on the system.
in this case the user will be able to disable srp_daemon/srp driver 
completion vectors assignment.

So, this was just an idea, and easy implementation that would 
potentionaly give the user semi-automatic performance optimized 
configuration...

-Sagi
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable
       [not found]                 ` <51E3F931.9080903-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-07-15 18:23                   ` Bart Van Assche
       [not found]                     ` <51E43E22.2060502-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2013-07-15 18:23 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Roland Dreier, David Dillow, Vu Pham, Sebastian Riemer,
	Jinpu Wang, linux-rdma

On 15/07/2013 7:29, Sagi Grimberg wrote:
> srp_daemon is a package designated for the customer to automatically
> detect targets in the IB fabric. From our experience here in Mellanox,
> customers/users like automatic "plug&play" tools.
> They are reluctant to build their own scriptology to enhance performance
> and settle with srp_daemon which is preferred over use of ibsrpdm and
> manual adding new targets.
> Regardless, the completion vectors assignment is meaningless without
> setting proper IRQ affinity, so in the worst case where the user didn't
> set his IRQ affinity,
> this assignment will perform like the default completion vector
> assignment as all IRQs are directed without any masking i.e. core 0.
>
> From my experiments in NUMA systems, optimal performance is gained
> where all IRQs are directed to half of the cores on the NUMA node close
> to the HCA, and all traffic generators share the other half of the cores
> on the same NUMA node. So based on that knowledge, I thought that
> srp_daemon/srp driver will assign it's CQs across the HCAs completion
> vectors, and the user is encouraged to set the IRQ affinity as described
> above to gain optimal performance.
> Adding connections over the far NUMA node don't seem to benefit
> performance too much...
>
> As I mentioned, a use-case I see that may raise a problem here, is if
> the user would like to maintain multiple SRP connections and reserve
> some completion vectors for other IB applications on the system.
> in this case the user will be able to disable srp_daemon/srp driver
> completion vectors assignment.
>
> So, this was just an idea, and easy implementation that would
> potentially give the user semi-automatic performance optimized
> configuration...

Hello Sagi,

I agree with you that it would help a lot if completion vector 
assignment could be automated such that end users do not have to care 
about assigning completion vector numbers. The challenge is to find an 
approach that is general enough such that it works for all possible use 
cases. One possible approach is to let a tool that has knowledge about 
the application fill in completion vector numbers in srp_daemon.conf and 
let srp_daemon use the values generated by this tool. That approach 
would avoid that srp_daemon has to have any knowledge about the 
application but would still allow srp_daemon to assign the completion 
vector numbers.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable
       [not found]                     ` <51E43E22.2060502-HInyCGIudOg@public.gmane.org>
@ 2013-07-16 10:11                       ` Sagi Grimberg
       [not found]                         ` <51E51C56.50906-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Sagi Grimberg @ 2013-07-16 10:11 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, David Dillow, Vu Pham, Sebastian Riemer,
	Jinpu Wang, linux-rdma

On 7/15/2013 9:23 PM, Bart Van Assche wrote:
> On 15/07/2013 7:29, Sagi Grimberg wrote:
>> srp_daemon is a package designated for the customer to automatically
>> detect targets in the IB fabric. From our experience here in Mellanox,
>> customers/users like automatic "plug&play" tools.
>> They are reluctant to build their own scriptology to enhance performance
>> and settle with srp_daemon which is preferred over use of ibsrpdm and
>> manual adding new targets.
>> Regardless, the completion vectors assignment is meaningless without
>> setting proper IRQ affinity, so in the worst case where the user didn't
>> set his IRQ affinity,
>> this assignment will perform like the default completion vector
>> assignment as all IRQs are directed without any masking i.e. core 0.
>>
>> From my experiments in NUMA systems, optimal performance is gained
>> where all IRQs are directed to half of the cores on the NUMA node close
>> to the HCA, and all traffic generators share the other half of the cores
>> on the same NUMA node. So based on that knowledge, I thought that
>> srp_daemon/srp driver will assign it's CQs across the HCAs completion
>> vectors, and the user is encouraged to set the IRQ affinity as described
>> above to gain optimal performance.
>> Adding connections over the far NUMA node don't seem to benefit
>> performance too much...
>>
>> As I mentioned, a use-case I see that may raise a problem here, is if
>> the user would like to maintain multiple SRP connections and reserve
>> some completion vectors for other IB applications on the system.
>> in this case the user will be able to disable srp_daemon/srp driver
>> completion vectors assignment.
>>
>> So, this was just an idea, and easy implementation that would
>> potentially give the user semi-automatic performance optimized
>> configuration...
>
> Hello Sagi,
>
> I agree with you that it would help a lot if completion vector 
> assignment could be automated such that end users do not have to care 
> about assigning completion vector numbers. The challenge is to find an 
> approach that is general enough such that it works for all possible 
> use cases. One possible approach is to let a tool that has knowledge 
> about the application fill in completion vector numbers in 
> srp_daemon.conf and let srp_daemon use the values generated by this 
> tool. That approach would avoid that srp_daemon has to have any 
> knowledge about the application but would still allow srp_daemon to 
> assign the completion vector numbers.
>
> Bart.

Hey Bart,
This sounds like a nice Idea, but there an inherent problem about 
applications coming and going while the connections are static (somewhat),
how can you control pinning an arbitrary application running (over SRP 
devices of-course) at certain point of time.

So will you agree at least to give target->comp_vector a default of 
IB_CQ_VECTOR_LEAST_ATTACHED?
 From my point of view, a user that don't have a slightest clue about 
completion vectors and performance optimization, this is somewhat better 
than doing nothing...

-Sagi
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable
       [not found]                         ` <51E51C56.50906-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2013-07-16 10:58                           ` Bart Van Assche
       [not found]                             ` <51E5275F.2070009-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2013-07-16 10:58 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Roland Dreier, David Dillow, Vu Pham, Sebastian Riemer,
	Jinpu Wang, linux-rdma

On 16/07/2013 4:11, Sagi Grimberg wrote:
> This sounds like a nice Idea, but there an inherent problem about
> applications coming and going while the connections are static (somewhat),
> how can you control pinning an arbitrary application running (over SRP
> devices of-course) at certain point of time.
>
> So will you agree at least to give target->comp_vector a default of
> IB_CQ_VECTOR_LEAST_ATTACHED?
>  From my point of view, a user that don't have a slightest clue about
> completion vectors and performance optimization, this is somewhat better
> than doing nothing...

Hello Sagi,

That sounds like an interesting proposal to me. But did the patch that 
adds the IB_CQ_VECTOR_LEAST_ATTACHED feature ever get accepted in the 
upstream Linux kernel ? I have tried to find that symbol in Linux kernel 
v3.11-rc1 but couldn't find it. Maybe I have overlooked something ?

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable
       [not found]                             ` <51E5275F.2070009-HInyCGIudOg@public.gmane.org>
@ 2013-07-16 12:41                               ` Sagi Grimberg
  0 siblings, 0 replies; 41+ messages in thread
From: Sagi Grimberg @ 2013-07-16 12:41 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, David Dillow, Vu Pham, Sebastian Riemer,
	Jinpu Wang, linux-rdma

On 7/16/2013 1:58 PM, Bart Van Assche wrote:
> On 16/07/2013 4:11, Sagi Grimberg wrote:
>> This sounds like a nice Idea, but there an inherent problem about
>> applications coming and going while the connections are static 
>> (somewhat),
>> how can you control pinning an arbitrary application running (over SRP
>> devices of-course) at certain point of time.
>>
>> So will you agree at least to give target->comp_vector a default of
>> IB_CQ_VECTOR_LEAST_ATTACHED?
>>  From my point of view, a user that don't have a slightest clue about
>> completion vectors and performance optimization, this is somewhat better
>> than doing nothing...
>
> Hello Sagi,
>
> That sounds like an interesting proposal to me. But did the patch that 
> adds the IB_CQ_VECTOR_LEAST_ATTACHED feature ever get accepted in the 
> upstream Linux kernel ? I have tried to find that symbol in Linux 
> kernel v3.11-rc1 but couldn't find it. Maybe I have overlooked 
> something ?
>
> Bart.
>

Oh you're right!

I'll ask Vu, from git blame on old OFED I see that He wrote the code...
Perhaps this should be added as well.

-Sagi
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable
       [not found]         ` <51E272A4.5030707-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2013-07-15 11:06           ` Bart Van Assche
@ 2013-07-16 15:11           ` Bart Van Assche
       [not found]             ` <51E56296.2000403-HInyCGIudOg@public.gmane.org>
  1 sibling, 1 reply; 41+ messages in thread
From: Bart Van Assche @ 2013-07-16 15:11 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Roland Dreier, David Dillow, Vu Pham, Sebastian Riemer,
	Jinpu Wang, linux-rdma

On 14/07/2013 3:43, Sagi Grimberg wrote:
> Just wrote a small patch to allow srp_daemon spread connection across
> HCA's completion vectors.

Hello Sagi,

How about the following approach:
- Add support for reading the completion vector from srp_daemon.conf,
   similar to how several other parameters are already read from that
   file.
- If the completion vector parameter has not been set in
   srp_daemon.conf, let srp_daemon assign a completion vector such that
   IB interrupts for different SRP hosts use different completion
   vectors.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable
       [not found]             ` <51E56296.2000403-HInyCGIudOg@public.gmane.org>
@ 2013-07-17  9:27               ` Sagi Grimberg
  0 siblings, 0 replies; 41+ messages in thread
From: Sagi Grimberg @ 2013-07-17  9:27 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, David Dillow, Vu Pham, Sebastian Riemer,
	Jinpu Wang, linux-rdma

On 7/16/2013 6:11 PM, Bart Van Assche wrote:
> On 14/07/2013 3:43, Sagi Grimberg wrote:
>> Just wrote a small patch to allow srp_daemon spread connection across
>> HCA's completion vectors.
>
> Hello Sagi,
>
> How about the following approach:
> - Add support for reading the completion vector from srp_daemon.conf,
>   similar to how several other parameters are already read from that
>   file.

Here We need to take into consideration that we are changing the 
functionality of srp_daemon.conf.
Now instead of simply allowing/dis-allowing targets of specific 
attributes, we are also defining configuration attributes of allowed 
targets.
This might be uncomfortable for the user to explicitly write N target 
strings in srp_daemon.conf just for completion vectors assignment.

Perhaps srp_daemon.conf can contain a list (comma separated) of reserved 
completion vectors for srp_daemon to spread CQs among them.
If this line won't exist - srp_daemon will spread assignment on all HCAs 
completion vectors.

> - If the completion vector parameter has not been set in
>   srp_daemon.conf, let srp_daemon assign a completion vector such that
>   IB interrupts for different SRP hosts use different completion
>   vectors.
>
> Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2013-07-17  9:27 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-03 12:41 [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11 Bart Van Assche
2013-07-03 12:53 ` [PATCH v3 06/13] IB/srp: Keep rport as long as the IB transport layer Bart Van Assche
     [not found] ` <51D41C03.4020607-HInyCGIudOg@public.gmane.org>
2013-07-03 12:43   ` [PATCH v3 01/13] IB/srp: Fix remove_one crash due to resource exhaustion Bart Van Assche
2013-07-03 12:44   ` [PATCH v3 02/13] IB/srp: Avoid that srp_reset_host() is skipped after a TL error Bart Van Assche
2013-07-03 12:45   ` [PATCH v3 03/13] IB/srp: Fail I/O fast if target offline Bart Van Assche
2013-07-03 12:50   ` [PATCH v3 04/13] IB/srp: Skip host settle delay Bart Van Assche
2013-07-03 12:51   ` [PATCH v3 05/13] IB/srp: Maintain a single connection per I_T nexus Bart Van Assche
2013-07-03 12:54   ` [PATCH v3 07/13] scsi_transport_srp: Add transport layer error handling Bart Van Assche
     [not found]     ` <51D41F13.6060203-HInyCGIudOg@public.gmane.org>
2013-07-03 15:14       ` David Dillow
2013-07-03 16:00         ` Bart Van Assche
     [not found]           ` <51D44A86.5050000-HInyCGIudOg@public.gmane.org>
2013-07-03 17:27             ` David Dillow
     [not found]               ` <1372872474.24238.43.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>
2013-07-03 18:24                 ` Bart Van Assche
2013-07-03 18:57                   ` David Dillow
     [not found]                     ` <1372877861.24238.64.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>
2013-07-03 23:41                       ` Vu Pham
2013-07-04  8:01                       ` Bart Van Assche
     [not found]                         ` <51D52BD7.1090506-HInyCGIudOg@public.gmane.org>
2013-07-04  8:16                           ` Bart Van Assche
2013-07-08 20:37                           ` David Dillow
2013-07-08 17:26                 ` Vu Pham
     [not found]                   ` <51DAF63D.9010906-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-07-08 18:42                     ` Bart Van Assche
2013-07-03 12:55   ` [PATCH v3 08/13] IB/srp: Add srp_terminate_io() Bart Van Assche
     [not found]     ` <51D41F52.4000409-HInyCGIudOg@public.gmane.org>
2013-07-03 14:08       ` David Dillow
     [not found]         ` <1372860491.24238.0.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>
2013-07-03 14:45           ` Bart Van Assche
     [not found]             ` <51D43915.9000007-HInyCGIudOg@public.gmane.org>
2013-07-03 14:57               ` David Dillow
     [not found]                 ` <1372863441.24238.26.camel-zHLflQxYYDO4Hhoo1DtQwJ9G+ZOsUmrO@public.gmane.org>
2013-07-03 15:13                   ` David Dillow
2013-07-03 12:56   ` [PATCH v3 09/13] IB/srp: Use SRP transport layer error recovery Bart Van Assche
2013-07-03 12:57   ` [PATCH v3 10/13] IB/srp: Start timers if a transport layer error occurs Bart Van Assche
2013-07-03 12:58   ` [PATCH v3 11/13] IB/srp: Make HCA completion vector configurable Bart Van Assche
     [not found]     ` <51D41FFC.6070105-HInyCGIudOg@public.gmane.org>
2013-07-14  9:43       ` Sagi Grimberg
     [not found]         ` <51E272A4.5030707-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-07-15 11:06           ` Bart Van Assche
     [not found]             ` <51E3D79D.9070808-HInyCGIudOg@public.gmane.org>
2013-07-15 13:29               ` Sagi Grimberg
     [not found]                 ` <51E3F931.9080903-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-07-15 18:23                   ` Bart Van Assche
     [not found]                     ` <51E43E22.2060502-HInyCGIudOg@public.gmane.org>
2013-07-16 10:11                       ` Sagi Grimberg
     [not found]                         ` <51E51C56.50906-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2013-07-16 10:58                           ` Bart Van Assche
     [not found]                             ` <51E5275F.2070009-HInyCGIudOg@public.gmane.org>
2013-07-16 12:41                               ` Sagi Grimberg
2013-07-16 15:11           ` Bart Van Assche
     [not found]             ` <51E56296.2000403-HInyCGIudOg@public.gmane.org>
2013-07-17  9:27               ` Sagi Grimberg
2013-07-03 12:59   ` [PATCH v3 12/13] IB/srp: Make transport layer retry count configurable Bart Van Assche
     [not found]     ` <51D4204E.7040301-HInyCGIudOg@public.gmane.org>
2013-07-03 14:30       ` David Dillow
2013-07-03 13:00   ` [PATCH v3 13/13] IB/srp: Bump driver version and release date Bart Van Assche
2013-07-03 13:38 ` [PATCH v3 0/13] IB SRP initiator patches for kernel 3.11 Or Gerlitz
2013-07-03 14:38   ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.