All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/15] IB SRP initiator patches for kernel 3.11
@ 2013-06-28 12:45 Bart Van Assche
       [not found] ` <51CD856A.3010102-HInyCGIudOg@public.gmane.org>
                   ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: Bart Van Assche @ 2013-06-28 12:45 UTC (permalink / raw)
  To: Roland Dreier, David Dillow, Vu Pham, Sebastian Riemer,
	linux-rdma, linux-scsi

The purpose of this InfiniBand SRP initiator patch series is as follows:
- Make the SRP initiator driver better suited for use in a H.A. setup.
   Speed up failover by reducing the IB RC retry count and by notifying
   multipathd faster about transport layer failures by adding
   fast_io_fail_tmo and dev_loss_tmo parameters.
- Make the SRP initiator better suited for use on NUMA systems by
   making the HCA completion vector configurable.

Changes since the v1 of the "IB SRP initiator patches for kernel 3.11" 
patch series:
- scsi_transport_srp: Allowed both fast_io_fail and dev_loss timeouts
   to be disabled.
- scsi_transport_srp, srp_reconnect_rport(): switched from
   scsi_block_requests() to scsi_target_block() for blocking SCSI command
   processing temporarily.
- scsi_transport_srp, srp_start_tl_fail_timers(): only block SCSI device
   command processing if the fast_io_fail timer is enabled.
- Changed srp_abort() such that upon transport offline the value
   FAST_IO_FAIL is returned instead of SUCCESS.
- Fixed a race condition in the "maintain single connection" patch: a
   new login after removal had started but before removal had finished
   still could create a duplicate connection. Fixed this by deferring
   removal from the target list until removal has finished.
- Modified the error message in the same patch for reporting that a
   duplicate connection has been rejected.
- Modified patch 2/15 such that all possible race conditions with
   srp_claim_req() are addressed.
- Documented the comp_vector and tl_retry_count login string parameters.
- Updated dev_loss_tmo and fast_io_fail_tmo documentation - mentioned
   "off" is a valid choice.

Changes compared to v5 of the "Make ib_srp better suited for H.A. 
purposes" patch series:
- Left out patches that are already upstream.
- Made it possible to set dev_loss_tmo to "off". This is useful in a
   setup using initiator side mirroring to avoid that new /dev/sd* names
   are reassigned after a failover or cable pull and reinsert.
- Added kernel module parameters to ib_srp for configuring default
   values of the fast_io_fail_tmo and dev_loss_tmo parameters.
- Added a patch from Dotan Barak that fixes a kernel oops during rmmod
   triggered by resource allocation failure at module load time.
- Avoid duplicate connections by refusing relogins instead of dropping
   duplicate connections, as proposed by Sebastian Riemer.
- Added a patch from Sebastian Riemer for failing SCSI commands
   silently.
- Added a patch from Vu Pham to make the transport layer (IB RC) retry
   count configurable.
- Made HCA completion vector configurable.

Changes since v4:
- Added a patch for removing SCSI devices upon a port down event

Changes since v3:
- Restored the dev_loss_tmo and fast_io_fail_tmo sysfs attributes.
- Included a patch to fix an ib_srp crash that could be triggered by
   cable pulling.

Changes since v2:
- Addressed the v2 review comments.
- Dropped the patches that have already been merged.
- Dropped the patches for integration with multipathd.
- Dropped the micro-optimization of the IB completion handlers.

The individual patches in this series are as follows:
0001-IB-srp-Fix-remove_one-crash-due-to-resource-exhausti.patch
0002-IB-srp-Fix-race-between-srp_queuecommand-and-srp_cla.patch
0003-IB-srp-Avoid-that-srp_reset_host-is-skipped-after-a-.patch
0004-IB-srp-Fail-I-O-fast-if-target-offline.patch
0005-IB-srp-Skip-host-settle-delay.patch
0006-IB-srp-Maintain-a-single-connection-per-I_T-nexus.patch
0007-IB-srp-Keep-rport-as-long-as-the-IB-transport-layer.patch
0008-scsi_transport_srp-Add-transport-layer-error-handlin.patch
0009-IB-srp-Add-srp_terminate_io.patch
0010-IB-srp-Use-SRP-transport-layer-error-recovery.patch
0011-IB-srp-Start-timers-if-a-transport-layer-error-occur.patch
0012-IB-srp-Fail-SCSI-commands-silently.patch
0013-IB-srp-Make-HCA-completion-vector-configurable.patch
0014-IB-srp-Make-transport-layer-retry-count-configurable.patch
0015-IB-srp-Bump-driver-version-and-release-date.patch
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2 01/15] IB/srp: Fix remove_one crash due to resource exhaustion
       [not found] ` <51CD856A.3010102-HInyCGIudOg@public.gmane.org>
@ 2013-06-28 12:46   ` Bart Van Assche
  2013-06-28 12:48   ` [PATCH v2 02/15] IB/srp: Fix race between srp_queuecommand() and srp_claim_req() Bart Van Assche
                     ` (11 subsequent siblings)
  12 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2013-06-28 12:46 UTC (permalink / raw)
  To: Roland Dreier; +Cc: David Dillow, Vu Pham, Sebastian Riemer, linux-rdma

From: Dotan Barak <dotanb-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>

If the add_one callback fails during driver load no resources are
allocated so there isn't a need to release any resources. Trying
to clean the resource may lead to the following kernel panic:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa0132331>] srp_remove_one+0x31/0x240 [ib_srp]
RIP: 0010:[<ffffffffa0132331>]  [<ffffffffa0132331>] srp_remove_one+0x31/0x240 [ib_srp]
Process rmmod (pid: 4562, threadinfo ffff8800dd738000, task ffff8801167e60c0)
Call Trace:
 [<ffffffffa024500e>] ib_unregister_client+0x4e/0x120 [ib_core]
 [<ffffffffa01361bd>] srp_cleanup_module+0x15/0x71 [ib_srp]
 [<ffffffff810ac6a4>] sys_delete_module+0x194/0x260
 [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

[bvanassche: Shortened patch description]
Signed-off-by: Dotan Barak <dotanb-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
Reviewed-by: Eli Cohen <eli-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Acked-by: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 7ccf328..368d160 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -2507,6 +2507,8 @@ static void srp_remove_one(struct ib_device *device)
 	struct srp_target_port *target;
 
 	srp_dev = ib_get_client_data(device, &srp_client);
+	if (!srp_dev)
+		return;
 
 	list_for_each_entry_safe(host, tmp_host, &srp_dev->dev_list, list) {
 		device_unregister(&host->dev);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 02/15] IB/srp: Fix race between srp_queuecommand() and srp_claim_req()
       [not found] ` <51CD856A.3010102-HInyCGIudOg@public.gmane.org>
  2013-06-28 12:46   ` [PATCH v2 01/15] IB/srp: Fix remove_one crash due to resource exhaustion Bart Van Assche
@ 2013-06-28 12:48   ` Bart Van Assche
       [not found]     ` <51CD8604.5010801-HInyCGIudOg@public.gmane.org>
  2013-06-28 12:49   ` [PATCH v2 03/15] IB/srp: Avoid that srp_reset_host() is skipped after a TL error Bart Van Assche
                     ` (10 subsequent siblings)
  12 siblings, 1 reply; 47+ messages in thread
From: Bart Van Assche @ 2013-06-28 12:48 UTC (permalink / raw)
  To: Roland Dreier; +Cc: David Dillow, Vu Pham, Sebastian Riemer, linux-rdma

Avoid that srp_claim_command() can claim a command while
srp_queuecommand() is still busy queueing the same command.
Found this via source reading.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 368d160..035167b 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1047,13 +1047,11 @@ map_complete:
 static void srp_put_tx_iu(struct srp_target_port *target, struct srp_iu *iu,
 			  enum srp_iu_type iu_type)
 {
-	unsigned long flags;
+	lockdep_assert_held(&target->lock);
 
-	spin_lock_irqsave(&target->lock, flags);
 	list_add(&iu->list, &target->free_tx);
 	if (iu_type != SRP_IU_RSP)
 		++target->req_lim;
-	spin_unlock_irqrestore(&target->lock, flags);
 }
 
 /*
@@ -1210,7 +1208,10 @@ static int srp_response_common(struct srp_target_port *target, s32 req_delta,
 	if (err) {
 		shost_printk(KERN_ERR, target->scsi_host, PFX
 			     "unable to post response: %d\n", err);
+
+		spin_lock_irqsave(&target->lock, flags);
 		srp_put_tx_iu(target, iu, SRP_IU_RSP);
+		spin_unlock_irqrestore(&target->lock, flags);
 	}
 
 	return err;
@@ -1367,7 +1368,6 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 
 	req = list_first_entry(&target->free_reqs, struct srp_request, list);
 	list_del(&req->list);
-	spin_unlock_irqrestore(&target->lock, flags);
 
 	dev = target->srp_host->srp_dev->dev;
 	ib_dma_sync_single_for_cpu(dev, iu->dma, target->max_iu_len,
@@ -1401,6 +1401,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 		shost_printk(KERN_ERR, target->scsi_host, PFX "Send failed\n");
 		goto err_unmap;
 	}
+	spin_unlock_irqrestore(&target->lock, flags);
 
 	return 0;
 
@@ -1409,8 +1410,6 @@ err_unmap:
 
 err_iu:
 	srp_put_tx_iu(target, iu, SRP_IU_CMD);
-
-	spin_lock_irqsave(&target->lock, flags);
 	list_add(&req->list, &target->free_reqs);
 
 err_unlock:
@@ -1729,7 +1728,10 @@ static int srp_send_tsk_mgmt(struct srp_target_port *target,
 	ib_dma_sync_single_for_device(dev, iu->dma, sizeof *tsk_mgmt,
 				      DMA_TO_DEVICE);
 	if (srp_post_send(target, iu, sizeof *tsk_mgmt)) {
+		spin_lock_irq(&target->lock);
 		srp_put_tx_iu(target, iu, SRP_IU_TSK_MGMT);
+		spin_unlock_irq(&target->lock);
+
 		return -1;
 	}
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 03/15] IB/srp: Avoid that srp_reset_host() is skipped after a TL error
       [not found] ` <51CD856A.3010102-HInyCGIudOg@public.gmane.org>
  2013-06-28 12:46   ` [PATCH v2 01/15] IB/srp: Fix remove_one crash due to resource exhaustion Bart Van Assche
  2013-06-28 12:48   ` [PATCH v2 02/15] IB/srp: Fix race between srp_queuecommand() and srp_claim_req() Bart Van Assche
@ 2013-06-28 12:49   ` Bart Van Assche
       [not found]     ` <51CD8644.5080600-HInyCGIudOg@public.gmane.org>
  2013-06-28 12:49   ` [PATCH v2 04/15] IB/srp: Fail I/O fast if target offline Bart Van Assche
                     ` (9 subsequent siblings)
  12 siblings, 1 reply; 47+ messages in thread
From: Bart Van Assche @ 2013-06-28 12:49 UTC (permalink / raw)
  To: Roland Dreier; +Cc: David Dillow, Vu Pham, Sebastian Riemer, linux-rdma

The SCSI error handler assumes that the transport layer is
operational if an eh_abort_handler() returns SUCCESS. Hence let
srp_abort() only return SUCCESS if sending the ABORT TASK task
management function succeeded. This patch avoids that the SCSI
error handler skips the srp_reset_host() call after a transport
layer error.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |   10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 035167b..8c95262 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1746,18 +1746,22 @@ static int srp_abort(struct scsi_cmnd *scmnd)
 {
 	struct srp_target_port *target = host_to_target(scmnd->device->host);
 	struct srp_request *req = (struct srp_request *) scmnd->host_scribble;
+	int ret;
 
 	shost_printk(KERN_ERR, target->scsi_host, "SRP abort called\n");
 
 	if (!req || !srp_claim_req(target, req, scmnd))
 		return FAILED;
-	srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
-			  SRP_TSK_ABORT_TASK);
+	if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
+			      SRP_TSK_ABORT_TASK) == 0)
+		ret = SUCCESS;
+	else
+		ret = FAILED;
 	srp_free_req(target, req, scmnd, 0);
 	scmnd->result = DID_ABORT << 16;
 	scmnd->scsi_done(scmnd);
 
-	return SUCCESS;
+	return ret;
 }
 
 static int srp_reset_device(struct scsi_cmnd *scmnd)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 04/15] IB/srp: Fail I/O fast if target offline
       [not found] ` <51CD856A.3010102-HInyCGIudOg@public.gmane.org>
                     ` (2 preceding siblings ...)
  2013-06-28 12:49   ` [PATCH v2 03/15] IB/srp: Avoid that srp_reset_host() is skipped after a TL error Bart Van Assche
@ 2013-06-28 12:49   ` Bart Van Assche
       [not found]     ` <51CD8676.6080205-HInyCGIudOg@public.gmane.org>
  2013-06-28 12:50   ` [PATCH v2 05/15] IB/srp: Skip host settle delay Bart Van Assche
                     ` (8 subsequent siblings)
  12 siblings, 1 reply; 47+ messages in thread
From: Bart Van Assche @ 2013-06-28 12:49 UTC (permalink / raw)
  To: Roland Dreier; +Cc: David Dillow, Vu Pham, Sebastian Riemer, linux-rdma

If reconnecting failed we know that no command completion will
be received anymore. Hence let the SCSI error handler fail such
commands immediately.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 8c95262..5c91521 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1755,6 +1755,8 @@ static int srp_abort(struct scsi_cmnd *scmnd)
 	if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
 			      SRP_TSK_ABORT_TASK) == 0)
 		ret = SUCCESS;
+	else if (target->transport_offline)
+		ret = FAST_IO_FAIL;
 	else
 		ret = FAILED;
 	srp_free_req(target, req, scmnd, 0);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 05/15] IB/srp: Skip host settle delay
       [not found] ` <51CD856A.3010102-HInyCGIudOg@public.gmane.org>
                     ` (3 preceding siblings ...)
  2013-06-28 12:49   ` [PATCH v2 04/15] IB/srp: Fail I/O fast if target offline Bart Van Assche
@ 2013-06-28 12:50   ` Bart Van Assche
  2013-06-28 12:51   ` [PATCH v2 06/15] IB/srp: Maintain a single connection per I_T nexus Bart Van Assche
                     ` (7 subsequent siblings)
  12 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2013-06-28 12:50 UTC (permalink / raw)
  To: Roland Dreier; +Cc: David Dillow, Vu Pham, Sebastian Riemer, linux-rdma

The SRP initiator implements host reset by reconnecting to the SRP
target. That means that communication with the target is possible
as soon as host reset finished. Hence skip the host settle delay.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 5c91521..01212c9 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1954,6 +1954,7 @@ static struct scsi_host_template srp_template = {
 	.eh_abort_handler		= srp_abort,
 	.eh_device_reset_handler	= srp_reset_device,
 	.eh_host_reset_handler		= srp_reset_host,
+	.skip_settle_delay		= true,
 	.sg_tablesize			= SRP_DEF_SG_TABLESIZE,
 	.can_queue			= SRP_CMD_SQ_SIZE,
 	.this_id			= -1,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 06/15] IB/srp: Maintain a single connection per I_T nexus
       [not found] ` <51CD856A.3010102-HInyCGIudOg@public.gmane.org>
                     ` (4 preceding siblings ...)
  2013-06-28 12:50   ` [PATCH v2 05/15] IB/srp: Skip host settle delay Bart Van Assche
@ 2013-06-28 12:51   ` Bart Van Assche
       [not found]     ` <51CD86CE.8080804-HInyCGIudOg@public.gmane.org>
  2013-06-28 12:52   ` [PATCH v2 07/15] IB/srp: Keep rport as long as the IB transport layer Bart Van Assche
                     ` (6 subsequent siblings)
  12 siblings, 1 reply; 47+ messages in thread
From: Bart Van Assche @ 2013-06-28 12:51 UTC (permalink / raw)
  To: Roland Dreier
  Cc: David Dillow, Vu Pham, Sebastian Riemer, linux-rdma, linux-scsi

An SRP target is required to maintain a single connection between
initiator and target. This means that if the 'add_target' attribute
is used to create a second connection to a target that the first
connection will be logged out and that the SCSI error handler will
kick in. The SCSI error handler will cause the SRP initiator to
reconnect, which will cause I/O over the second connection to fail.
Avoid such ping-pong behavior by disabling relogins. Note: if
reconnecting manually is necessary, that is possible by deleting
and recreating an rport via sysfs.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Signed-off-by: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Cc: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |   44 +++++++++++++++++++++++++++++++++--
 1 file changed, 42 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 01212c9..4eba1f7 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -542,11 +542,11 @@ static void srp_remove_work(struct work_struct *work)
 
 	WARN_ON_ONCE(target->state != SRP_TARGET_REMOVED);
 
+	srp_remove_target(target);
+
 	spin_lock(&target->srp_host->target_lock);
 	list_del(&target->list);
 	spin_unlock(&target->srp_host->target_lock);
-
-	srp_remove_target(target);
 }
 
 static void srp_rport_delete(struct srp_rport *rport)
@@ -2010,6 +2010,36 @@ static struct class srp_class = {
 	.dev_release = srp_release_dev
 };
 
+/**
+ * srp_conn_unique() - check whether the connection to a target is unique
+ */
+static bool srp_conn_unique(struct srp_host *host,
+			    struct srp_target_port *target)
+{
+	struct srp_target_port *t;
+	bool ret = false;
+
+	if (target->state == SRP_TARGET_REMOVED)
+		goto out;
+
+	ret = true;
+
+	spin_lock(&host->target_lock);
+	list_for_each_entry(t, &host->target_list, list) {
+		if (t != target &&
+		    target->id_ext == t->id_ext &&
+		    target->ioc_guid == t->ioc_guid &&
+		    target->initiator_ext == t->initiator_ext) {
+			ret = false;
+			break;
+		}
+	}
+	spin_unlock(&host->target_lock);
+
+out:
+	return ret;
+}
+
 /*
  * Target ports are added by writing
  *
@@ -2266,6 +2296,16 @@ static ssize_t srp_create_target(struct device *dev,
 	if (ret)
 		goto err;
 
+	if (!srp_conn_unique(target->srp_host, target)) {
+		shost_printk(KERN_INFO, target->scsi_host,
+			     PFX "Already connected to target port with id_ext=%016llx;ioc_guid=%016llx;initiator_ext=%016llx\n",
+			     be64_to_cpu(target->id_ext),
+			     be64_to_cpu(target->ioc_guid),
+			     be64_to_cpu(target->initiator_ext));
+		ret = -EEXIST;
+		goto err;
+	}
+
 	if (!host->srp_dev->fmr_pool && !target->allow_ext_sg &&
 				target->cmd_sg_cnt < target->sg_tablesize) {
 		pr_warn("No FMR pool and no external indirect descriptors, limiting sg_tablesize to cmd_sg_cnt\n");
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 07/15] IB/srp: Keep rport as long as the IB transport layer
       [not found] ` <51CD856A.3010102-HInyCGIudOg@public.gmane.org>
                     ` (5 preceding siblings ...)
  2013-06-28 12:51   ` [PATCH v2 06/15] IB/srp: Maintain a single connection per I_T nexus Bart Van Assche
@ 2013-06-28 12:52   ` Bart Van Assche
  2013-06-30 21:06     ` David Dillow
  2013-06-28 12:54   ` [PATCH v2 09/15] IB/srp: Add srp_terminate_io() Bart Van Assche
                     ` (5 subsequent siblings)
  12 siblings, 1 reply; 47+ messages in thread
From: Bart Van Assche @ 2013-06-28 12:52 UTC (permalink / raw)
  To: Roland Dreier
  Cc: David Dillow, Vu Pham, Sebastian Riemer, linux-rdma, linux-scsi,
	James Bottomley

Keep the rport data structure around after srp_remove_host() has
finished until cleanup of the IB transport layer has finished
completely. This is necessary because later patches use the rport
pointer inside the queuecommand callback. Without this patch
accessing the rport from inside a queuecommand callback is racy
because srp_remove_host() must be invoked before scsi_remove_host()
and because the queuecommand callback may get invoked after
srp_remove_host() has finished.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: James Bottomley <JBottomley-MU7nAjRaF3makBO8gow8eQ@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |    3 +++
 drivers/infiniband/ulp/srp/ib_srp.h |    1 +
 drivers/scsi/scsi_transport_srp.c   |   18 ++++++++++++++++++
 include/scsi/scsi_transport_srp.h   |    2 ++
 4 files changed, 24 insertions(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 4eba1f7..cb86be5 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -526,11 +526,13 @@ static void srp_remove_target(struct srp_target_port *target)
 	WARN_ON_ONCE(target->state != SRP_TARGET_REMOVED);
 
 	srp_del_scsi_host_attr(target->scsi_host);
+	srp_rport_get(target->rport);
 	srp_remove_host(target->scsi_host);
 	scsi_remove_host(target->scsi_host);
 	srp_disconnect_target(target);
 	ib_destroy_cm_id(target->cm_id);
 	srp_free_target_ib(target);
+	srp_rport_put(target->rport);
 	srp_free_req_data(target);
 	scsi_host_put(target->scsi_host);
 }
@@ -1984,6 +1986,7 @@ static int srp_add_target(struct srp_host *host, struct srp_target_port *target)
 	}
 
 	rport->lld_data = target;
+	target->rport = rport;
 
 	spin_lock(&host->target_lock);
 	list_add_tail(&target->list, &host->target_list);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index 66fbedd..1817ed5 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -153,6 +153,7 @@ struct srp_target_port {
 	u16			io_class;
 	struct srp_host	       *srp_host;
 	struct Scsi_Host       *scsi_host;
+	struct srp_rport       *rport;
 	char			target_name[32];
 	unsigned int		scsi_id;
 	unsigned int		sg_tablesize;
diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index f379c7f..f7ba94a 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -185,6 +185,24 @@ static int srp_host_match(struct attribute_container *cont, struct device *dev)
 }
 
 /**
+ * srp_rport_get() - increment rport reference count
+ */
+void srp_rport_get(struct srp_rport *rport)
+{
+	get_device(&rport->dev);
+}
+EXPORT_SYMBOL(srp_rport_get);
+
+/**
+ * srp_rport_put() - decrement rport reference count
+ */
+void srp_rport_put(struct srp_rport *rport)
+{
+	put_device(&rport->dev);
+}
+EXPORT_SYMBOL(srp_rport_put);
+
+/**
  * srp_rport_add - add a SRP remote port to the device hierarchy
  * @shost:	scsi host the remote port is connected to.
  * @ids:	The port id for the remote port.
diff --git a/include/scsi/scsi_transport_srp.h b/include/scsi/scsi_transport_srp.h
index ff0f04a..5a2d2d1 100644
--- a/include/scsi/scsi_transport_srp.h
+++ b/include/scsi/scsi_transport_srp.h
@@ -38,6 +38,8 @@ extern struct scsi_transport_template *
 srp_attach_transport(struct srp_function_template *);
 extern void srp_release_transport(struct scsi_transport_template *);
 
+extern void srp_rport_get(struct srp_rport *rport);
+extern void srp_rport_put(struct srp_rport *rport);
 extern struct srp_rport *srp_rport_add(struct Scsi_Host *,
 				       struct srp_rport_identifiers *);
 extern void srp_rport_del(struct srp_rport *);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 08/15] scsi_transport_srp: Add transport layer error handling
  2013-06-28 12:45 [PATCH v2 0/15] IB SRP initiator patches for kernel 3.11 Bart Van Assche
       [not found] ` <51CD856A.3010102-HInyCGIudOg@public.gmane.org>
@ 2013-06-28 12:53 ` Bart Van Assche
  2013-06-30 21:05   ` David Dillow
  2013-06-28 12:56 ` [PATCH v2 12/15] IB/srp: Fail SCSI commands silently Bart Van Assche
  2 siblings, 1 reply; 47+ messages in thread
From: Bart Van Assche @ 2013-06-28 12:53 UTC (permalink / raw)
  To: Roland Dreier
  Cc: David Dillow, Vu Pham, Sebastian Riemer, linux-rdma, linux-scsi,
	James Bottomley

Add the necessary functions in the SRP transport module to allow
an SRP initiator driver to implement transport layer error handling
similar to the functionality already provided by the FC transport
layer. This includes:
- Support for implementing fast_io_fail_tmo, the time that should
  elapse after having detected a transport layer problem and
  before failing I/O.
- Support for implementing dev_loss_tmo, the time that should
  elapse after having detected a transport layer problem and
  before removing a remote port.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@purestorage.com>
Cc: James Bottomley <JBottomley@Parallels.com>
Cc: David Dillow <dillowda@ornl.gov>
Cc: Vu Pham <vu@mellanox.com>
Cc: Sebastian Riemer <sebastian.riemer@profitbricks.com>
---
 Documentation/ABI/stable/sysfs-transport-srp |   37 ++
 drivers/scsi/scsi_transport_srp.c            |  477 +++++++++++++++++++++++++-
 include/scsi/scsi_transport_srp.h            |   62 +++-
 3 files changed, 573 insertions(+), 3 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-transport-srp b/Documentation/ABI/stable/sysfs-transport-srp
index b36fb0d..6a4d651 100644
--- a/Documentation/ABI/stable/sysfs-transport-srp
+++ b/Documentation/ABI/stable/sysfs-transport-srp
@@ -5,6 +5,24 @@ Contact:	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org
 Description:	Instructs an SRP initiator to disconnect from a target and to
 		remove all LUNs imported from that target.
 
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/dev_loss_tmo
+Date:		October 1, 2013
+KernelVersion:	3.11
+Contact:	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org
+Description:	Number of seconds the SCSI layer will wait after a transport
+		layer error has been observed before removing a target port.
+		Zero means immediate removal. Setting this attribute to "off"
+		will disable this behavior.
+
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/fast_io_fail_tmo
+Date:		October 1, 2013
+KernelVersion:	3.11
+Contact:	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org
+Description:	Number of seconds the SCSI layer will wait after a transport
+		layer error has been observed before failing I/O. Zero means
+		immediate removal. Setting this attribute to "off" will
+		disable this behavior.
+
 What:		/sys/class/srp_remote_ports/port-<h>:<n>/port_id
 Date:		June 27, 2007
 KernelVersion:	2.6.24
@@ -12,8 +30,27 @@ Contact:	linux-scsi@vger.kernel.org
 Description:	16-byte local SRP port identifier in hexadecimal format. An
 		example: 4c:49:4e:55:58:20:56:49:4f:00:00:00:00:00:00:00.
 
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/reconnect_delay
+Date:		October 1, 2013
+KernelVersion:	3.11
+Contact:	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org
+Description:	Number of seconds the SCSI layer will wait after a reconnect
+		attempt failed before retrying.
+
 What:		/sys/class/srp_remote_ports/port-<h>:<n>/roles
 Date:		June 27, 2007
 KernelVersion:	2.6.24
 Contact:	linux-scsi@vger.kernel.org
 Description:	Role of the remote port. Either "SRP Initiator" or "SRP Target".
+
+What:		/sys/class/srp_remote_ports/port-<h>:<n>/state
+Date:		October 1, 2013
+KernelVersion:	3.11
+Contact:	linux-scsi@vger.kernel.org, linux-rdma@vger.kernel.org
+Description:	State of the transport layer to the remote port. "running" if
+		the transport layer is operational; "blocked" if a transport
+		layer error has been encountered but the fail_io_fast_tmo
+		timer has not yet fired; "fail-fast" after the
+		fail_io_fast_tmo timer has fired and before the "dev_loss_tmo"
+		timer has fired; "lost" after the "dev_loss_tmo" timer has
+		fired and before the port is finally removed.
diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index f7ba94a..44b27dd 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -24,11 +24,13 @@
 #include <linux/err.h>
 #include <linux/slab.h>
 #include <linux/string.h>
+#include <linux/delay.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_device.h>
 #include <scsi/scsi_host.h>
 #include <scsi/scsi_transport.h>
+#include <scsi/scsi_cmnd.h>
 #include <scsi/scsi_transport_srp.h>
 #include "scsi_transport_srp_internal.h"
 
@@ -38,7 +40,7 @@ struct srp_host_attrs {
 #define to_srp_host_attrs(host)	((struct srp_host_attrs *)(host)->shost_data)
 
 #define SRP_HOST_ATTRS 0
-#define SRP_RPORT_ATTRS 3
+#define SRP_RPORT_ATTRS 8
 
 struct srp_internal {
 	struct scsi_transport_template t;
@@ -54,6 +56,25 @@ struct srp_internal {
 
 #define	dev_to_rport(d)	container_of(d, struct srp_rport, dev)
 #define transport_class_to_srp_rport(dev) dev_to_rport((dev)->parent)
+static inline struct Scsi_Host *rport_to_shost(struct srp_rport *r)
+{
+	return dev_to_shost(r->dev.parent);
+}
+
+/**
+ * srp_tmo_valid() - check timeout combination validity
+ *
+ * If both a fast I/O fail and a device loss timeout have been configured then
+ * the fast I/O fail timeout must be below the device loss timeout.
+ */
+int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
+{
+	return (fast_io_fail_tmo < 0 || dev_loss_tmo < 0 ||
+		fast_io_fail_tmo < dev_loss_tmo) &&
+		fast_io_fail_tmo < LONG_MAX / HZ &&
+		dev_loss_tmo < LONG_MAX / HZ ? 0 : -EINVAL;
+}
+EXPORT_SYMBOL_GPL(srp_tmo_valid);
 
 static int srp_host_setup(struct transport_container *tc, struct device *dev,
 			  struct device *cdev)
@@ -134,10 +155,433 @@ static ssize_t store_srp_rport_delete(struct device *dev,
 
 static DEVICE_ATTR(delete, S_IWUSR, NULL, store_srp_rport_delete);
 
+static ssize_t show_srp_rport_state(struct device *dev,
+				    struct device_attribute *attr,
+				    char *buf)
+{
+	static const char *const state_name[] = {
+		[SRP_RPORT_RUNNING]	= "running",
+		[SRP_RPORT_BLOCKED]	= "blocked",
+		[SRP_RPORT_FAIL_FAST]	= "fail-fast",
+		[SRP_RPORT_LOST]	= "lost",
+	};
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+	enum srp_rport_state state = rport->state;
+
+	return sprintf(buf, "%s\n",
+		       (unsigned)state < ARRAY_SIZE(state_name) ?
+		       state_name[state] : "???");
+}
+
+static DEVICE_ATTR(state, S_IRUGO, show_srp_rport_state, NULL);
+
+static ssize_t show_reconnect_delay(struct device *dev,
+				    struct device_attribute *attr, char *buf)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+
+	return sprintf(buf, "%d\n", rport->reconnect_delay);
+}
+
+static ssize_t store_reconnect_delay(struct device *dev,
+				     struct device_attribute *attr,
+				     const char *buf, const size_t count)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+	char ch[16];
+	int res, delay;
+
+	sprintf(ch, "%.*s", min_t(int, sizeof(ch) - 1, count), buf);
+	res = kstrtoint(ch, 0, &delay);
+	if (res)
+		goto out;
+
+	res = mutex_lock_interruptible(&rport->mutex);
+	if (res)
+		goto out;
+	if (rport->reconnect_delay <= 0 && delay > 0 &&
+	    rport->state != SRP_RPORT_RUNNING) {
+		queue_delayed_work(system_long_wq, &rport->reconnect_work,
+				   delay * HZ);
+	} else if (delay <= 0) {
+		cancel_delayed_work(&rport->reconnect_work);
+	}
+	rport->reconnect_delay = delay;
+	mutex_unlock(&rport->mutex);
+
+	res = count;
+
+out:
+	return res;
+}
+
+static DEVICE_ATTR(reconnect_delay, S_IRUGO | S_IWUSR, show_reconnect_delay,
+		   store_reconnect_delay);
+
+static ssize_t show_failed_reconnects(struct device *dev,
+				      struct device_attribute *attr, char *buf)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+
+	return sprintf(buf, "%d\n", rport->failed_reconnects);
+}
+
+static DEVICE_ATTR(failed_reconnects, S_IRUGO, show_failed_reconnects, NULL);
+
+static ssize_t show_srp_rport_fast_io_fail_tmo(struct device *dev,
+					       struct device_attribute *attr,
+					       char *buf)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+
+	if (rport->fast_io_fail_tmo >= 0)
+		return sprintf(buf, "%d\n", rport->fast_io_fail_tmo);
+	else
+		return sprintf(buf, "off\n");
+}
+
+static ssize_t store_srp_rport_fast_io_fail_tmo(struct device *dev,
+						struct device_attribute *attr,
+						const char *buf, size_t count)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+	char ch[16], *p;
+	int res;
+	int fast_io_fail_tmo;
+
+	sprintf(ch, "%.*s", min_t(int, sizeof(ch) - 1, count), buf);
+	p = strchr(ch, '\n');
+	if (p)
+		*p = '\0';
+
+	if (strcmp(ch, "off") != 0) {
+		res = kstrtoint(ch, 0, &fast_io_fail_tmo);
+		if (res)
+			goto out;
+	} else {
+		fast_io_fail_tmo = -1;
+	}
+	res = srp_tmo_valid(fast_io_fail_tmo, rport->dev_loss_tmo);
+	if (res)
+		goto out;
+	rport->fast_io_fail_tmo = fast_io_fail_tmo;
+	res = count;
+
+out:
+	return res;
+}
+
+static DEVICE_ATTR(fast_io_fail_tmo, S_IRUGO | S_IWUSR,
+		   show_srp_rport_fast_io_fail_tmo,
+		   store_srp_rport_fast_io_fail_tmo);
+
+static ssize_t show_srp_rport_dev_loss_tmo(struct device *dev,
+					   struct device_attribute *attr,
+					   char *buf)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+
+	if (rport->dev_loss_tmo >= 0)
+		return sprintf(buf, "%d\n", rport->dev_loss_tmo);
+	else
+		return sprintf(buf, "off\n");
+}
+
+static ssize_t store_srp_rport_dev_loss_tmo(struct device *dev,
+					    struct device_attribute *attr,
+					    const char *buf, size_t count)
+{
+	struct srp_rport *rport = transport_class_to_srp_rport(dev);
+	char ch[16];
+	int res;
+	int dev_loss_tmo;
+
+	sprintf(ch, "%.*s", min_t(int, sizeof(ch) - 1, count), buf);
+	if (strcmp(ch, "off") != 0) {
+		res = kstrtoint(ch, 0, &dev_loss_tmo);
+		if (res)
+			goto out;
+	} else {
+		dev_loss_tmo = -1;
+	}
+	res = srp_tmo_valid(rport->fast_io_fail_tmo, dev_loss_tmo);
+	if (res)
+		goto out;
+	rport->dev_loss_tmo = dev_loss_tmo;
+	res = count;
+
+out:
+	return res;
+}
+
+static DEVICE_ATTR(dev_loss_tmo, S_IRUGO | S_IWUSR,
+		   show_srp_rport_dev_loss_tmo,
+		   store_srp_rport_dev_loss_tmo);
+
+static int srp_rport_set_state(struct srp_rport *rport,
+			       enum srp_rport_state new_state)
+{
+	enum srp_rport_state old_state = rport->state;
+
+	lockdep_assert_held(&rport->mutex);
+
+	switch (new_state) {
+	case SRP_RPORT_RUNNING:
+		switch (old_state) {
+		case SRP_RPORT_LOST:
+			goto invalid;
+		default:
+			break;
+		}
+		break;
+	case SRP_RPORT_BLOCKED:
+		switch (old_state) {
+		case SRP_RPORT_RUNNING:
+			break;
+		default:
+			goto invalid;
+		}
+		break;
+	case SRP_RPORT_FAIL_FAST:
+		switch (old_state) {
+		case SRP_RPORT_LOST:
+			goto invalid;
+		default:
+			break;
+		}
+		break;
+	case SRP_RPORT_LOST:
+		break;
+	}
+	rport->state = new_state;
+	return 0;
+
+invalid:
+	return -EINVAL;
+}
+
+/**
+ * scsi_request_fn_active() - number of kernel threads inside scsi_request_fn()
+ */
+static int scsi_request_fn_active(struct Scsi_Host *shost)
+{
+	struct scsi_device *sdev;
+	struct request_queue *q;
+	int request_fn_active = 0;
+
+	shost_for_each_device(sdev, shost) {
+		q = sdev->request_queue;
+
+		spin_lock_irq(q->queue_lock);
+		request_fn_active += q->request_fn_active;
+		spin_unlock_irq(q->queue_lock);
+	}
+
+	return request_fn_active;
+}
+
+/**
+ * srp_reconnect_rport() - reconnect to an SRP target port
+ *
+ * Blocks SCSI command queueing before invoking reconnect() such that
+ * queuecommand() won't be invoked concurrently with reconnect(). This is
+ * important since a reconnect() implementation may reallocate resources
+ * needed by queuecommand(). Please note that this function neither waits
+ * until outstanding requests have finished nor tries to abort these. It is
+ * the responsibility of the reconnect() function to finish outstanding
+ * commands before reconnecting to the target port.
+ */
+int srp_reconnect_rport(struct srp_rport *rport)
+{
+	struct Scsi_Host *shost = rport_to_shost(rport);
+	struct srp_internal *i = to_srp_internal(shost->transportt);
+	struct scsi_device *sdev;
+	int res;
+
+	pr_debug("SCSI host %s\n", dev_name(&shost->shost_gendev));
+
+	res = mutex_lock_interruptible(&rport->mutex);
+	if (res)
+		goto out;
+	scsi_target_block(&shost->shost_gendev);
+	while (scsi_request_fn_active(shost))
+		msleep(20);
+	res = i->f->reconnect(rport);
+	pr_debug("%s (state %d): transport.reconnect() returned %d\n",
+		 dev_name(&shost->shost_gendev), rport->state, res);
+	if (res == 0) {
+		cancel_delayed_work(&rport->fast_io_fail_work);
+		cancel_delayed_work(&rport->dev_loss_work);
+		rport->failed_reconnects = 0;
+		srp_rport_set_state(rport, SRP_RPORT_RUNNING);
+		scsi_target_unblock(&shost->shost_gendev, SDEV_RUNNING);
+		/*
+		 * It can occur that after fast_io_fail_tmo expired and before
+		 * dev_loss_tmo expired that the SCSI error handler has
+		 * offlined one or more devices. scsi_target_unblock() doesn't
+		 * change the state of these devices into running, so do that
+		 * explicitly.
+		 */
+		spin_lock_irq(shost->host_lock);
+		__shost_for_each_device(sdev, shost)
+			if (sdev->sdev_state == SDEV_OFFLINE)
+				sdev->sdev_state = SDEV_RUNNING;
+		spin_unlock_irq(shost->host_lock);
+	} else if (rport->state == SRP_RPORT_RUNNING) {
+		srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST);
+		scsi_target_unblock(&shost->shost_gendev,
+				    SDEV_TRANSPORT_OFFLINE);
+	}
+	mutex_unlock(&rport->mutex);
+
+out:
+	return res;
+}
+EXPORT_SYMBOL(srp_reconnect_rport);
+
+/**
+ * srp_reconnect_work() - reconnect and schedule a new attempt if necessary
+ */
+static void srp_reconnect_work(struct work_struct *work)
+{
+	struct srp_rport *rport = container_of(to_delayed_work(work),
+					struct srp_rport, reconnect_work);
+	struct Scsi_Host *shost = rport_to_shost(rport);
+	int delay, res;
+
+	res = srp_reconnect_rport(rport);
+	if (res != 0) {
+		shost_printk(KERN_ERR, shost,
+			     "reconnect attempt %d failed (%d)\n",
+			     ++rport->failed_reconnects, res);
+		delay = rport->reconnect_delay *
+			min(100, max(1, rport->failed_reconnects - 10));
+		if (delay > 0)
+			queue_delayed_work(system_long_wq,
+					   &rport->reconnect_work, delay * HZ);
+	}
+}
+
+static void __rport_fast_io_fail_timedout(struct srp_rport *rport)
+{
+	struct Scsi_Host *shost = rport_to_shost(rport);
+	struct srp_internal *i;
+
+	lockdep_assert_held(&rport->mutex);
+
+	if (srp_rport_set_state(rport, SRP_RPORT_FAIL_FAST))
+		return;
+	scsi_target_unblock(rport->dev.parent, SDEV_TRANSPORT_OFFLINE);
+
+	/* Involve the LLD if possible to terminate all I/O on the rport. */
+	i = to_srp_internal(shost->transportt);
+	if (i->f->terminate_rport_io)
+		i->f->terminate_rport_io(rport);
+}
+
+/**
+ * rport_fast_io_fail_timedout() - fast I/O failure timeout handler
+ *
+ * Unblocks the SCSI host.
+ */
+static void rport_fast_io_fail_timedout(struct work_struct *work)
+{
+	struct srp_rport *rport = container_of(to_delayed_work(work),
+					struct srp_rport, fast_io_fail_work);
+	struct Scsi_Host *shost = rport_to_shost(rport);
+
+	pr_debug("fast_io_fail_tmo expired for %s.\n",
+		 dev_name(&shost->shost_gendev));
+
+	mutex_lock(&rport->mutex);
+	__rport_fast_io_fail_timedout(rport);
+	mutex_unlock(&rport->mutex);
+}
+
+/**
+ * rport_dev_loss_timedout() - device loss timeout handler
+ */
+static void rport_dev_loss_timedout(struct work_struct *work)
+{
+	struct srp_rport *rport = container_of(to_delayed_work(work),
+					struct srp_rport, dev_loss_work);
+	struct Scsi_Host *shost = rport_to_shost(rport);
+	struct srp_internal *i = to_srp_internal(shost->transportt);
+
+	pr_err("dev_loss_tmo expired for %s.\n",
+	       dev_name(&shost->shost_gendev));
+
+	mutex_lock(&rport->mutex);
+	WARN_ON(srp_rport_set_state(rport, SRP_RPORT_LOST) != 0);
+	scsi_target_unblock(rport->dev.parent, SDEV_TRANSPORT_OFFLINE);
+	mutex_unlock(&rport->mutex);
+
+	i->f->rport_delete(rport);
+}
+
+/**
+ * srp_start_tl_fail_timers() - start the transport layer failure timers
+ *
+ * Start the transport layer fast I/O failure and device loss timers. Do not
+ * modify a timer that was already started.
+ */
+void srp_start_tl_fail_timers(struct srp_rport *rport)
+{
+	struct Scsi_Host *shost = rport_to_shost(rport);
+	int fast_io_fail_tmo, dev_loss_tmo, delay;
+
+	mutex_lock(&rport->mutex);
+	delay = rport->reconnect_delay;
+	fast_io_fail_tmo = rport->fast_io_fail_tmo;
+	dev_loss_tmo = rport->dev_loss_tmo;
+	pr_debug("%s current state: %d\n", dev_name(&shost->shost_gendev),
+		 rport->state);
+
+	if (fast_io_fail_tmo >= 0 &&
+	    srp_rport_set_state(rport, SRP_RPORT_BLOCKED) == 0) {
+		pr_debug("%s new state: %d\n", dev_name(&shost->shost_gendev),
+			 rport->state);
+		scsi_target_block(&shost->shost_gendev);
+		queue_delayed_work(system_long_wq, &rport->fast_io_fail_work,
+				   1UL * fast_io_fail_tmo * HZ);
+	}
+	if (dev_loss_tmo >= 0)
+		queue_delayed_work(system_long_wq, &rport->dev_loss_work,
+				   1UL * dev_loss_tmo * HZ);
+	if (delay > 0)
+		queue_delayed_work(system_long_wq, &rport->reconnect_work,
+				   1UL * delay * HZ);
+	mutex_unlock(&rport->mutex);
+}
+EXPORT_SYMBOL(srp_start_tl_fail_timers);
+
+/**
+ * srp_timed_out() - SRP transport intercept of the SCSI timeout EH
+ *
+ * If a timeout occurs while an rport is in the blocked state, ask the SCSI
+ * EH to continue waiting (BLK_EH_RESET_TIMER). Otherwise let the SCSI core
+ * handle the timeout (BLK_EH_NOT_HANDLED).
+ *
+ * Note: This function is called from soft-IRQ context and with the request
+ * queue lock held.
+ */
+static enum blk_eh_timer_return srp_timed_out(struct scsi_cmnd *scmd)
+{
+	struct scsi_device *sdev = scmd->device;
+
+	pr_debug("timeout for sdev %s\n", dev_name(&sdev->sdev_gendev));
+	return scsi_device_blocked(sdev) ? BLK_EH_RESET_TIMER :
+		BLK_EH_NOT_HANDLED;
+}
+
 static void srp_rport_release(struct device *dev)
 {
 	struct srp_rport *rport = dev_to_rport(dev);
 
+	cancel_delayed_work_sync(&rport->reconnect_work);
+	cancel_delayed_work_sync(&rport->fast_io_fail_work);
+	cancel_delayed_work_sync(&rport->dev_loss_work);
+
 	put_device(dev->parent);
 	kfree(rport);
 }
@@ -214,12 +658,15 @@ struct srp_rport *srp_rport_add(struct Scsi_Host *shost,
 {
 	struct srp_rport *rport;
 	struct device *parent = &shost->shost_gendev;
+	struct srp_internal *i = to_srp_internal(shost->transportt);
 	int id, ret;
 
 	rport = kzalloc(sizeof(*rport), GFP_KERNEL);
 	if (!rport)
 		return ERR_PTR(-ENOMEM);
 
+	mutex_init(&rport->mutex);
+
 	device_initialize(&rport->dev);
 
 	rport->dev.parent = get_device(parent);
@@ -228,6 +675,17 @@ struct srp_rport *srp_rport_add(struct Scsi_Host *shost,
 	memcpy(rport->port_id, ids->port_id, sizeof(rport->port_id));
 	rport->roles = ids->roles;
 
+	if (i->f->reconnect)
+		rport->reconnect_delay = i->f->reconnect_delay ?
+			*i->f->reconnect_delay : 10;
+	INIT_DELAYED_WORK(&rport->reconnect_work, srp_reconnect_work);
+	rport->fast_io_fail_tmo = i->f->fast_io_fail_tmo ?
+		*i->f->fast_io_fail_tmo : 15;
+	rport->dev_loss_tmo = i->f->dev_loss_tmo ? *i->f->dev_loss_tmo : 600;
+	INIT_DELAYED_WORK(&rport->fast_io_fail_work,
+			  rport_fast_io_fail_timedout);
+	INIT_DELAYED_WORK(&rport->dev_loss_work, rport_dev_loss_timedout);
+
 	id = atomic_inc_return(&to_srp_host_attrs(shost)->next_port_id);
 	dev_set_name(&rport->dev, "port-%d:%d", shost->host_no, id);
 
@@ -277,6 +735,12 @@ void srp_rport_del(struct srp_rport *rport)
 	transport_remove_device(dev);
 	device_del(dev);
 	transport_destroy_device(dev);
+
+	mutex_lock(&rport->mutex);
+	if (rport->state == SRP_RPORT_BLOCKED)
+		__rport_fast_io_fail_timedout(rport);
+	mutex_unlock(&rport->mutex);
+
 	put_device(dev);
 }
 EXPORT_SYMBOL_GPL(srp_rport_del);
@@ -328,6 +792,8 @@ srp_attach_transport(struct srp_function_template *ft)
 	if (!i)
 		return NULL;
 
+	i->t.eh_timed_out = srp_timed_out;
+
 	i->t.tsk_mgmt_response = srp_tsk_mgmt_response;
 	i->t.it_nexus_response = srp_it_nexus_response;
 
@@ -345,6 +811,15 @@ srp_attach_transport(struct srp_function_template *ft)
 	count = 0;
 	i->rport_attrs[count++] = &dev_attr_port_id;
 	i->rport_attrs[count++] = &dev_attr_roles;
+	if (ft->has_rport_state) {
+		i->rport_attrs[count++] = &dev_attr_state;
+		i->rport_attrs[count++] = &dev_attr_fast_io_fail_tmo;
+		i->rport_attrs[count++] = &dev_attr_dev_loss_tmo;
+	}
+	if (ft->reconnect) {
+		i->rport_attrs[count++] = &dev_attr_reconnect_delay;
+		i->rport_attrs[count++] = &dev_attr_failed_reconnects;
+	}
 	if (ft->rport_delete)
 		i->rport_attrs[count++] = &dev_attr_delete;
 	i->rport_attrs[count++] = NULL;
diff --git a/include/scsi/scsi_transport_srp.h b/include/scsi/scsi_transport_srp.h
index 5a2d2d1..fbcc985 100644
--- a/include/scsi/scsi_transport_srp.h
+++ b/include/scsi/scsi_transport_srp.h
@@ -13,6 +13,26 @@ struct srp_rport_identifiers {
 	u8 roles;
 };
 
+/**
+ * enum srp_rport_state - SRP transport layer state
+ * @SRP_RPORT_RUNNING:   Transport layer operational.
+ * @SRP_RPORT_BLOCKED:   Transport layer not operational; fast I/O fail timer
+ *                       is running and I/O has been blocked.
+ * @SRP_RPORT_FAIL_FAST: Fast I/O fail timer has expired; fail I/O fast.
+ * @SRP_RPORT_LOST:      Device loss timer has expired; port is being removed.
+ */
+enum srp_rport_state {
+	SRP_RPORT_RUNNING,
+	SRP_RPORT_BLOCKED,
+	SRP_RPORT_FAIL_FAST,
+	SRP_RPORT_LOST,
+};
+
+/**
+ * struct srp_rport
+ * @mutex:   Protects against concurrent rport fast_io_fail / dev_loss_tmo /
+ *   reconnect activity.
+ */
 struct srp_rport {
 	/* for initiator and target drivers */
 
@@ -23,11 +43,27 @@ struct srp_rport {
 
 	/* for initiator drivers */
 
-	void *lld_data;	/* LLD private data */
+	void			*lld_data;	/* LLD private data */
+
+	struct mutex		mutex;
+	enum srp_rport_state	state;
+	int			reconnect_delay;
+	int			failed_reconnects;
+	struct delayed_work	reconnect_work;
+	int			fast_io_fail_tmo;
+	int			dev_loss_tmo;
+	struct delayed_work	fast_io_fail_work;
+	struct delayed_work	dev_loss_work;
 };
 
 struct srp_function_template {
 	/* for initiator drivers */
+	bool has_rport_state;
+	int *reconnect_delay;
+	int *fast_io_fail_tmo;
+	int *dev_loss_tmo;
+	int (*reconnect)(struct srp_rport *rport);
+	void (*terminate_rport_io)(struct srp_rport *rport);
 	void (*rport_delete)(struct srp_rport *rport);
 	/* for target drivers */
 	int (* tsk_mgmt_response)(struct Scsi_Host *, u64, u64, int);
@@ -43,7 +79,29 @@ extern void srp_rport_put(struct srp_rport *rport);
 extern struct srp_rport *srp_rport_add(struct Scsi_Host *,
 				       struct srp_rport_identifiers *);
 extern void srp_rport_del(struct srp_rport *);
-
+extern int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo);
+extern int srp_reconnect_rport(struct srp_rport *rport);
+extern void srp_start_tl_fail_timers(struct srp_rport *rport);
 extern void srp_remove_host(struct Scsi_Host *);
 
+/**
+ * srp_chkready() - evaluate the transport layer state before I/O
+ *
+ * Returns a SCSI result code that can be returned by the LLD. The role of
+ * this function is similar to that of fc_remote_port_chkready().
+ */
+static inline int srp_chkready(struct srp_rport *rport)
+{
+	switch (rport->state) {
+	case SRP_RPORT_RUNNING:
+	case SRP_RPORT_BLOCKED:
+	default:
+		return 0;
+	case SRP_RPORT_FAIL_FAST:
+		return DID_TRANSPORT_FAILFAST << 16;
+	case SRP_RPORT_LOST:
+		return DID_NO_CONNECT << 16;
+	}
+}
+
 #endif
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 09/15] IB/srp: Add srp_terminate_io()
       [not found] ` <51CD856A.3010102-HInyCGIudOg@public.gmane.org>
                     ` (6 preceding siblings ...)
  2013-06-28 12:52   ` [PATCH v2 07/15] IB/srp: Keep rport as long as the IB transport layer Bart Van Assche
@ 2013-06-28 12:54   ` Bart Van Assche
       [not found]     ` <51CD877E.80606-HInyCGIudOg@public.gmane.org>
  2013-06-28 12:55   ` [PATCH v2 10/15] IB/srp: Use SRP transport layer error recovery Bart Van Assche
                     ` (4 subsequent siblings)
  12 siblings, 1 reply; 47+ messages in thread
From: Bart Van Assche @ 2013-06-28 12:54 UTC (permalink / raw)
  To: Roland Dreier; +Cc: David Dillow, Vu Pham, Sebastian Riemer, linux-rdma

Finish all outstanding I/O requests after fast_io_fail_tmo expired,
which speeds up failover in a multipath setup. This patch is a
reworked version of a patch from Sebastian Riemer.

Reported-by: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |   22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index cb86be5..16c3612 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -686,17 +686,29 @@ static void srp_free_req(struct srp_target_port *target,
 	spin_unlock_irqrestore(&target->lock, flags);
 }
 
-static void srp_reset_req(struct srp_target_port *target, struct srp_request *req)
+static void srp_finish_req(struct srp_target_port *target,
+			   struct srp_request *req, int result)
 {
 	struct scsi_cmnd *scmnd = srp_claim_req(target, req, NULL);
 
 	if (scmnd) {
 		srp_free_req(target, req, scmnd, 0);
-		scmnd->result = DID_RESET << 16;
+		scmnd->result = result;
 		scmnd->scsi_done(scmnd);
 	}
 }
 
+static void srp_terminate_io(struct srp_rport *rport)
+{
+	struct srp_target_port *target = rport->lld_data;
+	int i;
+
+	for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) {
+		struct srp_request *req = &target->req_ring[i];
+		srp_finish_req(target, req, DID_TRANSPORT_FAILFAST << 16);
+	}
+}
+
 static int srp_reconnect_target(struct srp_target_port *target)
 {
 	struct Scsi_Host *shost = target->scsi_host;
@@ -723,8 +735,7 @@ static int srp_reconnect_target(struct srp_target_port *target)
 
 	for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) {
 		struct srp_request *req = &target->req_ring[i];
-		if (req->scmnd)
-			srp_reset_req(target, req);
+		srp_finish_req(target, req, DID_RESET << 16);
 	}
 
 	INIT_LIST_HEAD(&target->free_tx);
@@ -1784,7 +1795,7 @@ static int srp_reset_device(struct scsi_cmnd *scmnd)
 	for (i = 0; i < SRP_CMD_SQ_SIZE; ++i) {
 		struct srp_request *req = &target->req_ring[i];
 		if (req->scmnd && req->scmnd->device == scmnd->device)
-			srp_reset_req(target, req);
+			srp_finish_req(target, req, DID_RESET << 16);
 	}
 
 	return SUCCESS;
@@ -2596,6 +2607,7 @@ static void srp_remove_one(struct ib_device *device)
 
 static struct srp_function_template ib_srp_transport_functions = {
 	.rport_delete		 = srp_rport_delete,
+	.terminate_rport_io	 = srp_terminate_io,
 };
 
 static int __init srp_init_module(void)
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 10/15] IB/srp: Use SRP transport layer error recovery
       [not found] ` <51CD856A.3010102-HInyCGIudOg@public.gmane.org>
                     ` (7 preceding siblings ...)
  2013-06-28 12:54   ` [PATCH v2 09/15] IB/srp: Add srp_terminate_io() Bart Van Assche
@ 2013-06-28 12:55   ` Bart Van Assche
       [not found]     ` <51CD87A9.2090702-HInyCGIudOg@public.gmane.org>
  2013-06-28 12:55   ` [PATCH v2 11/15] IB/srp: Start timers if a transport layer error occurs Bart Van Assche
                     ` (3 subsequent siblings)
  12 siblings, 1 reply; 47+ messages in thread
From: Bart Van Assche @ 2013-06-28 12:55 UTC (permalink / raw)
  To: Roland Dreier; +Cc: David Dillow, Vu Pham, Sebastian Riemer, linux-rdma

Enable reconnect_delay, fast_io_fail_tmo and dev_loss_tmo
functionality for the IB SRP initiator. Add kernel module
parameters that allow to specify default values for these
three parameters.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |  123 +++++++++++++++++++++++++----------
 drivers/infiniband/ulp/srp/ib_srp.h |    1 -
 2 files changed, 88 insertions(+), 36 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 16c3612..197b310 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -86,6 +86,31 @@ module_param(topspin_workarounds, int, 0444);
 MODULE_PARM_DESC(topspin_workarounds,
 		 "Enable workarounds for Topspin/Cisco SRP target bugs if != 0");
 
+static int srp_reconnect_delay = 10;
+module_param_named(reconnect_delay, srp_reconnect_delay, int, S_IRUGO|S_IWUSR);
+MODULE_PARM_DESC(reconnect_delay, "Time between successive reconnect attempts");
+
+static struct kernel_param_ops srp_tmo_ops;
+
+static int srp_fast_io_fail_tmo = 15;
+module_param_cb(fast_io_fail_tmo, &srp_tmo_ops, &srp_fast_io_fail_tmo,
+		S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(fast_io_fail_tmo,
+		 "Number of seconds between the observation of a transport"
+		 " layer error and failing all I/O. \"off\" means that this"
+		 " functionality is disabled.");
+
+static int srp_dev_loss_tmo = 600;
+module_param_cb(dev_loss_tmo, &srp_tmo_ops, &srp_dev_loss_tmo,
+		S_IRUGO | S_IWUSR);
+MODULE_PARM_DESC(dev_loss_tmo,
+		 "Maximum number of seconds that the SRP transport should"
+		 " insulate transport layer errors. After this time has been"
+		 " exceeded the SCSI target is removed. Should be"
+		 " between 1 and " __stringify(SCSI_DEVICE_BLOCK_MAX_TIMEOUT)
+		 " if fast_io_fail_tmo has not been set. \"off\" means that"
+		 " this functionality is disabled.");
+
 static void srp_add_one(struct ib_device *device);
 static void srp_remove_one(struct ib_device *device);
 static void srp_recv_completion(struct ib_cq *cq, void *target_ptr);
@@ -102,6 +127,44 @@ static struct ib_client srp_client = {
 
 static struct ib_sa_client srp_sa_client;
 
+static int srp_tmo_get(char *buffer, const struct kernel_param *kp)
+{
+	int tmo = *(int *)kp->arg;
+
+	if (tmo >= 0)
+		return sprintf(buffer, "%d", tmo);
+	else
+		return sprintf(buffer, "off");
+}
+
+static int srp_tmo_set(const char *val, const struct kernel_param *kp)
+{
+	int tmo, res;
+
+	if (strncmp(val, "off", 3) != 0) {
+		res = kstrtoint(val, 0, &tmo);
+		if (res)
+			goto out;
+	} else {
+		tmo = -1;
+	}
+	if (kp->arg == &srp_fast_io_fail_tmo)
+		res = srp_tmo_valid(tmo, srp_dev_loss_tmo);
+	else
+		res = srp_tmo_valid(srp_fast_io_fail_tmo, tmo);
+	if (res)
+		goto out;
+	*(int *)kp->arg = tmo;
+
+out:
+	return res;
+}
+
+static struct kernel_param_ops srp_tmo_ops = {
+	.get = srp_tmo_get,
+	.set = srp_tmo_set,
+};
+
 static inline struct srp_target_port *host_to_target(struct Scsi_Host *host)
 {
 	return (struct srp_target_port *) host->hostdata;
@@ -709,13 +772,20 @@ static void srp_terminate_io(struct srp_rport *rport)
 	}
 }
 
-static int srp_reconnect_target(struct srp_target_port *target)
+/*
+ * It is up to the caller to ensure that srp_rport_reconnect() calls are
+ * serialized and that no concurrent srp_queuecommand(), srp_abort(),
+ * srp_reset_device() or srp_reset_host() calls will occur while this function
+ * is in progress. One way to realize that is not to call this function
+ * directly but to call srp_reconnect_rport() instead since that last function
+ * serializes calls of this function via rport->mutex and also blocks
+ * srp_queuecommand() calls before invoking this function.
+ */
+static int srp_rport_reconnect(struct srp_rport *rport)
 {
-	struct Scsi_Host *shost = target->scsi_host;
+	struct srp_target_port *target = rport->lld_data;
 	int i, ret;
 
-	scsi_target_block(&shost->shost_gendev);
-
 	srp_disconnect_target(target);
 	/*
 	 * Now get a new local CM ID so that we avoid confusing the target in
@@ -745,28 +815,9 @@ static int srp_reconnect_target(struct srp_target_port *target)
 	if (ret == 0)
 		ret = srp_connect_target(target);
 
-	scsi_target_unblock(&shost->shost_gendev, ret == 0 ? SDEV_RUNNING :
-			    SDEV_TRANSPORT_OFFLINE);
-	target->transport_offline = !!ret;
-
-	if (ret)
-		goto err;
-
-	shost_printk(KERN_INFO, target->scsi_host, PFX "reconnect succeeded\n");
-
-	return ret;
-
-err:
-	shost_printk(KERN_ERR, target->scsi_host,
-		     PFX "reconnect failed (%d), removing target port.\n", ret);
-
-	/*
-	 * We couldn't reconnect, so kill our target port off.
-	 * However, we have to defer the real removal because we
-	 * are in the context of the SCSI error handler now, which
-	 * will deadlock if we call scsi_remove_host().
-	 */
-	srp_queue_remove_work(target);
+	if (ret == 0)
+		shost_printk(KERN_INFO, target->scsi_host,
+			     PFX "reconnect succeeded\n");
 
 	return ret;
 }
@@ -1366,10 +1417,11 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 	struct srp_cmd *cmd;
 	struct ib_device *dev;
 	unsigned long flags;
-	int len;
+	int len, result;
 
-	if (unlikely(target->transport_offline)) {
-		scmnd->result = DID_NO_CONNECT << 16;
+	result = srp_chkready(target->rport);
+	if (unlikely(result)) {
+		scmnd->result = result;
 		scmnd->scsi_done(scmnd);
 		return 0;
 	}
@@ -1768,7 +1820,7 @@ static int srp_abort(struct scsi_cmnd *scmnd)
 	if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
 			      SRP_TSK_ABORT_TASK) == 0)
 		ret = SUCCESS;
-	else if (target->transport_offline)
+	else if (target->rport->state == SRP_RPORT_LOST)
 		ret = FAST_IO_FAIL;
 	else
 		ret = FAILED;
@@ -1804,14 +1856,10 @@ static int srp_reset_device(struct scsi_cmnd *scmnd)
 static int srp_reset_host(struct scsi_cmnd *scmnd)
 {
 	struct srp_target_port *target = host_to_target(scmnd->device->host);
-	int ret = FAILED;
 
 	shost_printk(KERN_ERR, target->scsi_host, PFX "SRP reset_host called\n");
 
-	if (!srp_reconnect_target(target))
-		ret = SUCCESS;
-
-	return ret;
+	return srp_reconnect_rport(target->rport) == 0 ? SUCCESS : FAILED;
 }
 
 static int srp_slave_configure(struct scsi_device *sdev)
@@ -2606,6 +2654,11 @@ static void srp_remove_one(struct ib_device *device)
 }
 
 static struct srp_function_template ib_srp_transport_functions = {
+	.has_rport_state	 = true,
+	.reconnect_delay	 = &srp_reconnect_delay,
+	.fast_io_fail_tmo	 = &srp_fast_io_fail_tmo,
+	.dev_loss_tmo		 = &srp_dev_loss_tmo,
+	.reconnect		 = srp_rport_reconnect,
 	.rport_delete		 = srp_rport_delete,
 	.terminate_rport_io	 = srp_terminate_io,
 };
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index 1817ed5..fda82f7 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -140,7 +140,6 @@ struct srp_target_port {
 	unsigned int		cmd_sg_cnt;
 	unsigned int		indirect_size;
 	bool			allow_ext_sg;
-	bool			transport_offline;
 
 	/* Everything above this point is used in the hot path of
 	 * command processing. Try to keep them packed into cachelines.
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 11/15] IB/srp: Start timers if a transport layer error occurs
       [not found] ` <51CD856A.3010102-HInyCGIudOg@public.gmane.org>
                     ` (8 preceding siblings ...)
  2013-06-28 12:55   ` [PATCH v2 10/15] IB/srp: Use SRP transport layer error recovery Bart Van Assche
@ 2013-06-28 12:55   ` Bart Van Assche
       [not found]     ` <51CD87D7.3050300-HInyCGIudOg@public.gmane.org>
  2013-06-28 12:57   ` [PATCH v2 13/15] IB/srp: Make HCA completion vector configurable Bart Van Assche
                     ` (2 subsequent siblings)
  12 siblings, 1 reply; 47+ messages in thread
From: Bart Van Assche @ 2013-06-28 12:55 UTC (permalink / raw)
  To: Roland Dreier; +Cc: David Dillow, Vu Pham, Sebastian Riemer, linux-rdma

Start the reconnect timer, fast_io_fail timer and dev_loss timers
if a transport layer error occurs.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |   19 +++++++++++++++++++
 drivers/infiniband/ulp/srp/ib_srp.h |    1 +
 2 files changed, 20 insertions(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 197b310..a8cc427 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -595,6 +595,7 @@ static void srp_remove_target(struct srp_target_port *target)
 	srp_disconnect_target(target);
 	ib_destroy_cm_id(target->cm_id);
 	srp_free_target_ib(target);
+	cancel_work_sync(&target->tl_err_work);
 	srp_rport_put(target->rport);
 	srp_free_req_data(target);
 	scsi_host_put(target->scsi_host);
@@ -1365,6 +1366,21 @@ static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc)
 			     PFX "Recv failed with error code %d\n", res);
 }
 
+/**
+ * srp_tl_err_work() - handle a transport layer error
+ *
+ * Note: This function may get invoked before the rport has been created,
+ * hence the target->rport test.
+ */
+static void srp_tl_err_work(struct work_struct *work)
+{
+	struct srp_target_port *target;
+
+	target = container_of(work, struct srp_target_port, tl_err_work);
+	if (target->rport)
+		srp_start_tl_fail_timers(target->rport);
+}
+
 static void srp_handle_qp_err(enum ib_wc_status wc_status,
 			      enum ib_wc_opcode wc_opcode,
 			      struct srp_target_port *target)
@@ -1374,6 +1390,7 @@ static void srp_handle_qp_err(enum ib_wc_status wc_status,
 			     PFX "failed %s status %d\n",
 			     wc_opcode & IB_WC_RECV ? "receive" : "send",
 			     wc_status);
+		queue_work(system_long_wq, &target->tl_err_work);
 	}
 	target->qp_in_error = true;
 }
@@ -1734,6 +1751,7 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
 		if (ib_send_cm_drep(cm_id, NULL, 0))
 			shost_printk(KERN_ERR, target->scsi_host,
 				     PFX "Sending CM DREP failed\n");
+		queue_work(system_long_wq, &target->tl_err_work);
 		break;
 
 	case IB_CM_TIMEWAIT_EXIT:
@@ -2381,6 +2399,7 @@ static ssize_t srp_create_target(struct device *dev,
 			     sizeof (struct srp_indirect_buf) +
 			     target->cmd_sg_cnt * sizeof (struct srp_direct_buf);
 
+	INIT_WORK(&target->tl_err_work, srp_tl_err_work);
 	INIT_WORK(&target->remove_work, srp_remove_work);
 	spin_lock_init(&target->lock);
 	INIT_LIST_HEAD(&target->free_tx);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index fda82f7..e45d9d0 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -175,6 +175,7 @@ struct srp_target_port {
 	struct srp_iu	       *rx_ring[SRP_RQ_SIZE];
 	struct srp_request	req_ring[SRP_CMD_SQ_SIZE];
 
+	struct work_struct	tl_err_work;
 	struct work_struct	remove_work;
 
 	struct list_head	list;
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 12/15] IB/srp: Fail SCSI commands silently
  2013-06-28 12:45 [PATCH v2 0/15] IB SRP initiator patches for kernel 3.11 Bart Van Assche
       [not found] ` <51CD856A.3010102-HInyCGIudOg@public.gmane.org>
  2013-06-28 12:53 ` [PATCH v2 08/15] scsi_transport_srp: Add transport layer error handling Bart Van Assche
@ 2013-06-28 12:56 ` Bart Van Assche
       [not found]   ` <51CD8812.20107-HInyCGIudOg@public.gmane.org>
  2 siblings, 1 reply; 47+ messages in thread
From: Bart Van Assche @ 2013-06-28 12:56 UTC (permalink / raw)
  To: Roland Dreier
  Cc: David Dillow, Vu Pham, Sebastian Riemer, linux-rdma, linux-scsi

From: Sebastian Riemer <sebastian.riemer@profitbricks.com>

Avoid that path failover in a multipath setup causes the SCSI layer
to generate kernel messages about SCSI command failures. This patch
speeds up SRP initiator operation significantly when monitoring
kernel messages over a serial port.

[bvanassche: Changed patch description]
Signed-off-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@purestorage.com>
Cc: David Dillow <dillowda@ornl.gov>
Cc: Vu Pham <vu@mellanox.com>
---
 drivers/infiniband/ulp/srp/ib_srp.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index a8cc427..e77176e 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -757,6 +757,7 @@ static void srp_finish_req(struct srp_target_port *target,
 
 	if (scmnd) {
 		srp_free_req(target, req, scmnd, 0);
+		scmnd->request->cmd_flags |= REQ_QUIET;
 		scmnd->result = result;
 		scmnd->scsi_done(scmnd);
 	}
@@ -1438,6 +1439,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 
 	result = srp_chkready(target->rport);
 	if (unlikely(result)) {
+		scmnd->request->cmd_flags |= REQ_QUIET;
 		scmnd->result = result;
 		scmnd->scsi_done(scmnd);
 		return 0;
@@ -1843,6 +1845,7 @@ static int srp_abort(struct scsi_cmnd *scmnd)
 	else
 		ret = FAILED;
 	srp_free_req(target, req, scmnd, 0);
+	scmnd->request->cmd_flags |= REQ_QUIET;
 	scmnd->result = DID_ABORT << 16;
 	scmnd->scsi_done(scmnd);
 
-- 
1.7.10.4


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 13/15] IB/srp: Make HCA completion vector configurable
       [not found] ` <51CD856A.3010102-HInyCGIudOg@public.gmane.org>
                     ` (9 preceding siblings ...)
  2013-06-28 12:55   ` [PATCH v2 11/15] IB/srp: Start timers if a transport layer error occurs Bart Van Assche
@ 2013-06-28 12:57   ` Bart Van Assche
       [not found]     ` <51CD8846.4070400-HInyCGIudOg@public.gmane.org>
  2013-06-28 12:58   ` [PATCH v2 14/15] IB/srp: Make transport layer retry count configurable Bart Van Assche
  2013-06-28 12:59   ` [PATCH v2 15/15] IB/srp: Bump driver version and release date Bart Van Assche
  12 siblings, 1 reply; 47+ messages in thread
From: Bart Van Assche @ 2013-06-28 12:57 UTC (permalink / raw)
  To: Roland Dreier; +Cc: David Dillow, Vu Pham, Sebastian Riemer, linux-rdma

Several InfiniBand HCA's allow to configure the completion vector
per queue pair. This allows to spread the workload created by IB
completion interrupts over multiple MSI-X vectors and hence over
multiple CPU cores. In other words, configuring the completion
vector properly not only allows to reduce latency on an initiator
connected to multiple SRP targets but also allows to improve
throughput.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Roland Dreier <roland-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 Documentation/ABI/stable/sysfs-driver-ib_srp |    7 +++++++
 drivers/infiniband/ulp/srp/ib_srp.c          |   26 ++++++++++++++++++++++++--
 drivers/infiniband/ulp/srp/ib_srp.h          |    1 +
 3 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp b/Documentation/ABI/stable/sysfs-driver-ib_srp
index 481aae9..5c53d28 100644
--- a/Documentation/ABI/stable/sysfs-driver-ib_srp
+++ b/Documentation/ABI/stable/sysfs-driver-ib_srp
@@ -54,6 +54,13 @@ Description:	Interface for making ib_srp connect to a new target.
 		  ib_srp. Specifying a value that exceeds cmd_sg_entries is
 		  only safe with partial memory descriptor list support enabled
 		  (allow_ext_sg=1).
+		* comp_vector, a number in the range 0..n-1 specifying the
+		  MSI-X completion vector. Some HCA's allocate multiple (n)
+		  MSI-X vectors per HCA port. If the IRQ affinity masks of
+		  these interrupts have been configured such that each MSI-X
+		  interrupt is handled by a different CPU then the comp_vector
+		  parameter can be used to spread the SRP completion workload
+		  over multiple CPU's.
 
 What:		/sys/class/infiniband_srp/srp-<hca>-<port_number>/ibdev
 Date:		January 2, 2006
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index e77176e..93cb101 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -294,14 +294,16 @@ static int srp_create_target_ib(struct srp_target_port *target)
 		return -ENOMEM;
 
 	recv_cq = ib_create_cq(target->srp_host->srp_dev->dev,
-			       srp_recv_completion, NULL, target, SRP_RQ_SIZE, 0);
+			       srp_recv_completion, NULL, target, SRP_RQ_SIZE,
+			       target->comp_vector);
 	if (IS_ERR(recv_cq)) {
 		ret = PTR_ERR(recv_cq);
 		goto err;
 	}
 
 	send_cq = ib_create_cq(target->srp_host->srp_dev->dev,
-			       srp_send_completion, NULL, target, SRP_SQ_SIZE, 0);
+			       srp_send_completion, NULL, target, SRP_SQ_SIZE,
+			       target->comp_vector);
 	if (IS_ERR(send_cq)) {
 		ret = PTR_ERR(send_cq);
 		goto err_recv_cq;
@@ -1981,6 +1983,14 @@ static ssize_t show_local_ib_device(struct device *dev,
 	return sprintf(buf, "%s\n", target->srp_host->srp_dev->dev->name);
 }
 
+static ssize_t show_comp_vector(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	struct srp_target_port *target = host_to_target(class_to_shost(dev));
+
+	return sprintf(buf, "%d\n", target->comp_vector);
+}
+
 static ssize_t show_cmd_sg_entries(struct device *dev,
 				   struct device_attribute *attr, char *buf)
 {
@@ -2007,6 +2017,7 @@ static DEVICE_ATTR(req_lim,         S_IRUGO, show_req_lim,         NULL);
 static DEVICE_ATTR(zero_req_lim,    S_IRUGO, show_zero_req_lim,	   NULL);
 static DEVICE_ATTR(local_ib_port,   S_IRUGO, show_local_ib_port,   NULL);
 static DEVICE_ATTR(local_ib_device, S_IRUGO, show_local_ib_device, NULL);
+static DEVICE_ATTR(comp_vector,     S_IRUGO, show_comp_vector,     NULL);
 static DEVICE_ATTR(cmd_sg_entries,  S_IRUGO, show_cmd_sg_entries,  NULL);
 static DEVICE_ATTR(allow_ext_sg,    S_IRUGO, show_allow_ext_sg,    NULL);
 
@@ -2021,6 +2032,7 @@ static struct device_attribute *srp_host_attrs[] = {
 	&dev_attr_zero_req_lim,
 	&dev_attr_local_ib_port,
 	&dev_attr_local_ib_device,
+	&dev_attr_comp_vector,
 	&dev_attr_cmd_sg_entries,
 	&dev_attr_allow_ext_sg,
 	NULL
@@ -2145,6 +2157,7 @@ enum {
 	SRP_OPT_CMD_SG_ENTRIES	= 1 << 9,
 	SRP_OPT_ALLOW_EXT_SG	= 1 << 10,
 	SRP_OPT_SG_TABLESIZE	= 1 << 11,
+	SRP_OPT_COMP_VECTOR	= 1 << 12,
 	SRP_OPT_ALL		= (SRP_OPT_ID_EXT	|
 				   SRP_OPT_IOC_GUID	|
 				   SRP_OPT_DGID		|
@@ -2165,6 +2178,7 @@ static const match_table_t srp_opt_tokens = {
 	{ SRP_OPT_CMD_SG_ENTRIES,	"cmd_sg_entries=%u"	},
 	{ SRP_OPT_ALLOW_EXT_SG,		"allow_ext_sg=%u"	},
 	{ SRP_OPT_SG_TABLESIZE,		"sg_tablesize=%u"	},
+	{ SRP_OPT_COMP_VECTOR,		"comp_vector=%u"	},
 	{ SRP_OPT_ERR,			NULL 			}
 };
 
@@ -2320,6 +2334,14 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target)
 			target->sg_tablesize = token;
 			break;
 
+		case SRP_OPT_COMP_VECTOR:
+			if (match_int(args, &token) || token < 0) {
+				pr_warn("bad comp_vector parameter '%s'\n", p);
+				goto out;
+			}
+			target->comp_vector = token;
+			break;
+
 		default:
 			pr_warn("unknown parameter or missing value '%s' in target creation request\n",
 				p);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index e45d9d0..cbc0b14 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -156,6 +156,7 @@ struct srp_target_port {
 	char			target_name[32];
 	unsigned int		scsi_id;
 	unsigned int		sg_tablesize;
+	int			comp_vector;
 
 	struct ib_sa_path_rec	path;
 	__be16			orig_dgid[8];
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 14/15] IB/srp: Make transport layer retry count configurable
       [not found] ` <51CD856A.3010102-HInyCGIudOg@public.gmane.org>
                     ` (10 preceding siblings ...)
  2013-06-28 12:57   ` [PATCH v2 13/15] IB/srp: Make HCA completion vector configurable Bart Van Assche
@ 2013-06-28 12:58   ` Bart Van Assche
       [not found]     ` <51CD8876.9020307-HInyCGIudOg@public.gmane.org>
  2013-06-28 12:59   ` [PATCH v2 15/15] IB/srp: Bump driver version and release date Bart Van Assche
  12 siblings, 1 reply; 47+ messages in thread
From: Bart Van Assche @ 2013-06-28 12:58 UTC (permalink / raw)
  To: Roland Dreier; +Cc: David Dillow, Vu Pham, Sebastian Riemer, linux-rdma

From: Vu Pham <vuhuong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Allow the InfiniBand RC retry count to be configured by the user
as an option in the target login string. The transport layer
timeout in nanoseconds is computed as follows from the retry count:

rc_timeout = rc_retry_count * 4 * 4096 * (1 << qp->timeout)

The default value for tl_retry_count is changed from 7 into 2.
Hence with a qp->timedout value of 19 this patch reduces the
default transport layer timeout from about 60s to about 17s. The
purpose of this patch is to reduce the time needed for SCSI error
handling significantly and at the same time to avoid activating
the SCSI error handler on an IB path with a regular BER or due to
brief IB network congestion.

[bvanassche: Rewrote patch description]
Signed-off-by: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 Documentation/ABI/stable/sysfs-driver-ib_srp |    2 ++
 drivers/infiniband/ulp/srp/ib_srp.c          |   24 +++++++++++++++++++++++-
 drivers/infiniband/ulp/srp/ib_srp.h          |    1 +
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp b/Documentation/ABI/stable/sysfs-driver-ib_srp
index 5c53d28..18e9b27 100644
--- a/Documentation/ABI/stable/sysfs-driver-ib_srp
+++ b/Documentation/ABI/stable/sysfs-driver-ib_srp
@@ -61,6 +61,8 @@ Description:	Interface for making ib_srp connect to a new target.
 		  interrupt is handled by a different CPU then the comp_vector
 		  parameter can be used to spread the SRP completion workload
 		  over multiple CPU's.
+		* tl_retry_count, a number in the range 2..7 specifying the
+		  IB RC retry count.
 
 What:		/sys/class/infiniband_srp/srp-<hca>-<port_number>/ibdev
 Date:		January 2, 2006
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 93cb101..58a6af9 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -453,7 +453,7 @@ static int srp_send_req(struct srp_target_port *target)
 	req->param.responder_resources	      = 4;
 	req->param.remote_cm_response_timeout = 20;
 	req->param.local_cm_response_timeout  = 20;
-	req->param.retry_count 		      = 7;
+	req->param.retry_count                = target->tl_retry_count;
 	req->param.rnr_retry_count 	      = 7;
 	req->param.max_cm_retries 	      = 15;
 
@@ -1991,6 +1991,14 @@ static ssize_t show_comp_vector(struct device *dev,
 	return sprintf(buf, "%d\n", target->comp_vector);
 }
 
+static ssize_t show_tl_retry_count(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct srp_target_port *target = host_to_target(class_to_shost(dev));
+
+	return sprintf(buf, "%d\n", target->tl_retry_count);
+}
+
 static ssize_t show_cmd_sg_entries(struct device *dev,
 				   struct device_attribute *attr, char *buf)
 {
@@ -2018,6 +2026,7 @@ static DEVICE_ATTR(zero_req_lim,    S_IRUGO, show_zero_req_lim,	   NULL);
 static DEVICE_ATTR(local_ib_port,   S_IRUGO, show_local_ib_port,   NULL);
 static DEVICE_ATTR(local_ib_device, S_IRUGO, show_local_ib_device, NULL);
 static DEVICE_ATTR(comp_vector,     S_IRUGO, show_comp_vector,     NULL);
+static DEVICE_ATTR(tl_retry_count,  S_IRUGO, show_tl_retry_count,  NULL);
 static DEVICE_ATTR(cmd_sg_entries,  S_IRUGO, show_cmd_sg_entries,  NULL);
 static DEVICE_ATTR(allow_ext_sg,    S_IRUGO, show_allow_ext_sg,    NULL);
 
@@ -2033,6 +2042,7 @@ static struct device_attribute *srp_host_attrs[] = {
 	&dev_attr_local_ib_port,
 	&dev_attr_local_ib_device,
 	&dev_attr_comp_vector,
+	&dev_attr_tl_retry_count,
 	&dev_attr_cmd_sg_entries,
 	&dev_attr_allow_ext_sg,
 	NULL
@@ -2158,6 +2168,7 @@ enum {
 	SRP_OPT_ALLOW_EXT_SG	= 1 << 10,
 	SRP_OPT_SG_TABLESIZE	= 1 << 11,
 	SRP_OPT_COMP_VECTOR	= 1 << 12,
+	SRP_OPT_TL_RETRY_COUNT	= 1 << 13,
 	SRP_OPT_ALL		= (SRP_OPT_ID_EXT	|
 				   SRP_OPT_IOC_GUID	|
 				   SRP_OPT_DGID		|
@@ -2179,6 +2190,7 @@ static const match_table_t srp_opt_tokens = {
 	{ SRP_OPT_ALLOW_EXT_SG,		"allow_ext_sg=%u"	},
 	{ SRP_OPT_SG_TABLESIZE,		"sg_tablesize=%u"	},
 	{ SRP_OPT_COMP_VECTOR,		"comp_vector=%u"	},
+	{ SRP_OPT_TL_RETRY_COUNT,	"tl_retry_count=%u"	},
 	{ SRP_OPT_ERR,			NULL 			}
 };
 
@@ -2342,6 +2354,15 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target)
 			target->comp_vector = token;
 			break;
 
+		case SRP_OPT_TL_RETRY_COUNT:
+			if (match_int(args, &token) || token < 2 || token > 7) {
+				pr_warn("bad tl_retry_count parameter '%s' (must be a number between 2 and 7)\n",
+					p);
+				goto out;
+			}
+			target->tl_retry_count = token;
+			break;
+
 		default:
 			pr_warn("unknown parameter or missing value '%s' in target creation request\n",
 				p);
@@ -2396,6 +2417,7 @@ static ssize_t srp_create_target(struct device *dev,
 	target->cmd_sg_cnt	= cmd_sg_entries;
 	target->sg_tablesize	= indirect_sg_entries ? : cmd_sg_entries;
 	target->allow_ext_sg	= allow_ext_sg;
+	target->tl_retry_count	= 2;
 
 	ret = srp_parse_options(buf, target);
 	if (ret)
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index cbc0b14..446b045 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -157,6 +157,7 @@ struct srp_target_port {
 	unsigned int		scsi_id;
 	unsigned int		sg_tablesize;
 	int			comp_vector;
+	int			tl_retry_count;
 
 	struct ib_sa_path_rec	path;
 	__be16			orig_dgid[8];
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH v2 15/15] IB/srp: Bump driver version and release date
       [not found] ` <51CD856A.3010102-HInyCGIudOg@public.gmane.org>
                     ` (11 preceding siblings ...)
  2013-06-28 12:58   ` [PATCH v2 14/15] IB/srp: Make transport layer retry count configurable Bart Van Assche
@ 2013-06-28 12:59   ` Bart Van Assche
  12 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2013-06-28 12:59 UTC (permalink / raw)
  To: Roland Dreier; +Cc: David Dillow, Vu Pham, Sebastian Riemer, linux-rdma

From: Vu Pham <vuhuong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Signed-off-by: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 58a6af9..4fefbde 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -53,8 +53,8 @@
 
 #define DRV_NAME	"ib_srp"
 #define PFX		DRV_NAME ": "
-#define DRV_VERSION	"0.2"
-#define DRV_RELDATE	"November 1, 2005"
+#define DRV_VERSION	"1.0"
+#define DRV_RELDATE	"July 1, 2013"
 
 MODULE_AUTHOR("Roland Dreier");
 MODULE_DESCRIPTION("InfiniBand SCSI RDMA Protocol initiator "
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 02/15] IB/srp: Fix race between srp_queuecommand() and srp_claim_req()
       [not found]     ` <51CD8604.5010801-HInyCGIudOg@public.gmane.org>
@ 2013-06-28 14:42       ` Sebastian Riemer
       [not found]         ` <51CDA0CD.6060504-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
  2013-06-30 19:59       ` David Dillow
  1 sibling, 1 reply; 47+ messages in thread
From: Sebastian Riemer @ 2013-06-28 14:42 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Roland Dreier, David Dillow, Vu Pham, linux-rdma

On 28.06.2013 14:48, Bart Van Assche wrote:
> Avoid that srp_claim_command() can claim a command while
> srp_queuecommand() is still busy queueing the same command.
> Found this via source reading.

Nice, that's much less re-acquiring of the target lock in error case in
srp_queuecommand().
But if we have to change that many locations for srp_put_tx_iu() anyway,
wouldn't it make sense to rename it into __srp_put_tx_iu() as well?

Then we can also put a little description to it and it looks familiar
compared to __srp_get_tx_iu().

The description could look like follows:
/*
 * Return an IU and possible credit to the free pool
 *
 * Must be called with target->lock held to protect free_tx.
 */

I'm not sure if we still need that lockdep_assert_held() then. There is
no other location with lock debugging in ib_srp.

Cheers,
Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 02/15] IB/srp: Fix race between srp_queuecommand() and srp_claim_req()
       [not found]         ` <51CDA0CD.6060504-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
@ 2013-06-28 14:51           ` Bart Van Assche
       [not found]             ` <51CDA2E5.2010704-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Bart Van Assche @ 2013-06-28 14:51 UTC (permalink / raw)
  To: Sebastian Riemer; +Cc: Roland Dreier, David Dillow, Vu Pham, linux-rdma

On 06/28/13 16:42, Sebastian Riemer wrote:
> On 28.06.2013 14:48, Bart Van Assche wrote:
>> Avoid that srp_claim_command() can claim a command while
>> srp_queuecommand() is still busy queueing the same command.
>> Found this via source reading.
>
> Nice, that's much less re-acquiring of the target lock in error case in
> srp_queuecommand().
> But if we have to change that many locations for srp_put_tx_iu() anyway,
> wouldn't it make sense to rename it into __srp_put_tx_iu() as well?
>
> Then we can also put a little description to it and it looks familiar
> compared to __srp_get_tx_iu().
>
> The description could look like follows:
> /*
>   * Return an IU and possible credit to the free pool
>   *
>   * Must be called with target->lock held to protect free_tx.
>   */
>
> I'm not sure if we still need that lockdep_assert_held() then. There is
> no other location with lock debugging in ib_srp.

Hello Sebastian,

I don't have a strong opinion about either of these two topics.

If a function like __srp_get_tx_iu() would be introduced that would 
allow to drop only two spin_lock/spin_unlock call pairs. So introducing 
that function would probably add more lines of code than adding the 
spin_lock/spin_unlock pairs. Hence my choice not to introduce 
__srp_get_tx_iu().

Regarding the lockdep_assert_held() statement: the reason I introduced 
it instead of adding a comment above the function telling which locking 
is required is because a lockdep_assert_held() statement is verified at 
runtime on a system with a kernel in which lockdep support has been enabled.

Bart.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 02/15] IB/srp: Fix race between srp_queuecommand() and srp_claim_req()
       [not found]             ` <51CDA2E5.2010704-HInyCGIudOg@public.gmane.org>
@ 2013-06-28 15:08               ` Sebastian Riemer
  0 siblings, 0 replies; 47+ messages in thread
From: Sebastian Riemer @ 2013-06-28 15:08 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Roland Dreier, David Dillow, Vu Pham, linux-rdma

On 28.06.2013 16:51, Bart Van Assche wrote:
>> Nice, that's much less re-acquiring of the target lock in error case in
>> srp_queuecommand().
>> But if we have to change that many locations for srp_put_tx_iu() anyway,
>> wouldn't it make sense to rename it into __srp_put_tx_iu() as well?
>>
>> Then we can also put a little description to it and it looks familiar
>> compared to __srp_get_tx_iu().
>>
>> The description could look like follows:
>> /*
>>   * Return an IU and possible credit to the free pool
>>   *
>>   * Must be called with target->lock held to protect free_tx.
>>   */
>>
>> I'm not sure if we still need that lockdep_assert_held() then. There is
>> no other location with lock debugging in ib_srp.
> 
> Hello Sebastian,
> 
> I don't have a strong opinion about either of these two topics.
> 
> If a function like __srp_get_tx_iu() would be introduced that would
> allow to drop only two spin_lock/spin_unlock call pairs. So introducing
> that function would probably add more lines of code than adding the
> spin_lock/spin_unlock pairs. Hence my choice not to introduce
> __srp_get_tx_iu().
> 
> Regarding the lockdep_assert_held() statement: the reason I introduced
> it instead of adding a comment above the function telling which locking
> is required is because a lockdep_assert_held() statement is verified at
> runtime on a system with a kernel in which lockdep support has been
> enabled.

Hi Bart,

I just meant a rename into "__srp_put_tx_iu()" to show that locking is
required and not introducing a further wrapper function.

The other function of that kind "__srp_get_tx_iu()" also doesn't have a
wrapper function "srp_get_tx_iu()".

For me it doesn't make much difference how it is marked that locking is
required. I just wanted to point out that this method is new to ib_srp.

Cheers,
Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 02/15] IB/srp: Fix race between srp_queuecommand() and srp_claim_req()
       [not found]     ` <51CD8604.5010801-HInyCGIudOg@public.gmane.org>
  2013-06-28 14:42       ` Sebastian Riemer
@ 2013-06-30 19:59       ` David Dillow
       [not found]         ` <1372622347.12468.9.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
  1 sibling, 1 reply; 47+ messages in thread
From: David Dillow @ 2013-06-30 19:59 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Roland Dreier, Vu Pham, Sebastian Riemer, linux-rdma

On Fri, 2013-06-28 at 14:48 +0200, Bart Van Assche wrote:
> Avoid that srp_claim_command() can claim a command while

I think you meant srp_claim_req() ?

> srp_queuecommand() is still busy queueing the same command.
> Found this via source reading.

The main issue seems to be a broken, possibly malicious, target? I'm not
sure I see another path to get into this race, as the scsi layer should
protect us from queue submission and error handling at the same time,
correct?

I'm not keen on giving up the command submission concurrency we worked
so hard to get -- you completely serialize command submission again. If
we really need to protect against this race -- and we probably should,
to guard against broken targets -- I think we can achieve the
concurrency and safety by improving the receive and error handling sides
-- especially if we can push it into error handling, as that's already a
slow path.

Let's push this patch to the back of the pile for any respin; I don't
think the others rely on it (haven't gotten there yet) and would rather
not hold them up for this.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 03/15] IB/srp: Avoid that srp_reset_host() is skipped after a TL error
       [not found]     ` <51CD8644.5080600-HInyCGIudOg@public.gmane.org>
@ 2013-06-30 20:00       ` David Dillow
  0 siblings, 0 replies; 47+ messages in thread
From: David Dillow @ 2013-06-30 20:00 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Roland Dreier, Vu Pham, Sebastian Riemer, linux-rdma

On Fri, 2013-06-28 at 14:49 +0200, Bart Van Assche wrote:
> The SCSI error handler assumes that the transport layer is
> operational if an eh_abort_handler() returns SUCCESS. Hence let
> srp_abort() only return SUCCESS if sending the ABORT TASK task
> management function succeeded. This patch avoids that the SCSI
> error handler skips the srp_reset_host() call after a transport
> layer error.

Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 04/15] IB/srp: Fail I/O fast if target offline
       [not found]     ` <51CD8676.6080205-HInyCGIudOg@public.gmane.org>
@ 2013-06-30 20:02       ` David Dillow
  2013-07-01  9:07       ` Sebastian Riemer
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 47+ messages in thread
From: David Dillow @ 2013-06-30 20:02 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Roland Dreier, Vu Pham, Sebastian Riemer, linux-rdma

On Fri, 2013-06-28 at 14:49 +0200, Bart Van Assche wrote:
> If reconnecting failed we know that no command completion will
> be received anymore. Hence let the SCSI error handler fail such
> commands immediately.

Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 06/15] IB/srp: Maintain a single connection per I_T nexus
       [not found]     ` <51CD86CE.8080804-HInyCGIudOg@public.gmane.org>
@ 2013-06-30 20:10       ` David Dillow
  0 siblings, 0 replies; 47+ messages in thread
From: David Dillow @ 2013-06-30 20:10 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, Vu Pham, Sebastian Riemer, linux-rdma, linux-scsi

On Fri, 2013-06-28 at 14:51 +0200, Bart Van Assche wrote:
> An SRP target is required to maintain a single connection between
> initiator and target. This means that if the 'add_target' attribute
> is used to create a second connection to a target that the first
> connection will be logged out and that the SCSI error handler will
> kick in. The SCSI error handler will cause the SRP initiator to
> reconnect, which will cause I/O over the second connection to fail.
> Avoid such ping-pong behavior by disabling relogins. Note: if
> reconnecting manually is necessary, that is possible by deleting
> and recreating an rport via sysfs.

Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 08/15] scsi_transport_srp: Add transport layer error handling
  2013-06-28 12:53 ` [PATCH v2 08/15] scsi_transport_srp: Add transport layer error handling Bart Van Assche
@ 2013-06-30 21:05   ` David Dillow
       [not found]     ` <1372626334.12468.34.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: David Dillow @ 2013-06-30 21:05 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, Vu Pham, Sebastian Riemer, linux-rdma, linux-scsi,
	James Bottomley

On Fri, 2013-06-28 at 14:53 +0200, Bart Van Assche wrote:
> Add the necessary functions in the SRP transport module to allow
> an SRP initiator driver to implement transport layer error handling
> similar to the functionality already provided by the FC transport
> layer. This includes:
> - Support for implementing fast_io_fail_tmo, the time that should
>   elapse after having detected a transport layer problem and
>   before failing I/O.
> - Support for implementing dev_loss_tmo, the time that should
>   elapse after having detected a transport layer problem and
>   before removing a remote port.
> 
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Cc: Roland Dreier <roland@purestorage.com>
> Cc: James Bottomley <JBottomley@Parallels.com>
> Cc: David Dillow <dillowda@ornl.gov>
> Cc: Vu Pham <vu@mellanox.com>
> Cc: Sebastian Riemer <sebastian.riemer@profitbricks.com>
> ---
>  Documentation/ABI/stable/sysfs-transport-srp |   37 ++
>  drivers/scsi/scsi_transport_srp.c            |  477 +++++++++++++++++++++++++-
>  include/scsi/scsi_transport_srp.h            |   62 +++-
>  3 files changed, 573 insertions(+), 3 deletions(-)

> +/**
> + * srp_tmo_valid() - check timeout combination validity
> + *
> + * If both a fast I/O fail and a device loss timeout have been configured then
> + * the fast I/O fail timeout must be below the device loss timeout.
> + */
> +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
> +{
> +	return (fast_io_fail_tmo < 0 || dev_loss_tmo < 0 ||
> +		fast_io_fail_tmo < dev_loss_tmo) &&
> +		fast_io_fail_tmo < LONG_MAX / HZ &&
> +		dev_loss_tmo < LONG_MAX / HZ ? 0 : -EINVAL;
> +}

There's really no need to fit it in one line to save space -- it makes
it much easier to read if you unpack it and return -EINVAL for the
problem cases.

They should also be capped by SCSI_DEVICE_BLOCK_MAX_TIMEOUT instead of
LONG_MAX / HZ, I think.


> +static ssize_t store_reconnect_delay(struct device *dev,
> +				     struct device_attribute *attr,
> +				     const char *buf, const size_t count)
> +{
> +	struct srp_rport *rport = transport_class_to_srp_rport(dev);
> +	char ch[16];
> +	int res, delay;
> +
> +	sprintf(ch, "%.*s", min_t(int, sizeof(ch) - 1, count), buf);
> +	res = kstrtoint(ch, 0, &delay);
> +	if (res)
> +		goto out;

I don't see the need for the sprintf? kstrtoint() will give you -ERANGE
if the value doesn't fit in an int...

Ah, NULL terminated string... except sysfs guarantees this for us for
the store method -- see fill_write_buffer() in fs/sysfs.c.

> +static ssize_t store_srp_rport_fast_io_fail_tmo(struct device *dev,
> +						struct device_attribute *attr,
> +						const char *buf, size_t count)
> +{
> +	struct srp_rport *rport = transport_class_to_srp_rport(dev);
> +	char ch[16], *p;
> +	int res;
> +	int fast_io_fail_tmo;
> +
> +	sprintf(ch, "%.*s", min_t(int, sizeof(ch) - 1, count), buf);
> +	p = strchr(ch, '\n');
> +	if (p)
> +		*p = '\0';

Again, no need for the sprintf if you don't modify the buffer? Instead
of using strchr() to make the strcmp() work with newlines, just do

if (!strcmp(buf, "off") || !strcmp(buf, "off\n")) {
	fast_io_fail_tmo = 1;
} else {
	res = kstrtoint(buf, 0, &fast_io_fail_tmo);
...

instead?

Same comment applies for store_srp_rport_dev_loss_tmo().

I haven't fully gone over the logic just yet, so I'm relying a bit on
Hannes and Mike's discussion from the last round. This looks to be
pretty reasonably sorted, though, based on the level of review I've been
able to do so far...



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 07/15] IB/srp: Keep rport as long as the IB transport layer
  2013-06-28 12:52   ` [PATCH v2 07/15] IB/srp: Keep rport as long as the IB transport layer Bart Van Assche
@ 2013-06-30 21:06     ` David Dillow
  0 siblings, 0 replies; 47+ messages in thread
From: David Dillow @ 2013-06-30 21:06 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, Vu Pham, Sebastian Riemer, linux-rdma, linux-scsi,
	James Bottomley

On Fri, 2013-06-28 at 14:52 +0200, Bart Van Assche wrote:
> Keep the rport data structure around after srp_remove_host() has
> finished until cleanup of the IB transport layer has finished
> completely. This is necessary because later patches use the rport
> pointer inside the queuecommand callback. Without this patch
> accessing the rport from inside a queuecommand callback is racy
> because srp_remove_host() must be invoked before scsi_remove_host()
> and because the queuecommand callback may get invoked after
> srp_remove_host() has finished.

Assuming the patches that require this are eventually accepted,

Acked-by: David Dillow <dillowda@ornl.gov>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 09/15] IB/srp: Add srp_terminate_io()
       [not found]     ` <51CD877E.80606-HInyCGIudOg@public.gmane.org>
@ 2013-06-30 21:10       ` David Dillow
  0 siblings, 0 replies; 47+ messages in thread
From: David Dillow @ 2013-06-30 21:10 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Roland Dreier, Vu Pham, Sebastian Riemer, linux-rdma

On Fri, 2013-06-28 at 14:54 +0200, Bart Van Assche wrote:
> Finish all outstanding I/O requests after fast_io_fail_tmo expired,
> which speeds up failover in a multipath setup. This patch is a
> reworked version of a patch from Sebastian Riemer.

> -static void srp_reset_req(struct srp_target_port *target, struct srp_request *req)
> +static void srp_finish_req(struct srp_target_port *target,
> +			   struct srp_request *req, int result)
>  {
>  	struct scsi_cmnd *scmnd = srp_claim_req(target, req, NULL);
>  
>  	if (scmnd) {
>  		srp_free_req(target, req, scmnd, 0);
> -		scmnd->result = DID_RESET << 16;
> +		scmnd->result = result;

Rather than have all the callers do the shift, just have
srp_finish_req() do it.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 10/15] IB/srp: Use SRP transport layer error recovery
       [not found]     ` <51CD87A9.2090702-HInyCGIudOg@public.gmane.org>
@ 2013-06-30 21:20       ` David Dillow
  0 siblings, 0 replies; 47+ messages in thread
From: David Dillow @ 2013-06-30 21:20 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Roland Dreier, Vu Pham, Sebastian Riemer, linux-rdma

On Fri, 2013-06-28 at 14:55 +0200, Bart Van Assche wrote:
> Enable reconnect_delay, fast_io_fail_tmo and dev_loss_tmo
> functionality for the IB SRP initiator. Add kernel module
> parameters that allow to specify default values for these
> three parameters.

> +static int srp_tmo_set(const char *val, const struct kernel_param *kp)
> +{
> +	int tmo, res;
> +
> +	if (strncmp(val, "off", 3) != 0) {

Trivial nit: this allows 'offended' to turn off the functionality.
Perhaps use two strcmp() to deal with the possible newline? Of course, I
think everyone else follows this pattern so, as I said trvial nit.

Assuming we accept patch 8, 
Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 11/15] IB/srp: Start timers if a transport layer error occurs
       [not found]     ` <51CD87D7.3050300-HInyCGIudOg@public.gmane.org>
@ 2013-06-30 21:21       ` David Dillow
  0 siblings, 0 replies; 47+ messages in thread
From: David Dillow @ 2013-06-30 21:21 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Roland Dreier, Vu Pham, Sebastian Riemer, linux-rdma

On Fri, 2013-06-28 at 14:55 +0200, Bart Van Assche wrote:
> Start the reconnect timer, fast_io_fail timer and dev_loss timers
> if a transport layer error occurs.

Assuming 8 is applied,
Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 12/15] IB/srp: Fail SCSI commands silently
       [not found]   ` <51CD8812.20107-HInyCGIudOg@public.gmane.org>
@ 2013-06-30 21:25     ` David Dillow
  0 siblings, 0 replies; 47+ messages in thread
From: David Dillow @ 2013-06-30 21:25 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, Vu Pham, Sebastian Riemer, linux-rdma, linux-scsi

On Fri, 2013-06-28 at 14:56 +0200, Bart Van Assche wrote:
> From: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
> 
> Avoid that path failover in a multipath setup causes the SCSI layer
> to generate kernel messages about SCSI command failures. This patch
> speeds up SRP initiator operation significantly when monitoring
> kernel messages over a serial port.

I'm not sure this should be fixed in the drivers -- this seems like an
issue that should be handled/configured in the core based on the host
result code.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 13/15] IB/srp: Make HCA completion vector configurable
       [not found]     ` <51CD8846.4070400-HInyCGIudOg@public.gmane.org>
@ 2013-06-30 21:26       ` David Dillow
  0 siblings, 0 replies; 47+ messages in thread
From: David Dillow @ 2013-06-30 21:26 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Roland Dreier, Vu Pham, Sebastian Riemer, linux-rdma

On Fri, 2013-06-28 at 14:57 +0200, Bart Van Assche wrote:
> Several InfiniBand HCA's allow to configure the completion vector
> per queue pair. This allows to spread the workload created by IB
> completion interrupts over multiple MSI-X vectors and hence over
> multiple CPU cores. In other words, configuring the completion
> vector properly not only allows to reduce latency on an initiator
> connected to multiple SRP targets but also allows to improve
> throughput.

Acked-by: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 14/15] IB/srp: Make transport layer retry count configurable
       [not found]     ` <51CD8876.9020307-HInyCGIudOg@public.gmane.org>
@ 2013-06-30 21:48       ` David Dillow
       [not found]         ` <1372628891.12468.52.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: David Dillow @ 2013-06-30 21:48 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Roland Dreier, Vu Pham, Sebastian Riemer, linux-rdma

On Fri, 2013-06-28 at 14:58 +0200, Bart Van Assche wrote:
> From: Vu Pham <vuhuong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> 
> Allow the InfiniBand RC retry count to be configured by the user
> as an option in the target login string. The transport layer
> timeout in nanoseconds is computed as follows from the retry count:
> 
> rc_timeout = rc_retry_count * 4 * 4096 * (1 << qp->timeout)
> 
> The default value for tl_retry_count is changed from 7 into 2.
> Hence with a qp->timedout value of 19 this patch reduces the
> default transport layer timeout from about 60s to about 17s. The
> purpose of this patch is to reduce the time needed for SCSI error
> handling significantly and at the same time to avoid activating
> the SCSI error handler on an IB path with a regular BER or due to
> brief IB network congestion.

I keep vacillating between preserving the default of 7 and opting for
easier/optimized configuration for the common case. It my internal
argument over this today, I wondered about changing the QP timeout
instead -- doesn't that achieve your goals of allowing for errors and
network congestion while optimizing for a reasonable fabric? Going from
19 to 17 drops the timeout by about the same amount, while allowing for
more errors.

I agree that one or both of the items should be configurable, but I'm
still worried about changing the defaults, given the feed back from
those that want to use IB over the WAN.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 08/15] scsi_transport_srp: Add transport layer error handling
       [not found]     ` <1372626334.12468.34.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
@ 2013-07-01  7:01       ` Bart Van Assche
       [not found]         ` <51D12941.3050105-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Bart Van Assche @ 2013-07-01  7:01 UTC (permalink / raw)
  To: David Dillow
  Cc: Roland Dreier, Vu Pham, Sebastian Riemer, linux-rdma, linux-scsi,
	James Bottomley

On 06/30/13 23:05, David Dillow wrote:
> On Fri, 2013-06-28 at 14:53 +0200, Bart Van Assche wrote:
>> +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
>> +{
>> +	return (fast_io_fail_tmo < 0 || dev_loss_tmo < 0 ||
>> +		fast_io_fail_tmo < dev_loss_tmo) &&
>> +		fast_io_fail_tmo < LONG_MAX / HZ &&
>> +		dev_loss_tmo < LONG_MAX / HZ ? 0 : -EINVAL;
>> +}
>
> They should also be capped by SCSI_DEVICE_BLOCK_MAX_TIMEOUT instead of
> LONG_MAX / HZ, I think.

The fast_io_fail_tmo should indeed be capped by that value. However, I'm 
not sure about dev_loss_tmo. I think there are several use cases (e.g. 
initiator-side mirroring) where it's useful to set dev_loss_tmo to a 
larger value than ten minutes.

>> +static ssize_t store_srp_rport_fast_io_fail_tmo(struct device *dev,
>> +						struct device_attribute *attr,
>> +						const char *buf, size_t count)
>> +{
>> +	struct srp_rport *rport = transport_class_to_srp_rport(dev);
>> +	char ch[16], *p;
>> +	int res;
>> +	int fast_io_fail_tmo;
>> +
>> +	sprintf(ch, "%.*s", min_t(int, sizeof(ch) - 1, count), buf);
>> +	p = strchr(ch, '\n');
>> +	if (p)
>> +		*p = '\0';
>
> Again, no need for the sprintf if you don't modify the buffer? Instead
> of using strchr() to make the strcmp() work with newlines, just do
>
> if (!strcmp(buf, "off") || !strcmp(buf, "off\n")) {
> 	fast_io_fail_tmo = 1;
> } else {
> 	res = kstrtoint(buf, 0, &fast_io_fail_tmo);
> ...
>
> instead?
>
> Same comment applies for store_srp_rport_dev_loss_tmo().

OK, will remove the temporary char arrays.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 02/15] IB/srp: Fix race between srp_queuecommand() and srp_claim_req()
       [not found]         ` <1372622347.12468.9.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
@ 2013-07-01  7:10           ` Bart Van Assche
  0 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2013-07-01  7:10 UTC (permalink / raw)
  To: David Dillow; +Cc: Roland Dreier, Vu Pham, Sebastian Riemer, linux-rdma

On 06/30/13 21:59, David Dillow wrote:
> Let's push this patch to the back of the pile for any respin; I don't
> think the others rely on it (haven't gotten there yet) and would rather
> not hold them up for this.

I'll drop this patch for now. It's a race I have not yet been able to 
trigger anyway.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 14/15] IB/srp: Make transport layer retry count configurable
       [not found]         ` <1372628891.12468.52.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
@ 2013-07-01  8:18           ` Bart Van Assche
       [not found]             ` <51D13B52.5060803-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Bart Van Assche @ 2013-07-01  8:18 UTC (permalink / raw)
  To: David Dillow; +Cc: Roland Dreier, Vu Pham, Sebastian Riemer, linux-rdma

On 06/30/13 23:48, David Dillow wrote:
> On Fri, 2013-06-28 at 14:58 +0200, Bart Van Assche wrote:
>> From: Vu Pham <vuhuong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>
>> Allow the InfiniBand RC retry count to be configured by the user
>> as an option in the target login string. The transport layer
>> timeout in nanoseconds is computed as follows from the retry count:
>>
>> rc_timeout = rc_retry_count * 4 * 4096 * (1 << qp->timeout)
>>
>> The default value for tl_retry_count is changed from 7 into 2.
>> Hence with a qp->timedout value of 19 this patch reduces the
>> default transport layer timeout from about 60s to about 17s. The
>> purpose of this patch is to reduce the time needed for SCSI error
>> handling significantly and at the same time to avoid activating
>> the SCSI error handler on an IB path with a regular BER or due to
>> brief IB network congestion.
>
> I keep vacillating between preserving the default of 7 and opting for
> easier/optimized configuration for the common case. It my internal
> argument over this today, I wondered about changing the QP timeout
> instead -- doesn't that achieve your goals of allowing for errors and
> network congestion while optimizing for a reasonable fabric? Going from
> 19 to 17 drops the timeout by about the same amount, while allowing for
> more errors.
>
> I agree that one or both of the items should be configurable, but I'm
> still worried about changing the defaults, given the feed back from
> those that want to use IB over the WAN.

The InfiniBand specification mentions the following about differential 
receiver inputs (C6-11.2.1): "A BER of 10^-12 shall be achieved when
connected to the worst case transmitter through any compliant channel". 
The maximum packet size for an InfiniBand packet is about 4 KB (see also 
section 7.7.8 in the spec). This means that with an 8b/10b encoding the 
chance to lose a packet over a single link due to bit errors is about 
4*10^-8. So the chance to lose a packet over a network consisting of n 
links with retry count r is about (n*4*10^-8)^r. With r=2 that results 
already in a really low value, even with multiple links. Since lowering 
the QP timeout might make congestion worse my preference is to lower the 
retry count.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 04/15] IB/srp: Fail I/O fast if target offline
       [not found]     ` <51CD8676.6080205-HInyCGIudOg@public.gmane.org>
  2013-06-30 20:02       ` David Dillow
@ 2013-07-01  9:07       ` Sebastian Riemer
       [not found]         ` <51D146EE.6010209-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
  2013-07-01  9:25       ` Sebastian Riemer
  2013-07-02  8:30       ` Sebastian Riemer
  3 siblings, 1 reply; 47+ messages in thread
From: Sebastian Riemer @ 2013-07-01  9:07 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Roland Dreier, David Dillow, Vu Pham, linux-rdma

On 28.06.2013 14:49, Bart Van Assche wrote:
> If reconnecting failed we know that no command completion will
> be received anymore. Hence let the SCSI error handler fail such
> commands immediately.
> 
> Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
> Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
> Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
> Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
> Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> ---
>  drivers/infiniband/ulp/srp/ib_srp.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index 8c95262..5c91521 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -1755,6 +1755,8 @@ static int srp_abort(struct scsi_cmnd *scmnd)
>  	if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
>  			      SRP_TSK_ABORT_TASK) == 0)
>  		ret = SUCCESS;
> +	else if (target->transport_offline)
> +		ret = FAST_IO_FAIL;
>  	else
>  		ret = FAILED;
>  	srp_free_req(target, req, scmnd, 0);
> 

This doesn't give us much speed advantage IMHO. The check for
target->transport_offline should be before calling srp_send_tsk_mgmt().

This way it would also match the patch description better.

Cheers,
Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 04/15] IB/srp: Fail I/O fast if target offline
       [not found]     ` <51CD8676.6080205-HInyCGIudOg@public.gmane.org>
  2013-06-30 20:02       ` David Dillow
  2013-07-01  9:07       ` Sebastian Riemer
@ 2013-07-01  9:25       ` Sebastian Riemer
       [not found]         ` <51D14AF1.4000803-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
  2013-07-02  8:30       ` Sebastian Riemer
  3 siblings, 1 reply; 47+ messages in thread
From: Sebastian Riemer @ 2013-07-01  9:25 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Roland Dreier, David Dillow, Vu Pham, linux-rdma

On 28.06.2013 14:49, Bart Van Assche wrote:
> If reconnecting failed we know that no command completion will
> be received anymore. Hence let the SCSI error handler fail such
> commands immediately.
> 
> Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
> Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
> Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
> Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
> Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> ---
>  drivers/infiniband/ulp/srp/ib_srp.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index 8c95262..5c91521 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -1755,6 +1755,8 @@ static int srp_abort(struct scsi_cmnd *scmnd)
>  	if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
>  			      SRP_TSK_ABORT_TASK) == 0)
>  		ret = SUCCESS;
> +	else if (target->transport_offline)
> +		ret = FAST_IO_FAIL;
>  	else
>  		ret = FAILED;
>  	srp_free_req(target, req, scmnd, 0);
> 

I'm also missing the concept for srp_reset_device(). There is a very
common case that the SCSI error handling and the transport layer error
handling run in parallel: Congestion.

In congestion some LUNs are blocked while others can still transmit. A
little bit later the QP timeout triggers in the middle of the SCSI error
handling in srp_abort(), srp_reset_device() or less likely in
srp_reset_host().

Cheers,
Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 08/15] scsi_transport_srp: Add transport layer error handling
       [not found]         ` <51D12941.3050105-HInyCGIudOg@public.gmane.org>
@ 2013-07-01 11:19           ` David Dillow
  0 siblings, 0 replies; 47+ messages in thread
From: David Dillow @ 2013-07-01 11:19 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Roland Dreier, Vu Pham, Sebastian Riemer, linux-rdma, linux-scsi,
	James Bottomley

On Mon, 2013-07-01 at 09:01 +0200, Bart Van Assche wrote:
> On 06/30/13 23:05, David Dillow wrote:
> > On Fri, 2013-06-28 at 14:53 +0200, Bart Van Assche wrote:
> >> +int srp_tmo_valid(int fast_io_fail_tmo, int dev_loss_tmo)
> >> +{
> >> +	return (fast_io_fail_tmo < 0 || dev_loss_tmo < 0 ||
> >> +		fast_io_fail_tmo < dev_loss_tmo) &&
> >> +		fast_io_fail_tmo < LONG_MAX / HZ &&
> >> +		dev_loss_tmo < LONG_MAX / HZ ? 0 : -EINVAL;
> >> +}
> >
> > They should also be capped by SCSI_DEVICE_BLOCK_MAX_TIMEOUT instead of
> > LONG_MAX / HZ, I think.
> 
> The fast_io_fail_tmo should indeed be capped by that value. However, I'm 
> not sure about dev_loss_tmo. I think there are several use cases (e.g. 
> initiator-side mirroring) where it's useful to set dev_loss_tmo to a 
> larger value than ten minutes.

Ah, yes, very good point.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 14/15] IB/srp: Make transport layer retry count configurable
       [not found]             ` <51D13B52.5060803-HInyCGIudOg@public.gmane.org>
@ 2013-07-01 11:26               ` David Dillow
       [not found]                 ` <1372677965.12468.57.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: David Dillow @ 2013-07-01 11:26 UTC (permalink / raw)
  To: Bart Van Assche, Jason Gunthorpe
  Cc: Roland Dreier, Vu Pham, Sebastian Riemer, linux-rdma

On Mon, 2013-07-01 at 10:18 +0200, Bart Van Assche wrote:
> On 06/30/13 23:48, David Dillow wrote:
> > On Fri, 2013-06-28 at 14:58 +0200, Bart Van Assche wrote:
> >> From: Vu Pham <vuhuong-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> >>
> >> Allow the InfiniBand RC retry count to be configured by the user
> >> as an option in the target login string. The transport layer
> >> timeout in nanoseconds is computed as follows from the retry count:
> >>
> >> rc_timeout = rc_retry_count * 4 * 4096 * (1 << qp->timeout)
> >>
> >> The default value for tl_retry_count is changed from 7 into 2.
> >> Hence with a qp->timedout value of 19 this patch reduces the
> >> default transport layer timeout from about 60s to about 17s. The
> >> purpose of this patch is to reduce the time needed for SCSI error
> >> handling significantly and at the same time to avoid activating
> >> the SCSI error handler on an IB path with a regular BER or due to
> >> brief IB network congestion.
> >
> > I keep vacillating between preserving the default of 7 and opting for
> > easier/optimized configuration for the common case. It my internal
> > argument over this today, I wondered about changing the QP timeout
> > instead -- doesn't that achieve your goals of allowing for errors and
> > network congestion while optimizing for a reasonable fabric? Going from
> > 19 to 17 drops the timeout by about the same amount, while allowing for
> > more errors.
> >
> > I agree that one or both of the items should be configurable, but I'm
> > still worried about changing the defaults, given the feed back from
> > those that want to use IB over the WAN.
> 
> The InfiniBand specification mentions the following about differential 
> receiver inputs (C6-11.2.1): "A BER of 10^-12 shall be achieved when
> connected to the worst case transmitter through any compliant channel".

There are a _lot_ of networks out there with non-compliant channels,
then. The spec says one thing, but the reality matters. Also, perhaps my
math is wrong, but that seems to be one error every 18 seconds or so at
FDR speeds?

> The maximum packet size for an InfiniBand packet is about 4 KB (see also 
> section 7.7.8 in the spec). This means that with an 8b/10b encoding the 
> chance to lose a packet over a single link due to bit errors is about 
> 4*10^-8. So the chance to lose a packet over a network consisting of n 
> links with retry count r is about (n*4*10^-8)^r. With r=2 that results 
> already in a really low value, even with multiple links. Since lowering 
> the QP timeout might make congestion worse my preference is to lower the 
> retry count.

You assume independent failures, which is suspect -- many times these
are data-dependent, or so I tend to think. Jason, do you have any
insight on this (overall) topic you could share?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 04/15] IB/srp: Fail I/O fast if target offline
       [not found]         ` <51D146EE.6010209-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
@ 2013-07-01 11:33           ` Bart Van Assche
       [not found]             ` <51D16918.60600-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Bart Van Assche @ 2013-07-01 11:33 UTC (permalink / raw)
  To: Sebastian Riemer; +Cc: Roland Dreier, David Dillow, Vu Pham, linux-rdma

On 07/01/13 11:07, Sebastian Riemer wrote:
> On 28.06.2013 14:49, Bart Van Assche wrote:
>> If reconnecting failed we know that no command completion will
>> be received anymore. Hence let the SCSI error handler fail such
>> commands immediately.
>>
>> Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
>> Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
>> Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
>> Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
>> Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> ---
>>   drivers/infiniband/ulp/srp/ib_srp.c |    2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
>> index 8c95262..5c91521 100644
>> --- a/drivers/infiniband/ulp/srp/ib_srp.c
>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
>> @@ -1755,6 +1755,8 @@ static int srp_abort(struct scsi_cmnd *scmnd)
>>   	if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
>>   			      SRP_TSK_ABORT_TASK) == 0)
>>   		ret = SUCCESS;
>> +	else if (target->transport_offline)
>> +		ret = FAST_IO_FAIL;
>>   	else
>>   		ret = FAILED;
>>   	srp_free_req(target, req, scmnd, 0);
>>
>
> This doesn't give us much speed advantage IMHO. The check for
> target->transport_offline should be before calling srp_send_tsk_mgmt().
>
> This way it would also match the patch description better.

Hello Sebastian,

Had you perhaps overlooked the following code at the start of 
srp_send_tsk_mgmt() ?

	if (!target->connected || target->qp_in_error)
		return -1;

Given this I don't think it matters whether the transport_offline check 
occurs before or after the srp_send_tsk_mgmt() call.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 04/15] IB/srp: Fail I/O fast if target offline
       [not found]         ` <51D14AF1.4000803-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
@ 2013-07-01 11:38           ` Bart Van Assche
       [not found]             ` <51D16A39.4050709-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Bart Van Assche @ 2013-07-01 11:38 UTC (permalink / raw)
  To: Sebastian Riemer; +Cc: Roland Dreier, David Dillow, Vu Pham, linux-rdma

On 07/01/13 11:25, Sebastian Riemer wrote:
> On 28.06.2013 14:49, Bart Van Assche wrote:
>> If reconnecting failed we know that no command completion will
>> be received anymore. Hence let the SCSI error handler fail such
>> commands immediately.
>>
>> Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
>> Cc: Roland Dreier <roland-BHEL68pLQRGGvPXPguhicg@public.gmane.org>
>> Cc: David Dillow <dillowda-1Heg1YXhbW8@public.gmane.org>
>> Cc: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
>> Cc: Vu Pham <vu-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> ---
>>   drivers/infiniband/ulp/srp/ib_srp.c |    2 ++
>>   1 file changed, 2 insertions(+)
>>
>> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
>> index 8c95262..5c91521 100644
>> --- a/drivers/infiniband/ulp/srp/ib_srp.c
>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
>> @@ -1755,6 +1755,8 @@ static int srp_abort(struct scsi_cmnd *scmnd)
>>   	if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
>>   			      SRP_TSK_ABORT_TASK) == 0)
>>   		ret = SUCCESS;
>> +	else if (target->transport_offline)
>> +		ret = FAST_IO_FAIL;
>>   	else
>>   		ret = FAILED;
>>   	srp_free_req(target, req, scmnd, 0);
>
> I'm also missing the concept for srp_reset_device(). There is a very
> common case that the SCSI error handling and the transport layer error
> handling run in parallel: Congestion.

Can you explain this comment further, and also how this comment relates 
to patch 04/15 ?

> In congestion some LUNs are blocked while others can still transmit. A
> little bit later the QP timeout triggers in the middle of the SCSI error
> handling in srp_abort(), srp_reset_device() or less likely in
> srp_reset_host().

I am aware this can result in concurrent srp_reconnect_rport() calls. 
However, such concurrent calls are serialized via rport->mutex.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 14/15] IB/srp: Make transport layer retry count configurable
       [not found]                 ` <1372677965.12468.57.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
@ 2013-07-01 11:44                   ` Bart Van Assche
  2013-07-02 19:18                   ` Jason Gunthorpe
  1 sibling, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2013-07-01 11:44 UTC (permalink / raw)
  To: David Dillow
  Cc: Jason Gunthorpe, Roland Dreier, Vu Pham, Sebastian Riemer, linux-rdma

On 07/01/13 13:26, David Dillow wrote:
> On Mon, 2013-07-01 at 10:18 +0200, Bart Van Assche wrote:
>> The InfiniBand specification mentions the following about differential
>> receiver inputs (C6-11.2.1): "A BER of 10^-12 shall be achieved when
>> connected to the worst case transmitter through any compliant channel".
>
> There are a _lot_ of networks out there with non-compliant channels,
> then. The spec says one thing, but the reality matters. Also, perhaps my
> math is wrong, but that seems to be one error every 18 seconds or so at
> FDR speeds?

If data is transferred at line rate on an FDR network, I agree that 
about one error is expected about every 18 seconds.

Sorry but it's not clear to me what makes you think that there are a lot 
of IB networks with non-compliant channels ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 04/15] IB/srp: Fail I/O fast if target offline
       [not found]             ` <51D16918.60600-HInyCGIudOg@public.gmane.org>
@ 2013-07-01 11:53               ` Sebastian Riemer
  0 siblings, 0 replies; 47+ messages in thread
From: Sebastian Riemer @ 2013-07-01 11:53 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Roland Dreier, David Dillow, Vu Pham, linux-rdma

On 01.07.2013 13:33, Bart Van Assche wrote:
>>> --- a/drivers/infiniband/ulp/srp/ib_srp.c
>>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
>>> @@ -1755,6 +1755,8 @@ static int srp_abort(struct scsi_cmnd *scmnd)
>>>       if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
>>>                     SRP_TSK_ABORT_TASK) == 0)
>>>           ret = SUCCESS;
>>> +    else if (target->transport_offline)
>>> +        ret = FAST_IO_FAIL;
>>>       else
>>>           ret = FAILED;
>>>       srp_free_req(target, req, scmnd, 0);
>>>
>>
>> This doesn't give us much speed advantage IMHO. The check for
>> target->transport_offline should be before calling srp_send_tsk_mgmt().
>>
>> This way it would also match the patch description better.
> 
> Hello Sebastian,
> 
> Had you perhaps overlooked the following code at the start of
> srp_send_tsk_mgmt() ?
> 
>     if (!target->connected || target->qp_in_error)
>         return -1;
> 
> Given this I don't think it matters whether the transport_offline check
> occurs before or after the srp_send_tsk_mgmt() call.

Hi Bart,

okay, right. So you get an error due to the connected and qp_in_error
state first. Yes, I've overlooked that. Thanks!

Cheers,
Sebastian

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 04/15] IB/srp: Fail I/O fast if target offline
       [not found]             ` <51D16A39.4050709-HInyCGIudOg@public.gmane.org>
@ 2013-07-01 12:31               ` Sebastian Riemer
       [not found]                 ` <51D176B5.90609-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
  0 siblings, 1 reply; 47+ messages in thread
From: Sebastian Riemer @ 2013-07-01 12:31 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Roland Dreier, David Dillow, Vu Pham, linux-rdma

On 01.07.2013 13:38, Bart Van Assche wrote:
>>> --- a/drivers/infiniband/ulp/srp/ib_srp.c
>>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
>>> @@ -1755,6 +1755,8 @@ static int srp_abort(struct scsi_cmnd *scmnd)
>>>       if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
>>>                     SRP_TSK_ABORT_TASK) == 0)
>>>           ret = SUCCESS;
>>> +    else if (target->transport_offline)
>>> +        ret = FAST_IO_FAIL;
>>>       else
>>>           ret = FAILED;
>>>       srp_free_req(target, req, scmnd, 0);
>>
>> I'm also missing the concept for srp_reset_device(). There is a very
>> common case that the SCSI error handling and the transport layer error
>> handling run in parallel: Congestion.
> 
> Can you explain this comment further, and also how this comment relates
> to patch 04/15 ?

Sorry, found it. Even if only one srp_reset_device() fails, then
srp_reset_host() is called anyway. So there this check + returning
FAST_IO_FAIL doesn't make so much sense.

>> In congestion some LUNs are blocked while others can still transmit. A
>> little bit later the QP timeout triggers in the middle of the SCSI error
>> handling in srp_abort(), srp_reset_device() or less likely in
>> srp_reset_host().
> 
> I am aware this can result in concurrent srp_reconnect_rport() calls.
> However, such concurrent calls are serialized via rport->mutex.

I put my comment regarding this to patch 10.

Cheers,
Sebastian
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 04/15] IB/srp: Fail I/O fast if target offline
       [not found]                 ` <51D176B5.90609-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
@ 2013-07-01 12:57                   ` Bart Van Assche
  0 siblings, 0 replies; 47+ messages in thread
From: Bart Van Assche @ 2013-07-01 12:57 UTC (permalink / raw)
  To: Sebastian Riemer; +Cc: Roland Dreier, David Dillow, Vu Pham, linux-rdma

On 07/01/13 14:31, Sebastian Riemer wrote:
> On 01.07.2013 13:38, Bart Van Assche wrote:
>>>> --- a/drivers/infiniband/ulp/srp/ib_srp.c
>>>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
>>>> @@ -1755,6 +1755,8 @@ static int srp_abort(struct scsi_cmnd *scmnd)
>>>>        if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
>>>>                      SRP_TSK_ABORT_TASK) == 0)
>>>>            ret = SUCCESS;
>>>> +    else if (target->transport_offline)
>>>> +        ret = FAST_IO_FAIL;
>>>>        else
>>>>            ret = FAILED;
>>>>        srp_free_req(target, req, scmnd, 0);
>>>
>>> I'm also missing the concept for srp_reset_device(). There is a very
>>> common case that the SCSI error handling and the transport layer error
>>> handling run in parallel: Congestion.
>>
>> Can you explain this comment further, and also how this comment relates
>> to patch 04/15 ?
>
> Sorry, found it. Even if only one srp_reset_device() fails, then
> srp_reset_host() is called anyway. So there this check + returning
> FAST_IO_FAIL doesn't make so much sense.

Hello Sebastian,

I agree that if one or more srp_abort() calls return FAILED that 
srp_reset_host() will be called anyway. However, this patch helps if the 
first call of srp_abort() occurs after a reconnect failure, which is 
likely with the default reconnnect_delay and SCSI timeout settings. In 
that case all srp_abort() calls will return FAST_IO_FAIL, 
scsi_eh_abort_cmds() will terminate the pending commands and hence the 
host reset will be skipped.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 04/15] IB/srp: Fail I/O fast if target offline
       [not found]     ` <51CD8676.6080205-HInyCGIudOg@public.gmane.org>
                         ` (2 preceding siblings ...)
  2013-07-01  9:25       ` Sebastian Riemer
@ 2013-07-02  8:30       ` Sebastian Riemer
  3 siblings, 0 replies; 47+ messages in thread
From: Sebastian Riemer @ 2013-07-02  8:30 UTC (permalink / raw)
  To: Bart Van Assche; +Cc: Roland Dreier, David Dillow, Vu Pham, linux-rdma

On 28.06.2013 14:49, Bart Van Assche wrote:
> If reconnecting failed we know that no command completion will
> be received anymore. Hence let the SCSI error handler fail such
> commands immediately.

Acked-by: Sebastian Riemer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 14/15] IB/srp: Make transport layer retry count configurable
       [not found]                 ` <1372677965.12468.57.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
  2013-07-01 11:44                   ` Bart Van Assche
@ 2013-07-02 19:18                   ` Jason Gunthorpe
       [not found]                     ` <20130702191842.GD14625-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
  1 sibling, 1 reply; 47+ messages in thread
From: Jason Gunthorpe @ 2013-07-02 19:18 UTC (permalink / raw)
  To: David Dillow
  Cc: Bart Van Assche, Roland Dreier, Vu Pham, Sebastian Riemer, linux-rdma

On Mon, Jul 01, 2013 at 07:26:05AM -0400, David Dillow wrote:

> > The InfiniBand specification mentions the following about differential 
> > receiver inputs (C6-11.2.1): "A BER of 10^-12 shall be achieved when
> > connected to the worst case transmitter through any compliant channel".

This test condition is under 'stressed conditions' for the higher
rates, which doesn't directly translate into an error every 18
seconds under normal conditions.

A properly functioning IB link should not be seeing link errors with
any frequency (eg we expect/observe no errors on single links over
month long periods).

Naturally, a large number of IB links, will on aggregate, have error
rates significantly higher..

.. and this whole area is very challenging and installations with
defective cables/defective end points/non-compliance/etc are not
uncommon, so some sites see higher error rates. :(

However, IIRC, one large installation I know of, had the achieved
target of nearly no errors on any link during typical operation.

In a practical sense, IB just doesn't work with packet loss. You need
to have low error rate signalling or your network doesn't work/won't
perform.

> > The maximum packet size for an InfiniBand packet is about 4 KB (see also 
> > section 7.7.8 in the spec). This means that with an 8b/10b encoding the 
> > chance to lose a packet over a single link due to bit errors is about 
> > 4*10^-8. So the chance to lose a packet over a network consisting of n 
> > links with retry count r is about (n*4*10^-8)^r. With r=2 that results 
> > already in a really low value, even with multiple links. Since lowering 
> > the QP timeout might make congestion worse my preference is to lower the 
> > retry count.
> 
> You assume independent failures, which is suspect -- many times these
> are data-dependent, or so I tend to think. Jason, do you have any
> insight on this (overall) topic you could share?

All data transmitted on modern serial links is 'whitened'
somehow. This is does independently on a link-by-link basis either
with 8b/10b coding or with the 64b/66b scrambler. So the idea of a
high level 'magic packet' that causes data-dependent errors is not
statistically likely.

Errors in properly functioning modern serial links are pure-random,
from the perspective of SRP.

It is best to use all the information the SM provides when setting up
the path, however I don't think there is a best practice idea yet for
how to setup the retry count though..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2 14/15] IB/srp: Make transport layer retry count configurable
       [not found]                     ` <20130702191842.GD14625-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
@ 2013-07-03 14:26                       ` David Dillow
  0 siblings, 0 replies; 47+ messages in thread
From: David Dillow @ 2013-07-03 14:26 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: Bart Van Assche, Roland Dreier, Vu Pham, Sebastian Riemer, linux-rdma

On Tue, 2013-07-02 at 13:18 -0600, Jason Gunthorpe wrote:
> On Mon, Jul 01, 2013 at 07:26:05AM -0400, David Dillow wrote:
> > You assume independent failures, which is suspect -- many times these
> > are data-dependent, or so I tend to think. Jason, do you have any
> > insight on this (overall) topic you could share?
> 
> All data transmitted on modern serial links is 'whitened'
> somehow. This is does independently on a link-by-link basis either
> with 8b/10b coding or with the 64b/66b scrambler. So the idea of a
> high level 'magic packet' that causes data-dependent errors is not
> statistically likely.

My thought was that if we hit a statistically unlikely pattern that
caused an issue, the retransmission is likely to also hit the issue
given the deterministic scrambling. But I didn't think about the fact
that the signal stream was being whitened.

> It is best to use all the information the SM provides when setting up
> the path, however I don't think there is a best practice idea yet for
> how to setup the retry count though..

Hmm, that would be a useful presentation for the workshop; I'll have to
see if I can get some people interested here.

Thanks for the information,
Dave
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2013-07-03 14:26 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-28 12:45 [PATCH v2 0/15] IB SRP initiator patches for kernel 3.11 Bart Van Assche
     [not found] ` <51CD856A.3010102-HInyCGIudOg@public.gmane.org>
2013-06-28 12:46   ` [PATCH v2 01/15] IB/srp: Fix remove_one crash due to resource exhaustion Bart Van Assche
2013-06-28 12:48   ` [PATCH v2 02/15] IB/srp: Fix race between srp_queuecommand() and srp_claim_req() Bart Van Assche
     [not found]     ` <51CD8604.5010801-HInyCGIudOg@public.gmane.org>
2013-06-28 14:42       ` Sebastian Riemer
     [not found]         ` <51CDA0CD.6060504-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-06-28 14:51           ` Bart Van Assche
     [not found]             ` <51CDA2E5.2010704-HInyCGIudOg@public.gmane.org>
2013-06-28 15:08               ` Sebastian Riemer
2013-06-30 19:59       ` David Dillow
     [not found]         ` <1372622347.12468.9.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2013-07-01  7:10           ` Bart Van Assche
2013-06-28 12:49   ` [PATCH v2 03/15] IB/srp: Avoid that srp_reset_host() is skipped after a TL error Bart Van Assche
     [not found]     ` <51CD8644.5080600-HInyCGIudOg@public.gmane.org>
2013-06-30 20:00       ` David Dillow
2013-06-28 12:49   ` [PATCH v2 04/15] IB/srp: Fail I/O fast if target offline Bart Van Assche
     [not found]     ` <51CD8676.6080205-HInyCGIudOg@public.gmane.org>
2013-06-30 20:02       ` David Dillow
2013-07-01  9:07       ` Sebastian Riemer
     [not found]         ` <51D146EE.6010209-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-07-01 11:33           ` Bart Van Assche
     [not found]             ` <51D16918.60600-HInyCGIudOg@public.gmane.org>
2013-07-01 11:53               ` Sebastian Riemer
2013-07-01  9:25       ` Sebastian Riemer
     [not found]         ` <51D14AF1.4000803-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-07-01 11:38           ` Bart Van Assche
     [not found]             ` <51D16A39.4050709-HInyCGIudOg@public.gmane.org>
2013-07-01 12:31               ` Sebastian Riemer
     [not found]                 ` <51D176B5.90609-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
2013-07-01 12:57                   ` Bart Van Assche
2013-07-02  8:30       ` Sebastian Riemer
2013-06-28 12:50   ` [PATCH v2 05/15] IB/srp: Skip host settle delay Bart Van Assche
2013-06-28 12:51   ` [PATCH v2 06/15] IB/srp: Maintain a single connection per I_T nexus Bart Van Assche
     [not found]     ` <51CD86CE.8080804-HInyCGIudOg@public.gmane.org>
2013-06-30 20:10       ` David Dillow
2013-06-28 12:52   ` [PATCH v2 07/15] IB/srp: Keep rport as long as the IB transport layer Bart Van Assche
2013-06-30 21:06     ` David Dillow
2013-06-28 12:54   ` [PATCH v2 09/15] IB/srp: Add srp_terminate_io() Bart Van Assche
     [not found]     ` <51CD877E.80606-HInyCGIudOg@public.gmane.org>
2013-06-30 21:10       ` David Dillow
2013-06-28 12:55   ` [PATCH v2 10/15] IB/srp: Use SRP transport layer error recovery Bart Van Assche
     [not found]     ` <51CD87A9.2090702-HInyCGIudOg@public.gmane.org>
2013-06-30 21:20       ` David Dillow
2013-06-28 12:55   ` [PATCH v2 11/15] IB/srp: Start timers if a transport layer error occurs Bart Van Assche
     [not found]     ` <51CD87D7.3050300-HInyCGIudOg@public.gmane.org>
2013-06-30 21:21       ` David Dillow
2013-06-28 12:57   ` [PATCH v2 13/15] IB/srp: Make HCA completion vector configurable Bart Van Assche
     [not found]     ` <51CD8846.4070400-HInyCGIudOg@public.gmane.org>
2013-06-30 21:26       ` David Dillow
2013-06-28 12:58   ` [PATCH v2 14/15] IB/srp: Make transport layer retry count configurable Bart Van Assche
     [not found]     ` <51CD8876.9020307-HInyCGIudOg@public.gmane.org>
2013-06-30 21:48       ` David Dillow
     [not found]         ` <1372628891.12468.52.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2013-07-01  8:18           ` Bart Van Assche
     [not found]             ` <51D13B52.5060803-HInyCGIudOg@public.gmane.org>
2013-07-01 11:26               ` David Dillow
     [not found]                 ` <1372677965.12468.57.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2013-07-01 11:44                   ` Bart Van Assche
2013-07-02 19:18                   ` Jason Gunthorpe
     [not found]                     ` <20130702191842.GD14625-ePGOBjL8dl3ta4EC/59zMFaTQe2KTcn/@public.gmane.org>
2013-07-03 14:26                       ` David Dillow
2013-06-28 12:59   ` [PATCH v2 15/15] IB/srp: Bump driver version and release date Bart Van Assche
2013-06-28 12:53 ` [PATCH v2 08/15] scsi_transport_srp: Add transport layer error handling Bart Van Assche
2013-06-30 21:05   ` David Dillow
     [not found]     ` <1372626334.12468.34.camel-a7a0dvSY7KqLUyTwlgNVppKKF0rrzTr+@public.gmane.org>
2013-07-01  7:01       ` Bart Van Assche
     [not found]         ` <51D12941.3050105-HInyCGIudOg@public.gmane.org>
2013-07-01 11:19           ` David Dillow
2013-06-28 12:56 ` [PATCH v2 12/15] IB/srp: Fail SCSI commands silently Bart Van Assche
     [not found]   ` <51CD8812.20107-HInyCGIudOg@public.gmane.org>
2013-06-30 21:25     ` David Dillow

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.