All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/12] IB/srp patches for kernel v4.2
@ 2015-04-30  8:56 Bart Van Assche
  2015-04-30  8:56 ` [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand() Bart Van Assche
                   ` (9 more replies)
  0 siblings, 10 replies; 59+ messages in thread
From: Bart Van Assche @ 2015-04-30  8:56 UTC (permalink / raw)
  To: Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

Hello Doug,

My own testing and code review of the SRP initiator driver during the 
past months resulted in a series of twelve patches with bug fixes, 
source code simplifications and improvements. Patches 1..5 are bug fixes 
and have been tagged for the stable kernel series. These patches have 
been tested against kernel v4.1-rc1. Please consider these patches for 
the v4.2 merge window.

The individual patches in this series are:
0001-scsi_transport_srp-Introduce-srp_wait_for_queuecomma.patch
0002-scsi_transport_srp-Fix-a-race-condition.patch
0003-IB-srp-Remove-an-extraneous-scsi_host_put-from-an-er.patch
0004-IB-srp-Fix-connection-state-tracking.patch
0005-IB-srp-Fix-reconnection-failure-handling.patch
0006-scsi_transport_srp-Reduce-failover-time.patch
0007-IB-srp-Remove-superfluous-casts.patch
0008-IB-srp-Rearrange-module-description.patch
0009-IB-srp-Remove-a-superfluous-check-from-srp_free_req_.patch
0010-IB-srp-Remove-ch-target-tests-from-the-reconnect-cod.patch
0011-IB-srp-Add-64-bit-LUN-support.patch
0012-IB-srp-Make-CM-timeout-dependent-on-subnet-timeout.patch

Thanks,

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()
  2015-04-30  8:56 [PATCH 0/12] IB/srp patches for kernel v4.2 Bart Van Assche
@ 2015-04-30  8:56 ` Bart Van Assche
       [not found]   ` <5541EE4A.30803-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  2015-04-30  8:57 ` [PATCH 02/12] scsi_transport_srp: Fix a race condition Bart Van Assche
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-04-30  8:56 UTC (permalink / raw)
  To: Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi

Introduce the helper function srp_wait_for_queuecommand().
Move the definition of scsi_request_fn_active(). This patch
does not change any functionality. A second call to
scsi_wait_for_queuecommand() will be introduced in the next
patch.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: James Bottomley <JBottomley@Odin.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: <stable@vger.kernel.org> #v3.13
---
 drivers/scsi/scsi_transport_srp.c | 52 ++++++++++++++++++++++-----------------
 1 file changed, 29 insertions(+), 23 deletions(-)

diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index ae45bd9..6ce1c48 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -396,6 +396,34 @@ static void srp_reconnect_work(struct work_struct *work)
 	}
 }
 
+/**
+ * scsi_request_fn_active() - number of kernel threads inside scsi_request_fn()
+ * @shost: SCSI host for which to count the number of scsi_request_fn() callers.
+ */
+static int scsi_request_fn_active(struct Scsi_Host *shost)
+{
+	struct scsi_device *sdev;
+	struct request_queue *q;
+	int request_fn_active = 0;
+
+	shost_for_each_device(sdev, shost) {
+		q = sdev->request_queue;
+
+		spin_lock_irq(q->queue_lock);
+		request_fn_active += q->request_fn_active;
+		spin_unlock_irq(q->queue_lock);
+	}
+
+	return request_fn_active;
+}
+
+/* Wait until ongoing shost->hostt->queuecommand() calls have finished. */
+static void srp_wait_for_queuecommand(struct Scsi_Host *shost)
+{
+	while (scsi_request_fn_active(shost))
+		msleep(20);
+}
+
 static void __rport_fail_io_fast(struct srp_rport *rport)
 {
 	struct Scsi_Host *shost = rport_to_shost(rport);
@@ -504,27 +532,6 @@ void srp_start_tl_fail_timers(struct srp_rport *rport)
 EXPORT_SYMBOL(srp_start_tl_fail_timers);
 
 /**
- * scsi_request_fn_active() - number of kernel threads inside scsi_request_fn()
- * @shost: SCSI host for which to count the number of scsi_request_fn() callers.
- */
-static int scsi_request_fn_active(struct Scsi_Host *shost)
-{
-	struct scsi_device *sdev;
-	struct request_queue *q;
-	int request_fn_active = 0;
-
-	shost_for_each_device(sdev, shost) {
-		q = sdev->request_queue;
-
-		spin_lock_irq(q->queue_lock);
-		request_fn_active += q->request_fn_active;
-		spin_unlock_irq(q->queue_lock);
-	}
-
-	return request_fn_active;
-}
-
-/**
  * srp_reconnect_rport() - reconnect to an SRP target port
  * @rport: SRP target port.
  *
@@ -559,8 +566,7 @@ int srp_reconnect_rport(struct srp_rport *rport)
 	if (res)
 		goto out;
 	scsi_target_block(&shost->shost_gendev);
-	while (scsi_request_fn_active(shost))
-		msleep(20);
+	srp_wait_for_queuecommand(shost);
 	res = rport->state != SRP_RPORT_LOST ? i->f->reconnect(rport) : -ENODEV;
 	pr_debug("%s (state %d): transport.reconnect() returned %d\n",
 		 dev_name(&shost->shost_gendev), rport->state, res);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 02/12] scsi_transport_srp: Fix a race condition
  2015-04-30  8:56 [PATCH 0/12] IB/srp patches for kernel v4.2 Bart Van Assche
  2015-04-30  8:56 ` [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand() Bart Van Assche
@ 2015-04-30  8:57 ` Bart Van Assche
       [not found]   ` <5541EE66.7090608-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  2015-04-30  8:58 ` [PATCH 04/12] IB/srp: Fix connection state tracking Bart Van Assche
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-04-30  8:57 UTC (permalink / raw)
  To: Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi

Avoid that srp_terminate_io() can get invoked while srp_queuecommand()
is in progress. This patch avoids that an I/O timeout can trigger the
following kernel warning:

WARNING: at drivers/infiniband/ulp/srp/ib_srp.c:1447 srp_terminate_io+0xef/0x100 [ib_srp]()
Call Trace:
 [<ffffffff814c65a2>] dump_stack+0x4e/0x68
 [<ffffffff81051f71>] warn_slowpath_common+0x81/0xa0
 [<ffffffff8105204a>] warn_slowpath_null+0x1a/0x20
 [<ffffffffa075f51f>] srp_terminate_io+0xef/0x100 [ib_srp]
 [<ffffffffa07495da>] __rport_fail_io_fast+0xba/0xc0 [scsi_transport_srp]
 [<ffffffffa0749a90>] rport_fast_io_fail_timedout+0xe0/0xf0 [scsi_transport_srp]
 [<ffffffff8106e09b>] process_one_work+0x1db/0x780
 [<ffffffff8106e75b>] worker_thread+0x11b/0x450
 [<ffffffff81073c64>] kthread+0xe4/0x100
 [<ffffffff814cf26c>] ret_from_fork+0x7c/0xb0

See also patch "scsi_transport_srp: Add transport layer error
handling" (commit ID 29c17324803c).

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: James Bottomley <JBottomley@Odin.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: <stable@vger.kernel.org> #v3.13
---
 drivers/scsi/scsi_transport_srp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index 6ce1c48..4a44337 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -437,8 +437,10 @@ static void __rport_fail_io_fast(struct srp_rport *rport)
 
 	/* Involve the LLD if possible to terminate all I/O on the rport. */
 	i = to_srp_internal(shost->transportt);
-	if (i->f->terminate_rport_io)
+	if (i->f->terminate_rport_io) {
+		srp_wait_for_queuecommand(shost);
 		i->f->terminate_rport_io(rport);
+	}
 }
 
 /**
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 03/12] IB/srp: Remove an extraneous scsi_host_put() from an error path
       [not found] ` <5541EE21.3050809-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2015-04-30  8:57   ` Bart Van Assche
  2015-04-30  9:44     ` Sagi Grimberg
  2015-04-30  9:02   ` [PATCH 11/12] IB/srp: Add 64-bit LUN support Bart Van Assche
  2015-04-30  9:02   ` [PATCH 12/12] IB/srp: Make CM timeout dependent on subnet timeout Bart Van Assche
  2 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-04-30  8:57 UTC (permalink / raw)
  To: Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

Fix a scsi_get_host() / scsi_host_put() imbalance in the error
path of srp_create_target(). See also patch "IB/srp: Avoid that
I/O hangs due to a cable pull during LUN scanning" (commit ID
34aa654ecb8e).

Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> #v3.19
---
 drivers/infiniband/ulp/srp/ib_srp.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 918814c..5ce6cfd 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -3172,11 +3172,11 @@ static ssize_t srp_create_target(struct device *dev,
 
 	ret = srp_parse_options(buf, target);
 	if (ret)
-		goto err;
+		goto out;
 
 	ret = scsi_init_shared_tag_map(target_host, target_host->can_queue);
 	if (ret)
-		goto err;
+		goto out;
 
 	target->req_ring_size = target->queue_size - SRP_TSK_MGMT_SQ_SIZE;
 
@@ -3187,7 +3187,7 @@ static ssize_t srp_create_target(struct device *dev,
 			     be64_to_cpu(target->ioc_guid),
 			     be64_to_cpu(target->initiator_ext));
 		ret = -EEXIST;
-		goto err;
+		goto out;
 	}
 
 	if (!srp_dev->has_fmr && !srp_dev->has_fr && !target->allow_ext_sg &&
@@ -3208,7 +3208,7 @@ static ssize_t srp_create_target(struct device *dev,
 	spin_lock_init(&target->lock);
 	ret = ib_query_gid(ibdev, host->port, 0, &target->sgid);
 	if (ret)
-		goto err;
+		goto out;
 
 	ret = -ENOMEM;
 	target->ch_count = max_t(unsigned, num_online_nodes(),
@@ -3219,7 +3219,7 @@ static ssize_t srp_create_target(struct device *dev,
 	target->ch = kcalloc(target->ch_count, sizeof(*target->ch),
 			     GFP_KERNEL);
 	if (!target->ch)
-		goto err;
+		goto out;
 
 	node_idx = 0;
 	for_each_online_node(node) {
@@ -3315,9 +3315,6 @@ err_disconnect:
 	}
 
 	kfree(target->ch);
-
-err:
-	scsi_host_put(target_host);
 	goto out;
 }
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 04/12] IB/srp: Fix connection state tracking
  2015-04-30  8:56 [PATCH 0/12] IB/srp patches for kernel v4.2 Bart Van Assche
  2015-04-30  8:56 ` [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand() Bart Van Assche
  2015-04-30  8:57 ` [PATCH 02/12] scsi_transport_srp: Fix a race condition Bart Van Assche
@ 2015-04-30  8:58 ` Bart Van Assche
  2015-04-30  9:51   ` Sagi Grimberg
  2015-04-30 16:08   ` Doug Ledford
  2015-04-30  8:58 ` [PATCH 05/12] IB/srp: Fix reconnection failure handling Bart Van Assche
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 59+ messages in thread
From: Bart Van Assche @ 2015-04-30  8:58 UTC (permalink / raw)
  To: Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi

Reception of a DREQ message only causes the state of a single
channel to change. Modify the SRP initiator such that channel
and target connection state are tracked separately. This patch
avoids that following false positive warning can be reported
by srp_destroy_qp():

WARNING: at drivers/infiniband/ulp/srp/ib_srp.c:617 srp_destroy_qp+0xa6/0x120 [ib_srp]()
Call Trace:
[<ffffffff8106e10f>] warn_slowpath_common+0x7f/0xc0
[<ffffffff8106e16a>] warn_slowpath_null+0x1a/0x20
[<ffffffffa0440226>] srp_destroy_qp+0xa6/0x120 [ib_srp]
[<ffffffffa0440322>] srp_free_ch_ib+0x82/0x1e0 [ib_srp]
[<ffffffffa044408b>] srp_create_target+0x7ab/0x998 [ib_srp]
[<ffffffff81346f60>] dev_attr_store+0x20/0x30
[<ffffffff811dd90f>] sysfs_write_file+0xef/0x170
[<ffffffff8116d248>] vfs_write+0xc8/0x190
[<ffffffff8116d411>] sys_write+0x51/0x90

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: <stable@vger.kernel.org> #v3.19
---
 drivers/infiniband/ulp/srp/ib_srp.c | 7 ++++---
 drivers/infiniband/ulp/srp/ib_srp.h | 1 +
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 5ce6cfd..0eb07d3 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -465,14 +465,13 @@ static struct srp_fr_pool *srp_alloc_fr_pool(struct srp_target_port *target)
  */
 static void srp_destroy_qp(struct srp_rdma_ch *ch)
 {
-	struct srp_target_port *target = ch->target;
 	static struct ib_qp_attr attr = { .qp_state = IB_QPS_ERR };
 	static struct ib_recv_wr wr = { .wr_id = SRP_LAST_WR_ID };
 	struct ib_recv_wr *bad_wr;
 	int ret;
 
 	/* Destroying a QP and reusing ch->done is only safe if not connected */
-	WARN_ON_ONCE(target->connected);
+	WARN_ON_ONCE(ch->connected);
 
 	ret = ib_modify_qp(ch->qp, &attr, IB_QP_STATE);
 	WARN_ONCE(ret, "ib_cm_init_qp_attr() returned %d\n", ret);
@@ -836,6 +835,7 @@ static void srp_disconnect_target(struct srp_target_port *target)
 
 		for (i = 0; i < target->ch_count; i++) {
 			ch = &target->ch[i];
+			ch->connected = false;
 			if (ch->cm_id && ib_send_cm_dreq(ch->cm_id, NULL, 0)) {
 				shost_printk(KERN_DEBUG, target->scsi_host,
 					     PFX "Sending CM DREQ failed\n");
@@ -1017,6 +1017,7 @@ static int srp_connect_ch(struct srp_rdma_ch *ch, bool multich)
 		switch (ch->status) {
 		case 0:
 			srp_change_conn_state(target, true);
+			ch->connected = true;
 			return 0;
 
 		case SRP_PORT_REDIRECT:
@@ -2367,7 +2368,7 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
 	case IB_CM_DREQ_RECEIVED:
 		shost_printk(KERN_WARNING, target->scsi_host,
 			     PFX "DREQ received - connection closed\n");
-		srp_change_conn_state(target, false);
+		ch->connected = false;
 		if (ib_send_cm_drep(cm_id, NULL, 0))
 			shost_printk(KERN_ERR, target->scsi_host,
 				     PFX "Sending CM DREP failed\n");
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index a611556..95a4471 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -170,6 +170,7 @@ struct srp_rdma_ch {
 
 	struct completion	tsk_mgmt_done;
 	u8			tsk_mgmt_status;
+	bool			connected;
 };
 
 /**
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 05/12] IB/srp: Fix reconnection failure handling
  2015-04-30  8:56 [PATCH 0/12] IB/srp patches for kernel v4.2 Bart Van Assche
                   ` (2 preceding siblings ...)
  2015-04-30  8:58 ` [PATCH 04/12] IB/srp: Fix connection state tracking Bart Van Assche
@ 2015-04-30  8:58 ` Bart Van Assche
  2015-04-30  8:59 ` [PATCH 06/12] scsi_transport_srp: Reduce failover time Bart Van Assche
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 59+ messages in thread
From: Bart Van Assche @ 2015-04-30  8:58 UTC (permalink / raw)
  To: Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi

Although it is possible to let SRP I/O continue if a reconnect
results in a reduction of the number of channels, the current
code does not handle this scenario correctly. Instead of making
the reconnect code more complex, consider this as a reconnection
failure.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: <stable@vger.kernel.org> #v3.19
---
 drivers/infiniband/ulp/srp/ib_srp.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 0eb07d3..5d7d790 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1246,11 +1246,8 @@ static int srp_rport_reconnect(struct srp_rport *rport)
 	}
 	for (i = 0; i < target->ch_count; i++) {
 		ch = &target->ch[i];
-		if (ret || !ch->target) {
-			if (i > 1)
-				ret = 0;
+		if (ret)
 			break;
-		}
 		ret = srp_connect_ch(ch, multich);
 		multich = true;
 	}
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 06/12] scsi_transport_srp: Reduce failover time
  2015-04-30  8:56 [PATCH 0/12] IB/srp patches for kernel v4.2 Bart Van Assche
                   ` (3 preceding siblings ...)
  2015-04-30  8:58 ` [PATCH 05/12] IB/srp: Fix reconnection failure handling Bart Van Assche
@ 2015-04-30  8:59 ` Bart Van Assche
  2015-04-30 10:13   ` Sagi Grimberg
  2015-04-30  9:00 ` [PATCH 07/12] IB/srp: Remove superfluous casts Bart Van Assche
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-04-30  8:59 UTC (permalink / raw)
  To: Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi

Unlike FC, there is no risk that SCSI devices will be offlined
by the error handler if it is started while a reconnect attempt
is ongoing. Hence report timeout errors as soon as possible
if a higher software layer (e.g. multipathd) needs to be informed
about a timeout. It is assumed that if both fast_io_fail_tmo < 0
and rport->dev_loss_tmo < 0 that this means that there is no
need to report timeouts quickly.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: James Bottomley <JBottomley@Odin.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
---
 drivers/scsi/scsi_transport_srp.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index 4a44337..6667c2b 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -61,6 +61,11 @@ static inline struct Scsi_Host *rport_to_shost(struct srp_rport *r)
 	return dev_to_shost(r->dev.parent);
 }
 
+static inline struct srp_rport *shost_to_rport(struct Scsi_Host *shost)
+{
+	return transport_class_to_srp_rport(&shost->shost_gendev);
+}
+
 /**
  * srp_tmo_valid() - check timeout combination validity
  * @reconnect_delay: Reconnect delay in seconds.
@@ -626,9 +631,11 @@ static enum blk_eh_timer_return srp_timed_out(struct scsi_cmnd *scmd)
 	struct scsi_device *sdev = scmd->device;
 	struct Scsi_Host *shost = sdev->host;
 	struct srp_internal *i = to_srp_internal(shost->transportt);
+	struct srp_rport *rport = shost_to_rport(shost);
 
 	pr_debug("timeout for sdev %s\n", dev_name(&sdev->sdev_gendev));
-	return i->f->reset_timer_if_blocked && scsi_device_blocked(sdev) ?
+	return rport->fast_io_fail_tmo < 0 && rport->dev_loss_tmo < 0 &&
+		i->f->reset_timer_if_blocked && scsi_device_blocked(sdev) ?
 		BLK_EH_RESET_TIMER : BLK_EH_NOT_HANDLED;
 }
 
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 07/12] IB/srp: Remove superfluous casts
  2015-04-30  8:56 [PATCH 0/12] IB/srp patches for kernel v4.2 Bart Van Assche
                   ` (4 preceding siblings ...)
  2015-04-30  8:59 ` [PATCH 06/12] scsi_transport_srp: Reduce failover time Bart Van Assche
@ 2015-04-30  9:00 ` Bart Van Assche
  2015-04-30 10:13   ` Sagi Grimberg
  2015-04-30  9:00 ` [PATCH 08/12] IB/srp: Rearrange module description Bart Van Assche
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-04-30  9:00 UTC (permalink / raw)
  To: Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi

A long time ago the data type int64_t was declared as long long
on x86 systems and as long on PPC systems. Today that data type
is declared as long long on all Linux architectures. This means
that the casts from uint64_t into unsigned long long are
superfluous.  Remove these superfluous casts.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 5d7d790..8b6f166 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -780,7 +780,7 @@ static int srp_send_req(struct srp_rdma_ch *ch, bool multich)
 		shost_printk(KERN_DEBUG, target->scsi_host,
 			     PFX "Topspin/Cisco initiator port ID workaround "
 			     "activated for target GUID %016llx\n",
-			     (unsigned long long) be64_to_cpu(target->ioc_guid));
+			     be64_to_cpu(target->ioc_guid));
 		memset(req->priv.initiator_port_id, 0, 8);
 		memcpy(req->priv.initiator_port_id + 8,
 		       &target->srp_host->srp_dev->dev->node_guid, 8);
@@ -2561,8 +2561,7 @@ static ssize_t show_id_ext(struct device *dev, struct device_attribute *attr,
 {
 	struct srp_target_port *target = host_to_target(class_to_shost(dev));
 
-	return sprintf(buf, "0x%016llx\n",
-		       (unsigned long long) be64_to_cpu(target->id_ext));
+	return sprintf(buf, "0x%016llx\n", be64_to_cpu(target->id_ext));
 }
 
 static ssize_t show_ioc_guid(struct device *dev, struct device_attribute *attr,
@@ -2570,8 +2569,7 @@ static ssize_t show_ioc_guid(struct device *dev, struct device_attribute *attr,
 {
 	struct srp_target_port *target = host_to_target(class_to_shost(dev));
 
-	return sprintf(buf, "0x%016llx\n",
-		       (unsigned long long) be64_to_cpu(target->ioc_guid));
+	return sprintf(buf, "0x%016llx\n", be64_to_cpu(target->ioc_guid));
 }
 
 static ssize_t show_service_id(struct device *dev,
@@ -2579,8 +2577,7 @@ static ssize_t show_service_id(struct device *dev,
 {
 	struct srp_target_port *target = host_to_target(class_to_shost(dev));
 
-	return sprintf(buf, "0x%016llx\n",
-		       (unsigned long long) be64_to_cpu(target->service_id));
+	return sprintf(buf, "0x%016llx\n", be64_to_cpu(target->service_id));
 }
 
 static ssize_t show_pkey(struct device *dev, struct device_attribute *attr,
@@ -2771,7 +2768,7 @@ static int srp_add_target(struct srp_host *host, struct srp_target_port *target)
 
 	target->state = SRP_TARGET_SCANNING;
 	sprintf(target->target_name, "SRP.T10:%016llX",
-		 (unsigned long long) be64_to_cpu(target->id_ext));
+		be64_to_cpu(target->id_ext));
 
 	if (scsi_add_host(target->scsi_host, host->srp_dev->dev->dma_device))
 		return -ENODEV;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 08/12] IB/srp: Rearrange module description
  2015-04-30  8:56 [PATCH 0/12] IB/srp patches for kernel v4.2 Bart Van Assche
                   ` (5 preceding siblings ...)
  2015-04-30  9:00 ` [PATCH 07/12] IB/srp: Remove superfluous casts Bart Van Assche
@ 2015-04-30  9:00 ` Bart Van Assche
       [not found]   ` <5541EF39.6040301-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  2015-04-30  9:01 ` [PATCH 09/12] IB/srp: Remove a superfluous check from srp_free_req_data() Bart Van Assche
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-04-30  9:00 UTC (permalink / raw)
  To: Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi

Move the module version and release date into separate fields.
This makes the modinfo output easier to read.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 8b6f166..3727b90 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -59,9 +59,10 @@
 #define DRV_RELDATE	"July 1, 2013"
 
 MODULE_AUTHOR("Roland Dreier");
-MODULE_DESCRIPTION("InfiniBand SCSI RDMA Protocol initiator "
-		   "v" DRV_VERSION " (" DRV_RELDATE ")");
+MODULE_DESCRIPTION("InfiniBand SCSI RDMA Protocol initiator");
 MODULE_LICENSE("Dual BSD/GPL");
+MODULE_VERSION(DRV_VERSION);
+MODULE_INFO(release_date, DRV_RELDATE);
 
 static unsigned int srp_sg_tablesize;
 static unsigned int cmd_sg_entries;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 09/12] IB/srp: Remove a superfluous check from srp_free_req_data()
  2015-04-30  8:56 [PATCH 0/12] IB/srp patches for kernel v4.2 Bart Van Assche
                   ` (6 preceding siblings ...)
  2015-04-30  9:00 ` [PATCH 08/12] IB/srp: Rearrange module description Bart Van Assche
@ 2015-04-30  9:01 ` Bart Van Assche
       [not found]   ` <5541EF4F.6050200-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  2015-04-30  9:01 ` [PATCH 10/12] IB/srp: Remove !ch->target tests from the reconnect code Bart Van Assche
       [not found] ` <5541EE21.3050809-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  9 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-04-30  9:01 UTC (permalink / raw)
  To: Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi

All srp_free_req_data() callers guarantee that ch->target != NULL
hence remove the ch->target test from srp_free_req_data().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 3727b90..a73eb1e5 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -853,7 +853,7 @@ static void srp_free_req_data(struct srp_target_port *target,
 	struct srp_request *req;
 	int i;
 
-	if (!ch->target || !ch->req_ring)
+	if (!ch->req_ring)
 		return;
 
 	for (i = 0; i < target->req_ring_size; ++i) {
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 10/12] IB/srp: Remove !ch->target tests from the reconnect code
  2015-04-30  8:56 [PATCH 0/12] IB/srp patches for kernel v4.2 Bart Van Assche
                   ` (7 preceding siblings ...)
  2015-04-30  9:01 ` [PATCH 09/12] IB/srp: Remove a superfluous check from srp_free_req_data() Bart Van Assche
@ 2015-04-30  9:01 ` Bart Van Assche
  2015-04-30 10:19   ` Sagi Grimberg
       [not found] ` <5541EE21.3050809-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  9 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-04-30  9:01 UTC (permalink / raw)
  To: Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi

Remove the !ch->target tests from the reconnect code. These
tests are not needed: upon entry of srp_rport_reconnect()
it is guaranteed that all ch->target pointers are non-NULL.
None of the functions srp_new_cm_id(), srp_finish_req(),
srp_create_ch_ib() nor srp_connect_ch() modifies this pointer.
srp_free_ch_ib() is never called concurrently with
srp_rport_reconnect().

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index a73eb1e5..400ef7a 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1216,14 +1216,10 @@ static int srp_rport_reconnect(struct srp_rport *rport)
 	 */
 	for (i = 0; i < target->ch_count; i++) {
 		ch = &target->ch[i];
-		if (!ch->target)
-			break;
 		ret += srp_new_cm_id(ch);
 	}
 	for (i = 0; i < target->ch_count; i++) {
 		ch = &target->ch[i];
-		if (!ch->target)
-			break;
 		for (j = 0; j < target->req_ring_size; ++j) {
 			struct srp_request *req = &ch->req_ring[j];
 
@@ -1232,8 +1228,6 @@ static int srp_rport_reconnect(struct srp_rport *rport)
 	}
 	for (i = 0; i < target->ch_count; i++) {
 		ch = &target->ch[i];
-		if (!ch->target)
-			break;
 		/*
 		 * Whether or not creating a new CM ID succeeded, create a new
 		 * QP. This guarantees that all completion callback function
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 11/12] IB/srp: Add 64-bit LUN support
       [not found] ` <5541EE21.3050809-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  2015-04-30  8:57   ` [PATCH 03/12] IB/srp: Remove an extraneous scsi_host_put() from an error path Bart Van Assche
@ 2015-04-30  9:02   ` Bart Van Assche
  2015-04-30  9:02   ` [PATCH 12/12] IB/srp: Make CM timeout dependent on subnet timeout Bart Van Assche
  2 siblings, 0 replies; 59+ messages in thread
From: Bart Van Assche @ 2015-04-30  9:02 UTC (permalink / raw)
  To: Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, Yann Droneaud, Brian King,
	Nathan Fontenot, Tyrel Datwyler

The SCSI standard defines 64-bit values for LUNs. Large arrays
employing large or hierarchical LUN numbers become more and more
common. So update the SRP initiator to use 64-bit LUN numbers.
See also Hannes Reinecke, commit 9cb78c16f5da ("scsi: use 64-bit LUNs"),
June 2014.

The largest LUN number that has been tested is 0xd2003fff00000000.

Checked the following structure sizes with gdb:
* sizeof(struct srp_cmd) = 48
* sizeof(struct srp_tsk_mgmt) = 48
* sizeof(struct srp_aer_req) = 36

The ibmvscsi changes have been compile tested only (on a PPC system).

Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Reviewed-by: Hannes Reinecke <hare-l3A5Bk7waGM@public.gmane.org>
Reviewed-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Reviewed-by: Yann Droneaud <ydroneaud-RlY5vtjFyJ3QT0dZR+AlfA@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
Cc: Brian King <brking-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Cc: Nathan Fontenot <nfont-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Cc: Tyrel Datwyler <tyreld-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
  drivers/infiniband/ulp/srp/ib_srp.c | 12 ++++++------
  drivers/infiniband/ulp/srp/ib_srp.h |  1 -
  drivers/scsi/ibmvscsi/ibmvscsi.c    |  6 +++---
  include/scsi/srp.h                  |  7 ++++---
  4 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c 
b/drivers/infiniband/ulp/srp/ib_srp.c
index 400ef7a..27d3a64 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1835,7 +1835,7 @@ static void srp_process_aer_req(struct srp_rdma_ch 
*ch,
  	s32 delta = be32_to_cpu(req->req_lim_delta);

  	shost_printk(KERN_ERR, target->scsi_host, PFX
-		     "ignoring AER for LUN %llu\n", be64_to_cpu(req->lun));
+		     "ignoring AER for LUN %llu\n", scsilun_to_int(&req->lun));

  	if (srp_response_common(ch, delta, &rsp, sizeof(rsp)))
  		shost_printk(KERN_ERR, target->scsi_host, PFX
@@ -2027,7 +2027,7 @@ static int srp_queuecommand(struct Scsi_Host 
*shost, struct scsi_cmnd *scmnd)
  	memset(cmd, 0, sizeof *cmd);

  	cmd->opcode = SRP_CMD;
-	cmd->lun    = cpu_to_be64((u64) scmnd->device->lun << 48);
+	int_to_scsilun(scmnd->device->lun, &cmd->lun);
  	cmd->tag    = tag;
  	memcpy(cmd->cdb, scmnd->cmnd, scmnd->cmd_len);

@@ -2407,8 +2407,8 @@ srp_change_queue_depth(struct scsi_device *sdev, 
int qdepth)
  	return scsi_change_queue_depth(sdev, qdepth);
  }

-static int srp_send_tsk_mgmt(struct srp_rdma_ch *ch, u64 req_tag,
-			     unsigned int lun, u8 func)
+static int srp_send_tsk_mgmt(struct srp_rdma_ch *ch, u64 req_tag, u64 lun,
+			     u8 func)
  {
  	struct srp_target_port *target = ch->target;
  	struct srp_rport *rport = target->rport;
@@ -2442,7 +2442,7 @@ static int srp_send_tsk_mgmt(struct srp_rdma_ch 
*ch, u64 req_tag,
  	memset(tsk_mgmt, 0, sizeof *tsk_mgmt);

  	tsk_mgmt->opcode 	= SRP_TSK_MGMT;
-	tsk_mgmt->lun		= cpu_to_be64((u64) lun << 48);
+	int_to_scsilun(lun, &tsk_mgmt->lun);
  	tsk_mgmt->tag		= req_tag | SRP_TAG_TSK_MGMT;
  	tsk_mgmt->tsk_mgmt_func = func;
  	tsk_mgmt->task_tag	= req_tag;
@@ -3136,7 +3136,7 @@ static ssize_t srp_create_target(struct device *dev,
  	target_host->transportt  = ib_srp_transport_template;
  	target_host->max_channel = 0;
  	target_host->max_id      = 1;
-	target_host->max_lun     = SRP_MAX_LUN;
+	target_host->max_lun     = -1LL;
  	target_host->max_cmd_len = sizeof ((struct srp_cmd *) (void *) 0L)->cdb;

  	target = host_to_target(target_host);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h 
b/drivers/infiniband/ulp/srp/ib_srp.h
index 95a4471..ba036ab 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -54,7 +54,6 @@ enum {
  	SRP_DLID_REDIRECT	= 2,
  	SRP_STALE_CONN		= 3,

-	SRP_MAX_LUN		= 512,
  	SRP_DEF_SG_TABLESIZE	= 12,

  	SRP_DEFAULT_QUEUE_SIZE	= 1 << 6,
diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c 
b/drivers/scsi/ibmvscsi/ibmvscsi.c
index acea5d6..6a41c36 100644
--- a/drivers/scsi/ibmvscsi/ibmvscsi.c
+++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
@@ -1053,7 +1053,7 @@ static int ibmvscsi_queuecommand_lck(struct 
scsi_cmnd *cmnd,
  	memset(srp_cmd, 0x00, SRP_MAX_IU_LEN);
  	srp_cmd->opcode = SRP_CMD;
  	memcpy(srp_cmd->cdb, cmnd->cmnd, sizeof(srp_cmd->cdb));
-	srp_cmd->lun = cpu_to_be64(((u64)lun) << 48);
+	int_to_scsilun(lun, &srp_cmd->lun);

  	if (!map_data_for_srp_cmd(cmnd, evt_struct, srp_cmd, hostdata->dev)) {
  		if (!firmware_has_feature(FW_FEATURE_CMO))
@@ -1529,7 +1529,7 @@ static int ibmvscsi_eh_abort_handler(struct 
scsi_cmnd *cmd)
  		/* Set up an abort SRP command */
  		memset(tsk_mgmt, 0x00, sizeof(*tsk_mgmt));
  		tsk_mgmt->opcode = SRP_TSK_MGMT;
-		tsk_mgmt->lun = cpu_to_be64(((u64) lun) << 48);
+		int_to_scsilun(lun, &tsk_mgmt->lun);
  		tsk_mgmt->tsk_mgmt_func = SRP_TSK_ABORT_TASK;
  		tsk_mgmt->task_tag = (u64) found_evt;

@@ -1652,7 +1652,7 @@ static int ibmvscsi_eh_device_reset_handler(struct 
scsi_cmnd *cmd)
  		/* Set up a lun reset SRP command */
  		memset(tsk_mgmt, 0x00, sizeof(*tsk_mgmt));
  		tsk_mgmt->opcode = SRP_TSK_MGMT;
-		tsk_mgmt->lun = cpu_to_be64(((u64) lun) << 48);
+		int_to_scsilun(lun, &tsk_mgmt->lun);
  		tsk_mgmt->tsk_mgmt_func = SRP_TSK_LUN_RESET;

  		evt->sync_srp = &srp_rsp;
diff --git a/include/scsi/srp.h b/include/scsi/srp.h
index 1ae84db..5be834d 100644
--- a/include/scsi/srp.h
+++ b/include/scsi/srp.h
@@ -42,6 +42,7 @@
   */

  #include <linux/types.h>
+#include <scsi/scsi.h>

  enum {
  	SRP_LOGIN_REQ	= 0x00,
@@ -179,7 +180,7 @@ struct srp_tsk_mgmt {
  	u8	reserved1[6];
  	u64	tag;
  	u8	reserved2[4];
-	__be64	lun __attribute__((packed));
+	struct scsi_lun	lun;
  	u8	reserved3[2];
  	u8	tsk_mgmt_func;
  	u8	reserved4;
@@ -200,7 +201,7 @@ struct srp_cmd {
  	u8	data_in_desc_cnt;
  	u64	tag;
  	u8	reserved2[4];
-	__be64	lun __attribute__((packed));
+	struct scsi_lun	lun;
  	u8	reserved3;
  	u8	task_attr;
  	u8	reserved4;
@@ -265,7 +266,7 @@ struct srp_aer_req {
  	__be32	req_lim_delta;
  	u64	tag;
  	u32	reserved2;
-	__be64	lun;
+	struct scsi_lun	lun;
  	__be32	sense_data_len;
  	u32	reserved3;
  	u8	sense_data[0];
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* [PATCH 12/12] IB/srp: Make CM timeout dependent on subnet timeout
       [not found] ` <5541EE21.3050809-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  2015-04-30  8:57   ` [PATCH 03/12] IB/srp: Remove an extraneous scsi_host_put() from an error path Bart Van Assche
  2015-04-30  9:02   ` [PATCH 11/12] IB/srp: Add 64-bit LUN support Bart Van Assche
@ 2015-04-30  9:02   ` Bart Van Assche
       [not found]     ` <5541EFB3.6030704-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  2 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-04-30  9:02 UTC (permalink / raw)
  To: Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

In large networks the subnet timeout may have been set to a value
above 20. In small networks it can be safe to reduce the subnet
timeout below 20. The CM timeout should be proportional to the
subnet timeout. Hence make the CM timeout dependent on the subnet
timeout. Since the default subnet timeout used by OpenSM is 18
this patch does not change the CM timeout if the default OpenSM
subnet timeout is used.

Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 27d3a64..a18a2ae 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -704,6 +704,19 @@ static int srp_lookup_path(struct srp_rdma_ch *ch)
 	return ch->status;
 }
 
+static u8 srp_get_subnet_timeout(struct srp_host *host)
+{
+	struct ib_port_attr attr;
+	int ret;
+	u8 subnet_timeout = 18;
+
+	ret = ib_query_port(host->srp_dev->dev, host->port, &attr);
+	if (ret == 0)
+		subnet_timeout = attr.subnet_timeout;
+
+	return subnet_timeout;
+}
+
 static int srp_send_req(struct srp_rdma_ch *ch, bool multich)
 {
 	struct srp_target_port *target = ch->target;
@@ -711,8 +724,11 @@ static int srp_send_req(struct srp_rdma_ch *ch, bool multich)
 		struct ib_cm_req_param param;
 		struct srp_login_req   priv;
 	} *req = NULL;
+	u8 subnet_timeout;
 	int status;
 
+	subnet_timeout = srp_get_subnet_timeout(target->srp_host);
+
 	req = kzalloc(sizeof *req, GFP_KERNEL);
 	if (!req)
 		return -ENOMEM;
@@ -734,8 +750,8 @@ static int srp_send_req(struct srp_rdma_ch *ch, bool multich)
 	 * module parameters if anyone cared about setting them.
 	 */
 	req->param.responder_resources	      = 4;
-	req->param.remote_cm_response_timeout = 20;
-	req->param.local_cm_response_timeout  = 20;
+	req->param.remote_cm_response_timeout = subnet_timeout + 2;
+	req->param.local_cm_response_timeout  = subnet_timeout + 2;
 	req->param.retry_count                = target->tl_retry_count;
 	req->param.rnr_retry_count 	      = 7;
 	req->param.max_cm_retries 	      = 15;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()
       [not found]   ` <5541EE4A.30803-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2015-04-30  9:32     ` Sagi Grimberg
  2015-04-30  9:37     ` Christoph Hellwig
  1 sibling, 0 replies; 59+ messages in thread
From: Sagi Grimberg @ 2015-04-30  9:32 UTC (permalink / raw)
  To: Bart Van Assche, Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

On 4/30/2015 11:56 AM, Bart Van Assche wrote:
> Introduce the helper function srp_wait_for_queuecommand().
> Move the definition of scsi_request_fn_active(). This patch
> does not change any functionality. A second call to
> scsi_wait_for_queuecommand() will be introduced in the next
> patch.
>
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: James Bottomley <JBottomley-O3H1v1f1dlM@public.gmane.org>
> Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
> Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> #v3.13
> ---
>   drivers/scsi/scsi_transport_srp.c | 52 ++++++++++++++++++++++-----------------
>   1 file changed, 29 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
> index ae45bd9..6ce1c48 100644
> --- a/drivers/scsi/scsi_transport_srp.c
> +++ b/drivers/scsi/scsi_transport_srp.c
> @@ -396,6 +396,34 @@ static void srp_reconnect_work(struct work_struct *work)
>   	}
>   }
>
> +/**
> + * scsi_request_fn_active() - number of kernel threads inside scsi_request_fn()
> + * @shost: SCSI host for which to count the number of scsi_request_fn() callers.
> + */
> +static int scsi_request_fn_active(struct Scsi_Host *shost)
> +{
> +	struct scsi_device *sdev;
> +	struct request_queue *q;
> +	int request_fn_active = 0;
> +
> +	shost_for_each_device(sdev, shost) {
> +		q = sdev->request_queue;
> +
> +		spin_lock_irq(q->queue_lock);
> +		request_fn_active += q->request_fn_active;
> +		spin_unlock_irq(q->queue_lock);
> +	}
> +
> +	return request_fn_active;
> +}
> +
> +/* Wait until ongoing shost->hostt->queuecommand() calls have finished. */
> +static void srp_wait_for_queuecommand(struct Scsi_Host *shost)
> +{
> +	while (scsi_request_fn_active(shost))
> +		msleep(20);
> +}
> +
>   static void __rport_fail_io_fast(struct srp_rport *rport)
>   {
>   	struct Scsi_Host *shost = rport_to_shost(rport);
> @@ -504,27 +532,6 @@ void srp_start_tl_fail_timers(struct srp_rport *rport)
>   EXPORT_SYMBOL(srp_start_tl_fail_timers);
>
>   /**
> - * scsi_request_fn_active() - number of kernel threads inside scsi_request_fn()
> - * @shost: SCSI host for which to count the number of scsi_request_fn() callers.
> - */
> -static int scsi_request_fn_active(struct Scsi_Host *shost)
> -{
> -	struct scsi_device *sdev;
> -	struct request_queue *q;
> -	int request_fn_active = 0;
> -
> -	shost_for_each_device(sdev, shost) {
> -		q = sdev->request_queue;
> -
> -		spin_lock_irq(q->queue_lock);
> -		request_fn_active += q->request_fn_active;
> -		spin_unlock_irq(q->queue_lock);
> -	}
> -
> -	return request_fn_active;
> -}
> -
> -/**
>    * srp_reconnect_rport() - reconnect to an SRP target port
>    * @rport: SRP target port.
>    *
> @@ -559,8 +566,7 @@ int srp_reconnect_rport(struct srp_rport *rport)
>   	if (res)
>   		goto out;
>   	scsi_target_block(&shost->shost_gendev);
> -	while (scsi_request_fn_active(shost))
> -		msleep(20);
> +	srp_wait_for_queuecommand(shost);
>   	res = rport->state != SRP_RPORT_LOST ? i->f->reconnect(rport) : -ENODEV;
>   	pr_debug("%s (state %d): transport.reconnect() returned %d\n",
>   		 dev_name(&shost->shost_gendev), rport->state, res);
>

Looks good.

Reviewed-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()
       [not found]   ` <5541EE4A.30803-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  2015-04-30  9:32     ` Sagi Grimberg
@ 2015-04-30  9:37     ` Christoph Hellwig
  2015-04-30 10:26       ` Bart Van Assche
  1 sibling, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2015-04-30  9:37 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Doug Ledford, James Bottomley, Sagi Grimberg,
	Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

On Thu, Apr 30, 2015 at 10:56:42AM +0200, Bart Van Assche wrote:
> Introduce the helper function srp_wait_for_queuecommand().
> Move the definition of scsi_request_fn_active(). This patch
> does not change any functionality. A second call to
> scsi_wait_for_queuecommand() will be introduced in the next
> patch.

Can we take a step back here?

This isn;t something driver should be doing, so we really should take
care of this in the SCSI midlayer.

Especially given that we don't maintain request_fn_active for blk-mq.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 02/12] scsi_transport_srp: Fix a race condition
       [not found]   ` <5541EE66.7090608-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2015-04-30  9:44     ` Sagi Grimberg
       [not found]       ` <5541F96F.8090503-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 59+ messages in thread
From: Sagi Grimberg @ 2015-04-30  9:44 UTC (permalink / raw)
  To: Bart Van Assche, Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

On 4/30/2015 11:57 AM, Bart Van Assche wrote:
> Avoid that srp_terminate_io() can get invoked while srp_queuecommand()
> is in progress. This patch avoids that an I/O timeout can trigger the
> following kernel warning:
>
> WARNING: at drivers/infiniband/ulp/srp/ib_srp.c:1447 srp_terminate_io+0xef/0x100 [ib_srp]()
> Call Trace:
>   [<ffffffff814c65a2>] dump_stack+0x4e/0x68
>   [<ffffffff81051f71>] warn_slowpath_common+0x81/0xa0
>   [<ffffffff8105204a>] warn_slowpath_null+0x1a/0x20
>   [<ffffffffa075f51f>] srp_terminate_io+0xef/0x100 [ib_srp]
>   [<ffffffffa07495da>] __rport_fail_io_fast+0xba/0xc0 [scsi_transport_srp]
>   [<ffffffffa0749a90>] rport_fast_io_fail_timedout+0xe0/0xf0 [scsi_transport_srp]
>   [<ffffffff8106e09b>] process_one_work+0x1db/0x780
>   [<ffffffff8106e75b>] worker_thread+0x11b/0x450
>   [<ffffffff81073c64>] kthread+0xe4/0x100
>   [<ffffffff814cf26c>] ret_from_fork+0x7c/0xb0
>
> See also patch "scsi_transport_srp: Add transport layer error
> handling" (commit ID 29c17324803c).
>
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: James Bottomley <JBottomley-O3H1v1f1dlM@public.gmane.org>
> Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
> Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> #v3.13
> ---
>   drivers/scsi/scsi_transport_srp.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
> index 6ce1c48..4a44337 100644
> --- a/drivers/scsi/scsi_transport_srp.c
> +++ b/drivers/scsi/scsi_transport_srp.c
> @@ -437,8 +437,10 @@ static void __rport_fail_io_fast(struct srp_rport *rport)
>
>   	/* Involve the LLD if possible to terminate all I/O on the rport. */
>   	i = to_srp_internal(shost->transportt);
> -	if (i->f->terminate_rport_io)
> +	if (i->f->terminate_rport_io) {
> +		srp_wait_for_queuecommand(shost);
>   		i->f->terminate_rport_io(rport);
> +	}

Why not just terminate the inflight IO before unblocking the target?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 03/12] IB/srp: Remove an extraneous scsi_host_put() from an error path
  2015-04-30  8:57   ` [PATCH 03/12] IB/srp: Remove an extraneous scsi_host_put() from an error path Bart Van Assche
@ 2015-04-30  9:44     ` Sagi Grimberg
  0 siblings, 0 replies; 59+ messages in thread
From: Sagi Grimberg @ 2015-04-30  9:44 UTC (permalink / raw)
  To: Bart Van Assche, Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi

On 4/30/2015 11:57 AM, Bart Van Assche wrote:
> Fix a scsi_get_host() / scsi_host_put() imbalance in the error
> path of srp_create_target(). See also patch "IB/srp: Avoid that
> I/O hangs due to a cable pull during LUN scanning" (commit ID
> 34aa654ecb8e).
>
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Sagi Grimberg <sagig@mellanox.com>
> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
> Cc: <stable@vger.kernel.org> #v3.19
> ---
>   drivers/infiniband/ulp/srp/ib_srp.c | 13 +++++--------
>   1 file changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index 918814c..5ce6cfd 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -3172,11 +3172,11 @@ static ssize_t srp_create_target(struct device *dev,
>
>   	ret = srp_parse_options(buf, target);
>   	if (ret)
> -		goto err;
> +		goto out;
>
>   	ret = scsi_init_shared_tag_map(target_host, target_host->can_queue);
>   	if (ret)
> -		goto err;
> +		goto out;
>
>   	target->req_ring_size = target->queue_size - SRP_TSK_MGMT_SQ_SIZE;
>
> @@ -3187,7 +3187,7 @@ static ssize_t srp_create_target(struct device *dev,
>   			     be64_to_cpu(target->ioc_guid),
>   			     be64_to_cpu(target->initiator_ext));
>   		ret = -EEXIST;
> -		goto err;
> +		goto out;
>   	}
>
>   	if (!srp_dev->has_fmr && !srp_dev->has_fr && !target->allow_ext_sg &&
> @@ -3208,7 +3208,7 @@ static ssize_t srp_create_target(struct device *dev,
>   	spin_lock_init(&target->lock);
>   	ret = ib_query_gid(ibdev, host->port, 0, &target->sgid);
>   	if (ret)
> -		goto err;
> +		goto out;
>
>   	ret = -ENOMEM;
>   	target->ch_count = max_t(unsigned, num_online_nodes(),
> @@ -3219,7 +3219,7 @@ static ssize_t srp_create_target(struct device *dev,
>   	target->ch = kcalloc(target->ch_count, sizeof(*target->ch),
>   			     GFP_KERNEL);
>   	if (!target->ch)
> -		goto err;
> +		goto out;
>
>   	node_idx = 0;
>   	for_each_online_node(node) {
> @@ -3315,9 +3315,6 @@ err_disconnect:
>   	}
>
>   	kfree(target->ch);
> -
> -err:
> -	scsi_host_put(target_host);
>   	goto out;
>   }
>
>

Looks good.

Reviewed-by: Sagi Grimberg <sagig@mellanox.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/12] IB/srp: Fix connection state tracking
  2015-04-30  8:58 ` [PATCH 04/12] IB/srp: Fix connection state tracking Bart Van Assche
@ 2015-04-30  9:51   ` Sagi Grimberg
  2015-04-30 11:25     ` Bart Van Assche
  2015-04-30 16:08   ` Doug Ledford
  1 sibling, 1 reply; 59+ messages in thread
From: Sagi Grimberg @ 2015-04-30  9:51 UTC (permalink / raw)
  To: Bart Van Assche, Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi

On 4/30/2015 11:58 AM, Bart Van Assche wrote:
> Reception of a DREQ message only causes the state of a single
> channel to change. Modify the SRP initiator such that channel
> and target connection state are tracked separately. This patch
> avoids that following false positive warning can be reported
> by srp_destroy_qp():
>
> WARNING: at drivers/infiniband/ulp/srp/ib_srp.c:617 srp_destroy_qp+0xa6/0x120 [ib_srp]()
> Call Trace:
> [<ffffffff8106e10f>] warn_slowpath_common+0x7f/0xc0
> [<ffffffff8106e16a>] warn_slowpath_null+0x1a/0x20
> [<ffffffffa0440226>] srp_destroy_qp+0xa6/0x120 [ib_srp]
> [<ffffffffa0440322>] srp_free_ch_ib+0x82/0x1e0 [ib_srp]
> [<ffffffffa044408b>] srp_create_target+0x7ab/0x998 [ib_srp]
> [<ffffffff81346f60>] dev_attr_store+0x20/0x30
> [<ffffffff811dd90f>] sysfs_write_file+0xef/0x170
> [<ffffffff8116d248>] vfs_write+0xc8/0x190
> [<ffffffff8116d411>] sys_write+0x51/0x90
>
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Sagi Grimberg <sagig@mellanox.com>
> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
> Cc: <stable@vger.kernel.org> #v3.19
> ---
>   drivers/infiniband/ulp/srp/ib_srp.c | 7 ++++---
>   drivers/infiniband/ulp/srp/ib_srp.h | 1 +
>   2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index 5ce6cfd..0eb07d3 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -465,14 +465,13 @@ static struct srp_fr_pool *srp_alloc_fr_pool(struct srp_target_port *target)
>    */
>   static void srp_destroy_qp(struct srp_rdma_ch *ch)
>   {
> -	struct srp_target_port *target = ch->target;
>   	static struct ib_qp_attr attr = { .qp_state = IB_QPS_ERR };
>   	static struct ib_recv_wr wr = { .wr_id = SRP_LAST_WR_ID };
>   	struct ib_recv_wr *bad_wr;
>   	int ret;
>
>   	/* Destroying a QP and reusing ch->done is only safe if not connected */
> -	WARN_ON_ONCE(target->connected);
> +	WARN_ON_ONCE(ch->connected);
>
>   	ret = ib_modify_qp(ch->qp, &attr, IB_QP_STATE);
>   	WARN_ONCE(ret, "ib_cm_init_qp_attr() returned %d\n", ret);
> @@ -836,6 +835,7 @@ static void srp_disconnect_target(struct srp_target_port *target)
>
>   		for (i = 0; i < target->ch_count; i++) {
>   			ch = &target->ch[i];
> +			ch->connected = false;
>   			if (ch->cm_id && ib_send_cm_dreq(ch->cm_id, NULL, 0)) {
>   				shost_printk(KERN_DEBUG, target->scsi_host,
>   					     PFX "Sending CM DREQ failed\n");
> @@ -1017,6 +1017,7 @@ static int srp_connect_ch(struct srp_rdma_ch *ch, bool multich)
>   		switch (ch->status) {
>   		case 0:
>   			srp_change_conn_state(target, true);
> +			ch->connected = true;
>   			return 0;
>
>   		case SRP_PORT_REDIRECT:
> @@ -2367,7 +2368,7 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
>   	case IB_CM_DREQ_RECEIVED:
>   		shost_printk(KERN_WARNING, target->scsi_host,
>   			     PFX "DREQ received - connection closed\n");
> -		srp_change_conn_state(target, false);
> +		ch->connected = false;

Shouldn't this be protected by the channel lock (like the target)?

>   		if (ib_send_cm_drep(cm_id, NULL, 0))
>   			shost_printk(KERN_ERR, target->scsi_host,
>   				     PFX "Sending CM DREP failed\n");
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
> index a611556..95a4471 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.h
> +++ b/drivers/infiniband/ulp/srp/ib_srp.h
> @@ -170,6 +170,7 @@ struct srp_rdma_ch {
>
>   	struct completion	tsk_mgmt_done;
>   	u8			tsk_mgmt_status;
> +	bool			connected;

I generally agree with this flag, but I'm a bit confused about the
semantics of two connected flags? Also, we check the target connected
flag on TMFs, although we are executing it on a channel (should we
check both?)

I'd say keep only the channel connected flag, the target logic needs to
be mandated by the state.

Sagi.

I think we need to keep only




^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/12] scsi_transport_srp: Reduce failover time
  2015-04-30  8:59 ` [PATCH 06/12] scsi_transport_srp: Reduce failover time Bart Van Assche
@ 2015-04-30 10:13   ` Sagi Grimberg
  2015-04-30 11:02     ` Bart Van Assche
  0 siblings, 1 reply; 59+ messages in thread
From: Sagi Grimberg @ 2015-04-30 10:13 UTC (permalink / raw)
  To: Bart Van Assche, Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi

On 4/30/2015 11:59 AM, Bart Van Assche wrote:
> Unlike FC, there is no risk that SCSI devices will be offlined
> by the error handler if it is started while a reconnect attempt
> is ongoing. Hence report timeout errors as soon as possible
> if a higher software layer (e.g. multipathd) needs to be informed
> about a timeout. It is assumed that if both fast_io_fail_tmo < 0
> and rport->dev_loss_tmo < 0 that this means that there is no
> need to report timeouts quickly.
>
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: James Bottomley <JBottomley@Odin.com>
> Cc: Sagi Grimberg <sagig@mellanox.com>
> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
> ---
>   drivers/scsi/scsi_transport_srp.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
> index 4a44337..6667c2b 100644
> --- a/drivers/scsi/scsi_transport_srp.c
> +++ b/drivers/scsi/scsi_transport_srp.c
> @@ -61,6 +61,11 @@ static inline struct Scsi_Host *rport_to_shost(struct srp_rport *r)
>   	return dev_to_shost(r->dev.parent);
>   }
>
> +static inline struct srp_rport *shost_to_rport(struct Scsi_Host *shost)
> +{
> +	return transport_class_to_srp_rport(&shost->shost_gendev);
> +}
> +
>   /**
>    * srp_tmo_valid() - check timeout combination validity
>    * @reconnect_delay: Reconnect delay in seconds.
> @@ -626,9 +631,11 @@ static enum blk_eh_timer_return srp_timed_out(struct scsi_cmnd *scmd)
>   	struct scsi_device *sdev = scmd->device;
>   	struct Scsi_Host *shost = sdev->host;
>   	struct srp_internal *i = to_srp_internal(shost->transportt);
> +	struct srp_rport *rport = shost_to_rport(shost);
>
>   	pr_debug("timeout for sdev %s\n", dev_name(&sdev->sdev_gendev));
> -	return i->f->reset_timer_if_blocked && scsi_device_blocked(sdev) ?
> +	return rport->fast_io_fail_tmo < 0 && rport->dev_loss_tmo < 0 &&
> +		i->f->reset_timer_if_blocked && scsi_device_blocked(sdev) ?
>   		BLK_EH_RESET_TIMER : BLK_EH_NOT_HANDLED;
>   }
>
>

I'm not sure I understand the purpose of this patch? If the user
requested fast_io_fail_tmo of X, I would assume it wants srp to retry
for the command for X secs. Why should we fail it any earlier than that?

There is a situation where I've seen srp starting the fast_io_fail_tmo
later than expected because the QP retry exceeded error completion
arrives late for certain workloads. Was this the motivation for this
patch?

Sagi.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 07/12] IB/srp: Remove superfluous casts
  2015-04-30  9:00 ` [PATCH 07/12] IB/srp: Remove superfluous casts Bart Van Assche
@ 2015-04-30 10:13   ` Sagi Grimberg
  0 siblings, 0 replies; 59+ messages in thread
From: Sagi Grimberg @ 2015-04-30 10:13 UTC (permalink / raw)
  To: Bart Van Assche, Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi

On 4/30/2015 12:00 PM, Bart Van Assche wrote:
> A long time ago the data type int64_t was declared as long long
> on x86 systems and as long on PPC systems. Today that data type
> is declared as long long on all Linux architectures. This means
> that the casts from uint64_t into unsigned long long are
> superfluous.  Remove these superfluous casts.
>
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Sagi Grimberg <sagig@mellanox.com>
> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
> ---
>   drivers/infiniband/ulp/srp/ib_srp.c | 13 +++++--------
>   1 file changed, 5 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index 5d7d790..8b6f166 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -780,7 +780,7 @@ static int srp_send_req(struct srp_rdma_ch *ch, bool multich)
>   		shost_printk(KERN_DEBUG, target->scsi_host,
>   			     PFX "Topspin/Cisco initiator port ID workaround "
>   			     "activated for target GUID %016llx\n",
> -			     (unsigned long long) be64_to_cpu(target->ioc_guid));
> +			     be64_to_cpu(target->ioc_guid));
>   		memset(req->priv.initiator_port_id, 0, 8);
>   		memcpy(req->priv.initiator_port_id + 8,
>   		       &target->srp_host->srp_dev->dev->node_guid, 8);
> @@ -2561,8 +2561,7 @@ static ssize_t show_id_ext(struct device *dev, struct device_attribute *attr,
>   {
>   	struct srp_target_port *target = host_to_target(class_to_shost(dev));
>
> -	return sprintf(buf, "0x%016llx\n",
> -		       (unsigned long long) be64_to_cpu(target->id_ext));
> +	return sprintf(buf, "0x%016llx\n", be64_to_cpu(target->id_ext));
>   }
>
>   static ssize_t show_ioc_guid(struct device *dev, struct device_attribute *attr,
> @@ -2570,8 +2569,7 @@ static ssize_t show_ioc_guid(struct device *dev, struct device_attribute *attr,
>   {
>   	struct srp_target_port *target = host_to_target(class_to_shost(dev));
>
> -	return sprintf(buf, "0x%016llx\n",
> -		       (unsigned long long) be64_to_cpu(target->ioc_guid));
> +	return sprintf(buf, "0x%016llx\n", be64_to_cpu(target->ioc_guid));
>   }
>
>   static ssize_t show_service_id(struct device *dev,
> @@ -2579,8 +2577,7 @@ static ssize_t show_service_id(struct device *dev,
>   {
>   	struct srp_target_port *target = host_to_target(class_to_shost(dev));
>
> -	return sprintf(buf, "0x%016llx\n",
> -		       (unsigned long long) be64_to_cpu(target->service_id));
> +	return sprintf(buf, "0x%016llx\n", be64_to_cpu(target->service_id));
>   }
>
>   static ssize_t show_pkey(struct device *dev, struct device_attribute *attr,
> @@ -2771,7 +2768,7 @@ static int srp_add_target(struct srp_host *host, struct srp_target_port *target)
>
>   	target->state = SRP_TARGET_SCANNING;
>   	sprintf(target->target_name, "SRP.T10:%016llX",
> -		 (unsigned long long) be64_to_cpu(target->id_ext));
> +		be64_to_cpu(target->id_ext));
>
>   	if (scsi_add_host(target->scsi_host, host->srp_dev->dev->dma_device))
>   		return -ENODEV;
>

Looks good.

Reviewed-by: Sagi Grimberg <sagig@mellanox.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 08/12] IB/srp: Rearrange module description
       [not found]   ` <5541EF39.6040301-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2015-04-30 10:15     ` Sagi Grimberg
  0 siblings, 0 replies; 59+ messages in thread
From: Sagi Grimberg @ 2015-04-30 10:15 UTC (permalink / raw)
  To: Bart Van Assche, Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

On 4/30/2015 12:00 PM, Bart Van Assche wrote:
> Move the module version and release date into separate fields.
> This makes the modinfo output easier to read.
>
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
> ---
>   drivers/infiniband/ulp/srp/ib_srp.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index 8b6f166..3727b90 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -59,9 +59,10 @@
>   #define DRV_RELDATE	"July 1, 2013"
>
>   MODULE_AUTHOR("Roland Dreier");
> -MODULE_DESCRIPTION("InfiniBand SCSI RDMA Protocol initiator "
> -		   "v" DRV_VERSION " (" DRV_RELDATE ")");
> +MODULE_DESCRIPTION("InfiniBand SCSI RDMA Protocol initiator");
>   MODULE_LICENSE("Dual BSD/GPL");
> +MODULE_VERSION(DRV_VERSION);
> +MODULE_INFO(release_date, DRV_RELDATE);
>
>   static unsigned int srp_sg_tablesize;
>   static unsigned int cmd_sg_entries;
>

Looks good.

Reviewed-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 09/12] IB/srp: Remove a superfluous check from srp_free_req_data()
       [not found]   ` <5541EF4F.6050200-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2015-04-30 10:18     ` Sagi Grimberg
  2015-04-30 10:37       ` Bart Van Assche
  0 siblings, 1 reply; 59+ messages in thread
From: Sagi Grimberg @ 2015-04-30 10:18 UTC (permalink / raw)
  To: Bart Van Assche, Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

On 4/30/2015 12:01 PM, Bart Van Assche wrote:
> All srp_free_req_data() callers guarantee that ch->target != NULL
> hence remove the ch->target test from srp_free_req_data().

ch->target is not referenced anywhere in this routine. So why does
this guarantee matter?

>
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
> ---
>   drivers/infiniband/ulp/srp/ib_srp.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index 3727b90..a73eb1e5 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -853,7 +853,7 @@ static void srp_free_req_data(struct srp_target_port *target,
>   	struct srp_request *req;
>   	int i;
>
> -	if (!ch->target || !ch->req_ring)
> +	if (!ch->req_ring)
>   		return;
>
>   	for (i = 0; i < target->req_ring_size; ++i) {
>

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 10/12] IB/srp: Remove !ch->target tests from the reconnect code
  2015-04-30  9:01 ` [PATCH 10/12] IB/srp: Remove !ch->target tests from the reconnect code Bart Van Assche
@ 2015-04-30 10:19   ` Sagi Grimberg
  0 siblings, 0 replies; 59+ messages in thread
From: Sagi Grimberg @ 2015-04-30 10:19 UTC (permalink / raw)
  To: Bart Van Assche, Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi

On 4/30/2015 12:01 PM, Bart Van Assche wrote:
> Remove the !ch->target tests from the reconnect code. These
> tests are not needed: upon entry of srp_rport_reconnect()
> it is guaranteed that all ch->target pointers are non-NULL.
> None of the functions srp_new_cm_id(), srp_finish_req(),
> srp_create_ch_ib() nor srp_connect_ch() modifies this pointer.
> srp_free_ch_ib() is never called concurrently with
> srp_rport_reconnect().
>
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Sagi Grimberg <sagig@mellanox.com>
> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
> ---
>   drivers/infiniband/ulp/srp/ib_srp.c | 6 ------
>   1 file changed, 6 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index a73eb1e5..400ef7a 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -1216,14 +1216,10 @@ static int srp_rport_reconnect(struct srp_rport *rport)
>   	 */
>   	for (i = 0; i < target->ch_count; i++) {
>   		ch = &target->ch[i];
> -		if (!ch->target)
> -			break;
>   		ret += srp_new_cm_id(ch);
>   	}
>   	for (i = 0; i < target->ch_count; i++) {
>   		ch = &target->ch[i];
> -		if (!ch->target)
> -			break;
>   		for (j = 0; j < target->req_ring_size; ++j) {
>   			struct srp_request *req = &ch->req_ring[j];
>
> @@ -1232,8 +1228,6 @@ static int srp_rport_reconnect(struct srp_rport *rport)
>   	}
>   	for (i = 0; i < target->ch_count; i++) {
>   		ch = &target->ch[i];
> -		if (!ch->target)
> -			break;
>   		/*
>   		 * Whether or not creating a new CM ID succeeded, create a new
>   		 * QP. This guarantees that all completion callback function
>

Looks good.

Reviewed-by: Sagi Grimberg <sagig@mellanox.com>

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 02/12] scsi_transport_srp: Fix a race condition
       [not found]       ` <5541F96F.8090503-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-04-30 10:20         ` Bart Van Assche
  0 siblings, 0 replies; 59+ messages in thread
From: Bart Van Assche @ 2015-04-30 10:20 UTC (permalink / raw)
  To: Sagi Grimberg, Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

On 04/30/15 11:44, Sagi Grimberg wrote:
> On 4/30/2015 11:57 AM, Bart Van Assche wrote:
>> Avoid that srp_terminate_io() can get invoked while srp_queuecommand()
>> is in progress. This patch avoids that an I/O timeout can trigger the
>> following kernel warning:
>>
>> WARNING: at drivers/infiniband/ulp/srp/ib_srp.c:1447
>> srp_terminate_io+0xef/0x100 [ib_srp]()
>> Call Trace:
>>   [<ffffffff814c65a2>] dump_stack+0x4e/0x68
>>   [<ffffffff81051f71>] warn_slowpath_common+0x81/0xa0
>>   [<ffffffff8105204a>] warn_slowpath_null+0x1a/0x20
>>   [<ffffffffa075f51f>] srp_terminate_io+0xef/0x100 [ib_srp]
>>   [<ffffffffa07495da>] __rport_fail_io_fast+0xba/0xc0
>> [scsi_transport_srp]
>>   [<ffffffffa0749a90>] rport_fast_io_fail_timedout+0xe0/0xf0
>> [scsi_transport_srp]
>>   [<ffffffff8106e09b>] process_one_work+0x1db/0x780
>>   [<ffffffff8106e75b>] worker_thread+0x11b/0x450
>>   [<ffffffff81073c64>] kthread+0xe4/0x100
>>   [<ffffffff814cf26c>] ret_from_fork+0x7c/0xb0
>>
>> See also patch "scsi_transport_srp: Add transport layer error
>> handling" (commit ID 29c17324803c).
>>
>> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
>> Cc: James Bottomley <JBottomley-O3H1v1f1dlM@public.gmane.org>
>> Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>> Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
>> Cc: <stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org> #v3.13
>> ---
>>   drivers/scsi/scsi_transport_srp.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/scsi_transport_srp.c
>> b/drivers/scsi/scsi_transport_srp.c
>> index 6ce1c48..4a44337 100644
>> --- a/drivers/scsi/scsi_transport_srp.c
>> +++ b/drivers/scsi/scsi_transport_srp.c
>> @@ -437,8 +437,10 @@ static void __rport_fail_io_fast(struct srp_rport
>> *rport)
>>
>>       /* Involve the LLD if possible to terminate all I/O on the
>> rport. */
>>       i = to_srp_internal(shost->transportt);
>> -    if (i->f->terminate_rport_io)
>> +    if (i->f->terminate_rport_io) {
>> +        srp_wait_for_queuecommand(shost);
>>           i->f->terminate_rport_io(rport);
>> +    }
>
> Why not just terminate the inflight IO before unblocking the target?

Sorry but I don't think that would prevent the described race condition. 
The call trace in the description of this patch illustrates that 
srp_queuecommand() can still be active even after the transport state 
has been changed into "offline". Hence if terminate_rport_io() would be 
invoked earlier the same race would still exist.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()
  2015-04-30  9:37     ` Christoph Hellwig
@ 2015-04-30 10:26       ` Bart Van Assche
  2015-04-30 10:32         ` Sagi Grimberg
  0 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-04-30 10:26 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Doug Ledford, James Bottomley, Sagi Grimberg,
	Sebastian Parschauer, linux-rdma, linux-scsi

On 04/30/15 11:37, Christoph Hellwig wrote:
> On Thu, Apr 30, 2015 at 10:56:42AM +0200, Bart Van Assche wrote:
>> Introduce the helper function srp_wait_for_queuecommand().
>> Move the definition of scsi_request_fn_active(). This patch
>> does not change any functionality. A second call to
>> scsi_wait_for_queuecommand() will be introduced in the next
>> patch.
>
> Can we take a step back here?
>
> This isn;t something driver should be doing, so we really should take
> care of this in the SCSI midlayer.
>
> Especially given that we don't maintain request_fn_active for blk-mq.

Hello Christoph,

How about the following:
* Modify scsi_target_block() and scsi_target_unblock(...,
   SDEV_TRANSPORT_OFFLINE) such that these functions wait until
   active queuecommand() calls have finished.
* Ensure that this approach works not only for the traditional
   block layer but also for blk-mq.

With these changes it won't be necessary anymore to have something like 
srp_wait_for_queuecommand() in the SRP transport layer.

Bart.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 12/12] IB/srp: Make CM timeout dependent on subnet timeout
       [not found]     ` <5541EFB3.6030704-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2015-04-30 10:27       ` Sagi Grimberg
  2015-04-30 10:45         ` Bart Van Assche
  0 siblings, 1 reply; 59+ messages in thread
From: Sagi Grimberg @ 2015-04-30 10:27 UTC (permalink / raw)
  To: Bart Van Assche, Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

On 4/30/2015 12:02 PM, Bart Van Assche wrote:
> In large networks the subnet timeout may have been set to a value
> above 20. In small networks it can be safe to reduce the subnet
> timeout below 20. The CM timeout should be proportional to the
> subnet timeout. Hence make the CM timeout dependent on the subnet
> timeout. Since the default subnet timeout used by OpenSM is 18
> this patch does not change the CM timeout if the default OpenSM
> subnet timeout is used.
>
> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
> Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
> ---
>   drivers/infiniband/ulp/srp/ib_srp.c | 20 ++++++++++++++++++--
>   1 file changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index 27d3a64..a18a2ae 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -704,6 +704,19 @@ static int srp_lookup_path(struct srp_rdma_ch *ch)
>   	return ch->status;
>   }
>
> +static u8 srp_get_subnet_timeout(struct srp_host *host)
> +{
> +	struct ib_port_attr attr;
> +	int ret;
> +	u8 subnet_timeout = 18;
> +
> +	ret = ib_query_port(host->srp_dev->dev, host->port, &attr);
> +	if (ret == 0)
> +		subnet_timeout = attr.subnet_timeout;
> +
> +	return subnet_timeout;
> +}
> +
>   static int srp_send_req(struct srp_rdma_ch *ch, bool multich)
>   {
>   	struct srp_target_port *target = ch->target;
> @@ -711,8 +724,11 @@ static int srp_send_req(struct srp_rdma_ch *ch, bool multich)
>   		struct ib_cm_req_param param;
>   		struct srp_login_req   priv;
>   	} *req = NULL;
> +	u8 subnet_timeout;
>   	int status;
>
> +	subnet_timeout = srp_get_subnet_timeout(target->srp_host);

Is this really something that srp initiator should look at?
I'd say that this needs to be a ib_cm code in case the input timeout
values are not set (e.g. zero).
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()
  2015-04-30 10:26       ` Bart Van Assche
@ 2015-04-30 10:32         ` Sagi Grimberg
  2015-04-30 10:58           ` Bart Van Assche
  0 siblings, 1 reply; 59+ messages in thread
From: Sagi Grimberg @ 2015-04-30 10:32 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: Doug Ledford, James Bottomley, Sagi Grimberg,
	Sebastian Parschauer, linux-rdma, linux-scsi

On 4/30/2015 1:26 PM, Bart Van Assche wrote:
> On 04/30/15 11:37, Christoph Hellwig wrote:
>> On Thu, Apr 30, 2015 at 10:56:42AM +0200, Bart Van Assche wrote:
>>> Introduce the helper function srp_wait_for_queuecommand().
>>> Move the definition of scsi_request_fn_active(). This patch
>>> does not change any functionality. A second call to
>>> scsi_wait_for_queuecommand() will be introduced in the next
>>> patch.
>>
>> Can we take a step back here?
>>
>> This isn;t something driver should be doing, so we really should take
>> care of this in the SCSI midlayer.
>>
>> Especially given that we don't maintain request_fn_active for blk-mq.
>
> Hello Christoph,
>
> How about the following:
> * Modify scsi_target_block() and scsi_target_unblock(...,
>    SDEV_TRANSPORT_OFFLINE) such that these functions wait until
>    active queuecommand() calls have finished.

Won't this make scsi_target_unblock a sleeping function? It might
not fit all the contexts scsi_target_unblock is called.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 09/12] IB/srp: Remove a superfluous check from srp_free_req_data()
  2015-04-30 10:18     ` Sagi Grimberg
@ 2015-04-30 10:37       ` Bart Van Assche
  0 siblings, 0 replies; 59+ messages in thread
From: Bart Van Assche @ 2015-04-30 10:37 UTC (permalink / raw)
  To: Sagi Grimberg, Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi

On 04/30/15 12:18, Sagi Grimberg wrote:
> On 4/30/2015 12:01 PM, Bart Van Assche wrote:
>> All srp_free_req_data() callers guarantee that ch->target != NULL
>> hence remove the ch->target test from srp_free_req_data().
>
> ch->target is not referenced anywhere in this routine. So why does
> this guarantee matter?

I can shorten the description of this patch if you want.

Bart.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 12/12] IB/srp: Make CM timeout dependent on subnet timeout
  2015-04-30 10:27       ` Sagi Grimberg
@ 2015-04-30 10:45         ` Bart Van Assche
  0 siblings, 0 replies; 59+ messages in thread
From: Bart Van Assche @ 2015-04-30 10:45 UTC (permalink / raw)
  To: Sagi Grimberg, Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi, Sean Hefty

On 04/30/15 12:27, Sagi Grimberg wrote:
> On 4/30/2015 12:02 PM, Bart Van Assche wrote:
>> In large networks the subnet timeout may have been set to a value
>> above 20. In small networks it can be safe to reduce the subnet
>> timeout below 20. The CM timeout should be proportional to the
>> subnet timeout. Hence make the CM timeout dependent on the subnet
>> timeout. Since the default subnet timeout used by OpenSM is 18
>> this patch does not change the CM timeout if the default OpenSM
>> subnet timeout is used.
>>
>> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
>> Cc: Sagi Grimberg <sagig@mellanox.com>
>> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
>> ---
>>    drivers/infiniband/ulp/srp/ib_srp.c | 20 ++++++++++++++++++--
>>    1 file changed, 18 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
>> index 27d3a64..a18a2ae 100644
>> --- a/drivers/infiniband/ulp/srp/ib_srp.c
>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
>> @@ -704,6 +704,19 @@ static int srp_lookup_path(struct srp_rdma_ch *ch)
>>    	return ch->status;
>>    }
>>
>> +static u8 srp_get_subnet_timeout(struct srp_host *host)
>> +{
>> +	struct ib_port_attr attr;
>> +	int ret;
>> +	u8 subnet_timeout = 18;
>> +
>> +	ret = ib_query_port(host->srp_dev->dev, host->port, &attr);
>> +	if (ret == 0)
>> +		subnet_timeout = attr.subnet_timeout;
>> +
>> +	return subnet_timeout;
>> +}
>> +
>>    static int srp_send_req(struct srp_rdma_ch *ch, bool multich)
>>    {
>>    	struct srp_target_port *target = ch->target;
>> @@ -711,8 +724,11 @@ static int srp_send_req(struct srp_rdma_ch *ch, bool multich)
>>    		struct ib_cm_req_param param;
>>    		struct srp_login_req   priv;
>>    	} *req = NULL;
>> +	u8 subnet_timeout;
>>    	int status;
>>
>> +	subnet_timeout = srp_get_subnet_timeout(target->srp_host);
>
> Is this really something that srp initiator should look at?
> I'd say that this needs to be a ib_cm code in case the input timeout
> values are not set (e.g. zero).

I'm open to suggestions for how to move this functionality into the IB 
CM. Today the IB CM interprets all timeout values as absolute timeouts 
instead of timeouts that are relative to the subnet timeout. If we want 
to move this functionality into the IB CM we need an approach that 
allows the IB CM to discern relative from absolute values. I think this 
is only possible by changing struct ib_cm_req_param. Sean, would you 
perhaps like to share your opinion about this ?

Bart.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()
  2015-04-30 10:32         ` Sagi Grimberg
@ 2015-04-30 10:58           ` Bart Van Assche
       [not found]             ` <55420AEA.10108-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-04-30 10:58 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig
  Cc: Doug Ledford, James Bottomley, Sagi Grimberg,
	Sebastian Parschauer, linux-rdma, linux-scsi

On 04/30/15 12:33, Sagi Grimberg wrote:
> On 4/30/2015 1:26 PM, Bart Van Assche wrote:
>> On 04/30/15 11:37, Christoph Hellwig wrote:
>>> On Thu, Apr 30, 2015 at 10:56:42AM +0200, Bart Van Assche wrote:
>>>> Introduce the helper function srp_wait_for_queuecommand().
>>>> Move the definition of scsi_request_fn_active(). This patch
>>>> does not change any functionality. A second call to
>>>> scsi_wait_for_queuecommand() will be introduced in the next
>>>> patch.
>>>
>>> Can we take a step back here?
>>>
>>> This isn;t something driver should be doing, so we really should take
>>> care of this in the SCSI midlayer.
>>>
>>> Especially given that we don't maintain request_fn_active for blk-mq.
>>
>> Hello Christoph,
>>
>> How about the following:
>> * Modify scsi_target_block() and scsi_target_unblock(...,
>>     SDEV_TRANSPORT_OFFLINE) such that these functions wait until
>>     active queuecommand() calls have finished.
>
> Won't this make scsi_target_unblock a sleeping function? It might
> not fit all the contexts scsi_target_unblock is called.

The only callers in upstream code of scsi_target_block() and 
scsi_target_unblock() I am aware of are the FC, iSCSI and SRP transport 
layers and the drivers that use these transport layers. As far as I can 
see both functions are always called from thread context and without 
holding any spinlocks. A possible alternative to what I proposed in my 
previous e-mail could be to provide a new function that waits for active 
queuecommand() calls and that has to be called explicitly.

Bart.



^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/12] scsi_transport_srp: Reduce failover time
  2015-04-30 10:13   ` Sagi Grimberg
@ 2015-04-30 11:02     ` Bart Van Assche
       [not found]       ` <55420BAA.7060507-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-04-30 11:02 UTC (permalink / raw)
  To: Sagi Grimberg, Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi

On 04/30/15 12:13, Sagi Grimberg wrote:
> On 4/30/2015 11:59 AM, Bart Van Assche wrote:
>> Unlike FC, there is no risk that SCSI devices will be offlined
>> by the error handler if it is started while a reconnect attempt
>> is ongoing. Hence report timeout errors as soon as possible
>> if a higher software layer (e.g. multipathd) needs to be informed
>> about a timeout. It is assumed that if both fast_io_fail_tmo < 0
>> and rport->dev_loss_tmo < 0 that this means that there is no
>> need to report timeouts quickly.
>>
>> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
>> Cc: James Bottomley <JBottomley@Odin.com>
>> Cc: Sagi Grimberg <sagig@mellanox.com>
>> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
>> ---
>>   drivers/scsi/scsi_transport_srp.c | 9 ++++++++-
>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/scsi_transport_srp.c
>> b/drivers/scsi/scsi_transport_srp.c
>> index 4a44337..6667c2b 100644
>> --- a/drivers/scsi/scsi_transport_srp.c
>> +++ b/drivers/scsi/scsi_transport_srp.c
>> @@ -61,6 +61,11 @@ static inline struct Scsi_Host
>> *rport_to_shost(struct srp_rport *r)
>>       return dev_to_shost(r->dev.parent);
>>   }
>>
>> +static inline struct srp_rport *shost_to_rport(struct Scsi_Host *shost)
>> +{
>> +    return transport_class_to_srp_rport(&shost->shost_gendev);
>> +}
>> +
>>   /**
>>    * srp_tmo_valid() - check timeout combination validity
>>    * @reconnect_delay: Reconnect delay in seconds.
>> @@ -626,9 +631,11 @@ static enum blk_eh_timer_return
>> srp_timed_out(struct scsi_cmnd *scmd)
>>       struct scsi_device *sdev = scmd->device;
>>       struct Scsi_Host *shost = sdev->host;
>>       struct srp_internal *i = to_srp_internal(shost->transportt);
>> +    struct srp_rport *rport = shost_to_rport(shost);
>>
>>       pr_debug("timeout for sdev %s\n", dev_name(&sdev->sdev_gendev));
>> -    return i->f->reset_timer_if_blocked && scsi_device_blocked(sdev) ?
>> +    return rport->fast_io_fail_tmo < 0 && rport->dev_loss_tmo < 0 &&
>> +        i->f->reset_timer_if_blocked && scsi_device_blocked(sdev) ?
>>           BLK_EH_RESET_TIMER : BLK_EH_NOT_HANDLED;
>>   }
>>
>>
>
> I'm not sure I understand the purpose of this patch? If the user
> requested fast_io_fail_tmo of X, I would assume it wants srp to retry
> for the command for X secs. Why should we fail it any earlier than that?
>
> There is a situation where I've seen srp starting the fast_io_fail_tmo
> later than expected because the QP retry exceeded error completion
> arrives late for certain workloads. Was this the motivation for this
> patch?

The purpose of the srp_timed_out() function is not about failing 
commands early but about postponing timeout errors. The above patch 
causes a SCSI timeout to be handled as soon as the command timer expires 
if rport->fast_io_fail_tmo >= 0 or rport->dev_loss_tmo >= 0.

Bart.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/12] IB/srp: Fix connection state tracking
  2015-04-30  9:51   ` Sagi Grimberg
@ 2015-04-30 11:25     ` Bart Van Assche
       [not found]       ` <5542111E.1080305-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-04-30 11:25 UTC (permalink / raw)
  To: Sagi Grimberg, Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi

On 04/30/15 11:51, Sagi Grimberg wrote:
> On 4/30/2015 11:58 AM, Bart Van Assche wrote:
>> -        srp_change_conn_state(target, false);
>> +        ch->connected = false;
>
> Shouldn't this be protected by the channel lock (like the target)?

On all CPU architectures I'm familiar with changes of boolean variables 
are atomic so I think modifying that variable without locking is fine.

>> @@ -170,6 +170,7 @@ struct srp_rdma_ch {
>>
>>       struct completion    tsk_mgmt_done;
>>       u8            tsk_mgmt_status;
>> +    bool            connected;
>
> I generally agree with this flag, but I'm a bit confused about the
> semantics of two connected flags? Also, we check the target connected
> flag on TMFs, although we are executing it on a channel (should we
> check both?)
>
> I'd say keep only the channel connected flag, the target logic needs to
> be mandated by the state.

I think we need both flags. The ch->connected flag is used only in 
srp_destroy_qp() to verify whether a channel has been disconnected 
before it is destroyed. The target->connected flag provides an easy way 
to test the connected state in e.g. srp_disconnect_target(). And if it 
is attempted to send a task management function over a channel that has 
just been disconnected that should be fine since any channel disconnect 
causes tl_err_work to be queued. That last action will sooner or later 
result in a reconnect.

Bart.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()
       [not found]             ` <55420AEA.10108-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2015-04-30 14:13               ` Sagi Grimberg
  2015-04-30 17:25               ` Christoph Hellwig
  1 sibling, 0 replies; 59+ messages in thread
From: Sagi Grimberg @ 2015-04-30 14:13 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: Doug Ledford, James Bottomley, Sagi Grimberg,
	Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

On 4/30/2015 1:58 PM, Bart Van Assche wrote:
> On 04/30/15 12:33, Sagi Grimberg wrote:
>> On 4/30/2015 1:26 PM, Bart Van Assche wrote:
>>> On 04/30/15 11:37, Christoph Hellwig wrote:
>>>> On Thu, Apr 30, 2015 at 10:56:42AM +0200, Bart Van Assche wrote:
>>>>> Introduce the helper function srp_wait_for_queuecommand().
>>>>> Move the definition of scsi_request_fn_active(). This patch
>>>>> does not change any functionality. A second call to
>>>>> scsi_wait_for_queuecommand() will be introduced in the next
>>>>> patch.
>>>>
>>>> Can we take a step back here?
>>>>
>>>> This isn;t something driver should be doing, so we really should take
>>>> care of this in the SCSI midlayer.
>>>>
>>>> Especially given that we don't maintain request_fn_active for blk-mq.
>>>
>>> Hello Christoph,
>>>
>>> How about the following:
>>> * Modify scsi_target_block() and scsi_target_unblock(...,
>>>     SDEV_TRANSPORT_OFFLINE) such that these functions wait until
>>>     active queuecommand() calls have finished.
>>
>> Won't this make scsi_target_unblock a sleeping function? It might
>> not fit all the contexts scsi_target_unblock is called.
>
> The only callers in upstream code of scsi_target_block() and
> scsi_target_unblock() I am aware of are the FC, iSCSI and SRP transport
> layers and the drivers that use these transport layers. As far as I can
> see both functions are always called from thread context and without
> holding any spinlocks. A possible alternative to what I proposed in my
> previous e-mail could be to provide a new function that waits for active
> queuecommand() calls and that has to be called explicitly.
>

Maybe that might be better.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/12] IB/srp: Fix connection state tracking
       [not found]       ` <5542111E.1080305-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2015-04-30 15:00         ` Sagi Grimberg
       [not found]           ` <5542439D.1000107-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 59+ messages in thread
From: Sagi Grimberg @ 2015-04-30 15:00 UTC (permalink / raw)
  To: Bart Van Assche, Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

On 4/30/2015 2:25 PM, Bart Van Assche wrote:
> On 04/30/15 11:51, Sagi Grimberg wrote:
>> On 4/30/2015 11:58 AM, Bart Van Assche wrote:
>>> -        srp_change_conn_state(target, false);
>>> +        ch->connected = false;
>>
>> Shouldn't this be protected by the channel lock (like the target)?
>
> On all CPU architectures I'm familiar with changes of boolean variables
> are atomic so I think modifying that variable without locking is fine.
>
>>> @@ -170,6 +170,7 @@ struct srp_rdma_ch {
>>>
>>>       struct completion    tsk_mgmt_done;
>>>       u8            tsk_mgmt_status;
>>> +    bool            connected;
>>
>> I generally agree with this flag, but I'm a bit confused about the
>> semantics of two connected flags? Also, we check the target connected
>> flag on TMFs, although we are executing it on a channel (should we
>> check both?)
>>
>> I'd say keep only the channel connected flag, the target logic needs to
>> be mandated by the state.
>
> I think we need both flags. The ch->connected flag is used only in
> srp_destroy_qp() to verify whether a channel has been disconnected
> before it is destroyed. The target->connected flag provides an easy way
> to test the connected state in e.g. srp_disconnect_target().

We can just as easily check rport state. rport state is modified before
target-connected. I still think one is redundant.

> And if it
> is attempted to send a task management function over a channel that has
> just been disconnected that should be fine since any channel disconnect
> causes tl_err_work to be queued. That last action will sooner or later
> result in a reconnect.

Will it be fine to queue commands when the channel is in reconnecting
stage?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/12] scsi_transport_srp: Reduce failover time
       [not found]       ` <55420BAA.7060507-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2015-04-30 15:14         ` Sagi Grimberg
       [not found]           ` <554246E6.9020503-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 59+ messages in thread
From: Sagi Grimberg @ 2015-04-30 15:14 UTC (permalink / raw)
  To: Bart Van Assche, Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

On 4/30/2015 2:02 PM, Bart Van Assche wrote:
> On 04/30/15 12:13, Sagi Grimberg wrote:
>> On 4/30/2015 11:59 AM, Bart Van Assche wrote:
>>> Unlike FC, there is no risk that SCSI devices will be offlined
>>> by the error handler if it is started while a reconnect attempt
>>> is ongoing. Hence report timeout errors as soon as possible
>>> if a higher software layer (e.g. multipathd) needs to be informed
>>> about a timeout. It is assumed that if both fast_io_fail_tmo < 0
>>> and rport->dev_loss_tmo < 0 that this means that there is no
>>> need to report timeouts quickly.
>>>
>>> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
>>> Cc: James Bottomley <JBottomley-O3H1v1f1dlM@public.gmane.org>
>>> Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>> Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
>>> ---
>>>   drivers/scsi/scsi_transport_srp.c | 9 ++++++++-
>>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/scsi/scsi_transport_srp.c
>>> b/drivers/scsi/scsi_transport_srp.c
>>> index 4a44337..6667c2b 100644
>>> --- a/drivers/scsi/scsi_transport_srp.c
>>> +++ b/drivers/scsi/scsi_transport_srp.c
>>> @@ -61,6 +61,11 @@ static inline struct Scsi_Host
>>> *rport_to_shost(struct srp_rport *r)
>>>       return dev_to_shost(r->dev.parent);
>>>   }
>>>
>>> +static inline struct srp_rport *shost_to_rport(struct Scsi_Host *shost)
>>> +{
>>> +    return transport_class_to_srp_rport(&shost->shost_gendev);
>>> +}
>>> +
>>>   /**
>>>    * srp_tmo_valid() - check timeout combination validity
>>>    * @reconnect_delay: Reconnect delay in seconds.
>>> @@ -626,9 +631,11 @@ static enum blk_eh_timer_return
>>> srp_timed_out(struct scsi_cmnd *scmd)
>>>       struct scsi_device *sdev = scmd->device;
>>>       struct Scsi_Host *shost = sdev->host;
>>>       struct srp_internal *i = to_srp_internal(shost->transportt);
>>> +    struct srp_rport *rport = shost_to_rport(shost);
>>>
>>>       pr_debug("timeout for sdev %s\n", dev_name(&sdev->sdev_gendev));
>>> -    return i->f->reset_timer_if_blocked && scsi_device_blocked(sdev) ?
>>> +    return rport->fast_io_fail_tmo < 0 && rport->dev_loss_tmo < 0 &&
>>> +        i->f->reset_timer_if_blocked && scsi_device_blocked(sdev) ?
>>>           BLK_EH_RESET_TIMER : BLK_EH_NOT_HANDLED;
>>>   }
>>>
>>>
>>
>> I'm not sure I understand the purpose of this patch? If the user
>> requested fast_io_fail_tmo of X, I would assume it wants srp to retry
>> for the command for X secs. Why should we fail it any earlier than that?
>>
>> There is a situation where I've seen srp starting the fast_io_fail_tmo
>> later than expected because the QP retry exceeded error completion
>> arrives late for certain workloads. Was this the motivation for this
>> patch?
>
> The purpose of the srp_timed_out() function is not about failing
> commands early but about postponing timeout errors. The above patch
> causes a SCSI timeout to be handled as soon as the command timer expires
> if rport->fast_io_fail_tmo >= 0 or rport->dev_loss_tmo >= 0.

This change is effective only when
time_to_wc_err + fast_io_fail_tmo < cmd_timeout.

If the user chose this configuration, I imagine the wanted behavior is
that the timeout errors won't be propagated as soon as the cmd timeout
expires no?

Today, srp return BLK_EH_RESET_TIMER until fast_io_fail_tmo kicks in,
and at that point, it unblocks the scsi target terminates all the
commands with TRANSPORT_FAIL_FAST. This will change the behavior to 
trigger scsi-eh (abort, reset_device) and reset_host will trigger
another reconnect which is redundant (we have reconnect work inflight).

Can you explain the motivation?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/12] IB/srp: Fix connection state tracking
  2015-04-30  8:58 ` [PATCH 04/12] IB/srp: Fix connection state tracking Bart Van Assche
  2015-04-30  9:51   ` Sagi Grimberg
@ 2015-04-30 16:08   ` Doug Ledford
       [not found]     ` <1430410094.102408.71.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 59+ messages in thread
From: Doug Ledford @ 2015-04-30 16:08 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi

[-- Attachment #1: Type: text/plain, Size: 4093 bytes --]

On Thu, 2015-04-30 at 10:58 +0200, Bart Van Assche wrote:
> Reception of a DREQ message only causes the state of a single
> channel to change. Modify the SRP initiator such that channel
> and target connection state are tracked separately. This patch
> avoids that following false positive warning can be reported
> by srp_destroy_qp():
> 
> WARNING: at drivers/infiniband/ulp/srp/ib_srp.c:617 srp_destroy_qp+0xa6/0x120 [ib_srp]()
> Call Trace:
> [<ffffffff8106e10f>] warn_slowpath_common+0x7f/0xc0
> [<ffffffff8106e16a>] warn_slowpath_null+0x1a/0x20
> [<ffffffffa0440226>] srp_destroy_qp+0xa6/0x120 [ib_srp]
> [<ffffffffa0440322>] srp_free_ch_ib+0x82/0x1e0 [ib_srp]
> [<ffffffffa044408b>] srp_create_target+0x7ab/0x998 [ib_srp]
> [<ffffffff81346f60>] dev_attr_store+0x20/0x30
> [<ffffffff811dd90f>] sysfs_write_file+0xef/0x170
> [<ffffffff8116d248>] vfs_write+0xc8/0x190
> [<ffffffff8116d411>] sys_write+0x51/0x90
> 
> Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
> Cc: Sagi Grimberg <sagig@mellanox.com>
> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
> Cc: <stable@vger.kernel.org> #v3.19
> ---
>  drivers/infiniband/ulp/srp/ib_srp.c | 7 ++++---
>  drivers/infiniband/ulp/srp/ib_srp.h | 1 +
>  2 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index 5ce6cfd..0eb07d3 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -465,14 +465,13 @@ static struct srp_fr_pool *srp_alloc_fr_pool(struct srp_target_port *target)
>   */
>  static void srp_destroy_qp(struct srp_rdma_ch *ch)
>  {
> -	struct srp_target_port *target = ch->target;
>  	static struct ib_qp_attr attr = { .qp_state = IB_QPS_ERR };
>  	static struct ib_recv_wr wr = { .wr_id = SRP_LAST_WR_ID };
>  	struct ib_recv_wr *bad_wr;
>  	int ret;
>  
>  	/* Destroying a QP and reusing ch->done is only safe if not connected */
> -	WARN_ON_ONCE(target->connected);
> +	WARN_ON_ONCE(ch->connected);
>  
>  	ret = ib_modify_qp(ch->qp, &attr, IB_QP_STATE);
>  	WARN_ONCE(ret, "ib_cm_init_qp_attr() returned %d\n", ret);
> @@ -836,6 +835,7 @@ static void srp_disconnect_target(struct srp_target_port *target)
>  
>  		for (i = 0; i < target->ch_count; i++) {
>  			ch = &target->ch[i];
> +			ch->connected = false;
>  			if (ch->cm_id && ib_send_cm_dreq(ch->cm_id, NULL, 0)) {
>  				shost_printk(KERN_DEBUG, target->scsi_host,
>  					     PFX "Sending CM DREQ failed\n");
> @@ -1017,6 +1017,7 @@ static int srp_connect_ch(struct srp_rdma_ch *ch, bool multich)
>  		switch (ch->status) {
>  		case 0:
>  			srp_change_conn_state(target, true);
> +			ch->connected = true;
>  			return 0;
>  
>  		case SRP_PORT_REDIRECT:
> @@ -2367,7 +2368,7 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
>  	case IB_CM_DREQ_RECEIVED:
>  		shost_printk(KERN_WARNING, target->scsi_host,
>  			     PFX "DREQ received - connection closed\n");
> -		srp_change_conn_state(target, false);
> +		ch->connected = false;

So, in this patch, you modify disconnect to set srp_change_conn_state()
to false for the target, then loop through all the channels sending
cm_dreq's, and on the receiving side, you modify the cm_dreq handler to
set each channel to false.  However, once you get to 0 channels open,
shouldn't you then set the target state to false too just to keep things
consistent?

>  		if (ib_send_cm_drep(cm_id, NULL, 0))
>  			shost_printk(KERN_ERR, target->scsi_host,
>  				     PFX "Sending CM DREP failed\n");
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
> index a611556..95a4471 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.h
> +++ b/drivers/infiniband/ulp/srp/ib_srp.h
> @@ -170,6 +170,7 @@ struct srp_rdma_ch {
>  
>  	struct completion	tsk_mgmt_done;
>  	u8			tsk_mgmt_status;
> +	bool			connected;
>  };
>  
>  /**


-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: 0E572FDD



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()
       [not found]             ` <55420AEA.10108-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  2015-04-30 14:13               ` Sagi Grimberg
@ 2015-04-30 17:25               ` Christoph Hellwig
  2015-05-06  9:59                 ` Bart Van Assche
  1 sibling, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2015-04-30 17:25 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Sagi Grimberg, Christoph Hellwig, Doug Ledford, James Bottomley,
	Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

On Thu, Apr 30, 2015 at 12:58:50PM +0200, Bart Van Assche wrote:
> The only callers in upstream code of scsi_target_block() and
> scsi_target_unblock() I am aware of are the FC, iSCSI and SRP transport
> layers and the drivers that use these transport layers. As far as I can see
> both functions are always called from thread context and without holding any
> spinlocks. A possible alternative to what I proposed in my previous e-mail
> could be to provide a new function that waits for active queuecommand()
> calls and that has to be called explicitly.

A separate helper sounds fine for now, although I suspect we'll
eventually migrate the call to it into scsi_target_block().
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/12] IB/srp: Fix connection state tracking
       [not found]     ` <1430410094.102408.71.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-05-05  9:21       ` Bart Van Assche
       [not found]         ` <55488BAE.7070006-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-05-05  9:21 UTC (permalink / raw)
  To: Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

On 04/30/15 18:08, Doug Ledford wrote:
> On Thu, 2015-04-30 at 10:58 +0200, Bart Van Assche wrote:
>> @@ -2367,7 +2368,7 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
>>   	case IB_CM_DREQ_RECEIVED:
>>   		shost_printk(KERN_WARNING, target->scsi_host,
>>   			     PFX "DREQ received - connection closed\n");
>> -		srp_change_conn_state(target, false);
>> +		ch->connected = false;
>
> So, in this patch, you modify disconnect to set srp_change_conn_state()
> to false for the target, then loop through all the channels sending
> cm_dreq's, and on the receiving side, you modify the cm_dreq handler to
> set each channel to false.  However, once you get to 0 channels open,
> shouldn't you then set the target state to false too just to keep things
> consistent?

Hello Doug,

What is not visible in this patch but only in the ib_srp.c source code 
is that the first received DREQ initiates a reconnect (the queue_work() 
call below):

	case IB_CM_DREQ_RECEIVED:
		shost_printk(KERN_WARNING, target->scsi_host,
			     PFX "DREQ received - connection closed\n");
		ch->connected = false;
		if (ib_send_cm_drep(cm_id, NULL, 0))
			shost_printk(KERN_ERR, target->scsi_host,
				     PFX "Sending CM DREP failed\n");
		queue_work(system_long_wq, &target->tl_err_work);
		break;

That should be sufficient to restore communication after a DREQ has been 
received.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/12] IB/srp: Fix connection state tracking
       [not found]           ` <5542439D.1000107-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-05-05  9:31             ` Bart Van Assche
       [not found]               ` <55488E06.8040308-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-05-05  9:31 UTC (permalink / raw)
  To: Sagi Grimberg, Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

On 04/30/15 17:00, Sagi Grimberg wrote:
> On 4/30/2015 2:25 PM, Bart Van Assche wrote:
>> On 04/30/15 11:51, Sagi Grimberg wrote:
>>> On 4/30/2015 11:58 AM, Bart Van Assche wrote:
>>>> @@ -170,6 +170,7 @@ struct srp_rdma_ch {
>>>>
>>>>       struct completion    tsk_mgmt_done;
>>>>       u8            tsk_mgmt_status;
>>>> +    bool            connected;
>>>
>>> I generally agree with this flag, but I'm a bit confused about the
>>> semantics of two connected flags? Also, we check the target connected
>>> flag on TMFs, although we are executing it on a channel (should we
>>> check both?)
>>>
>>> I'd say keep only the channel connected flag, the target logic needs to
>>> be mandated by the state.
>>
>> I think we need both flags. The ch->connected flag is used only in
>> srp_destroy_qp() to verify whether a channel has been disconnected
>> before it is destroyed. The target->connected flag provides an easy way
>> to test the connected state in e.g. srp_disconnect_target().
>
> We can just as easily check rport state. rport state is modified before
> target-connected. I still think one is redundant.

Sorry but I disagree. If e.g. both the fast_io_fail and dev_loss_tmo 
timers have been disabled then the rport state is SRP_RPORT_RUNNING all 
the time. At least in that case it is not possible to derive the 
target->connected state from the rport state.

>> And if it
>> is attempted to send a task management function over a channel that has
>> just been disconnected that should be fine since any channel disconnect
>> causes tl_err_work to be queued. That last action will sooner or later
>> result in a reconnect.
>
> Will it be fine to queue commands when the channel is in reconnecting
> stage?

As one can see in srp_reconnect_rport() the associated SCSI devices are 
blocked as long as a reconnect is in progress. This means that 
srp_queuecommand() won't be called while a reconnect is in progress.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 06/12] scsi_transport_srp: Reduce failover time
       [not found]           ` <554246E6.9020503-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-05-05  9:38             ` Bart Van Assche
  0 siblings, 0 replies; 59+ messages in thread
From: Bart Van Assche @ 2015-05-05  9:38 UTC (permalink / raw)
  To: Sagi Grimberg, Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

On 04/30/15 17:14, Sagi Grimberg wrote:
> On 4/30/2015 2:02 PM, Bart Van Assche wrote:
>> On 04/30/15 12:13, Sagi Grimberg wrote:
>>> On 4/30/2015 11:59 AM, Bart Van Assche wrote:
>>>> Unlike FC, there is no risk that SCSI devices will be offlined
>>>> by the error handler if it is started while a reconnect attempt
>>>> is ongoing. Hence report timeout errors as soon as possible
>>>> if a higher software layer (e.g. multipathd) needs to be informed
>>>> about a timeout. It is assumed that if both fast_io_fail_tmo < 0
>>>> and rport->dev_loss_tmo < 0 that this means that there is no
>>>> need to report timeouts quickly.
>>>>
>>>> Signed-off-by: Bart Van Assche <bart.vanassche-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
>>>> Cc: James Bottomley <JBottomley-O3H1v1f1dlM@public.gmane.org>
>>>> Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
>>>> Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
>>>> ---
>>>>   drivers/scsi/scsi_transport_srp.c | 9 ++++++++-
>>>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/scsi/scsi_transport_srp.c
>>>> b/drivers/scsi/scsi_transport_srp.c
>>>> index 4a44337..6667c2b 100644
>>>> --- a/drivers/scsi/scsi_transport_srp.c
>>>> +++ b/drivers/scsi/scsi_transport_srp.c
>>>> @@ -61,6 +61,11 @@ static inline struct Scsi_Host
>>>> *rport_to_shost(struct srp_rport *r)
>>>>       return dev_to_shost(r->dev.parent);
>>>>   }
>>>>
>>>> +static inline struct srp_rport *shost_to_rport(struct Scsi_Host
>>>> *shost)
>>>> +{
>>>> +    return transport_class_to_srp_rport(&shost->shost_gendev);
>>>> +}
>>>> +
>>>>   /**
>>>>    * srp_tmo_valid() - check timeout combination validity
>>>>    * @reconnect_delay: Reconnect delay in seconds.
>>>> @@ -626,9 +631,11 @@ static enum blk_eh_timer_return
>>>> srp_timed_out(struct scsi_cmnd *scmd)
>>>>       struct scsi_device *sdev = scmd->device;
>>>>       struct Scsi_Host *shost = sdev->host;
>>>>       struct srp_internal *i = to_srp_internal(shost->transportt);
>>>> +    struct srp_rport *rport = shost_to_rport(shost);
>>>>
>>>>       pr_debug("timeout for sdev %s\n", dev_name(&sdev->sdev_gendev));
>>>> -    return i->f->reset_timer_if_blocked && scsi_device_blocked(sdev) ?
>>>> +    return rport->fast_io_fail_tmo < 0 && rport->dev_loss_tmo < 0 &&
>>>> +        i->f->reset_timer_if_blocked && scsi_device_blocked(sdev) ?
>>>>           BLK_EH_RESET_TIMER : BLK_EH_NOT_HANDLED;
>>>>   }
>>>>
>>>>
>>>
>>> I'm not sure I understand the purpose of this patch? If the user
>>> requested fast_io_fail_tmo of X, I would assume it wants srp to retry
>>> for the command for X secs. Why should we fail it any earlier than that?
>>>
>>> There is a situation where I've seen srp starting the fast_io_fail_tmo
>>> later than expected because the QP retry exceeded error completion
>>> arrives late for certain workloads. Was this the motivation for this
>>> patch?
>>
>> The purpose of the srp_timed_out() function is not about failing
>> commands early but about postponing timeout errors. The above patch
>> causes a SCSI timeout to be handled as soon as the command timer expires
>> if rport->fast_io_fail_tmo >= 0 or rport->dev_loss_tmo >= 0.
>
> This change is effective only when
> time_to_wc_err + fast_io_fail_tmo < cmd_timeout.
>
> If the user chose this configuration, I imagine the wanted behavior is
> that the timeout errors won't be propagated as soon as the cmd timeout
> expires no?
>
> Today, srp return BLK_EH_RESET_TIMER until fast_io_fail_tmo kicks in,
> and at that point, it unblocks the scsi target terminates all the
> commands with TRANSPORT_FAIL_FAST. This will change the behavior to
> trigger scsi-eh (abort, reset_device) and reset_host will trigger
> another reconnect which is redundant (we have reconnect work inflight).
>
> Can you explain the motivation?

Sure. The only reason why the srp_timed_out() function has been added is 
to avoid that I/O times out when both the fast_io_fail and dev_loss 
mechanisms have been disabled and while a reconnect is ongoing. However, 
if multipathd is running we want timeout errors to be reported as soon 
as possible such that multipathd can switch to another path quickly if 
another working path is available.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/12] IB/srp: Fix connection state tracking
       [not found]               ` <55488E06.8040308-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2015-05-05  9:45                 ` Sagi Grimberg
       [not found]                   ` <5548911F.8060505-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 59+ messages in thread
From: Sagi Grimberg @ 2015-05-05  9:45 UTC (permalink / raw)
  To: Bart Van Assche, Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

On 5/5/2015 12:31 PM, Bart Van Assche wrote:
> On 04/30/15 17:00, Sagi Grimberg wrote:
>> On 4/30/2015 2:25 PM, Bart Van Assche wrote:
>>> On 04/30/15 11:51, Sagi Grimberg wrote:
>>>> On 4/30/2015 11:58 AM, Bart Van Assche wrote:
>>>>> @@ -170,6 +170,7 @@ struct srp_rdma_ch {
>>>>>
>>>>>       struct completion    tsk_mgmt_done;
>>>>>       u8            tsk_mgmt_status;
>>>>> +    bool            connected;
>>>>
>>>> I generally agree with this flag, but I'm a bit confused about the
>>>> semantics of two connected flags? Also, we check the target connected
>>>> flag on TMFs, although we are executing it on a channel (should we
>>>> check both?)
>>>>
>>>> I'd say keep only the channel connected flag, the target logic needs to
>>>> be mandated by the state.
>>>
>>> I think we need both flags. The ch->connected flag is used only in
>>> srp_destroy_qp() to verify whether a channel has been disconnected
>>> before it is destroyed. The target->connected flag provides an easy way
>>> to test the connected state in e.g. srp_disconnect_target().
>>
>> We can just as easily check rport state. rport state is modified before
>> target-connected. I still think one is redundant.
>
> Sorry but I disagree. If e.g. both the fast_io_fail and dev_loss_tmo
> timers have been disabled then the rport state is SRP_RPORT_RUNNING all
> the time. At least in that case it is not possible to derive the
> target->connected state from the rport state.

Can't you rely on ch->connected for that?

We can keep both, but I just think that the meaning of
target->connected is confusing now.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/12] IB/srp: Fix connection state tracking
       [not found]                   ` <5548911F.8060505-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2015-05-05  9:59                     ` Bart Van Assche
  0 siblings, 0 replies; 59+ messages in thread
From: Bart Van Assche @ 2015-05-05  9:59 UTC (permalink / raw)
  To: Sagi Grimberg, Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

On 05/05/15 11:45, Sagi Grimberg wrote:
> On 5/5/2015 12:31 PM, Bart Van Assche wrote:
>> On 04/30/15 17:00, Sagi Grimberg wrote:
>>> On 4/30/2015 2:25 PM, Bart Van Assche wrote:
>>>> On 04/30/15 11:51, Sagi Grimberg wrote:
>>>>> On 4/30/2015 11:58 AM, Bart Van Assche wrote:
>>>>>> @@ -170,6 +170,7 @@ struct srp_rdma_ch {
>>>>>>
>>>>>>        struct completion    tsk_mgmt_done;
>>>>>>        u8            tsk_mgmt_status;
>>>>>> +    bool            connected;
>>>>>
>>>>> I generally agree with this flag, but I'm a bit confused about the
>>>>> semantics of two connected flags? Also, we check the target connected
>>>>> flag on TMFs, although we are executing it on a channel (should we
>>>>> check both?)
>>>>>
>>>>> I'd say keep only the channel connected flag, the target logic needs to
>>>>> be mandated by the state.
>>>>
>>>> I think we need both flags. The ch->connected flag is used only in
>>>> srp_destroy_qp() to verify whether a channel has been disconnected
>>>> before it is destroyed. The target->connected flag provides an easy way
>>>> to test the connected state in e.g. srp_disconnect_target().
>>>
>>> We can just as easily check rport state. rport state is modified before
>>> target-connected. I still think one is redundant.
>>
>> Sorry but I disagree. If e.g. both the fast_io_fail and dev_loss_tmo
>> timers have been disabled then the rport state is SRP_RPORT_RUNNING all
>> the time. At least in that case it is not possible to derive the
>> target->connected state from the rport state.
>
> Can't you rely on ch->connected for that?
>
> We can keep both, but I just think that the meaning of
> target->connected is confusing now.

Hello Sagi,

I will see whether I can remove the target->connected state variable.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/12] IB/srp: Fix connection state tracking
       [not found]         ` <55488BAE.7070006-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2015-05-05 14:10           ` Doug Ledford
  2015-05-05 14:26             ` Bart Van Assche
  0 siblings, 1 reply; 59+ messages in thread
From: Doug Ledford @ 2015-05-05 14:10 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 2946 bytes --]

On Tue, 2015-05-05 at 11:21 +0200, Bart Van Assche wrote:
> On 04/30/15 18:08, Doug Ledford wrote:
> > On Thu, 2015-04-30 at 10:58 +0200, Bart Van Assche wrote:
> >> @@ -2367,7 +2368,7 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
> >>   	case IB_CM_DREQ_RECEIVED:
> >>   		shost_printk(KERN_WARNING, target->scsi_host,
> >>   			     PFX "DREQ received - connection closed\n");
> >> -		srp_change_conn_state(target, false);
> >> +		ch->connected = false;
> >
> > So, in this patch, you modify disconnect to set srp_change_conn_state()
> > to false for the target, then loop through all the channels sending
> > cm_dreq's, and on the receiving side, you modify the cm_dreq handler to
> > set each channel to false.  However, once you get to 0 channels open,
> > shouldn't you then set the target state to false too just to keep things
> > consistent?
> 
> Hello Doug,
> 
> What is not visible in this patch but only in the ib_srp.c source code 
> is that the first received DREQ initiates a reconnect (the queue_work() 
> call below):
> 
> 	case IB_CM_DREQ_RECEIVED:
> 		shost_printk(KERN_WARNING, target->scsi_host,
> 			     PFX "DREQ received - connection closed\n");
> 		ch->connected = false;
> 		if (ib_send_cm_drep(cm_id, NULL, 0))
> 			shost_printk(KERN_ERR, target->scsi_host,
> 				     PFX "Sending CM DREP failed\n");
> 		queue_work(system_long_wq, &target->tl_err_work);
> 		break;
> 
> That should be sufficient to restore communication after a DREQ has been 
> received.

Sure, but there is no guarantee that the wq is not busy with something
else, or that the reconnect attempt will succeed.  So, it would seem to
me that if you want to make sure your internal driver state is always
consistent, you should set the device connected state to 0 when there
are no connected channels any more.

However, while looking through the driver to research this, I noticed
something else that seems more important if you ask me.  With this patch
we now implement individual channel connection tracking.  However, in
srp_queuecommand() you pick the channel based on the tag, and the blk
layer has no idea of these disconnects, so the blk layer is free to
assign a tag/channel to a channel that's disconnected, and then as best
I can tell, you will simply try to post a work request to a channel
that's already disconnected, which I would expect to fail if we have
already disconnected this particular qp and not brought up a new one
yet.  So it seems to me there is a race condition between new incoming
SCSI commands and this disconnect/reconnect window, and that maybe we
should be sending these commands back to the mid layer for requeueing
when the channel the blk_mq tag points to is disconnected.  Or am I
missing something in there?


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/12] IB/srp: Fix connection state tracking
  2015-05-05 14:10           ` Doug Ledford
@ 2015-05-05 14:26             ` Bart Van Assche
  2015-05-05 15:10               ` Doug Ledford
  0 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-05-05 14:26 UTC (permalink / raw)
  To: Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi

On 05/05/15 16:10, Doug Ledford wrote:
> However, while looking through the driver to research this, I noticed
> something else that seems more important if you ask me.  With this patch
> we now implement individual channel connection tracking.  However, in
> srp_queuecommand() you pick the channel based on the tag, and the blk
> layer has no idea of these disconnects, so the blk layer is free to
> assign a tag/channel to a channel that's disconnected, and then as best
> I can tell, you will simply try to post a work request to a channel
> that's already disconnected, which I would expect to fail if we have
> already disconnected this particular qp and not brought up a new one
> yet.  So it seems to me there is a race condition between new incoming
> SCSI commands and this disconnect/reconnect window, and that maybe we
> should be sending these commands back to the mid layer for requeueing
> when the channel the blk_mq tag points to is disconnected.  Or am I
> missing something in there?

Hello Doug,

Around the time a cable disconnect or other link layer failure is 
detected by the SRP initiator or any other SCSI LLD it is unavoidable 
that one or more SCSI requests fail. It is up to a higher layer (e.g. 
dm-multipath + multipathd) to decide what to do with such requests, e.g. 
queue these requests and resend these over another path. The SRP 
initiator driver has been tested thoroughly with the multipath 
queue_if_no_path policy, with a fio job with I/O verification enabled 
running on top of a dm device while concurrently repeatedly simulating 
link layer failures (via ibportstate).

Bart.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/12] IB/srp: Fix connection state tracking
  2015-05-05 14:26             ` Bart Van Assche
@ 2015-05-05 15:10               ` Doug Ledford
  2015-05-05 15:27                 ` Bart Van Assche
  0 siblings, 1 reply; 59+ messages in thread
From: Doug Ledford @ 2015-05-05 15:10 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi

[-- Attachment #1: Type: text/plain, Size: 2716 bytes --]

On Tue, 2015-05-05 at 16:26 +0200, Bart Van Assche wrote:
> On 05/05/15 16:10, Doug Ledford wrote:
> > However, while looking through the driver to research this, I noticed
> > something else that seems more important if you ask me.  With this patch
> > we now implement individual channel connection tracking.  However, in
> > srp_queuecommand() you pick the channel based on the tag, and the blk
> > layer has no idea of these disconnects, so the blk layer is free to
> > assign a tag/channel to a channel that's disconnected, and then as best
> > I can tell, you will simply try to post a work request to a channel
> > that's already disconnected, which I would expect to fail if we have
> > already disconnected this particular qp and not brought up a new one
> > yet.  So it seems to me there is a race condition between new incoming
> > SCSI commands and this disconnect/reconnect window, and that maybe we
> > should be sending these commands back to the mid layer for requeueing
> > when the channel the blk_mq tag points to is disconnected.  Or am I
> > missing something in there?
> 
> Hello Doug,
> 
> Around the time a cable disconnect or other link layer failure is 
> detected by the SRP initiator or any other SCSI LLD it is unavoidable 
> that one or more SCSI requests fail. It is up to a higher layer (e.g. 
> dm-multipath + multipathd) to decide what to do with such requests, e.g. 
> queue these requests and resend these over another path.

Sure, but that wasn't my point.  My point was that if you know the
channel is disconnected, then why don't you go immediately to the
correct action in queuecommand (where correct action could be requeue
waiting on reconnect or return with error, whatever is appropriate)?
Instead you attempt to post a command to a known disconnected queue
pair.

>  The SRP 
> initiator driver has been tested thoroughly with the multipath 
> queue_if_no_path policy, with a fio job with I/O verification enabled 
> running on top of a dm device while concurrently repeatedly simulating 
> link layer failures (via ibportstate).

Part of my questions here are because I don't know how the blk_mq
handles certain conditions.  However, your testing above only handles
one case: all channels get dropped.  As unlikely it may be, what if
resource constraints caused just one channel to be dropped out of the
bunch and the others stayed alive?  Then the blk_mq would see requests
on just one queue come back errored, while the others finished
successfully.  Does it drop that one queue out of rotation, or does it
fail over the entire connection?

-- 
Doug Ledford <dledford@redhat.com>
              GPG KeyID: 0E572FDD



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/12] IB/srp: Fix connection state tracking
  2015-05-05 15:10               ` Doug Ledford
@ 2015-05-05 15:27                 ` Bart Van Assche
       [not found]                   ` <5548E155.70007-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-05-05 15:27 UTC (permalink / raw)
  To: Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi

On 05/05/15 17:10, Doug Ledford wrote:
> On Tue, 2015-05-05 at 16:26 +0200, Bart Van Assche wrote:
>> On 05/05/15 16:10, Doug Ledford wrote:
>>> However, while looking through the driver to research this, I noticed
>>> something else that seems more important if you ask me.  With this patch
>>> we now implement individual channel connection tracking.  However, in
>>> srp_queuecommand() you pick the channel based on the tag, and the blk
>>> layer has no idea of these disconnects, so the blk layer is free to
>>> assign a tag/channel to a channel that's disconnected, and then as best
>>> I can tell, you will simply try to post a work request to a channel
>>> that's already disconnected, which I would expect to fail if we have
>>> already disconnected this particular qp and not brought up a new one
>>> yet.  So it seems to me there is a race condition between new incoming
>>> SCSI commands and this disconnect/reconnect window, and that maybe we
>>> should be sending these commands back to the mid layer for requeueing
>>> when the channel the blk_mq tag points to is disconnected.  Or am I
>>> missing something in there?
>>
>> Hello Doug,
>>
>> Around the time a cable disconnect or other link layer failure is
>> detected by the SRP initiator or any other SCSI LLD it is unavoidable
>> that one or more SCSI requests fail. It is up to a higher layer (e.g.
>> dm-multipath + multipathd) to decide what to do with such requests, e.g.
>> queue these requests and resend these over another path.
>
> Sure, but that wasn't my point.  My point was that if you know the
> channel is disconnected, then why don't you go immediately to the
> correct action in queuecommand (where correct action could be requeue
> waiting on reconnect or return with error, whatever is appropriate)?
> Instead you attempt to post a command to a known disconnected queue
> pair.
>
>> The SRP initiator driver has been tested thoroughly with the multipath
>> queue_if_no_path policy, with a fio job with I/O verification enabled
>> running on top of a dm device while concurrently repeatedly simulating
>> link layer failures (via ibportstate).
>
> Part of my questions here are because I don't know how the blk_mq
> handles certain conditions.  However, your testing above only handles
> one case: all channels get dropped.  As unlikely it may be, what if
> resource constraints caused just one channel to be dropped out of the
> bunch and the others stayed alive?  Then the blk_mq would see requests
> on just one queue come back errored, while the others finished
> successfully.  Does it drop that one queue out of rotation, or does it
> fail over the entire connection?

Hello Doug,

Sorry but I don't think that a SCSI LLD is the appropriate layer to 
choose between requeuing or failing a request. If multiple paths are 
available between an initiator system and a SAN and if one path fails 
only the multipath layer knows whether there are other working paths 
available. If a working path is still available then the request should 
be resent as soon as possible over another path. The multipath layer can 
only take such a decision after a SCSI LLD has failed a request.

If only one channel fails all other channels are disconnected and the 
transport layer error handling mechanism is started.

Bart.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/12] IB/srp: Fix connection state tracking
       [not found]                   ` <5548E155.70007-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2015-05-05 16:10                     ` Doug Ledford
       [not found]                       ` <1430842201.2407.226.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 59+ messages in thread
From: Doug Ledford @ 2015-05-05 16:10 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 5300 bytes --]

On Tue, 2015-05-05 at 17:27 +0200, Bart Van Assche wrote:
> On 05/05/15 17:10, Doug Ledford wrote:
> > On Tue, 2015-05-05 at 16:26 +0200, Bart Van Assche wrote:
> >> On 05/05/15 16:10, Doug Ledford wrote:
> >>> However, while looking through the driver to research this, I noticed
> >>> something else that seems more important if you ask me.  With this patch
> >>> we now implement individual channel connection tracking.  However, in
> >>> srp_queuecommand() you pick the channel based on the tag, and the blk
> >>> layer has no idea of these disconnects, so the blk layer is free to
> >>> assign a tag/channel to a channel that's disconnected, and then as best
> >>> I can tell, you will simply try to post a work request to a channel
> >>> that's already disconnected, which I would expect to fail if we have
> >>> already disconnected this particular qp and not brought up a new one
> >>> yet.  So it seems to me there is a race condition between new incoming
> >>> SCSI commands and this disconnect/reconnect window, and that maybe we
> >>> should be sending these commands back to the mid layer for requeueing
> >>> when the channel the blk_mq tag points to is disconnected.  Or am I
> >>> missing something in there?
> >>
> >> Hello Doug,
> >>
> >> Around the time a cable disconnect or other link layer failure is
> >> detected by the SRP initiator or any other SCSI LLD it is unavoidable
> >> that one or more SCSI requests fail. It is up to a higher layer (e.g.
> >> dm-multipath + multipathd) to decide what to do with such requests, e.g.
> >> queue these requests and resend these over another path.
> >
> > Sure, but that wasn't my point.  My point was that if you know the
> > channel is disconnected, then why don't you go immediately to the
> > correct action in queuecommand (where correct action could be requeue
> > waiting on reconnect or return with error, whatever is appropriate)?
> > Instead you attempt to post a command to a known disconnected queue
> > pair.
> >
> >> The SRP initiator driver has been tested thoroughly with the multipath
> >> queue_if_no_path policy, with a fio job with I/O verification enabled
> >> running on top of a dm device while concurrently repeatedly simulating
> >> link layer failures (via ibportstate).
> >
> > Part of my questions here are because I don't know how the blk_mq
> > handles certain conditions.  However, your testing above only handles
> > one case: all channels get dropped.  As unlikely it may be, what if
> > resource constraints caused just one channel to be dropped out of the
> > bunch and the others stayed alive?  Then the blk_mq would see requests
> > on just one queue come back errored, while the others finished
> > successfully.  Does it drop that one queue out of rotation, or does it
> > fail over the entire connection?
> 
> Hello Doug,
> 
> Sorry but I don't think that a SCSI LLD is the appropriate layer to 
> choose between requeuing or failing a request.

Be that as it may, that doesn't change what I said about posting a
command to a known disconnected QP.  You could just fail immediately.
Something like:

if (!ch->connected) {
	scmnd->result = DID_NO_CONNECT;
	goto err;
}

right after getting the channel in queuecommand would work.  That would
save a couple spinlocks, several DMA mappings, a call into the low level
driver, and a few other things.  (And I only left requeue on the table
because I wasn't sure how the blk_mq dealt with just a single channel
being down versus all of them being down)

>  If multiple paths are 
> available between an initiator system and a SAN and if one path fails

Who says the path failed?  The path may be just fine.

> only the multipath layer knows whether there are other working paths 
> available. If a working path is still available then the request should 
> be resent as soon as possible over another path. The multipath layer can 
> only take such a decision after a SCSI LLD has failed a request.

Sure.  I totally get failing fast and unilaterally for multipath managed
devices.  That's all assuming multipath though.  There are uses without
that.

But my point in all of this is that if you have a single qp between
yourself and the target, then any error including a qp resource error ==
path error since you only have one path.  When you have a multi queue
device, that's no longer true.  A transient resource problem on one qp
does not mean a path event (at least not necessarily, although your
statement below converts a QP event into a path event by virtue
disconnecting and reconnecting all of the QPs).  My curiosity is now
moot given what you wrote about tearing everything down and reconnecting
(unless the error handling is modified to be more subtle in its
workings), but the original question in my mind was what happens at the
blk_mq level if you did have a single queue drop but not all of them and
you weren't using multipath.

> If only one channel fails all other channels are disconnected and the 
> transport layer error handling mechanism is started.

I missed that.  I assume it's done in srp_start_tl_fail_timers()?

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD



[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/12] IB/srp: Fix connection state tracking
       [not found]                       ` <1430842201.2407.226.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2015-05-06  9:29                         ` Bart Van Assche
       [not found]                           ` <5549DEEC.9050501-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
  0 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-05-06  9:29 UTC (permalink / raw)
  To: Doug Ledford
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

Hello Doug,

On 05/05/15 18:10, Doug Ledford wrote:
> Be that as it may, that doesn't change what I said about posting a
> command to a known disconnected QP.  You could just fail immediately.
> Something like:
>
> if (!ch->connected) {
> 	scmnd->result = DID_NO_CONNECT;
> 	goto err;
> }
>
> right after getting the channel in queuecommand would work.  That would
> save a couple spinlocks, several DMA mappings, a call into the low level
> driver, and a few other things.  (And I only left requeue on the table
> because I wasn't sure how the blk_mq dealt with just a single channel
> being down versus all of them being down)

What you wrote above looks correct to me. However, it is intentional 
that such a check is not present in srp_queuecommand(). The intention 
was to optimize the hot path of that driver as much as possible. Hence 
the choice to post a work request on the QP even after it has been 
disconnected and to let the HCA generate an error completion.

> But my point in all of this is that if you have a single qp between
> yourself and the target, then any error including a qp resource error ==
> path error since you only have one path.  When you have a multi queue
> device, that's no longer true.  A transient resource problem on one qp
> does not mean a path event (at least not necessarily, although your
> statement below converts a QP event into a path event by virtue
> disconnecting and reconnecting all of the QPs).  My curiosity is now
> moot given what you wrote about tearing everything down and reconnecting
> (unless the error handling is modified to be more subtle in its
> workings), but the original question in my mind was what happens at the
> blk_mq level if you did have a single queue drop but not all of them and
> you weren't using multipath.

If we want to support this without adding similar code to handle this in 
every SCSI LLD I think we need to change first how blk-mq and 
dm-multipath interact. Today dm-multipath is a layer on top of blk-mq. 
Supporting the above scenario properly is possible e.g. by integrating 
multipath support in the blk-mq layer. I think Hannes and Christoph have 
already started to work on this.

>> If only one channel fails all other channels are disconnected and the
>> transport layer error handling mechanism is started.
>
> I missed that.  I assume it's done in srp_start_tl_fail_timers()?

Yes, that's correct. Both QP errors and reception of a DREQ trigger a 
call of srp_tl_err_work(). That last function calls 
srp_start_tl_fail_timers() which starts the reconnection mechanism, at 
least if the reconnect_delay parameter has a positive value (> 0).

Bart.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()
  2015-04-30 17:25               ` Christoph Hellwig
@ 2015-05-06  9:59                 ` Bart Van Assche
  2015-05-11  7:50                   ` hosts resets in SRP and the rest of the world, was: " Christoph Hellwig
  0 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-05-06  9:59 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Doug Ledford, James Bottomley, Sagi Grimberg,
	Sebastian Parschauer, Jens Axboe, linux-scsi

On 04/30/15 19:25, Christoph Hellwig wrote:
> On Thu, Apr 30, 2015 at 12:58:50PM +0200, Bart Van Assche wrote:
>> The only callers in upstream code of scsi_target_block() and
>> scsi_target_unblock() I am aware of are the FC, iSCSI and SRP transport
>> layers and the drivers that use these transport layers. As far as I can see
>> both functions are always called from thread context and without holding any
>> spinlocks. A possible alternative to what I proposed in my previous e-mail
>> could be to provide a new function that waits for active queuecommand()
>> calls and that has to be called explicitly.
> 
> A separate helper sounds fine for now, although I suspect we'll
> eventually migrate the call to it into scsi_target_block().

(+Jens)

How about the patch below (lightly tested so far) ?

[PATCH] Introduce scsi_wait_for_queuecommand()

Move the functionality for waiting until active queuecommand()
calls have finished from the SRP transport layer into the SCSI
core. Add support for blk-mq in scsi_wait_for_queuecommand().
Introduce a request_fn_active counter in struct blk_mq_hw_ctx
to make it possible to implement this functionality for blk-mq.
---
 block/blk-mq.c                    | 19 ++++++++--------
 drivers/scsi/scsi_lib.c           | 48 +++++++++++++++++++++++++++++++++++++++
 drivers/scsi/scsi_transport_srp.c | 24 +-------------------
 include/linux/blk-mq.h            |  1 +
 include/scsi/scsi_device.h        |  1 +
 5 files changed, 60 insertions(+), 33 deletions(-)

diff --git a/block/blk-mq.c b/block/blk-mq.c
index ade8a2d..d0063e4 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -786,12 +786,11 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	 * If we have previous entries on our dispatch list, grab them
 	 * and stuff them at the front for more fair dispatch.
 	 */
-	if (!list_empty_careful(&hctx->dispatch)) {
-		spin_lock(&hctx->lock);
-		if (!list_empty(&hctx->dispatch))
-			list_splice_init(&hctx->dispatch, &rq_list);
-		spin_unlock(&hctx->lock);
-	}
+	spin_lock(&hctx->lock);
+	hctx->request_fn_active++;
+	if (!list_empty(&hctx->dispatch))
+		list_splice_init(&hctx->dispatch, &rq_list);
+	spin_unlock(&hctx->lock);
 
 	/*
 	 * Start off with dptr being NULL, so we start the first request
@@ -851,11 +850,11 @@ static void __blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx)
 	 * Any items that need requeuing? Stuff them into hctx->dispatch,
 	 * that is where we will continue on next queue run.
 	 */
-	if (!list_empty(&rq_list)) {
-		spin_lock(&hctx->lock);
+	spin_lock(&hctx->lock);
+	if (!list_empty(&rq_list))
 		list_splice(&rq_list, &hctx->dispatch);
-		spin_unlock(&hctx->lock);
-	}
+	hctx->request_fn_active--;
+	spin_unlock(&hctx->lock);
 }
 
 /*
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b1a2631..1466ef2 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -3073,6 +3073,54 @@ scsi_target_unblock(struct device *dev, enum scsi_device_state new_state)
 EXPORT_SYMBOL_GPL(scsi_target_unblock);
 
 /**
+ * scsi_request_fn_active() - number of ongoing scsi_request_fn() calls.
+ * @shost: SCSI host.
+ */
+static int scsi_request_fn_active(struct Scsi_Host *shost)
+{
+	struct scsi_device *sdev;
+	struct request_queue *q;
+	struct blk_mq_hw_ctx *hctx;
+	unsigned int i;
+	int request_fn_active = 0;
+
+	shost_for_each_device(sdev, shost) {
+		q = sdev->request_queue;
+
+		if (q->mq_ops) {
+			queue_for_each_hw_ctx(q, hctx, i) {
+				spin_lock(&hctx->lock);
+				request_fn_active += q->request_fn_active;
+				spin_unlock(&hctx->lock);
+			}
+		} else {
+			spin_lock_irq(q->queue_lock);
+			request_fn_active += q->request_fn_active;
+			spin_unlock_irq(q->queue_lock);
+		}
+	}
+
+	return request_fn_active;
+}
+
+/**
+ * scsi_wait_for_queuecommand() - wait until queuecommand() calls have finished
+ * @shost: SCSI host.
+ *
+ * Although functions like scsi_target_block() and scsi_target_unblock(shost,
+ * SDEV_TRANSPORT_OFFLINE) prevent new calls to the queuecommand() callback
+ * function these functions do not wait until ongoing queucommand() calls have
+ * finished. Hence this function that waits until any ongoing queucommand()
+ * calls have finished.
+ */
+void scsi_wait_for_queuecommand(struct Scsi_Host *shost)
+{
+	while (scsi_request_fn_active(shost))
+		msleep(20);
+}
+EXPORT_SYMBOL(scsi_wait_for_queuecommand);
+
+/**
  * scsi_kmap_atomic_sg - find and atomically map an sg-elemnt
  * @sgl:	scatter-gather list
  * @sg_count:	number of segments in sg
diff --git a/drivers/scsi/scsi_transport_srp.c b/drivers/scsi/scsi_transport_srp.c
index ae45bd9..aaf4581 100644
--- a/drivers/scsi/scsi_transport_srp.c
+++ b/drivers/scsi/scsi_transport_srp.c
@@ -504,27 +504,6 @@ void srp_start_tl_fail_timers(struct srp_rport *rport)
 EXPORT_SYMBOL(srp_start_tl_fail_timers);
 
 /**
- * scsi_request_fn_active() - number of kernel threads inside scsi_request_fn()
- * @shost: SCSI host for which to count the number of scsi_request_fn() callers.
- */
-static int scsi_request_fn_active(struct Scsi_Host *shost)
-{
-	struct scsi_device *sdev;
-	struct request_queue *q;
-	int request_fn_active = 0;
-
-	shost_for_each_device(sdev, shost) {
-		q = sdev->request_queue;
-
-		spin_lock_irq(q->queue_lock);
-		request_fn_active += q->request_fn_active;
-		spin_unlock_irq(q->queue_lock);
-	}
-
-	return request_fn_active;
-}
-
-/**
  * srp_reconnect_rport() - reconnect to an SRP target port
  * @rport: SRP target port.
  *
@@ -559,8 +538,7 @@ int srp_reconnect_rport(struct srp_rport *rport)
 	if (res)
 		goto out;
 	scsi_target_block(&shost->shost_gendev);
-	while (scsi_request_fn_active(shost))
-		msleep(20);
+	scsi_wait_for_queuecommand(shost);
 	res = rport->state != SRP_RPORT_LOST ? i->f->reconnect(rport) : -ENODEV;
 	pr_debug("%s (state %d): transport.reconnect() returned %d\n",
 		 dev_name(&shost->shost_gendev), rport->state, res);
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index 2056a99..732b746 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -22,6 +22,7 @@ struct blk_mq_hw_ctx {
 	struct {
 		spinlock_t		lock;
 		struct list_head	dispatch;
+		int			request_fn_active;
 	} ____cacheline_aligned_in_smp;
 
 	unsigned long		state;		/* BLK_MQ_S_* flags */
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index a4c9336..5540d02 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -412,6 +412,7 @@ extern void scsi_scan_target(struct device *parent, unsigned int channel,
 extern void scsi_target_reap(struct scsi_target *);
 extern void scsi_target_block(struct device *);
 extern void scsi_target_unblock(struct device *, enum scsi_device_state);
+extern void scsi_wait_for_queuecommand(struct Scsi_Host *shost);
 extern void scsi_remove_target(struct device *);
 extern void int_to_scsilun(u64, struct scsi_lun *);
 extern u64 scsilun_to_int(struct scsi_lun *);
-- 
2.1.4




^ permalink raw reply related	[flat|nested] 59+ messages in thread

* Re: [PATCH 04/12] IB/srp: Fix connection state tracking
       [not found]                           ` <5549DEEC.9050501-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
@ 2015-05-07 13:44                             ` Doug Ledford
  0 siblings, 0 replies; 59+ messages in thread
From: Doug Ledford @ 2015-05-07 13:44 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: James Bottomley, Sagi Grimberg, Sebastian Parschauer, linux-rdma,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 2787 bytes --]

On Wed, 2015-05-06 at 11:29 +0200, Bart Van Assche wrote:
> Hello Doug,
> 
> On 05/05/15 18:10, Doug Ledford wrote:
> > Be that as it may, that doesn't change what I said about posting a
> > command to a known disconnected QP.  You could just fail immediately.
> > Something like:
> >
> > if (!ch->connected) {
> > 	scmnd->result = DID_NO_CONNECT;
> > 	goto err;
> > }
> >
> > right after getting the channel in queuecommand would work.  That would
> > save a couple spinlocks, several DMA mappings, a call into the low level
> > driver, and a few other things.  (And I only left requeue on the table
> > because I wasn't sure how the blk_mq dealt with just a single channel
> > being down versus all of them being down)
> 
> What you wrote above looks correct to me. However, it is intentional 
> that such a check is not present in srp_queuecommand(). The intention 
> was to optimize the hot path of that driver as much as possible. Hence 
> the choice to post a work request on the QP even after it has been 
> disconnected and to let the HCA generate an error completion.

Given the amount of time it takes to do all of the dma mapping in that
function on the normal hot path, I suspect the above test, especially if
you added an unlikely annotation, would not even make a blip on your
performance numbers.  In any case, it's not something I would require in
the patch.

> > But my point in all of this is that if you have a single qp between
> > yourself and the target, then any error including a qp resource error ==
> > path error since you only have one path.  When you have a multi queue
> > device, that's no longer true.  A transient resource problem on one qp
> > does not mean a path event (at least not necessarily, although your
> > statement below converts a QP event into a path event by virtue
> > disconnecting and reconnecting all of the QPs).  My curiosity is now
> > moot given what you wrote about tearing everything down and reconnecting
> > (unless the error handling is modified to be more subtle in its
> > workings), but the original question in my mind was what happens at the
> > blk_mq level if you did have a single queue drop but not all of them and
> > you weren't using multipath.
> 
> If we want to support this without adding similar code to handle this in 
> every SCSI LLD I think we need to change first how blk-mq and 
> dm-multipath interact. Today dm-multipath is a layer on top of blk-mq. 
> Supporting the above scenario properly is possible e.g. by integrating 
> multipath support in the blk-mq layer. I think Hannes and Christoph have 
> already started to work on this.

Ok.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
              GPG KeyID: 0E572FDD


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 59+ messages in thread

* hosts resets in SRP and the rest of the world, was: Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()
  2015-05-06  9:59                 ` Bart Van Assche
@ 2015-05-11  7:50                   ` Christoph Hellwig
  2015-05-11  8:54                     ` Bart Van Assche
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2015-05-11  7:50 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Sagi Grimberg, Doug Ledford, James Bottomley,
	Sagi Grimberg, Sebastian Parschauer, Jens Axboe, linux-scsi,
	Hannes Reinecke

Hi Bart,

I've looked at this and didn't really like the unconditional hctx lock
in the blk-mq path which might have nasty effects when just using a
single hctx.

So I'm taking another step back and try to understand what you're doign
here.

Let me try to recreate the issue:

 - we get a ->host_reset call for the SRP initiator, which then
   calls srp_reconnect_rport, at which point we still have outstanding
   commands on the wire, and we still allow concurrent I/O submission
 - srp_reconnect_rport then blocks new I/O, and tries to drain the
   peding requeuest from ->queuecommand.  It then calls into
   srp_rport_reconnect, which after some work also clears out all
   commands on the wire and the reconnects

Maybe it's time to move to what Hannes suggested in
events.linuxfoundation.org/sites/events/files/slides/SCSI-EH.pdf
slides 56+ at least for SRP as a start, that is:

 - once escalating to a LUN reset fail all commands for the LUN
   and block the the LUN for I/O and send a TMF abort
 - once scalatating to the host reset fail all I/O for the host
   and block the host (all LUNs) for I/O, and only then call
   the host reset action (reconnect in the SRP case)
   (or rather replace the current RP host reset with the
   I_T Nexus reset suggested by Hannes)

The advantage is that we can do the full drain much more easily
than just waiting for command leaving ->queuecommnd.  The other
advantage is that we can implement this with fairly small changes
in the scsi_error.c code trggered off a host or transport template
flag, without touching code in the block layer while at the same
time significantly simplifying the transport layer and drivers.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: hosts resets in SRP and the rest of the world, was: Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()
  2015-05-11  7:50                   ` hosts resets in SRP and the rest of the world, was: " Christoph Hellwig
@ 2015-05-11  8:54                     ` Bart Van Assche
  2015-05-11  9:31                       ` Christoph Hellwig
  0 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-05-11  8:54 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Doug Ledford, James Bottomley, Sagi Grimberg,
	Sebastian Parschauer, Jens Axboe, linux-scsi, Hannes Reinecke

On 05/11/15 09:50, Christoph Hellwig wrote:
> Hi Bart,
>
> I've looked at this and didn't really like the unconditional hctx lock
> in the blk-mq path which might have nasty effects when just using a
> single hctx.
>
> So I'm taking another step back and try to understand what you're doign
> here.
>
> Let me try to recreate the issue:
>
>   - we get a ->host_reset call for the SRP initiator, which then
>     calls srp_reconnect_rport, at which point we still have outstanding
>     commands on the wire, and we still allow concurrent I/O submission
>   - srp_reconnect_rport then blocks new I/O, and tries to drain the
>     peding requeuest from ->queuecommand.  It then calls into
>     srp_rport_reconnect, which after some work also clears out all
>     commands on the wire and the reconnects
>
> Maybe it's time to move to what Hannes suggested in
> events.linuxfoundation.org/sites/events/files/slides/SCSI-EH.pdf
> slides 56+ at least for SRP as a start, that is:
>
>   - once escalating to a LUN reset fail all commands for the LUN
>     and block the the LUN for I/O and send a TMF abort
>   - once scalatating to the host reset fail all I/O for the host
>     and block the host (all LUNs) for I/O, and only then call
>     the host reset action (reconnect in the SRP case)
>     (or rather replace the current RP host reset with the
>     I_T Nexus reset suggested by Hannes)
>
> The advantage is that we can do the full drain much more easily
> than just waiting for command leaving ->queuecommnd.  The other
> advantage is that we can implement this with fairly small changes
> in the scsi_error.c code trggered off a host or transport template
> flag, without touching code in the block layer while at the same
> time significantly simplifying the transport layer and drivers.

Hello Christoph,

There are multiple events that can cause the SRP initiator driver to 
initiate a reconnect:
1. The SCSI core invoking eh_host_reset_handler().
2. An error reported by the IB HCA or by the IB core, e.g. an RDMA
    transmit timeout or a transport layer disconnect reported by the
    IB/CM.

The reason I added (2) is to reduce the failover time in a H.A. setup. 
If e.g. a path fails it can take up to (2 * SCSI timeout) before all 
outstanding SCSI commands have timed out. The next step is that the SCSI 
error handler invokes a device reset. If a cable has been pulled the 
task management function issued by srp_reset_device() will time out. The 
next step is that srp_reset_host() will try to perform a reconnect. If a 
cable has been pulled this reconnect attempt will also time out. Because 
of how the retry count and timeout parameters for establishing a 
connection in the SRP initiator have been chosen it can take 
considerable time before a reconnect attempt times out and hence before 
srp_reset_host() reports a failure.

A common complaint about older versions of the SRP initiator was that 
failover took to long, namely several minutes instead of less than a 
minute. The reason why (2) had been introduced was to reduce the path 
failover time to less than a minute. As soon as the IB HCA and/or IB 
core have reported an error we know that a connection has to be 
reestablished. Waiting until the SCSI error handler has finished its 
escalation strategy only slows down failover and does not provide any 
benefits from the point of view an SRP initiator.

In summary, if it would be possible to modify the SCSI error handling 
strategy such that (2) can be dropped without increasing the SRP 
initiator failover time I definitely would like to hear about that. But 
I'm not sure that's possible.

Best regards,

Bart.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: hosts resets in SRP and the rest of the world, was: Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()
  2015-05-11  8:54                     ` Bart Van Assche
@ 2015-05-11  9:31                       ` Christoph Hellwig
  2015-05-11  9:58                         ` Bart Van Assche
  2015-05-11 10:58                         ` Bart Van Assche
  0 siblings, 2 replies; 59+ messages in thread
From: Christoph Hellwig @ 2015-05-11  9:31 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Sagi Grimberg, Doug Ledford, James Bottomley,
	Sagi Grimberg, Sebastian Parschauer, Jens Axboe, linux-scsi,
	Hannes Reinecke

On Mon, May 11, 2015 at 10:54:30AM +0200, Bart Van Assche wrote:
> Hello Christoph,
> 
> There are multiple events that can cause the SRP initiator driver to
> initiate a reconnect:
> 1. The SCSI core invoking eh_host_reset_handler().
> 2. An error reported by the IB HCA or by the IB core, e.g. an RDMA
>    transmit timeout or a transport layer disconnect reported by the
>    IB/CM.

Right, I missed the srp_reconnect_work case.  But even with that I
think what I wrote above still stands.  srp_reconnect_work in that
case would just directly trigger the abort all commands and
reconnect operation.

The main point I was trying to make is that instead of having a sequence
of:

 1) block new queuecommand instances
 2) flush out pending queuecommand instances
 3) do part of the disconnect
 4) fail all in-flight commands
 5) reconnect

we should aim for:

 1) block new queuecommand instances
 2) fail all in-flight commands
 3) disconnect and reconnect

to avoid the need to keep track of pending queuecommand instances,
and instead re-use the existing infrastructure to fail all in-flight
commands, which we have the infrastructure for, and which we need
to do anyway.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: hosts resets in SRP and the rest of the world, was: Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()
  2015-05-11  9:31                       ` Christoph Hellwig
@ 2015-05-11  9:58                         ` Bart Van Assche
  2015-05-11 11:47                           ` Christoph Hellwig
  2015-05-11 10:58                         ` Bart Van Assche
  1 sibling, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-05-11  9:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Doug Ledford, James Bottomley, Sagi Grimberg,
	Sebastian Parschauer, Jens Axboe, linux-scsi, Hannes Reinecke

On 05/11/15 11:31, Christoph Hellwig wrote:
> On Mon, May 11, 2015 at 10:54:30AM +0200, Bart Van Assche wrote:
>> There are multiple events that can cause the SRP initiator driver to
>> initiate a reconnect:
>> 1. The SCSI core invoking eh_host_reset_handler().
>> 2. An error reported by the IB HCA or by the IB core, e.g. an RDMA
>>     transmit timeout or a transport layer disconnect reported by the
>>     IB/CM.
>
> Right, I missed the srp_reconnect_work case.  But even with that I
> think what I wrote above still stands.  srp_reconnect_work in that
> case would just directly trigger the abort all commands and
> reconnect operation.
>
> The main point I was trying to make is that instead of having a sequence
> of:
>
>   1) block new queuecommand instances
>   2) flush out pending queuecommand instances
>   3) do part of the disconnect
>   4) fail all in-flight commands
>   5) reconnect
>
> we should aim for:
>
>   1) block new queuecommand instances
>   2) fail all in-flight commands
>   3) disconnect and reconnect
>
> to avoid the need to keep track of pending queuecommand instances,
> and instead re-use the existing infrastructure to fail all in-flight
> commands, which we have the infrastructure for, and which we need
> to do anyway.

Hello Christoph,

Your proposal absolutely makes sense to me but unfortunately I do not 
have the time available now to implement it. Would it be acceptable if I 
rework scsi_wait_for_queuecommand() such that per-CPU counters are 
introduced in blk-mq instead of one counter per hctx ?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: hosts resets in SRP and the rest of the world, was: Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()
  2015-05-11  9:31                       ` Christoph Hellwig
  2015-05-11  9:58                         ` Bart Van Assche
@ 2015-05-11 10:58                         ` Bart Van Assche
  2015-05-11 11:50                           ` Christoph Hellwig
  1 sibling, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-05-11 10:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Doug Ledford, James Bottomley, Sagi Grimberg,
	Sebastian Parschauer, Jens Axboe, linux-scsi, Hannes Reinecke

On 05/11/15 11:31, Christoph Hellwig wrote:
> On Mon, May 11, 2015 at 10:54:30AM +0200, Bart Van Assche wrote:
>> Hello Christoph,
>>
>> There are multiple events that can cause the SRP initiator driver to
>> initiate a reconnect:
>> 1. The SCSI core invoking eh_host_reset_handler().
>> 2. An error reported by the IB HCA or by the IB core, e.g. an RDMA
>>     transmit timeout or a transport layer disconnect reported by the
>>     IB/CM.
>
> Right, I missed the srp_reconnect_work case.  But even with that I
> think what I wrote above still stands.  srp_reconnect_work in that
> case would just directly trigger the abort all commands and
> reconnect operation.
>
> The main point I was trying to make is that instead of having a sequence
> of:
>
>   1) block new queuecommand instances
>   2) flush out pending queuecommand instances
>   3) do part of the disconnect
>   4) fail all in-flight commands
>   5) reconnect
>
> we should aim for:
>
>   1) block new queuecommand instances
>   2) fail all in-flight commands
>   3) disconnect and reconnect
>
> to avoid the need to keep track of pending queuecommand instances,
> and instead re-use the existing infrastructure to fail all in-flight
> commands, which we have the infrastructure for, and which we need
> to do anyway.

Hello Christoph,

What I'm wondering about is whether it will be possible with the above 
approach to trigger path failover before (2 * SCSI timeout) has expired 
? Starting SCSI error handling immediately after the block layer has 
reported the first SCSI timeout is only safe if all ongoing SCSI 
commands are canceled in some way. Is this what the function 
blk_abort_request() is intended for ? As far as I can see invoking that 
function or any function with a similar purpose is only safe after the 
queuecommand() callback function has finished. However, 
blk_mq_run_hw_queue() invokes mq_ops->queue_rq() without holding any 
lock. So it's not clear to me how to safely cancel ongoing blk-mq 
requests without waiting until these have timed out. I hope that this 
means that overlooked something ?

Bart.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: hosts resets in SRP and the rest of the world, was: Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()
  2015-05-11  9:58                         ` Bart Van Assche
@ 2015-05-11 11:47                           ` Christoph Hellwig
  0 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2015-05-11 11:47 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Sagi Grimberg, Doug Ledford, James Bottomley,
	Sagi Grimberg, Sebastian Parschauer, Jens Axboe, linux-scsi,
	Hannes Reinecke

On Mon, May 11, 2015 at 11:58:59AM +0200, Bart Van Assche wrote:
> Your proposal absolutely makes sense to me but unfortunately I do not have
> the time available now to implement it. Would it be acceptable if I rework
> scsi_wait_for_queuecommand() such that per-CPU counters are introduced in
> blk-mq instead of one counter per hctx ?

If we can't fix this for real I'd prefer to do nothing, or at least
almost nothing for now, e.g. keep your original patch and ignore the
blk-mq case for now.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: hosts resets in SRP and the rest of the world, was: Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()
  2015-05-11 10:58                         ` Bart Van Assche
@ 2015-05-11 11:50                           ` Christoph Hellwig
  2015-05-12  8:49                             ` Bart Van Assche
  0 siblings, 1 reply; 59+ messages in thread
From: Christoph Hellwig @ 2015-05-11 11:50 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Sagi Grimberg, Doug Ledford, James Bottomley,
	Sagi Grimberg, Sebastian Parschauer, Jens Axboe, linux-scsi,
	Hannes Reinecke

On Mon, May 11, 2015 at 12:58:03PM +0200, Bart Van Assche wrote:
> What I'm wondering about is whether it will be possible with the above
> approach to trigger path failover before (2 * SCSI timeout) has expired ?
> Starting SCSI error handling immediately after the block layer has reported
> the first SCSI timeout is only safe if all ongoing SCSI commands are
> canceled in some way. Is this what the function blk_abort_request() is
> intended for ? As far as I can see invoking that function or any function
> with a similar purpose is only safe after the queuecommand() callback
> function has finished. However, blk_mq_run_hw_queue() invokes
> mq_ops->queue_rq() without holding any lock. So it's not clear to me how to
> safely cancel ongoing blk-mq requests without waiting until these have timed
> out. I hope that this means that overlooked something ?

For the blk-mq case invoking it earlier should be fine - the
REQ_ATOM_STARTED and REQ_ATOM_COMPLETE bit ops are specifily designed
so that calling the timeout handler on any request is fine.  I'm not
sure about the !blk-mq case, though.

^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: hosts resets in SRP and the rest of the world, was: Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()
  2015-05-11 11:50                           ` Christoph Hellwig
@ 2015-05-12  8:49                             ` Bart Van Assche
  2015-05-12 18:02                               ` Christoph Hellwig
  0 siblings, 1 reply; 59+ messages in thread
From: Bart Van Assche @ 2015-05-12  8:49 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Doug Ledford, James Bottomley, Sagi Grimberg,
	Sebastian Parschauer, Jens Axboe, linux-scsi, Hannes Reinecke

On 05/11/15 13:50, Christoph Hellwig wrote:
> On Mon, May 11, 2015 at 12:58:03PM +0200, Bart Van Assche wrote:
>> What I'm wondering about is whether it will be possible with the above
>> approach to trigger path failover before (2 * SCSI timeout) has expired ?
>> Starting SCSI error handling immediately after the block layer has reported
>> the first SCSI timeout is only safe if all ongoing SCSI commands are
>> canceled in some way. Is this what the function blk_abort_request() is
>> intended for ? As far as I can see invoking that function or any function
>> with a similar purpose is only safe after the queuecommand() callback
>> function has finished. However, blk_mq_run_hw_queue() invokes
>> mq_ops->queue_rq() without holding any lock. So it's not clear to me how to
>> safely cancel ongoing blk-mq requests without waiting until these have timed
>> out. I hope that this means that overlooked something ?
>
> For the blk-mq case invoking it earlier should be fine - the
> REQ_ATOM_STARTED and REQ_ATOM_COMPLETE bit ops are specifily designed
> so that calling the timeout handler on any request is fine.  I'm not
> sure about the !blk-mq case, though.

Hello Christoph,

Thanks for the feedback. However, I'm still wondering what will happen 
if blk_abort_request() causes e.g. blk_rq_unmap_user() or 
blk_update_request() to be called while mq_ops->queue_rq() or 
q->request_fn() is still in progress ? More in general, I'm not sure it 
is possible to avoid that blk_abort_request() races with a request 
queuing function by only letting the block layer set an additional 
request flag. Setting an additional flag just before queue_rq() or 
request_fn() is called would not allow to detect when these callback 
functions have finished. Setting a flag just after queue_rq() or 
request_fn() have returned without introducing additional locking or 
atomic operations would make it possible that blk_mark_rq_complete() is 
called from the I/O completion path before that new flag is set. This 
means that with this last approach may make it necessary to increase / 
decrease a request reference count around the queue_rq() or request_fn() 
call + flag set operation. Another possible approach would be to replace 
the REQ_ATOM_STARTED and REQ_ATOM_COMPLETE flags with a request state 
variable that is modified atomically. Further feedback is welcome.

Bart.


^ permalink raw reply	[flat|nested] 59+ messages in thread

* Re: hosts resets in SRP and the rest of the world, was: Re: [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand()
  2015-05-12  8:49                             ` Bart Van Assche
@ 2015-05-12 18:02                               ` Christoph Hellwig
  0 siblings, 0 replies; 59+ messages in thread
From: Christoph Hellwig @ 2015-05-12 18:02 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Sagi Grimberg, Doug Ledford, James Bottomley,
	Sagi Grimberg, Sebastian Parschauer, Jens Axboe, linux-scsi,
	Hannes Reinecke

On Tue, May 12, 2015 at 10:49:47AM +0200, Bart Van Assche wrote:
> Thanks for the feedback. However, I'm still wondering what will happen if
> blk_abort_request() causes e.g. blk_rq_unmap_user() or blk_update_request()
> to be called while mq_ops->queue_rq() or q->request_fn() is still in
> progress ?

We will never call the blk-mq ->timeout handler before the LLDD
called blk_mq_start_request, which the driver should only call once it's
ready to handle ->timeout.

For normal blk-mq drivers this works fine, but for SCSI we still have
the problem that we need'd really need to move this into the
->queuecommand function to be fully safe against the driver internal
data structures, the current version only helps with the SCSI level
structures.

^ permalink raw reply	[flat|nested] 59+ messages in thread

end of thread, other threads:[~2015-05-12 18:02 UTC | newest]

Thread overview: 59+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-30  8:56 [PATCH 0/12] IB/srp patches for kernel v4.2 Bart Van Assche
2015-04-30  8:56 ` [PATCH 01/12] scsi_transport_srp: Introduce srp_wait_for_queuecommand() Bart Van Assche
     [not found]   ` <5541EE4A.30803-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30  9:32     ` Sagi Grimberg
2015-04-30  9:37     ` Christoph Hellwig
2015-04-30 10:26       ` Bart Van Assche
2015-04-30 10:32         ` Sagi Grimberg
2015-04-30 10:58           ` Bart Van Assche
     [not found]             ` <55420AEA.10108-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30 14:13               ` Sagi Grimberg
2015-04-30 17:25               ` Christoph Hellwig
2015-05-06  9:59                 ` Bart Van Assche
2015-05-11  7:50                   ` hosts resets in SRP and the rest of the world, was: " Christoph Hellwig
2015-05-11  8:54                     ` Bart Van Assche
2015-05-11  9:31                       ` Christoph Hellwig
2015-05-11  9:58                         ` Bart Van Assche
2015-05-11 11:47                           ` Christoph Hellwig
2015-05-11 10:58                         ` Bart Van Assche
2015-05-11 11:50                           ` Christoph Hellwig
2015-05-12  8:49                             ` Bart Van Assche
2015-05-12 18:02                               ` Christoph Hellwig
2015-04-30  8:57 ` [PATCH 02/12] scsi_transport_srp: Fix a race condition Bart Van Assche
     [not found]   ` <5541EE66.7090608-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30  9:44     ` Sagi Grimberg
     [not found]       ` <5541F96F.8090503-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-04-30 10:20         ` Bart Van Assche
2015-04-30  8:58 ` [PATCH 04/12] IB/srp: Fix connection state tracking Bart Van Assche
2015-04-30  9:51   ` Sagi Grimberg
2015-04-30 11:25     ` Bart Van Assche
     [not found]       ` <5542111E.1080305-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30 15:00         ` Sagi Grimberg
     [not found]           ` <5542439D.1000107-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-05-05  9:31             ` Bart Van Assche
     [not found]               ` <55488E06.8040308-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-05-05  9:45                 ` Sagi Grimberg
     [not found]                   ` <5548911F.8060505-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-05-05  9:59                     ` Bart Van Assche
2015-04-30 16:08   ` Doug Ledford
     [not found]     ` <1430410094.102408.71.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-05  9:21       ` Bart Van Assche
     [not found]         ` <55488BAE.7070006-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-05-05 14:10           ` Doug Ledford
2015-05-05 14:26             ` Bart Van Assche
2015-05-05 15:10               ` Doug Ledford
2015-05-05 15:27                 ` Bart Van Assche
     [not found]                   ` <5548E155.70007-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-05-05 16:10                     ` Doug Ledford
     [not found]                       ` <1430842201.2407.226.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2015-05-06  9:29                         ` Bart Van Assche
     [not found]                           ` <5549DEEC.9050501-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-05-07 13:44                             ` Doug Ledford
2015-04-30  8:58 ` [PATCH 05/12] IB/srp: Fix reconnection failure handling Bart Van Assche
2015-04-30  8:59 ` [PATCH 06/12] scsi_transport_srp: Reduce failover time Bart Van Assche
2015-04-30 10:13   ` Sagi Grimberg
2015-04-30 11:02     ` Bart Van Assche
     [not found]       ` <55420BAA.7060507-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30 15:14         ` Sagi Grimberg
     [not found]           ` <554246E6.9020503-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2015-05-05  9:38             ` Bart Van Assche
2015-04-30  9:00 ` [PATCH 07/12] IB/srp: Remove superfluous casts Bart Van Assche
2015-04-30 10:13   ` Sagi Grimberg
2015-04-30  9:00 ` [PATCH 08/12] IB/srp: Rearrange module description Bart Van Assche
     [not found]   ` <5541EF39.6040301-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30 10:15     ` Sagi Grimberg
2015-04-30  9:01 ` [PATCH 09/12] IB/srp: Remove a superfluous check from srp_free_req_data() Bart Van Assche
     [not found]   ` <5541EF4F.6050200-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30 10:18     ` Sagi Grimberg
2015-04-30 10:37       ` Bart Van Assche
2015-04-30  9:01 ` [PATCH 10/12] IB/srp: Remove !ch->target tests from the reconnect code Bart Van Assche
2015-04-30 10:19   ` Sagi Grimberg
     [not found] ` <5541EE21.3050809-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30  8:57   ` [PATCH 03/12] IB/srp: Remove an extraneous scsi_host_put() from an error path Bart Van Assche
2015-04-30  9:44     ` Sagi Grimberg
2015-04-30  9:02   ` [PATCH 11/12] IB/srp: Add 64-bit LUN support Bart Van Assche
2015-04-30  9:02   ` [PATCH 12/12] IB/srp: Make CM timeout dependent on subnet timeout Bart Van Assche
     [not found]     ` <5541EFB3.6030704-XdAiOPVOjttBDgjK7y7TUQ@public.gmane.org>
2015-04-30 10:27       ` Sagi Grimberg
2015-04-30 10:45         ` Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.