All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/12] IB/srp: Add multichannel support
@ 2014-10-07 13:01 Bart Van Assche
  2014-10-07 13:03 ` [PATCH v2 02/12] blk-mq: Add blk_mq_unique_tag() Bart Van Assche
                   ` (7 more replies)
  0 siblings, 8 replies; 83+ messages in thread
From: Bart Van Assche @ 2014-10-07 13:01 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

Although the SRP protocol supports multichannel operation, although 
since considerable time RDMA HCA's are available that support multiple 
completion vectors and although multichannel operation yields better 
performance than using a single channel, the Linux SRP initiator does 
not yet support multichannel operation. Hence this patch series that 
adds multichannel support to the SRP initiator driver.

The changes compared to the previous version of this patch series are as 
follows:
* Added a function to the block layer that allows SCSI LLDs to query
   the blk-mq hardware context index chosen by the block layer. Removed
   the mq_queuecommand callback again.
* Added support for multiple hardware queues in the TCQ functions in
   the SCSI core.
* Split a few patches and elaborated the patch descriptions to make it
   easier to review this patch series.
* Added two new patches: one patch that makes the SRP initiator always
   use block layer tags and another patch that realizes a micro-
   optimization, namely elimination of the free requests list.

The patches in this series are:
0001-blk-mq-Use-all-available-hardware-queues.patch
0002-blk-mq-Add-blk_mq_unique_tag.patch
0003-scsi-mq-Add-support-for-multiple-hardware-queues.patch
0004-scsi_tcq.h-Add-support-for-multiple-hardware-queues.patch
0005-IB-srp-Move-ib_destroy_cm_id-call-into-srp_free_ch_i.patch
0006-IB-srp-Remove-stale-connection-retry-mechanism.patch
0007-IB-srp-Avoid-that-I-O-hangs-due-to-a-cable-pull-duri.patch
0008-IB-srp-Introduce-two-new-srp_target_port-member-vari.patch
0009-IB-srp-Separate-target-and-channel-variables.patch
0010-IB-srp-Use-block-layer-tags.patch
0011-IB-srp-Eliminate-free_reqs-list.patch
0012-IB-srp-Add-multichannel-support.patch

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH v2 01/12] blk-mq: Use all available hardware queues
       [not found] ` <5433E43D.3010107-HInyCGIudOg@public.gmane.org>
@ 2014-10-07 13:02   ` Bart Van Assche
  2014-10-07 14:37     ` Jens Axboe
  2014-10-07 13:03   ` [PATCH v2 03/12] scsi-mq: Add support for multiple " Bart Van Assche
                     ` (4 subsequent siblings)
  5 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-07 13:02 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

Suppose that a system has two CPU sockets, three cores per socket,
that it does not support hyperthreading and that four hardware
queues are provided by a block driver. With the current algorithm
this will lead to the following assignment of CPU cores to hardware
queues:

  HWQ 0: 0 1
  HWQ 1: 2 3
  HWQ 2: 4 5
  HWQ 3: (none)

This patch changes the queue assignment into:

  HWQ 0: 0 1
  HWQ 1: 2
  HWQ 2: 3 4
  HWQ 3: 5

In other words, this patch has the following three effects:
- All four hardware queues are used instead of only three.
- CPU cores are spread more evenly over hardware queues. For the
  above example the range of the number of CPU cores associated
  with a single HWQ is reduced from [0..2] to [1..2].
- If the number of HWQ's is a multiple of the number of CPU sockets
  it is now guaranteed that all CPU cores associated with a single
  HWQ reside on the same CPU socket.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Reviewed-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Jens Axboe <axboe-b10kYP2dOMg@public.gmane.org>
Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: Ming Lei <ming.lei-Z7WLFzj8eWMS+FvcfC7Uqw@public.gmane.org>
---
 block/blk-mq-cpumap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 1065d7c..8e56455 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -17,7 +17,7 @@
 static int cpu_to_queue_index(unsigned int nr_cpus, unsigned int nr_queues,
 			      const int cpu)
 {
-	return cpu / ((nr_cpus + nr_queues - 1) / nr_queues);
+	return cpu * nr_queues / nr_cpus;
 }
 
 static int get_first_sibling(unsigned int cpu)
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v2 02/12] blk-mq: Add blk_mq_unique_tag()
  2014-10-07 13:01 [PATCH v2 0/12] IB/srp: Add multichannel support Bart Van Assche
@ 2014-10-07 13:03 ` Bart Van Assche
  2014-10-11 11:08   ` Christoph Hellwig
       [not found]   ` <5433E493.9030304-HInyCGIudOg@public.gmane.org>
  2014-10-07 13:04 ` [PATCH v2 04/12] scsi_tcq.h: Add support for multiple hardware queues Bart Van Assche
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 83+ messages in thread
From: Bart Van Assche @ 2014-10-07 13:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi, linux-rdma

The queuecommand() callback functions in SCSI low-level drivers
need to know which hardware context has been selected by the
block layer. Since this information is not available in the
request structure, and since passing the hctx pointer directly to
the queuecommand callback function would require modification of
all SCSI LLDs, add a function to the block layer that allows to
query the hardware context index.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@canonical.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
---
 block/blk-mq-tag.c     | 27 +++++++++++++++++++++++++++
 block/blk-mq.c         |  2 ++
 include/linux/blk-mq.h | 22 ++++++++++++++++++++++
 3 files changed, 51 insertions(+)

diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c
index 88d512f..2c63a2b 100644
--- a/block/blk-mq-tag.c
+++ b/block/blk-mq-tag.c
@@ -592,6 +592,33 @@ int blk_mq_tag_update_depth(struct blk_mq_tags *tags, unsigned int tdepth)
 	return 0;
 }
 
+/**
+ * blk_mq_unique_tag() - return a tag that is unique queue-wide
+ * @rq: request for which to compute a unique tag
+ *
+ * The tag field in struct request is unique per hardware queue but not over
+ * all hardware queues. Hence this function that returns a tag with the
+ * hardware context index in the upper bits and the per hardware queue tag in
+ * the lower bits.
+ *
+ * Note: When called for a request that queued on a non-multiqueue request
+ * queue, the hardware context index is set to zero.
+ */
+u32 blk_mq_unique_tag(struct request *rq)
+{
+	struct request_queue *q = rq->q;
+	struct blk_mq_hw_ctx *hctx;
+	int hwq = 0;
+
+	if (q->mq_ops) {
+		hctx = q->mq_ops->map_queue(q, rq->mq_ctx->cpu);
+		hwq = hctx->queue_num;
+	}
+
+	return blk_mq_build_unique_tag(hwq, rq->tag);
+}
+EXPORT_SYMBOL(blk_mq_unique_tag);
+
 ssize_t blk_mq_tag_sysfs_show(struct blk_mq_tags *tags, char *page)
 {
 	char *orig_page = page;
diff --git a/block/blk-mq.c b/block/blk-mq.c
index df8e1e0..bf1959c 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2018,6 +2018,8 @@ static int blk_mq_alloc_rq_maps(struct blk_mq_tag_set *set)
  */
 int blk_mq_alloc_tag_set(struct blk_mq_tag_set *set)
 {
+	BUILD_BUG_ON(BLK_MQ_MAX_DEPTH > 1 << BLK_MQ_UNIQUE_TAG_BITS);
+
 	if (!set->nr_hw_queues)
 		return -EINVAL;
 	if (!set->queue_depth)
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index eac4f31..6050a23 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -167,6 +167,28 @@ struct request *blk_mq_alloc_request(struct request_queue *q, int rw,
 		gfp_t gfp, bool reserved);
 struct request *blk_mq_tag_to_rq(struct blk_mq_tags *tags, unsigned int tag);
 
+enum {
+	BLK_MQ_UNIQUE_TAG_BITS = 16,
+	BLK_MQ_UNIQUE_TAG_MASK = (1 << BLK_MQ_UNIQUE_TAG_BITS) - 1,
+};
+
+u32 blk_mq_unique_tag(struct request *rq);
+
+static inline u32 blk_mq_build_unique_tag(int hwq, int tag)
+{
+	return (hwq << BLK_MQ_UNIQUE_TAG_BITS) | (tag & BLK_MQ_UNIQUE_TAG_MASK);
+}
+
+static inline u16 blk_mq_unique_tag_to_hwq(u32 unique_tag)
+{
+	return unique_tag >> BLK_MQ_UNIQUE_TAG_BITS;
+}
+
+static inline u16 blk_mq_unique_tag_to_tag(u32 unique_tag)
+{
+	return unique_tag & BLK_MQ_UNIQUE_TAG_MASK;
+}
+
 struct blk_mq_hw_ctx *blk_mq_map_queue(struct request_queue *, const int ctx_index);
 struct blk_mq_hw_ctx *blk_mq_alloc_single_hw_queue(struct blk_mq_tag_set *, unsigned int, int);
 
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v2 03/12] scsi-mq: Add support for multiple hardware queues
       [not found] ` <5433E43D.3010107-HInyCGIudOg@public.gmane.org>
  2014-10-07 13:02   ` [PATCH v2 01/12] blk-mq: Use all available " Bart Van Assche
@ 2014-10-07 13:03   ` Bart Van Assche
       [not found]     ` <5433E4AB.8030306-HInyCGIudOg@public.gmane.org>
  2014-10-07 13:04   ` [PATCH v2 05/12] IB/srp: Move ib_destroy_cm_id() call into srp_free_ch_ib() Bart Van Assche
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-07 13:03 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

Allow a SCSI LLD to declare how many hardware queues it supports
by setting Scsi_Host.nr_hw_queues before calling scsi_add_host().

Note: it is assumed that each hardware queue has a queue depth of
shost->can_queue. In other words, the total queue depth per host
is (number of hardware queues) * (shost->can_queue).

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
---
 drivers/scsi/scsi_lib.c  | 2 +-
 include/scsi/scsi_host.h | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index db8c449..f829c42 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -2072,7 +2072,7 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
 
 	memset(&shost->tag_set, 0, sizeof(shost->tag_set));
 	shost->tag_set.ops = &scsi_mq_ops;
-	shost->tag_set.nr_hw_queues = 1;
+	shost->tag_set.nr_hw_queues = shost->nr_hw_queues ? : 1;
 	shost->tag_set.queue_depth = shost->can_queue;
 	shost->tag_set.cmd_size = cmd_size;
 	shost->tag_set.numa_node = NUMA_NO_NODE;
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index cafb260..d38cab9 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -638,6 +638,10 @@ struct Scsi_Host {
 	short unsigned int sg_prot_tablesize;
 	unsigned int max_sectors;
 	unsigned long dma_boundary;
+	/*
+	 * In scsi-mq mode, the number of hardware queues supported by the LLD.
+	 */
+	unsigned nr_hw_queues;
 	/* 
 	 * Used to assign serial numbers to the cmds.
 	 * Protected by the host lock.
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v2 04/12] scsi_tcq.h: Add support for multiple hardware queues
  2014-10-07 13:01 [PATCH v2 0/12] IB/srp: Add multichannel support Bart Van Assche
  2014-10-07 13:03 ` [PATCH v2 02/12] blk-mq: Add blk_mq_unique_tag() Bart Van Assche
@ 2014-10-07 13:04 ` Bart Van Assche
  2014-10-19 16:12   ` Sagi Grimberg
  2014-10-28  2:06   ` Martin K. Petersen
       [not found] ` <5433E43D.3010107-HInyCGIudOg@public.gmane.org>
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 83+ messages in thread
From: Bart Van Assche @ 2014-10-07 13:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi, linux-rdma

Modify scsi_find_tag() and scsi_host_find_tag() such that these
fuctions can translate a tag generated by blk_mq_unique_tag().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Sagi Grimberg <sagig@mellanox.com>
---
 include/scsi/scsi_tcq.h | 19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/include/scsi/scsi_tcq.h b/include/scsi/scsi_tcq.h
index e645835..ea1ca9c 100644
--- a/include/scsi/scsi_tcq.h
+++ b/include/scsi/scsi_tcq.h
@@ -111,18 +111,21 @@ static inline int scsi_populate_tag_msg(struct scsi_cmnd *cmd, char *msg)
 }
 
 static inline struct scsi_cmnd *scsi_mq_find_tag(struct Scsi_Host *shost,
-		unsigned int hw_ctx, int tag)
+						 int unique_tag)
 {
-	struct request *req;
+	u16 hwq = blk_mq_unique_tag_to_hwq(unique_tag);
+	struct request *req = NULL;
 
-	req = blk_mq_tag_to_rq(shost->tag_set.tags[hw_ctx], tag);
+	if (hwq < shost->tag_set.nr_hw_queues)
+		req = blk_mq_tag_to_rq(shost->tag_set.tags[hwq],
+				       blk_mq_unique_tag_to_tag(unique_tag));
 	return req ? (struct scsi_cmnd *)req->special : NULL;
 }
 
 /**
  * scsi_find_tag - find a tagged command by device
  * @SDpnt:	pointer to the ScSI device
- * @tag:	the tag number
+ * @tag:	tag generated by blk_mq_unique_tag()
  *
  * Notes:
  *	Only works with tags allocated by the generic blk layer.
@@ -133,9 +136,9 @@ static inline struct scsi_cmnd *scsi_find_tag(struct scsi_device *sdev, int tag)
 
         if (tag != SCSI_NO_TAG) {
 		if (shost_use_blk_mq(sdev->host))
-			return scsi_mq_find_tag(sdev->host, 0, tag);
+			return scsi_mq_find_tag(sdev->host, tag);
 
-        	req = blk_queue_find_tag(sdev->request_queue, tag);
+		req = blk_queue_find_tag(sdev->request_queue, tag);
 	        return req ? (struct scsi_cmnd *)req->special : NULL;
 	}
 
@@ -174,7 +177,7 @@ static inline int scsi_init_shared_tag_map(struct Scsi_Host *shost, int depth)
 /**
  * scsi_host_find_tag - find the tagged command by host
  * @shost:	pointer to scsi_host
- * @tag:	tag of the scsi_cmnd
+ * @tag:	tag generated by blk_mq_unique_tag()
  *
  * Notes:
  *	Only works with tags allocated by the generic blk layer.
@@ -186,7 +189,7 @@ static inline struct scsi_cmnd *scsi_host_find_tag(struct Scsi_Host *shost,
 
 	if (tag != SCSI_NO_TAG) {
 		if (shost_use_blk_mq(shost))
-			return scsi_mq_find_tag(shost, 0, tag);
+			return scsi_mq_find_tag(shost, tag);
 		req = blk_map_queue_find_tag(shost->bqt, tag);
 		return req ? (struct scsi_cmnd *)req->special : NULL;
 	}
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v2 05/12] IB/srp: Move ib_destroy_cm_id() call into srp_free_ch_ib()
       [not found] ` <5433E43D.3010107-HInyCGIudOg@public.gmane.org>
  2014-10-07 13:02   ` [PATCH v2 01/12] blk-mq: Use all available " Bart Van Assche
  2014-10-07 13:03   ` [PATCH v2 03/12] scsi-mq: Add support for multiple " Bart Van Assche
@ 2014-10-07 13:04   ` Bart Van Assche
  2014-10-07 13:04   ` [PATCH v2 06/12] IB/srp: Remove stale connection retry mechanism Bart Van Assche
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 83+ messages in thread
From: Bart Van Assche @ 2014-10-07 13:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

The patch that adds multichannel support into the SRP initiator
driver introduces an additional call to srp_free_ch_ib(). This
patch helps to keep that later patch simple.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Reviewed-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 62d2a18..d3c712f 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -555,6 +555,11 @@ static void srp_free_target_ib(struct srp_target_port *target)
 	struct srp_device *dev = target->srp_host->srp_dev;
 	int i;
 
+	if (target->cm_id) {
+		ib_destroy_cm_id(target->cm_id);
+		target->cm_id = NULL;
+	}
+
 	if (dev->use_fast_reg) {
 		if (target->fr_pool)
 			srp_destroy_fr_pool(target->fr_pool);
@@ -868,7 +873,6 @@ static void srp_remove_target(struct srp_target_port *target)
 	scsi_remove_host(target->scsi_host);
 	srp_stop_rport_timers(target->rport);
 	srp_disconnect_target(target);
-	ib_destroy_cm_id(target->cm_id);
 	srp_free_target_ib(target);
 	cancel_work_sync(&target->tl_err_work);
 	srp_rport_put(target->rport);
@@ -3043,7 +3047,7 @@ static ssize_t srp_create_target(struct device *dev,
 	if (ret) {
 		shost_printk(KERN_ERR, target->scsi_host,
 			     PFX "Connection failed\n");
-		goto err_cm_id;
+		goto err_free_ib;
 	}
 
 	ret = srp_add_target(host, target);
@@ -3067,9 +3071,6 @@ out:
 err_disconnect:
 	srp_disconnect_target(target);
 
-err_cm_id:
-	ib_destroy_cm_id(target->cm_id);
-
 err_free_ib:
 	srp_free_target_ib(target);
 
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v2 06/12] IB/srp: Remove stale connection retry mechanism
       [not found] ` <5433E43D.3010107-HInyCGIudOg@public.gmane.org>
                     ` (2 preceding siblings ...)
  2014-10-07 13:04   ` [PATCH v2 05/12] IB/srp: Move ib_destroy_cm_id() call into srp_free_ch_ib() Bart Van Assche
@ 2014-10-07 13:04   ` Bart Van Assche
  2014-10-07 13:05   ` [PATCH v2 09/12] IB/srp: Separate target and channel variables Bart Van Assche
  2014-10-07 13:06   ` [PATCH v2 11/12] IB/srp: Eliminate free_reqs list Bart Van Assche
  5 siblings, 0 replies; 83+ messages in thread
From: Bart Van Assche @ 2014-10-07 13:04 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

Attempting to connect three times may be insufficient after an
initiator system tries to relogin, especially if the relogin
attempt occurs before the SRP target service ID has been
registered. Since the srp_daemon retries a failed login attempt
anyway, remove the stale connection retry mechanism.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Reviewed-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 16 +++-------------
 1 file changed, 3 insertions(+), 13 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index d3c712f..9608e7a 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -904,7 +904,6 @@ static void srp_rport_delete(struct srp_rport *rport)
 
 static int srp_connect_target(struct srp_target_port *target)
 {
-	int retries = 3;
 	int ret;
 
 	WARN_ON_ONCE(target->connected);
@@ -945,19 +944,10 @@ static int srp_connect_target(struct srp_target_port *target)
 			break;
 
 		case SRP_STALE_CONN:
-			/* Our current CM id was stale, and is now in timewait.
-			 * Try to reconnect with a new one.
-			 */
-			if (!retries-- || srp_new_cm_id(target)) {
-				shost_printk(KERN_ERR, target->scsi_host, PFX
-					     "giving up on stale connection\n");
-				target->status = -ECONNRESET;
-				return target->status;
-			}
-
 			shost_printk(KERN_ERR, target->scsi_host, PFX
-				     "retrying stale connection\n");
-			break;
+				     "giving up on stale connection\n");
+			target->status = -ECONNRESET;
+			return target->status;
 
 		default:
 			return target->status;
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v2 07/12] IB/srp: Avoid that I/O hangs due to a cable pull during LUN scanning
  2014-10-07 13:01 [PATCH v2 0/12] IB/srp: Add multichannel support Bart Van Assche
                   ` (2 preceding siblings ...)
       [not found] ` <5433E43D.3010107-HInyCGIudOg@public.gmane.org>
@ 2014-10-07 13:05 ` Bart Van Assche
  2014-10-19 16:27   ` Sagi Grimberg
  2014-10-07 13:05 ` [PATCH v2 08/12] IB/srp: Introduce two new srp_target_port member variables Bart Van Assche
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-07 13:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi, linux-rdma

If a cable is pulled during LUN scanning it can happen that the
SRP rport and the SCSI host have been created but no LUNs have been
added to the SCSI host. Since multipathd only sends SCSI commands
to a SCSI target if one or more SCSI devices are present and since
there is no keepalive mechanism for IB queue pairs this means that
after a LUN scan failed and after a reconnect has succeeded no
data will be sent over the QP and hence that a subsequent cable
pull will not be detected. Avoid this by not creating an rport or
SCSI host if a cable is pulled during a SCSI LUN scan.

Note: so far the above behavior has only been observed with the
kernel module parameter ch_count set to a value >= 2.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 60 +++++++++++++++++++++++++++++++------
 drivers/infiniband/ulp/srp/ib_srp.h |  1 +
 2 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 9608e7a..a662c29 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1111,6 +1111,10 @@ static int srp_rport_reconnect(struct srp_rport *rport)
 	int i, ret;
 
 	srp_disconnect_target(target);
+
+	if (target->state == SRP_TARGET_SCANNING)
+		return -ENODEV;
+
 	/*
 	 * Now get a new local CM ID so that we avoid confusing the target in
 	 * case things are really fouled up. Doing so also ensures that all CM
@@ -2607,11 +2611,23 @@ static struct scsi_host_template srp_template = {
 	.shost_attrs			= srp_host_attrs
 };
 
+static int srp_sdev_count(struct Scsi_Host *host)
+{
+	struct scsi_device *sdev;
+	int c = 0;
+
+	shost_for_each_device(sdev, host)
+		c++;
+
+	return c;
+}
+
 static int srp_add_target(struct srp_host *host, struct srp_target_port *target)
 {
 	struct srp_rport_identifiers ids;
 	struct srp_rport *rport;
 
+	target->state = SRP_TARGET_SCANNING;
 	sprintf(target->target_name, "SRP.T10:%016llX",
 		 (unsigned long long) be64_to_cpu(target->id_ext));
 
@@ -2634,11 +2650,26 @@ static int srp_add_target(struct srp_host *host, struct srp_target_port *target)
 	list_add_tail(&target->list, &host->target_list);
 	spin_unlock(&host->target_lock);
 
-	target->state = SRP_TARGET_LIVE;
-
 	scsi_scan_target(&target->scsi_host->shost_gendev,
 			 0, target->scsi_id, SCAN_WILD_CARD, 0);
 
+	if (!target->connected || target->qp_in_error) {
+		shost_printk(KERN_INFO, target->scsi_host,
+			     PFX "SCSI scan failed - removing SCSI host\n");
+		srp_queue_remove_work(target);
+		goto out;
+	}
+
+	pr_debug(PFX "%s: SCSI scan succeeded - detected %d LUNs\n",
+		 dev_name(&target->scsi_host->shost_gendev),
+		 srp_sdev_count(target->scsi_host));
+
+	spin_lock_irq(&target->lock);
+	if (target->state == SRP_TARGET_SCANNING)
+		target->state = SRP_TARGET_LIVE;
+	spin_unlock_irq(&target->lock);
+
+out:
 	return 0;
 }
 
@@ -2982,6 +3013,12 @@ static ssize_t srp_create_target(struct device *dev,
 	target->tl_retry_count	= 7;
 	target->queue_size	= SRP_DEFAULT_QUEUE_SIZE;
 
+	/*
+	 * Avoid that the SCSI host can be removed by srp_remove_target()
+	 * before this function returns.
+	 */
+	scsi_host_get(target->scsi_host);
+
 	mutex_lock(&host->add_target_mutex);
 
 	ret = srp_parse_options(buf, target);
@@ -3044,18 +3081,23 @@ static ssize_t srp_create_target(struct device *dev,
 	if (ret)
 		goto err_disconnect;
 
-	shost_printk(KERN_DEBUG, target->scsi_host, PFX
-		     "new target: id_ext %016llx ioc_guid %016llx pkey %04x service_id %016llx sgid %pI6 dgid %pI6\n",
-		     be64_to_cpu(target->id_ext),
-		     be64_to_cpu(target->ioc_guid),
-		     be16_to_cpu(target->path.pkey),
-		     be64_to_cpu(target->service_id),
-		     target->path.sgid.raw, target->path.dgid.raw);
+	if (target->state != SRP_TARGET_REMOVED) {
+		shost_printk(KERN_DEBUG, target->scsi_host, PFX
+			     "new target: id_ext %016llx ioc_guid %016llx pkey %04x service_id %016llx sgid %pI6 dgid %pI6\n",
+			     be64_to_cpu(target->id_ext),
+			     be64_to_cpu(target->ioc_guid),
+			     be16_to_cpu(target->path.pkey),
+			     be64_to_cpu(target->service_id),
+			     target->path.sgid.raw, target->orig_dgid);
+	}
 
 	ret = count;
 
 out:
 	mutex_unlock(&host->add_target_mutex);
+
+	scsi_host_put(target->scsi_host);
+
 	return ret;
 
 err_disconnect:
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index e46ecb1..00c7c48 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -73,6 +73,7 @@ enum {
 };
 
 enum srp_target_state {
+	SRP_TARGET_SCANNING,
 	SRP_TARGET_LIVE,
 	SRP_TARGET_REMOVED,
 };
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v2 08/12] IB/srp: Introduce two new srp_target_port member variables
  2014-10-07 13:01 [PATCH v2 0/12] IB/srp: Add multichannel support Bart Van Assche
                   ` (3 preceding siblings ...)
  2014-10-07 13:05 ` [PATCH v2 07/12] IB/srp: Avoid that I/O hangs due to a cable pull during LUN scanning Bart Van Assche
@ 2014-10-07 13:05 ` Bart Van Assche
  2014-10-19 16:30   ` Sagi Grimberg
  2014-10-07 13:06 ` [PATCH v2 10/12] IB/srp: Use block layer tags Bart Van Assche
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-07 13:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi, linux-rdma

Introduce the srp_target_port member variables 'sgid' and 'pkey'.
Change the type of 'orig_dgid' from __be16[8] into union ib_gid.
This patch does not change any functionality but makes the
"Separate target and channel variables" patch easier to verify.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 39 ++++++++++++++++++++++---------------
 drivers/infiniband/ulp/srp/ib_srp.h |  4 +++-
 2 files changed, 26 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index a662c29..5685062 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -262,7 +262,7 @@ static int srp_init_qp(struct srp_target_port *target,
 
 	ret = ib_find_pkey(target->srp_host->srp_dev->dev,
 			   target->srp_host->port,
-			   be16_to_cpu(target->path.pkey),
+			   be16_to_cpu(target->pkey),
 			   &attr->pkey_index);
 	if (ret)
 		goto out;
@@ -295,6 +295,10 @@ static int srp_new_cm_id(struct srp_target_port *target)
 	if (target->cm_id)
 		ib_destroy_cm_id(target->cm_id);
 	target->cm_id = new_cm_id;
+	target->path.sgid = target->sgid;
+	target->path.dgid = target->orig_dgid;
+	target->path.pkey = target->pkey;
+	target->path.service_id = target->service_id;
 
 	return 0;
 }
@@ -689,7 +693,7 @@ static int srp_send_req(struct srp_target_port *target)
 	 */
 	if (target->io_class == SRP_REV10_IB_IO_CLASS) {
 		memcpy(req->priv.initiator_port_id,
-		       &target->path.sgid.global.interface_id, 8);
+		       &target->sgid.global.interface_id, 8);
 		memcpy(req->priv.initiator_port_id + 8,
 		       &target->initiator_ext, 8);
 		memcpy(req->priv.target_port_id,     &target->ioc_guid, 8);
@@ -698,7 +702,7 @@ static int srp_send_req(struct srp_target_port *target)
 		memcpy(req->priv.initiator_port_id,
 		       &target->initiator_ext, 8);
 		memcpy(req->priv.initiator_port_id + 8,
-		       &target->path.sgid.global.interface_id, 8);
+		       &target->sgid.global.interface_id, 8);
 		memcpy(req->priv.target_port_id,     &target->id_ext, 8);
 		memcpy(req->priv.target_port_id + 8, &target->ioc_guid, 8);
 	}
@@ -2175,8 +2179,8 @@ static void srp_cm_rej_handler(struct ib_cm_id *cm_id,
 			else
 				shost_printk(KERN_WARNING, shost, PFX
 					     "SRP LOGIN from %pI6 to %pI6 REJECTED, reason 0x%08x\n",
-					     target->path.sgid.raw,
-					     target->orig_dgid, reason);
+					     target->sgid.raw,
+					     target->orig_dgid.raw, reason);
 		} else
 			shost_printk(KERN_WARNING, shost,
 				     "  REJ reason: IB_CM_REJ_CONSUMER_DEFINED,"
@@ -2464,7 +2468,7 @@ static ssize_t show_pkey(struct device *dev, struct device_attribute *attr,
 {
 	struct srp_target_port *target = host_to_target(class_to_shost(dev));
 
-	return sprintf(buf, "0x%04x\n", be16_to_cpu(target->path.pkey));
+	return sprintf(buf, "0x%04x\n", be16_to_cpu(target->pkey));
 }
 
 static ssize_t show_sgid(struct device *dev, struct device_attribute *attr,
@@ -2472,7 +2476,7 @@ static ssize_t show_sgid(struct device *dev, struct device_attribute *attr,
 {
 	struct srp_target_port *target = host_to_target(class_to_shost(dev));
 
-	return sprintf(buf, "%pI6\n", target->path.sgid.raw);
+	return sprintf(buf, "%pI6\n", target->sgid.raw);
 }
 
 static ssize_t show_dgid(struct device *dev, struct device_attribute *attr,
@@ -2488,7 +2492,7 @@ static ssize_t show_orig_dgid(struct device *dev,
 {
 	struct srp_target_port *target = host_to_target(class_to_shost(dev));
 
-	return sprintf(buf, "%pI6\n", target->orig_dgid);
+	return sprintf(buf, "%pI6\n", target->orig_dgid.raw);
 }
 
 static ssize_t show_req_lim(struct device *dev,
@@ -2826,11 +2830,15 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target)
 			}
 
 			for (i = 0; i < 16; ++i) {
-				strlcpy(dgid, p + i * 2, 3);
-				target->path.dgid.raw[i] = simple_strtoul(dgid, NULL, 16);
+				strlcpy(dgid, p + i * 2, sizeof(dgid));
+				if (sscanf(dgid, "%hhx",
+					   &target->orig_dgid.raw[i]) < 1) {
+					ret = -EINVAL;
+					kfree(p);
+					goto out;
+				}
 			}
 			kfree(p);
-			memcpy(target->orig_dgid, target->path.dgid.raw, 16);
 			break;
 
 		case SRP_OPT_PKEY:
@@ -2838,7 +2846,7 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target)
 				pr_warn("bad P_Key parameter '%s'\n", p);
 				goto out;
 			}
-			target->path.pkey = cpu_to_be16(token);
+			target->pkey = cpu_to_be16(token);
 			break;
 
 		case SRP_OPT_SERVICE_ID:
@@ -2848,7 +2856,6 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target)
 				goto out;
 			}
 			target->service_id = cpu_to_be64(simple_strtoull(p, NULL, 16));
-			target->path.service_id = target->service_id;
 			kfree(p);
 			break;
 
@@ -3058,7 +3065,7 @@ static ssize_t srp_create_target(struct device *dev,
 	if (ret)
 		goto err_free_mem;
 
-	ret = ib_query_gid(ibdev, host->port, 0, &target->path.sgid);
+	ret = ib_query_gid(ibdev, host->port, 0, &target->sgid);
 	if (ret)
 		goto err_free_mem;
 
@@ -3086,9 +3093,9 @@ static ssize_t srp_create_target(struct device *dev,
 			     "new target: id_ext %016llx ioc_guid %016llx pkey %04x service_id %016llx sgid %pI6 dgid %pI6\n",
 			     be64_to_cpu(target->id_ext),
 			     be64_to_cpu(target->ioc_guid),
-			     be16_to_cpu(target->path.pkey),
+			     be16_to_cpu(target->pkey),
 			     be64_to_cpu(target->service_id),
-			     target->path.sgid.raw, target->orig_dgid);
+			     target->sgid.raw, target->orig_dgid.raw);
 	}
 
 	ret = count;
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index 00c7c48..8635ab6 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -157,6 +157,7 @@ struct srp_target_port {
 	 * command processing. Try to keep them packed into cachelines.
 	 */
 
+	union ib_gid		sgid;
 	__be64			id_ext;
 	__be64			ioc_guid;
 	__be64			service_id;
@@ -173,8 +174,9 @@ struct srp_target_port {
 	int			comp_vector;
 	int			tl_retry_count;
 
+	union ib_gid		orig_dgid;
+	__be16			pkey;
 	struct ib_sa_path_rec	path;
-	__be16			orig_dgid[8];
 	struct ib_sa_query     *path_query;
 	int			path_query_id;
 
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v2 09/12] IB/srp: Separate target and channel variables
       [not found] ` <5433E43D.3010107-HInyCGIudOg@public.gmane.org>
                     ` (3 preceding siblings ...)
  2014-10-07 13:04   ` [PATCH v2 06/12] IB/srp: Remove stale connection retry mechanism Bart Van Assche
@ 2014-10-07 13:05   ` Bart Van Assche
  2014-10-19 16:48     ` Sagi Grimberg
  2014-10-07 13:06   ` [PATCH v2 11/12] IB/srp: Eliminate free_reqs list Bart Van Assche
  5 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-07 13:05 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

Changes in this patch:
- Move channel variables into a new structure (struct srp_rdma_ch).
- Add an srp_target_port pointer, 'lock' and 'comp_vector' members
  in struct srp_rdma_ch.
- Add code to initialize these three new member variables.
- Many boring "target->" into "ch->" changes.
- The cm_id and completion handler context pointers are now of type
  srp_rdma_ch * instead of srp_target_port *.
- Three kzalloc(a * b, f) calls have been changed into kcalloc(a, b, f)
  to avoid that this patch would trigger a checkpatch warning.
- Two casts from u64 into unsigned long long have been left out
  because these are superfluous. Since considerable time u64 is
  defined as unsigned long long for all architectures supported by
  the Linux kernel.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 674 +++++++++++++++++++-----------------
 drivers/infiniband/ulp/srp/ib_srp.h |  64 ++--
 2 files changed, 403 insertions(+), 335 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 5685062..cc0bf83b 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -125,8 +125,8 @@ MODULE_PARM_DESC(dev_loss_tmo,
 
 static void srp_add_one(struct ib_device *device);
 static void srp_remove_one(struct ib_device *device);
-static void srp_recv_completion(struct ib_cq *cq, void *target_ptr);
-static void srp_send_completion(struct ib_cq *cq, void *target_ptr);
+static void srp_recv_completion(struct ib_cq *cq, void *ch_ptr);
+static void srp_send_completion(struct ib_cq *cq, void *ch_ptr);
 static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event);
 
 static struct scsi_transport_template *ib_srp_transport_template;
@@ -283,22 +283,23 @@ out:
 	return ret;
 }
 
-static int srp_new_cm_id(struct srp_target_port *target)
+static int srp_new_cm_id(struct srp_rdma_ch *ch)
 {
+	struct srp_target_port *target = ch->target;
 	struct ib_cm_id *new_cm_id;
 
 	new_cm_id = ib_create_cm_id(target->srp_host->srp_dev->dev,
-				    srp_cm_handler, target);
+				    srp_cm_handler, ch);
 	if (IS_ERR(new_cm_id))
 		return PTR_ERR(new_cm_id);
 
-	if (target->cm_id)
-		ib_destroy_cm_id(target->cm_id);
-	target->cm_id = new_cm_id;
-	target->path.sgid = target->sgid;
-	target->path.dgid = target->orig_dgid;
-	target->path.pkey = target->pkey;
-	target->path.service_id = target->service_id;
+	if (ch->cm_id)
+		ib_destroy_cm_id(ch->cm_id);
+	ch->cm_id = new_cm_id;
+	ch->path.sgid = target->sgid;
+	ch->path.dgid = target->orig_dgid;
+	ch->path.pkey = target->pkey;
+	ch->path.service_id = target->service_id;
 
 	return 0;
 }
@@ -447,8 +448,9 @@ static struct srp_fr_pool *srp_alloc_fr_pool(struct srp_target_port *target)
 				  dev->max_pages_per_mr);
 }
 
-static int srp_create_target_ib(struct srp_target_port *target)
+static int srp_create_ch_ib(struct srp_rdma_ch *ch)
 {
+	struct srp_target_port *target = ch->target;
 	struct srp_device *dev = target->srp_host->srp_dev;
 	struct ib_qp_init_attr *init_attr;
 	struct ib_cq *recv_cq, *send_cq;
@@ -462,15 +464,15 @@ static int srp_create_target_ib(struct srp_target_port *target)
 	if (!init_attr)
 		return -ENOMEM;
 
-	recv_cq = ib_create_cq(dev->dev, srp_recv_completion, NULL, target,
-			       target->queue_size, target->comp_vector);
+	recv_cq = ib_create_cq(dev->dev, srp_recv_completion, NULL, ch,
+			       target->queue_size, ch->comp_vector);
 	if (IS_ERR(recv_cq)) {
 		ret = PTR_ERR(recv_cq);
 		goto err;
 	}
 
-	send_cq = ib_create_cq(dev->dev, srp_send_completion, NULL, target,
-			       m * target->queue_size, target->comp_vector);
+	send_cq = ib_create_cq(dev->dev, srp_send_completion, NULL, ch,
+			       m * target->queue_size, ch->comp_vector);
 	if (IS_ERR(send_cq)) {
 		ret = PTR_ERR(send_cq);
 		goto err_recv_cq;
@@ -506,9 +508,9 @@ static int srp_create_target_ib(struct srp_target_port *target)
 				     "FR pool allocation failed (%d)\n", ret);
 			goto err_qp;
 		}
-		if (target->fr_pool)
-			srp_destroy_fr_pool(target->fr_pool);
-		target->fr_pool = fr_pool;
+		if (ch->fr_pool)
+			srp_destroy_fr_pool(ch->fr_pool);
+		ch->fr_pool = fr_pool;
 	} else if (!dev->use_fast_reg && dev->has_fmr) {
 		fmr_pool = srp_alloc_fmr_pool(target);
 		if (IS_ERR(fmr_pool)) {
@@ -517,21 +519,21 @@ static int srp_create_target_ib(struct srp_target_port *target)
 				     "FMR pool allocation failed (%d)\n", ret);
 			goto err_qp;
 		}
-		if (target->fmr_pool)
-			ib_destroy_fmr_pool(target->fmr_pool);
-		target->fmr_pool = fmr_pool;
+		if (ch->fmr_pool)
+			ib_destroy_fmr_pool(ch->fmr_pool);
+		ch->fmr_pool = fmr_pool;
 	}
 
-	if (target->qp)
-		ib_destroy_qp(target->qp);
-	if (target->recv_cq)
-		ib_destroy_cq(target->recv_cq);
-	if (target->send_cq)
-		ib_destroy_cq(target->send_cq);
+	if (ch->qp)
+		ib_destroy_qp(ch->qp);
+	if (ch->recv_cq)
+		ib_destroy_cq(ch->recv_cq);
+	if (ch->send_cq)
+		ib_destroy_cq(ch->send_cq);
 
-	target->qp = qp;
-	target->recv_cq = recv_cq;
-	target->send_cq = send_cq;
+	ch->qp = qp;
+	ch->recv_cq = recv_cq;
+	ch->send_cq = send_cq;
 
 	kfree(init_attr);
 	return 0;
@@ -552,98 +554,102 @@ err:
 
 /*
  * Note: this function may be called without srp_alloc_iu_bufs() having been
- * invoked. Hence the target->[rt]x_ring checks.
+ * invoked. Hence the ch->[rt]x_ring checks.
  */
-static void srp_free_target_ib(struct srp_target_port *target)
+static void srp_free_ch_ib(struct srp_target_port *target,
+			   struct srp_rdma_ch *ch)
 {
 	struct srp_device *dev = target->srp_host->srp_dev;
 	int i;
 
-	if (target->cm_id) {
-		ib_destroy_cm_id(target->cm_id);
-		target->cm_id = NULL;
+	if (ch->cm_id) {
+		ib_destroy_cm_id(ch->cm_id);
+		ch->cm_id = NULL;
 	}
 
 	if (dev->use_fast_reg) {
-		if (target->fr_pool)
-			srp_destroy_fr_pool(target->fr_pool);
+		if (ch->fr_pool)
+			srp_destroy_fr_pool(ch->fr_pool);
 	} else {
-		if (target->fmr_pool)
-			ib_destroy_fmr_pool(target->fmr_pool);
+		if (ch->fmr_pool)
+			ib_destroy_fmr_pool(ch->fmr_pool);
 	}
-	ib_destroy_qp(target->qp);
-	ib_destroy_cq(target->send_cq);
-	ib_destroy_cq(target->recv_cq);
+	ib_destroy_qp(ch->qp);
+	ib_destroy_cq(ch->send_cq);
+	ib_destroy_cq(ch->recv_cq);
 
-	target->qp = NULL;
-	target->send_cq = target->recv_cq = NULL;
+	ch->qp = NULL;
+	ch->send_cq = ch->recv_cq = NULL;
 
-	if (target->rx_ring) {
+	if (ch->rx_ring) {
 		for (i = 0; i < target->queue_size; ++i)
-			srp_free_iu(target->srp_host, target->rx_ring[i]);
-		kfree(target->rx_ring);
-		target->rx_ring = NULL;
+			srp_free_iu(target->srp_host, ch->rx_ring[i]);
+		kfree(ch->rx_ring);
+		ch->rx_ring = NULL;
 	}
-	if (target->tx_ring) {
+	if (ch->tx_ring) {
 		for (i = 0; i < target->queue_size; ++i)
-			srp_free_iu(target->srp_host, target->tx_ring[i]);
-		kfree(target->tx_ring);
-		target->tx_ring = NULL;
+			srp_free_iu(target->srp_host, ch->tx_ring[i]);
+		kfree(ch->tx_ring);
+		ch->tx_ring = NULL;
 	}
 }
 
 static void srp_path_rec_completion(int status,
 				    struct ib_sa_path_rec *pathrec,
-				    void *target_ptr)
+				    void *ch_ptr)
 {
-	struct srp_target_port *target = target_ptr;
+	struct srp_rdma_ch *ch = ch_ptr;
+	struct srp_target_port *target = ch->target;
 
-	target->status = status;
+	ch->status = status;
 	if (status)
 		shost_printk(KERN_ERR, target->scsi_host,
 			     PFX "Got failed path rec status %d\n", status);
 	else
-		target->path = *pathrec;
-	complete(&target->done);
+		ch->path = *pathrec;
+	complete(&ch->done);
 }
 
-static int srp_lookup_path(struct srp_target_port *target)
+static int srp_lookup_path(struct srp_rdma_ch *ch)
 {
+	struct srp_target_port *target = ch->target;
 	int ret;
 
-	target->path.numb_path = 1;
-
-	init_completion(&target->done);
-
-	target->path_query_id = ib_sa_path_rec_get(&srp_sa_client,
-						   target->srp_host->srp_dev->dev,
-						   target->srp_host->port,
-						   &target->path,
-						   IB_SA_PATH_REC_SERVICE_ID	|
-						   IB_SA_PATH_REC_DGID		|
-						   IB_SA_PATH_REC_SGID		|
-						   IB_SA_PATH_REC_NUMB_PATH	|
-						   IB_SA_PATH_REC_PKEY,
-						   SRP_PATH_REC_TIMEOUT_MS,
-						   GFP_KERNEL,
-						   srp_path_rec_completion,
-						   target, &target->path_query);
-	if (target->path_query_id < 0)
-		return target->path_query_id;
-
-	ret = wait_for_completion_interruptible(&target->done);
+	ch->path.numb_path = 1;
+
+	init_completion(&ch->done);
+
+	ch->path_query_id = ib_sa_path_rec_get(&srp_sa_client,
+					       target->srp_host->srp_dev->dev,
+					       target->srp_host->port,
+					       &ch->path,
+					       IB_SA_PATH_REC_SERVICE_ID |
+					       IB_SA_PATH_REC_DGID	 |
+					       IB_SA_PATH_REC_SGID	 |
+					       IB_SA_PATH_REC_NUMB_PATH	 |
+					       IB_SA_PATH_REC_PKEY,
+					       SRP_PATH_REC_TIMEOUT_MS,
+					       GFP_KERNEL,
+					       srp_path_rec_completion,
+					       ch, &ch->path_query);
+	if (ch->path_query_id < 0)
+		return ch->path_query_id;
+
+	ret = wait_for_completion_interruptible(&ch->done);
 	if (ret < 0)
 		return ret;
 
-	if (target->status < 0)
+	if (ch->status < 0)
 		shost_printk(KERN_WARNING, target->scsi_host,
 			     PFX "Path record query failed\n");
 
-	return target->status;
+	return ch->status;
 }
 
-static int srp_send_req(struct srp_target_port *target)
+static int srp_send_req(struct srp_rdma_ch *ch)
 {
+	struct srp_target_port *target = ch->target;
 	struct {
 		struct ib_cm_req_param param;
 		struct srp_login_req   priv;
@@ -654,11 +660,11 @@ static int srp_send_req(struct srp_target_port *target)
 	if (!req)
 		return -ENOMEM;
 
-	req->param.primary_path 	      = &target->path;
+	req->param.primary_path		      = &ch->path;
 	req->param.alternate_path 	      = NULL;
 	req->param.service_id 		      = target->service_id;
-	req->param.qp_num 		      = target->qp->qp_num;
-	req->param.qp_type 		      = target->qp->qp_type;
+	req->param.qp_num		      = ch->qp->qp_num;
+	req->param.qp_type		      = ch->qp->qp_type;
 	req->param.private_data 	      = &req->priv;
 	req->param.private_data_len 	      = sizeof req->priv;
 	req->param.flow_control 	      = 1;
@@ -722,7 +728,7 @@ static int srp_send_req(struct srp_target_port *target)
 		       &target->srp_host->srp_dev->dev->node_guid, 8);
 	}
 
-	status = ib_send_cm_req(target->cm_id, &req->param);
+	status = ib_send_cm_req(ch->cm_id, &req->param);
 
 	kfree(req);
 
@@ -763,28 +769,31 @@ static bool srp_change_conn_state(struct srp_target_port *target,
 
 static void srp_disconnect_target(struct srp_target_port *target)
 {
+	struct srp_rdma_ch *ch = &target->ch;
+
 	if (srp_change_conn_state(target, false)) {
 		/* XXX should send SRP_I_LOGOUT request */
 
-		if (ib_send_cm_dreq(target->cm_id, NULL, 0)) {
+		if (ib_send_cm_dreq(ch->cm_id, NULL, 0)) {
 			shost_printk(KERN_DEBUG, target->scsi_host,
 				     PFX "Sending CM DREQ failed\n");
 		}
 	}
 }
 
-static void srp_free_req_data(struct srp_target_port *target)
+static void srp_free_req_data(struct srp_target_port *target,
+			      struct srp_rdma_ch *ch)
 {
 	struct srp_device *dev = target->srp_host->srp_dev;
 	struct ib_device *ibdev = dev->dev;
 	struct srp_request *req;
 	int i;
 
-	if (!target->req_ring)
+	if (!ch->req_ring)
 		return;
 
 	for (i = 0; i < target->req_ring_size; ++i) {
-		req = &target->req_ring[i];
+		req = &ch->req_ring[i];
 		if (dev->use_fast_reg)
 			kfree(req->fr_list);
 		else
@@ -798,12 +807,13 @@ static void srp_free_req_data(struct srp_target_port *target)
 		kfree(req->indirect_desc);
 	}
 
-	kfree(target->req_ring);
-	target->req_ring = NULL;
+	kfree(ch->req_ring);
+	ch->req_ring = NULL;
 }
 
-static int srp_alloc_req_data(struct srp_target_port *target)
+static int srp_alloc_req_data(struct srp_rdma_ch *ch)
 {
+	struct srp_target_port *target = ch->target;
 	struct srp_device *srp_dev = target->srp_host->srp_dev;
 	struct ib_device *ibdev = srp_dev->dev;
 	struct srp_request *req;
@@ -811,15 +821,15 @@ static int srp_alloc_req_data(struct srp_target_port *target)
 	dma_addr_t dma_addr;
 	int i, ret = -ENOMEM;
 
-	INIT_LIST_HEAD(&target->free_reqs);
+	INIT_LIST_HEAD(&ch->free_reqs);
 
-	target->req_ring = kzalloc(target->req_ring_size *
-				   sizeof(*target->req_ring), GFP_KERNEL);
-	if (!target->req_ring)
+	ch->req_ring = kcalloc(target->req_ring_size, sizeof(*ch->req_ring),
+			       GFP_KERNEL);
+	if (!ch->req_ring)
 		goto out;
 
 	for (i = 0; i < target->req_ring_size; ++i) {
-		req = &target->req_ring[i];
+		req = &ch->req_ring[i];
 		mr_list = kmalloc(target->cmd_sg_cnt * sizeof(void *),
 				  GFP_KERNEL);
 		if (!mr_list)
@@ -844,7 +854,7 @@ static int srp_alloc_req_data(struct srp_target_port *target)
 
 		req->indirect_dma_addr = dma_addr;
 		req->index = i;
-		list_add_tail(&req->list, &target->free_reqs);
+		list_add_tail(&req->list, &ch->free_reqs);
 	}
 	ret = 0;
 
@@ -869,6 +879,8 @@ static void srp_del_scsi_host_attr(struct Scsi_Host *shost)
 
 static void srp_remove_target(struct srp_target_port *target)
 {
+	struct srp_rdma_ch *ch = &target->ch;
+
 	WARN_ON_ONCE(target->state != SRP_TARGET_REMOVED);
 
 	srp_del_scsi_host_attr(target->scsi_host);
@@ -877,10 +889,10 @@ static void srp_remove_target(struct srp_target_port *target)
 	scsi_remove_host(target->scsi_host);
 	srp_stop_rport_timers(target->rport);
 	srp_disconnect_target(target);
-	srp_free_target_ib(target);
+	srp_free_ch_ib(target, ch);
 	cancel_work_sync(&target->tl_err_work);
 	srp_rport_put(target->rport);
-	srp_free_req_data(target);
+	srp_free_req_data(target, ch);
 
 	spin_lock(&target->srp_host->target_lock);
 	list_del(&target->list);
@@ -906,24 +918,25 @@ static void srp_rport_delete(struct srp_rport *rport)
 	srp_queue_remove_work(target);
 }
 
-static int srp_connect_target(struct srp_target_port *target)
+static int srp_connect_ch(struct srp_rdma_ch *ch)
 {
+	struct srp_target_port *target = ch->target;
 	int ret;
 
 	WARN_ON_ONCE(target->connected);
 
 	target->qp_in_error = false;
 
-	ret = srp_lookup_path(target);
+	ret = srp_lookup_path(ch);
 	if (ret)
 		return ret;
 
 	while (1) {
-		init_completion(&target->done);
-		ret = srp_send_req(target);
+		init_completion(&ch->done);
+		ret = srp_send_req(ch);
 		if (ret)
 			return ret;
-		ret = wait_for_completion_interruptible(&target->done);
+		ret = wait_for_completion_interruptible(&ch->done);
 		if (ret < 0)
 			return ret;
 
@@ -933,13 +946,13 @@ static int srp_connect_target(struct srp_target_port *target)
 		 * back, or SRP_DLID_REDIRECT if we get a lid/qp
 		 * redirect REJ back.
 		 */
-		switch (target->status) {
+		switch (ch->status) {
 		case 0:
 			srp_change_conn_state(target, true);
 			return 0;
 
 		case SRP_PORT_REDIRECT:
-			ret = srp_lookup_path(target);
+			ret = srp_lookup_path(ch);
 			if (ret)
 				return ret;
 			break;
@@ -950,16 +963,16 @@ static int srp_connect_target(struct srp_target_port *target)
 		case SRP_STALE_CONN:
 			shost_printk(KERN_ERR, target->scsi_host, PFX
 				     "giving up on stale connection\n");
-			target->status = -ECONNRESET;
-			return target->status;
+			ch->status = -ECONNRESET;
+			return ch->status;
 
 		default:
-			return target->status;
+			return ch->status;
 		}
 	}
 }
 
-static int srp_inv_rkey(struct srp_target_port *target, u32 rkey)
+static int srp_inv_rkey(struct srp_rdma_ch *ch, u32 rkey)
 {
 	struct ib_send_wr *bad_wr;
 	struct ib_send_wr wr = {
@@ -971,13 +984,14 @@ static int srp_inv_rkey(struct srp_target_port *target, u32 rkey)
 		.ex.invalidate_rkey = rkey,
 	};
 
-	return ib_post_send(target->qp, &wr, &bad_wr);
+	return ib_post_send(ch->qp, &wr, &bad_wr);
 }
 
 static void srp_unmap_data(struct scsi_cmnd *scmnd,
-			   struct srp_target_port *target,
+			   struct srp_rdma_ch *ch,
 			   struct srp_request *req)
 {
+	struct srp_target_port *target = ch->target;
 	struct srp_device *dev = target->srp_host->srp_dev;
 	struct ib_device *ibdev = dev->dev;
 	int i, res;
@@ -991,7 +1005,7 @@ static void srp_unmap_data(struct scsi_cmnd *scmnd,
 		struct srp_fr_desc **pfr;
 
 		for (i = req->nmdesc, pfr = req->fr_list; i > 0; i--, pfr++) {
-			res = srp_inv_rkey(target, (*pfr)->mr->rkey);
+			res = srp_inv_rkey(ch, (*pfr)->mr->rkey);
 			if (res < 0) {
 				shost_printk(KERN_ERR, target->scsi_host, PFX
 				  "Queueing INV WR for rkey %#x failed (%d)\n",
@@ -1001,7 +1015,7 @@ static void srp_unmap_data(struct scsi_cmnd *scmnd,
 			}
 		}
 		if (req->nmdesc)
-			srp_fr_pool_put(target->fr_pool, req->fr_list,
+			srp_fr_pool_put(ch->fr_pool, req->fr_list,
 					req->nmdesc);
 	} else {
 		struct ib_pool_fmr **pfmr;
@@ -1016,7 +1030,7 @@ static void srp_unmap_data(struct scsi_cmnd *scmnd,
 
 /**
  * srp_claim_req - Take ownership of the scmnd associated with a request.
- * @target: SRP target port.
+ * @ch: SRP RDMA channel.
  * @req: SRP request.
  * @sdev: If not NULL, only take ownership for this SCSI device.
  * @scmnd: If NULL, take ownership of @req->scmnd. If not NULL, only take
@@ -1025,14 +1039,14 @@ static void srp_unmap_data(struct scsi_cmnd *scmnd,
  * Return value:
  * Either NULL or a pointer to the SCSI command the caller became owner of.
  */
-static struct scsi_cmnd *srp_claim_req(struct srp_target_port *target,
+static struct scsi_cmnd *srp_claim_req(struct srp_rdma_ch *ch,
 				       struct srp_request *req,
 				       struct scsi_device *sdev,
 				       struct scsi_cmnd *scmnd)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&target->lock, flags);
+	spin_lock_irqsave(&ch->lock, flags);
 	if (req->scmnd &&
 	    (!sdev || req->scmnd->device == sdev) &&
 	    (!scmnd || req->scmnd == scmnd)) {
@@ -1041,40 +1055,38 @@ static struct scsi_cmnd *srp_claim_req(struct srp_target_port *target,
 	} else {
 		scmnd = NULL;
 	}
-	spin_unlock_irqrestore(&target->lock, flags);
+	spin_unlock_irqrestore(&ch->lock, flags);
 
 	return scmnd;
 }
 
 /**
  * srp_free_req() - Unmap data and add request to the free request list.
- * @target: SRP target port.
+ * @ch:     SRP RDMA channel.
  * @req:    Request to be freed.
  * @scmnd:  SCSI command associated with @req.
  * @req_lim_delta: Amount to be added to @target->req_lim.
  */
-static void srp_free_req(struct srp_target_port *target,
-			 struct srp_request *req, struct scsi_cmnd *scmnd,
-			 s32 req_lim_delta)
+static void srp_free_req(struct srp_rdma_ch *ch, struct srp_request *req,
+			 struct scsi_cmnd *scmnd, s32 req_lim_delta)
 {
 	unsigned long flags;
 
-	srp_unmap_data(scmnd, target, req);
+	srp_unmap_data(scmnd, ch, req);
 
-	spin_lock_irqsave(&target->lock, flags);
-	target->req_lim += req_lim_delta;
-	list_add_tail(&req->list, &target->free_reqs);
-	spin_unlock_irqrestore(&target->lock, flags);
+	spin_lock_irqsave(&ch->lock, flags);
+	ch->req_lim += req_lim_delta;
+	list_add_tail(&req->list, &ch->free_reqs);
+	spin_unlock_irqrestore(&ch->lock, flags);
 }
 
-static void srp_finish_req(struct srp_target_port *target,
-			   struct srp_request *req, struct scsi_device *sdev,
-			   int result)
+static void srp_finish_req(struct srp_rdma_ch *ch, struct srp_request *req,
+			   struct scsi_device *sdev, int result)
 {
-	struct scsi_cmnd *scmnd = srp_claim_req(target, req, sdev, NULL);
+	struct scsi_cmnd *scmnd = srp_claim_req(ch, req, sdev, NULL);
 
 	if (scmnd) {
-		srp_free_req(target, req, scmnd, 0);
+		srp_free_req(ch, req, scmnd, 0);
 		scmnd->result = result;
 		scmnd->scsi_done(scmnd);
 	}
@@ -1083,6 +1095,7 @@ static void srp_finish_req(struct srp_target_port *target,
 static void srp_terminate_io(struct srp_rport *rport)
 {
 	struct srp_target_port *target = rport->lld_data;
+	struct srp_rdma_ch *ch = &target->ch;
 	struct Scsi_Host *shost = target->scsi_host;
 	struct scsi_device *sdev;
 	int i;
@@ -1095,8 +1108,9 @@ static void srp_terminate_io(struct srp_rport *rport)
 		WARN_ON_ONCE(sdev->request_queue->request_fn_active);
 
 	for (i = 0; i < target->req_ring_size; ++i) {
-		struct srp_request *req = &target->req_ring[i];
-		srp_finish_req(target, req, NULL, DID_TRANSPORT_FAILFAST << 16);
+		struct srp_request *req = &ch->req_ring[i];
+
+		srp_finish_req(ch, req, NULL, DID_TRANSPORT_FAILFAST << 16);
 	}
 }
 
@@ -1112,6 +1126,7 @@ static void srp_terminate_io(struct srp_rport *rport)
 static int srp_rport_reconnect(struct srp_rport *rport)
 {
 	struct srp_target_port *target = rport->lld_data;
+	struct srp_rdma_ch *ch = &target->ch;
 	int i, ret;
 
 	srp_disconnect_target(target);
@@ -1124,11 +1139,12 @@ static int srp_rport_reconnect(struct srp_rport *rport)
 	 * case things are really fouled up. Doing so also ensures that all CM
 	 * callbacks will have finished before a new QP is allocated.
 	 */
-	ret = srp_new_cm_id(target);
+	ret = srp_new_cm_id(ch);
 
 	for (i = 0; i < target->req_ring_size; ++i) {
-		struct srp_request *req = &target->req_ring[i];
-		srp_finish_req(target, req, NULL, DID_RESET << 16);
+		struct srp_request *req = &ch->req_ring[i];
+
+		srp_finish_req(ch, req, NULL, DID_RESET << 16);
 	}
 
 	/*
@@ -1136,14 +1152,14 @@ static int srp_rport_reconnect(struct srp_rport *rport)
 	 * QP. This guarantees that all callback functions for the old QP have
 	 * finished before any send requests are posted on the new QP.
 	 */
-	ret += srp_create_target_ib(target);
+	ret += srp_create_ch_ib(ch);
 
-	INIT_LIST_HEAD(&target->free_tx);
+	INIT_LIST_HEAD(&ch->free_tx);
 	for (i = 0; i < target->queue_size; ++i)
-		list_add(&target->tx_ring[i]->list, &target->free_tx);
+		list_add(&ch->tx_ring[i]->list, &ch->free_tx);
 
 	if (ret == 0)
-		ret = srp_connect_target(target);
+		ret = srp_connect_ch(ch);
 
 	if (ret == 0)
 		shost_printk(KERN_INFO, target->scsi_host,
@@ -1167,12 +1183,12 @@ static void srp_map_desc(struct srp_map_state *state, dma_addr_t dma_addr,
 }
 
 static int srp_map_finish_fmr(struct srp_map_state *state,
-			      struct srp_target_port *target)
+			      struct srp_rdma_ch *ch)
 {
 	struct ib_pool_fmr *fmr;
 	u64 io_addr = 0;
 
-	fmr = ib_fmr_pool_map_phys(target->fmr_pool, state->pages,
+	fmr = ib_fmr_pool_map_phys(ch->fmr_pool, state->pages,
 				   state->npages, io_addr);
 	if (IS_ERR(fmr))
 		return PTR_ERR(fmr);
@@ -1186,15 +1202,16 @@ static int srp_map_finish_fmr(struct srp_map_state *state,
 }
 
 static int srp_map_finish_fr(struct srp_map_state *state,
-			     struct srp_target_port *target)
+			     struct srp_rdma_ch *ch)
 {
+	struct srp_target_port *target = ch->target;
 	struct srp_device *dev = target->srp_host->srp_dev;
 	struct ib_send_wr *bad_wr;
 	struct ib_send_wr wr;
 	struct srp_fr_desc *desc;
 	u32 rkey;
 
-	desc = srp_fr_pool_get(target->fr_pool);
+	desc = srp_fr_pool_get(ch->fr_pool);
 	if (!desc)
 		return -ENOMEM;
 
@@ -1223,12 +1240,13 @@ static int srp_map_finish_fr(struct srp_map_state *state,
 	srp_map_desc(state, state->base_dma_addr, state->dma_len,
 		     desc->mr->rkey);
 
-	return ib_post_send(target->qp, &wr, &bad_wr);
+	return ib_post_send(ch->qp, &wr, &bad_wr);
 }
 
 static int srp_finish_mapping(struct srp_map_state *state,
-			      struct srp_target_port *target)
+			      struct srp_rdma_ch *ch)
 {
+	struct srp_target_port *target = ch->target;
 	int ret = 0;
 
 	if (state->npages == 0)
@@ -1239,8 +1257,8 @@ static int srp_finish_mapping(struct srp_map_state *state,
 			     target->rkey);
 	else
 		ret = target->srp_host->srp_dev->use_fast_reg ?
-			srp_map_finish_fr(state, target) :
-			srp_map_finish_fmr(state, target);
+			srp_map_finish_fr(state, ch) :
+			srp_map_finish_fmr(state, ch);
 
 	if (ret == 0) {
 		state->npages = 0;
@@ -1260,10 +1278,11 @@ static void srp_map_update_start(struct srp_map_state *state,
 }
 
 static int srp_map_sg_entry(struct srp_map_state *state,
-			    struct srp_target_port *target,
+			    struct srp_rdma_ch *ch,
 			    struct scatterlist *sg, int sg_index,
 			    bool use_mr)
 {
+	struct srp_target_port *target = ch->target;
 	struct srp_device *dev = target->srp_host->srp_dev;
 	struct ib_device *ibdev = dev->dev;
 	dma_addr_t dma_addr = ib_sg_dma_address(ibdev, sg);
@@ -1292,7 +1311,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 	 */
 	if ((!dev->use_fast_reg && dma_addr & ~dev->mr_page_mask) ||
 	    dma_len > dev->mr_max_size) {
-		ret = srp_finish_mapping(state, target);
+		ret = srp_finish_mapping(state, ch);
 		if (ret)
 			return ret;
 
@@ -1313,7 +1332,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 	while (dma_len) {
 		unsigned offset = dma_addr & ~dev->mr_page_mask;
 		if (state->npages == dev->max_pages_per_mr || offset != 0) {
-			ret = srp_finish_mapping(state, target);
+			ret = srp_finish_mapping(state, ch);
 			if (ret)
 				return ret;
 
@@ -1337,17 +1356,18 @@ static int srp_map_sg_entry(struct srp_map_state *state,
 	 */
 	ret = 0;
 	if (len != dev->mr_page_size) {
-		ret = srp_finish_mapping(state, target);
+		ret = srp_finish_mapping(state, ch);
 		if (!ret)
 			srp_map_update_start(state, NULL, 0, 0);
 	}
 	return ret;
 }
 
-static int srp_map_sg(struct srp_map_state *state,
-		      struct srp_target_port *target, struct srp_request *req,
-		      struct scatterlist *scat, int count)
+static int srp_map_sg(struct srp_map_state *state, struct srp_rdma_ch *ch,
+		      struct srp_request *req, struct scatterlist *scat,
+		      int count)
 {
+	struct srp_target_port *target = ch->target;
 	struct srp_device *dev = target->srp_host->srp_dev;
 	struct ib_device *ibdev = dev->dev;
 	struct scatterlist *sg;
@@ -1358,14 +1378,14 @@ static int srp_map_sg(struct srp_map_state *state,
 	state->pages	= req->map_page;
 	if (dev->use_fast_reg) {
 		state->next_fr = req->fr_list;
-		use_mr = !!target->fr_pool;
+		use_mr = !!ch->fr_pool;
 	} else {
 		state->next_fmr = req->fmr_list;
-		use_mr = !!target->fmr_pool;
+		use_mr = !!ch->fmr_pool;
 	}
 
 	for_each_sg(scat, sg, count, i) {
-		if (srp_map_sg_entry(state, target, sg, i, use_mr)) {
+		if (srp_map_sg_entry(state, ch, sg, i, use_mr)) {
 			/*
 			 * Memory registration failed, so backtrack to the
 			 * first unmapped entry and continue on without using
@@ -1387,7 +1407,7 @@ backtrack:
 		}
 	}
 
-	if (use_mr && srp_finish_mapping(state, target))
+	if (use_mr && srp_finish_mapping(state, ch))
 		goto backtrack;
 
 	req->nmdesc = state->nmdesc;
@@ -1395,9 +1415,10 @@ backtrack:
 	return 0;
 }
 
-static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target,
+static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_rdma_ch *ch,
 			struct srp_request *req)
 {
+	struct srp_target_port *target = ch->target;
 	struct scatterlist *scat;
 	struct srp_cmd *cmd = req->cmd->buf;
 	int len, nents, count;
@@ -1459,7 +1480,7 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target,
 				   target->indirect_size, DMA_TO_DEVICE);
 
 	memset(&state, 0, sizeof(state));
-	srp_map_sg(&state, target, req, scat, count);
+	srp_map_sg(&state, ch, req, scat, count);
 
 	/* We've mapped the request, now pull as much of the indirect
 	 * descriptor table as we can into the command buffer. If this
@@ -1520,20 +1541,20 @@ map_complete:
 /*
  * Return an IU and possible credit to the free pool
  */
-static void srp_put_tx_iu(struct srp_target_port *target, struct srp_iu *iu,
+static void srp_put_tx_iu(struct srp_rdma_ch *ch, struct srp_iu *iu,
 			  enum srp_iu_type iu_type)
 {
 	unsigned long flags;
 
-	spin_lock_irqsave(&target->lock, flags);
-	list_add(&iu->list, &target->free_tx);
+	spin_lock_irqsave(&ch->lock, flags);
+	list_add(&iu->list, &ch->free_tx);
 	if (iu_type != SRP_IU_RSP)
-		++target->req_lim;
-	spin_unlock_irqrestore(&target->lock, flags);
+		++ch->req_lim;
+	spin_unlock_irqrestore(&ch->lock, flags);
 }
 
 /*
- * Must be called with target->lock held to protect req_lim and free_tx.
+ * Must be called with ch->lock held to protect req_lim and free_tx.
  * If IU is not sent, it must be returned using srp_put_tx_iu().
  *
  * Note:
@@ -1545,35 +1566,36 @@ static void srp_put_tx_iu(struct srp_target_port *target, struct srp_iu *iu,
  * - SRP_IU_RSP: 1, since a conforming SRP target never sends more than
  *   one unanswered SRP request to an initiator.
  */
-static struct srp_iu *__srp_get_tx_iu(struct srp_target_port *target,
+static struct srp_iu *__srp_get_tx_iu(struct srp_rdma_ch *ch,
 				      enum srp_iu_type iu_type)
 {
+	struct srp_target_port *target = ch->target;
 	s32 rsv = (iu_type == SRP_IU_TSK_MGMT) ? 0 : SRP_TSK_MGMT_SQ_SIZE;
 	struct srp_iu *iu;
 
-	srp_send_completion(target->send_cq, target);
+	srp_send_completion(ch->send_cq, ch);
 
-	if (list_empty(&target->free_tx))
+	if (list_empty(&ch->free_tx))
 		return NULL;
 
 	/* Initiator responses to target requests do not consume credits */
 	if (iu_type != SRP_IU_RSP) {
-		if (target->req_lim <= rsv) {
+		if (ch->req_lim <= rsv) {
 			++target->zero_req_lim;
 			return NULL;
 		}
 
-		--target->req_lim;
+		--ch->req_lim;
 	}
 
-	iu = list_first_entry(&target->free_tx, struct srp_iu, list);
+	iu = list_first_entry(&ch->free_tx, struct srp_iu, list);
 	list_del(&iu->list);
 	return iu;
 }
 
-static int srp_post_send(struct srp_target_port *target,
-			 struct srp_iu *iu, int len)
+static int srp_post_send(struct srp_rdma_ch *ch, struct srp_iu *iu, int len)
 {
+	struct srp_target_port *target = ch->target;
 	struct ib_sge list;
 	struct ib_send_wr wr, *bad_wr;
 
@@ -1588,11 +1610,12 @@ static int srp_post_send(struct srp_target_port *target,
 	wr.opcode     = IB_WR_SEND;
 	wr.send_flags = IB_SEND_SIGNALED;
 
-	return ib_post_send(target->qp, &wr, &bad_wr);
+	return ib_post_send(ch->qp, &wr, &bad_wr);
 }
 
-static int srp_post_recv(struct srp_target_port *target, struct srp_iu *iu)
+static int srp_post_recv(struct srp_rdma_ch *ch, struct srp_iu *iu)
 {
+	struct srp_target_port *target = ch->target;
 	struct ib_recv_wr wr, *bad_wr;
 	struct ib_sge list;
 
@@ -1605,35 +1628,36 @@ static int srp_post_recv(struct srp_target_port *target, struct srp_iu *iu)
 	wr.sg_list  = &list;
 	wr.num_sge  = 1;
 
-	return ib_post_recv(target->qp, &wr, &bad_wr);
+	return ib_post_recv(ch->qp, &wr, &bad_wr);
 }
 
-static void srp_process_rsp(struct srp_target_port *target, struct srp_rsp *rsp)
+static void srp_process_rsp(struct srp_rdma_ch *ch, struct srp_rsp *rsp)
 {
+	struct srp_target_port *target = ch->target;
 	struct srp_request *req;
 	struct scsi_cmnd *scmnd;
 	unsigned long flags;
 
 	if (unlikely(rsp->tag & SRP_TAG_TSK_MGMT)) {
-		spin_lock_irqsave(&target->lock, flags);
-		target->req_lim += be32_to_cpu(rsp->req_lim_delta);
-		spin_unlock_irqrestore(&target->lock, flags);
+		spin_lock_irqsave(&ch->lock, flags);
+		ch->req_lim += be32_to_cpu(rsp->req_lim_delta);
+		spin_unlock_irqrestore(&ch->lock, flags);
 
-		target->tsk_mgmt_status = -1;
+		ch->tsk_mgmt_status = -1;
 		if (be32_to_cpu(rsp->resp_data_len) >= 4)
-			target->tsk_mgmt_status = rsp->data[3];
-		complete(&target->tsk_mgmt_done);
+			ch->tsk_mgmt_status = rsp->data[3];
+		complete(&ch->tsk_mgmt_done);
 	} else {
-		req = &target->req_ring[rsp->tag];
-		scmnd = srp_claim_req(target, req, NULL, NULL);
+		req = &ch->req_ring[rsp->tag];
+		scmnd = srp_claim_req(ch, req, NULL, NULL);
 		if (!scmnd) {
 			shost_printk(KERN_ERR, target->scsi_host,
 				     "Null scmnd for RSP w/tag %016llx\n",
 				     (unsigned long long) rsp->tag);
 
-			spin_lock_irqsave(&target->lock, flags);
-			target->req_lim += be32_to_cpu(rsp->req_lim_delta);
-			spin_unlock_irqrestore(&target->lock, flags);
+			spin_lock_irqsave(&ch->lock, flags);
+			ch->req_lim += be32_to_cpu(rsp->req_lim_delta);
+			spin_unlock_irqrestore(&ch->lock, flags);
 
 			return;
 		}
@@ -1655,7 +1679,7 @@ static void srp_process_rsp(struct srp_target_port *target, struct srp_rsp *rsp)
 		else if (unlikely(rsp->flags & SRP_RSP_FLAG_DOOVER))
 			scsi_set_resid(scmnd, -be32_to_cpu(rsp->data_out_res_cnt));
 
-		srp_free_req(target, req, scmnd,
+		srp_free_req(ch, req, scmnd,
 			     be32_to_cpu(rsp->req_lim_delta));
 
 		scmnd->host_scribble = NULL;
@@ -1663,18 +1687,19 @@ static void srp_process_rsp(struct srp_target_port *target, struct srp_rsp *rsp)
 	}
 }
 
-static int srp_response_common(struct srp_target_port *target, s32 req_delta,
+static int srp_response_common(struct srp_rdma_ch *ch, s32 req_delta,
 			       void *rsp, int len)
 {
+	struct srp_target_port *target = ch->target;
 	struct ib_device *dev = target->srp_host->srp_dev->dev;
 	unsigned long flags;
 	struct srp_iu *iu;
 	int err;
 
-	spin_lock_irqsave(&target->lock, flags);
-	target->req_lim += req_delta;
-	iu = __srp_get_tx_iu(target, SRP_IU_RSP);
-	spin_unlock_irqrestore(&target->lock, flags);
+	spin_lock_irqsave(&ch->lock, flags);
+	ch->req_lim += req_delta;
+	iu = __srp_get_tx_iu(ch, SRP_IU_RSP);
+	spin_unlock_irqrestore(&ch->lock, flags);
 
 	if (!iu) {
 		shost_printk(KERN_ERR, target->scsi_host, PFX
@@ -1686,17 +1711,17 @@ static int srp_response_common(struct srp_target_port *target, s32 req_delta,
 	memcpy(iu->buf, rsp, len);
 	ib_dma_sync_single_for_device(dev, iu->dma, len, DMA_TO_DEVICE);
 
-	err = srp_post_send(target, iu, len);
+	err = srp_post_send(ch, iu, len);
 	if (err) {
 		shost_printk(KERN_ERR, target->scsi_host, PFX
 			     "unable to post response: %d\n", err);
-		srp_put_tx_iu(target, iu, SRP_IU_RSP);
+		srp_put_tx_iu(ch, iu, SRP_IU_RSP);
 	}
 
 	return err;
 }
 
-static void srp_process_cred_req(struct srp_target_port *target,
+static void srp_process_cred_req(struct srp_rdma_ch *ch,
 				 struct srp_cred_req *req)
 {
 	struct srp_cred_rsp rsp = {
@@ -1705,14 +1730,15 @@ static void srp_process_cred_req(struct srp_target_port *target,
 	};
 	s32 delta = be32_to_cpu(req->req_lim_delta);
 
-	if (srp_response_common(target, delta, &rsp, sizeof rsp))
-		shost_printk(KERN_ERR, target->scsi_host, PFX
+	if (srp_response_common(ch, delta, &rsp, sizeof(rsp)))
+		shost_printk(KERN_ERR, ch->target->scsi_host, PFX
 			     "problems processing SRP_CRED_REQ\n");
 }
 
-static void srp_process_aer_req(struct srp_target_port *target,
+static void srp_process_aer_req(struct srp_rdma_ch *ch,
 				struct srp_aer_req *req)
 {
+	struct srp_target_port *target = ch->target;
 	struct srp_aer_rsp rsp = {
 		.opcode = SRP_AER_RSP,
 		.tag = req->tag,
@@ -1722,19 +1748,20 @@ static void srp_process_aer_req(struct srp_target_port *target,
 	shost_printk(KERN_ERR, target->scsi_host, PFX
 		     "ignoring AER for LUN %llu\n", be64_to_cpu(req->lun));
 
-	if (srp_response_common(target, delta, &rsp, sizeof rsp))
+	if (srp_response_common(ch, delta, &rsp, sizeof(rsp)))
 		shost_printk(KERN_ERR, target->scsi_host, PFX
 			     "problems processing SRP_AER_REQ\n");
 }
 
-static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc)
+static void srp_handle_recv(struct srp_rdma_ch *ch, struct ib_wc *wc)
 {
+	struct srp_target_port *target = ch->target;
 	struct ib_device *dev = target->srp_host->srp_dev->dev;
 	struct srp_iu *iu = (struct srp_iu *) (uintptr_t) wc->wr_id;
 	int res;
 	u8 opcode;
 
-	ib_dma_sync_single_for_cpu(dev, iu->dma, target->max_ti_iu_len,
+	ib_dma_sync_single_for_cpu(dev, iu->dma, ch->max_ti_iu_len,
 				   DMA_FROM_DEVICE);
 
 	opcode = *(u8 *) iu->buf;
@@ -1748,15 +1775,15 @@ static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc)
 
 	switch (opcode) {
 	case SRP_RSP:
-		srp_process_rsp(target, iu->buf);
+		srp_process_rsp(ch, iu->buf);
 		break;
 
 	case SRP_CRED_REQ:
-		srp_process_cred_req(target, iu->buf);
+		srp_process_cred_req(ch, iu->buf);
 		break;
 
 	case SRP_AER_REQ:
-		srp_process_aer_req(target, iu->buf);
+		srp_process_aer_req(ch, iu->buf);
 		break;
 
 	case SRP_T_LOGOUT:
@@ -1771,10 +1798,10 @@ static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc)
 		break;
 	}
 
-	ib_dma_sync_single_for_device(dev, iu->dma, target->max_ti_iu_len,
+	ib_dma_sync_single_for_device(dev, iu->dma, ch->max_ti_iu_len,
 				      DMA_FROM_DEVICE);
 
-	res = srp_post_recv(target, iu);
+	res = srp_post_recv(ch, iu);
 	if (res != 0)
 		shost_printk(KERN_ERR, target->scsi_host,
 			     PFX "Recv failed with error code %d\n", res);
@@ -1819,33 +1846,35 @@ static void srp_handle_qp_err(u64 wr_id, enum ib_wc_status wc_status,
 	target->qp_in_error = true;
 }
 
-static void srp_recv_completion(struct ib_cq *cq, void *target_ptr)
+static void srp_recv_completion(struct ib_cq *cq, void *ch_ptr)
 {
-	struct srp_target_port *target = target_ptr;
+	struct srp_rdma_ch *ch = ch_ptr;
 	struct ib_wc wc;
 
 	ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
 	while (ib_poll_cq(cq, 1, &wc) > 0) {
 		if (likely(wc.status == IB_WC_SUCCESS)) {
-			srp_handle_recv(target, &wc);
+			srp_handle_recv(ch, &wc);
 		} else {
-			srp_handle_qp_err(wc.wr_id, wc.status, false, target);
+			srp_handle_qp_err(wc.wr_id, wc.status, false,
+					  ch->target);
 		}
 	}
 }
 
-static void srp_send_completion(struct ib_cq *cq, void *target_ptr)
+static void srp_send_completion(struct ib_cq *cq, void *ch_ptr)
 {
-	struct srp_target_port *target = target_ptr;
+	struct srp_rdma_ch *ch = ch_ptr;
 	struct ib_wc wc;
 	struct srp_iu *iu;
 
 	while (ib_poll_cq(cq, 1, &wc) > 0) {
 		if (likely(wc.status == IB_WC_SUCCESS)) {
 			iu = (struct srp_iu *) (uintptr_t) wc.wr_id;
-			list_add(&iu->list, &target->free_tx);
+			list_add(&iu->list, &ch->free_tx);
 		} else {
-			srp_handle_qp_err(wc.wr_id, wc.status, true, target);
+			srp_handle_qp_err(wc.wr_id, wc.status, true,
+					  ch->target);
 		}
 	}
 }
@@ -1854,6 +1883,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 {
 	struct srp_target_port *target = host_to_target(shost);
 	struct srp_rport *rport = target->rport;
+	struct srp_rdma_ch *ch;
 	struct srp_request *req;
 	struct srp_iu *iu;
 	struct srp_cmd *cmd;
@@ -1875,14 +1905,16 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 	if (unlikely(scmnd->result))
 		goto err;
 
-	spin_lock_irqsave(&target->lock, flags);
-	iu = __srp_get_tx_iu(target, SRP_IU_CMD);
+	ch = &target->ch;
+
+	spin_lock_irqsave(&ch->lock, flags);
+	iu = __srp_get_tx_iu(ch, SRP_IU_CMD);
 	if (!iu)
 		goto err_unlock;
 
-	req = list_first_entry(&target->free_reqs, struct srp_request, list);
+	req = list_first_entry(&ch->free_reqs, struct srp_request, list);
 	list_del(&req->list);
-	spin_unlock_irqrestore(&target->lock, flags);
+	spin_unlock_irqrestore(&ch->lock, flags);
 
 	dev = target->srp_host->srp_dev->dev;
 	ib_dma_sync_single_for_cpu(dev, iu->dma, target->max_iu_len,
@@ -1901,7 +1933,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 	req->scmnd    = scmnd;
 	req->cmd      = iu;
 
-	len = srp_map_data(scmnd, target, req);
+	len = srp_map_data(scmnd, ch, req);
 	if (len < 0) {
 		shost_printk(KERN_ERR, target->scsi_host,
 			     PFX "Failed to map data (%d)\n", len);
@@ -1919,7 +1951,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 	ib_dma_sync_single_for_device(dev, iu->dma, target->max_iu_len,
 				      DMA_TO_DEVICE);
 
-	if (srp_post_send(target, iu, len)) {
+	if (srp_post_send(ch, iu, len)) {
 		shost_printk(KERN_ERR, target->scsi_host, PFX "Send failed\n");
 		goto err_unmap;
 	}
@@ -1933,10 +1965,10 @@ unlock_rport:
 	return ret;
 
 err_unmap:
-	srp_unmap_data(scmnd, target, req);
+	srp_unmap_data(scmnd, ch, req);
 
 err_iu:
-	srp_put_tx_iu(target, iu, SRP_IU_CMD);
+	srp_put_tx_iu(ch, iu, SRP_IU_CMD);
 
 	/*
 	 * Avoid that the loops that iterate over the request ring can
@@ -1944,11 +1976,11 @@ err_iu:
 	 */
 	req->scmnd = NULL;
 
-	spin_lock_irqsave(&target->lock, flags);
-	list_add(&req->list, &target->free_reqs);
+	spin_lock_irqsave(&ch->lock, flags);
+	list_add(&req->list, &ch->free_reqs);
 
 err_unlock:
-	spin_unlock_irqrestore(&target->lock, flags);
+	spin_unlock_irqrestore(&ch->lock, flags);
 
 err:
 	if (scmnd->result) {
@@ -1963,53 +1995,54 @@ err:
 
 /*
  * Note: the resources allocated in this function are freed in
- * srp_free_target_ib().
+ * srp_free_ch_ib().
  */
-static int srp_alloc_iu_bufs(struct srp_target_port *target)
+static int srp_alloc_iu_bufs(struct srp_rdma_ch *ch)
 {
+	struct srp_target_port *target = ch->target;
 	int i;
 
-	target->rx_ring = kzalloc(target->queue_size * sizeof(*target->rx_ring),
-				  GFP_KERNEL);
-	if (!target->rx_ring)
+	ch->rx_ring = kcalloc(target->queue_size, sizeof(*ch->rx_ring),
+			      GFP_KERNEL);
+	if (!ch->rx_ring)
 		goto err_no_ring;
-	target->tx_ring = kzalloc(target->queue_size * sizeof(*target->tx_ring),
-				  GFP_KERNEL);
-	if (!target->tx_ring)
+	ch->tx_ring = kcalloc(target->queue_size, sizeof(*ch->tx_ring),
+			      GFP_KERNEL);
+	if (!ch->tx_ring)
 		goto err_no_ring;
 
 	for (i = 0; i < target->queue_size; ++i) {
-		target->rx_ring[i] = srp_alloc_iu(target->srp_host,
-						  target->max_ti_iu_len,
-						  GFP_KERNEL, DMA_FROM_DEVICE);
-		if (!target->rx_ring[i])
+		ch->rx_ring[i] = srp_alloc_iu(target->srp_host,
+					      ch->max_ti_iu_len,
+					      GFP_KERNEL, DMA_FROM_DEVICE);
+		if (!ch->rx_ring[i])
 			goto err;
 	}
 
 	for (i = 0; i < target->queue_size; ++i) {
-		target->tx_ring[i] = srp_alloc_iu(target->srp_host,
-						  target->max_iu_len,
-						  GFP_KERNEL, DMA_TO_DEVICE);
-		if (!target->tx_ring[i])
+		ch->tx_ring[i] = srp_alloc_iu(target->srp_host,
+					      target->max_iu_len,
+					      GFP_KERNEL, DMA_TO_DEVICE);
+		if (!ch->tx_ring[i])
 			goto err;
 
-		list_add(&target->tx_ring[i]->list, &target->free_tx);
+		list_add(&ch->tx_ring[i]->list, &ch->free_tx);
 	}
 
 	return 0;
 
 err:
 	for (i = 0; i < target->queue_size; ++i) {
-		srp_free_iu(target->srp_host, target->rx_ring[i]);
-		srp_free_iu(target->srp_host, target->tx_ring[i]);
+		srp_free_iu(target->srp_host, ch->rx_ring[i]);
+		srp_free_iu(target->srp_host, ch->tx_ring[i]);
 	}
 
 
 err_no_ring:
-	kfree(target->tx_ring);
-	target->tx_ring = NULL;
-	kfree(target->rx_ring);
-	target->rx_ring = NULL;
+	kfree(ch->tx_ring);
+	ch->tx_ring = NULL;
+	kfree(ch->rx_ring);
+	ch->rx_ring = NULL;
 
 	return -ENOMEM;
 }
@@ -2043,23 +2076,24 @@ static uint32_t srp_compute_rq_tmo(struct ib_qp_attr *qp_attr, int attr_mask)
 
 static void srp_cm_rep_handler(struct ib_cm_id *cm_id,
 			       struct srp_login_rsp *lrsp,
-			       struct srp_target_port *target)
+			       struct srp_rdma_ch *ch)
 {
+	struct srp_target_port *target = ch->target;
 	struct ib_qp_attr *qp_attr = NULL;
 	int attr_mask = 0;
 	int ret;
 	int i;
 
 	if (lrsp->opcode == SRP_LOGIN_RSP) {
-		target->max_ti_iu_len = be32_to_cpu(lrsp->max_ti_iu_len);
-		target->req_lim       = be32_to_cpu(lrsp->req_lim_delta);
+		ch->max_ti_iu_len = be32_to_cpu(lrsp->max_ti_iu_len);
+		ch->req_lim       = be32_to_cpu(lrsp->req_lim_delta);
 
 		/*
 		 * Reserve credits for task management so we don't
 		 * bounce requests back to the SCSI mid-layer.
 		 */
 		target->scsi_host->can_queue
-			= min(target->req_lim - SRP_TSK_MGMT_SQ_SIZE,
+			= min(ch->req_lim - SRP_TSK_MGMT_SQ_SIZE,
 			      target->scsi_host->can_queue);
 		target->scsi_host->cmd_per_lun
 			= min_t(int, target->scsi_host->can_queue,
@@ -2071,8 +2105,8 @@ static void srp_cm_rep_handler(struct ib_cm_id *cm_id,
 		goto error;
 	}
 
-	if (!target->rx_ring) {
-		ret = srp_alloc_iu_bufs(target);
+	if (!ch->rx_ring) {
+		ret = srp_alloc_iu_bufs(ch);
 		if (ret)
 			goto error;
 	}
@@ -2087,13 +2121,14 @@ static void srp_cm_rep_handler(struct ib_cm_id *cm_id,
 	if (ret)
 		goto error_free;
 
-	ret = ib_modify_qp(target->qp, qp_attr, attr_mask);
+	ret = ib_modify_qp(ch->qp, qp_attr, attr_mask);
 	if (ret)
 		goto error_free;
 
 	for (i = 0; i < target->queue_size; i++) {
-		struct srp_iu *iu = target->rx_ring[i];
-		ret = srp_post_recv(target, iu);
+		struct srp_iu *iu = ch->rx_ring[i];
+
+		ret = srp_post_recv(ch, iu);
 		if (ret)
 			goto error_free;
 	}
@@ -2105,7 +2140,7 @@ static void srp_cm_rep_handler(struct ib_cm_id *cm_id,
 
 	target->rq_tmo_jiffies = srp_compute_rq_tmo(qp_attr, attr_mask);
 
-	ret = ib_modify_qp(target->qp, qp_attr, attr_mask);
+	ret = ib_modify_qp(ch->qp, qp_attr, attr_mask);
 	if (ret)
 		goto error_free;
 
@@ -2115,13 +2150,14 @@ error_free:
 	kfree(qp_attr);
 
 error:
-	target->status = ret;
+	ch->status = ret;
 }
 
 static void srp_cm_rej_handler(struct ib_cm_id *cm_id,
 			       struct ib_cm_event *event,
-			       struct srp_target_port *target)
+			       struct srp_rdma_ch *ch)
 {
+	struct srp_target_port *target = ch->target;
 	struct Scsi_Host *shost = target->scsi_host;
 	struct ib_class_port_info *cpi;
 	int opcode;
@@ -2129,12 +2165,12 @@ static void srp_cm_rej_handler(struct ib_cm_id *cm_id,
 	switch (event->param.rej_rcvd.reason) {
 	case IB_CM_REJ_PORT_CM_REDIRECT:
 		cpi = event->param.rej_rcvd.ari;
-		target->path.dlid = cpi->redirect_lid;
-		target->path.pkey = cpi->redirect_pkey;
+		ch->path.dlid = cpi->redirect_lid;
+		ch->path.pkey = cpi->redirect_pkey;
 		cm_id->remote_cm_qpn = be32_to_cpu(cpi->redirect_qp) & 0x00ffffff;
-		memcpy(target->path.dgid.raw, cpi->redirect_gid, 16);
+		memcpy(ch->path.dgid.raw, cpi->redirect_gid, 16);
 
-		target->status = target->path.dlid ?
+		ch->status = ch->path.dlid ?
 			SRP_DLID_REDIRECT : SRP_PORT_REDIRECT;
 		break;
 
@@ -2145,26 +2181,26 @@ static void srp_cm_rej_handler(struct ib_cm_id *cm_id,
 			 * reject reason code 25 when they mean 24
 			 * (port redirect).
 			 */
-			memcpy(target->path.dgid.raw,
+			memcpy(ch->path.dgid.raw,
 			       event->param.rej_rcvd.ari, 16);
 
 			shost_printk(KERN_DEBUG, shost,
 				     PFX "Topspin/Cisco redirect to target port GID %016llx%016llx\n",
-				     (unsigned long long) be64_to_cpu(target->path.dgid.global.subnet_prefix),
-				     (unsigned long long) be64_to_cpu(target->path.dgid.global.interface_id));
+				     be64_to_cpu(ch->path.dgid.global.subnet_prefix),
+				     be64_to_cpu(ch->path.dgid.global.interface_id));
 
-			target->status = SRP_PORT_REDIRECT;
+			ch->status = SRP_PORT_REDIRECT;
 		} else {
 			shost_printk(KERN_WARNING, shost,
 				     "  REJ reason: IB_CM_REJ_PORT_REDIRECT\n");
-			target->status = -ECONNRESET;
+			ch->status = -ECONNRESET;
 		}
 		break;
 
 	case IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID:
 		shost_printk(KERN_WARNING, shost,
 			    "  REJ reason: IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID\n");
-		target->status = -ECONNRESET;
+		ch->status = -ECONNRESET;
 		break;
 
 	case IB_CM_REJ_CONSUMER_DEFINED:
@@ -2185,24 +2221,25 @@ static void srp_cm_rej_handler(struct ib_cm_id *cm_id,
 			shost_printk(KERN_WARNING, shost,
 				     "  REJ reason: IB_CM_REJ_CONSUMER_DEFINED,"
 				     " opcode 0x%02x\n", opcode);
-		target->status = -ECONNRESET;
+		ch->status = -ECONNRESET;
 		break;
 
 	case IB_CM_REJ_STALE_CONN:
 		shost_printk(KERN_WARNING, shost, "  REJ reason: stale connection\n");
-		target->status = SRP_STALE_CONN;
+		ch->status = SRP_STALE_CONN;
 		break;
 
 	default:
 		shost_printk(KERN_WARNING, shost, "  REJ reason 0x%x\n",
 			     event->param.rej_rcvd.reason);
-		target->status = -ECONNRESET;
+		ch->status = -ECONNRESET;
 	}
 }
 
 static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
 {
-	struct srp_target_port *target = cm_id->context;
+	struct srp_rdma_ch *ch = cm_id->context;
+	struct srp_target_port *target = ch->target;
 	int comp = 0;
 
 	switch (event->event) {
@@ -2210,19 +2247,19 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
 		shost_printk(KERN_DEBUG, target->scsi_host,
 			     PFX "Sending CM REQ failed\n");
 		comp = 1;
-		target->status = -ECONNRESET;
+		ch->status = -ECONNRESET;
 		break;
 
 	case IB_CM_REP_RECEIVED:
 		comp = 1;
-		srp_cm_rep_handler(cm_id, event->private_data, target);
+		srp_cm_rep_handler(cm_id, event->private_data, ch);
 		break;
 
 	case IB_CM_REJ_RECEIVED:
 		shost_printk(KERN_DEBUG, target->scsi_host, PFX "REJ received\n");
 		comp = 1;
 
-		srp_cm_rej_handler(cm_id, event, target);
+		srp_cm_rej_handler(cm_id, event, ch);
 		break;
 
 	case IB_CM_DREQ_RECEIVED:
@@ -2240,7 +2277,7 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
 			     PFX "connection closed\n");
 		comp = 1;
 
-		target->status = 0;
+		ch->status = 0;
 		break;
 
 	case IB_CM_MRA_RECEIVED:
@@ -2255,7 +2292,7 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
 	}
 
 	if (comp)
-		complete(&target->done);
+		complete(&ch->done);
 
 	return 0;
 }
@@ -2311,9 +2348,10 @@ srp_change_queue_depth(struct scsi_device *sdev, int qdepth, int reason)
 	return sdev->queue_depth;
 }
 
-static int srp_send_tsk_mgmt(struct srp_target_port *target,
-			     u64 req_tag, unsigned int lun, u8 func)
+static int srp_send_tsk_mgmt(struct srp_rdma_ch *ch, u64 req_tag,
+			     unsigned int lun, u8 func)
 {
+	struct srp_target_port *target = ch->target;
 	struct srp_rport *rport = target->rport;
 	struct ib_device *dev = target->srp_host->srp_dev->dev;
 	struct srp_iu *iu;
@@ -2322,16 +2360,16 @@ static int srp_send_tsk_mgmt(struct srp_target_port *target,
 	if (!target->connected || target->qp_in_error)
 		return -1;
 
-	init_completion(&target->tsk_mgmt_done);
+	init_completion(&ch->tsk_mgmt_done);
 
 	/*
-	 * Lock the rport mutex to avoid that srp_create_target_ib() is
+	 * Lock the rport mutex to avoid that srp_create_ch_ib() is
 	 * invoked while a task management function is being sent.
 	 */
 	mutex_lock(&rport->mutex);
-	spin_lock_irq(&target->lock);
-	iu = __srp_get_tx_iu(target, SRP_IU_TSK_MGMT);
-	spin_unlock_irq(&target->lock);
+	spin_lock_irq(&ch->lock);
+	iu = __srp_get_tx_iu(ch, SRP_IU_TSK_MGMT);
+	spin_unlock_irq(&ch->lock);
 
 	if (!iu) {
 		mutex_unlock(&rport->mutex);
@@ -2352,15 +2390,15 @@ static int srp_send_tsk_mgmt(struct srp_target_port *target,
 
 	ib_dma_sync_single_for_device(dev, iu->dma, sizeof *tsk_mgmt,
 				      DMA_TO_DEVICE);
-	if (srp_post_send(target, iu, sizeof *tsk_mgmt)) {
-		srp_put_tx_iu(target, iu, SRP_IU_TSK_MGMT);
+	if (srp_post_send(ch, iu, sizeof(*tsk_mgmt))) {
+		srp_put_tx_iu(ch, iu, SRP_IU_TSK_MGMT);
 		mutex_unlock(&rport->mutex);
 
 		return -1;
 	}
 	mutex_unlock(&rport->mutex);
 
-	if (!wait_for_completion_timeout(&target->tsk_mgmt_done,
+	if (!wait_for_completion_timeout(&ch->tsk_mgmt_done,
 					 msecs_to_jiffies(SRP_ABORT_TIMEOUT_MS)))
 		return -1;
 
@@ -2371,20 +2409,22 @@ static int srp_abort(struct scsi_cmnd *scmnd)
 {
 	struct srp_target_port *target = host_to_target(scmnd->device->host);
 	struct srp_request *req = (struct srp_request *) scmnd->host_scribble;
+	struct srp_rdma_ch *ch;
 	int ret;
 
 	shost_printk(KERN_ERR, target->scsi_host, "SRP abort called\n");
 
-	if (!req || !srp_claim_req(target, req, NULL, scmnd))
+	ch = &target->ch;
+	if (!req || !srp_claim_req(ch, req, NULL, scmnd))
 		return SUCCESS;
-	if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
+	if (srp_send_tsk_mgmt(ch, req->index, scmnd->device->lun,
 			      SRP_TSK_ABORT_TASK) == 0)
 		ret = SUCCESS;
 	else if (target->rport->state == SRP_RPORT_LOST)
 		ret = FAST_IO_FAIL;
 	else
 		ret = FAILED;
-	srp_free_req(target, req, scmnd, 0);
+	srp_free_req(ch, req, scmnd, 0);
 	scmnd->result = DID_ABORT << 16;
 	scmnd->scsi_done(scmnd);
 
@@ -2394,19 +2434,21 @@ static int srp_abort(struct scsi_cmnd *scmnd)
 static int srp_reset_device(struct scsi_cmnd *scmnd)
 {
 	struct srp_target_port *target = host_to_target(scmnd->device->host);
+	struct srp_rdma_ch *ch = &target->ch;
 	int i;
 
 	shost_printk(KERN_ERR, target->scsi_host, "SRP reset_device called\n");
 
-	if (srp_send_tsk_mgmt(target, SRP_TAG_NO_REQ, scmnd->device->lun,
+	if (srp_send_tsk_mgmt(ch, SRP_TAG_NO_REQ, scmnd->device->lun,
 			      SRP_TSK_LUN_RESET))
 		return FAILED;
-	if (target->tsk_mgmt_status)
+	if (ch->tsk_mgmt_status)
 		return FAILED;
 
 	for (i = 0; i < target->req_ring_size; ++i) {
-		struct srp_request *req = &target->req_ring[i];
-		srp_finish_req(target, req, scmnd->device, DID_RESET << 16);
+		struct srp_request *req = &ch->req_ring[i];
+
+		srp_finish_req(ch, req, scmnd->device, DID_RESET << 16);
 	}
 
 	return SUCCESS;
@@ -2483,8 +2525,9 @@ static ssize_t show_dgid(struct device *dev, struct device_attribute *attr,
 			 char *buf)
 {
 	struct srp_target_port *target = host_to_target(class_to_shost(dev));
+	struct srp_rdma_ch *ch = &target->ch;
 
-	return sprintf(buf, "%pI6\n", target->path.dgid.raw);
+	return sprintf(buf, "%pI6\n", ch->path.dgid.raw);
 }
 
 static ssize_t show_orig_dgid(struct device *dev,
@@ -2500,7 +2543,7 @@ static ssize_t show_req_lim(struct device *dev,
 {
 	struct srp_target_port *target = host_to_target(class_to_shost(dev));
 
-	return sprintf(buf, "%d\n", target->req_lim);
+	return sprintf(buf, "%d\n", target->ch.req_lim);
 }
 
 static ssize_t show_zero_req_lim(struct device *dev,
@@ -2992,6 +3035,7 @@ static ssize_t srp_create_target(struct device *dev,
 		container_of(dev, struct srp_host, dev);
 	struct Scsi_Host *target_host;
 	struct srp_target_port *target;
+	struct srp_rdma_ch *ch;
 	struct srp_device *srp_dev = host->srp_dev;
 	struct ib_device *ibdev = srp_dev->dev;
 	int ret;
@@ -3060,8 +3104,12 @@ static ssize_t srp_create_target(struct device *dev,
 	INIT_WORK(&target->tl_err_work, srp_tl_err_work);
 	INIT_WORK(&target->remove_work, srp_remove_work);
 	spin_lock_init(&target->lock);
-	INIT_LIST_HEAD(&target->free_tx);
-	ret = srp_alloc_req_data(target);
+	ch = &target->ch;
+	ch->target = target;
+	ch->comp_vector = target->comp_vector;
+	spin_lock_init(&ch->lock);
+	INIT_LIST_HEAD(&ch->free_tx);
+	ret = srp_alloc_req_data(ch);
 	if (ret)
 		goto err_free_mem;
 
@@ -3069,15 +3117,15 @@ static ssize_t srp_create_target(struct device *dev,
 	if (ret)
 		goto err_free_mem;
 
-	ret = srp_create_target_ib(target);
+	ret = srp_create_ch_ib(ch);
 	if (ret)
 		goto err_free_mem;
 
-	ret = srp_new_cm_id(target);
+	ret = srp_new_cm_id(ch);
 	if (ret)
 		goto err_free_ib;
 
-	ret = srp_connect_target(target);
+	ret = srp_connect_ch(ch);
 	if (ret) {
 		shost_printk(KERN_ERR, target->scsi_host,
 			     PFX "Connection failed\n");
@@ -3111,10 +3159,10 @@ err_disconnect:
 	srp_disconnect_target(target);
 
 err_free_ib:
-	srp_free_target_ib(target);
+	srp_free_ch_ib(target, ch);
 
 err_free_mem:
-	srp_free_req_data(target);
+	srp_free_req_data(target, ch);
 
 err:
 	scsi_host_put(target_host);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index 8635ab6..74530d9 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -130,7 +130,11 @@ struct srp_request {
 	short			index;
 };
 
-struct srp_target_port {
+/**
+ * struct srp_rdma_ch
+ * @comp_vector: Completion vector used by this RDMA channel.
+ */
+struct srp_rdma_ch {
 	/* These are RW in the hot path, and commonly used together */
 	struct list_head	free_tx;
 	struct list_head	free_reqs;
@@ -138,13 +142,48 @@ struct srp_target_port {
 	s32			req_lim;
 
 	/* These are read-only in the hot path */
-	struct ib_cq	       *send_cq ____cacheline_aligned_in_smp;
+	struct srp_target_port *target ____cacheline_aligned_in_smp;
+	struct ib_cq	       *send_cq;
 	struct ib_cq	       *recv_cq;
 	struct ib_qp	       *qp;
 	union {
 		struct ib_fmr_pool     *fmr_pool;
 		struct srp_fr_pool     *fr_pool;
 	};
+
+	/* Everything above this point is used in the hot path of
+	 * command processing. Try to keep them packed into cachelines.
+	 */
+
+	struct completion	done;
+	int			status;
+
+	struct ib_sa_path_rec	path;
+	struct ib_sa_query     *path_query;
+	int			path_query_id;
+
+	struct ib_cm_id	       *cm_id;
+	struct srp_iu	      **tx_ring;
+	struct srp_iu	      **rx_ring;
+	struct srp_request     *req_ring;
+	int			max_ti_iu_len;
+	int			comp_vector;
+
+	struct completion	tsk_mgmt_done;
+	u8			tsk_mgmt_status;
+};
+
+/**
+ * struct srp_target_port
+ * @comp_vector: Completion vector used by the first RDMA channel created for
+ *   this target port.
+ */
+struct srp_target_port {
+	/* read and written in the hot path */
+	spinlock_t		lock;
+
+	struct srp_rdma_ch	ch;
+	/* read only in the hot path */
 	u32			lkey;
 	u32			rkey;
 	enum srp_target_state	state;
@@ -153,10 +192,7 @@ struct srp_target_port {
 	unsigned int		indirect_size;
 	bool			allow_ext_sg;
 
-	/* Everything above this point is used in the hot path of
-	 * command processing. Try to keep them packed into cachelines.
-	 */
-
+	/* other member variables */
 	union ib_gid		sgid;
 	__be64			id_ext;
 	__be64			ioc_guid;
@@ -176,33 +212,17 @@ struct srp_target_port {
 
 	union ib_gid		orig_dgid;
 	__be16			pkey;
-	struct ib_sa_path_rec	path;
-	struct ib_sa_query     *path_query;
-	int			path_query_id;
 
 	u32			rq_tmo_jiffies;
 	bool			connected;
 
-	struct ib_cm_id	       *cm_id;
-
-	int			max_ti_iu_len;
-
 	int			zero_req_lim;
 
-	struct srp_iu	       **tx_ring;
-	struct srp_iu	       **rx_ring;
-	struct srp_request	*req_ring;
-
 	struct work_struct	tl_err_work;
 	struct work_struct	remove_work;
 
 	struct list_head	list;
-	struct completion	done;
-	int			status;
 	bool			qp_in_error;
-
-	struct completion	tsk_mgmt_done;
-	u8			tsk_mgmt_status;
 };
 
 struct srp_iu {
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v2 10/12] IB/srp: Use block layer tags
  2014-10-07 13:01 [PATCH v2 0/12] IB/srp: Add multichannel support Bart Van Assche
                   ` (4 preceding siblings ...)
  2014-10-07 13:05 ` [PATCH v2 08/12] IB/srp: Introduce two new srp_target_port member variables Bart Van Assche
@ 2014-10-07 13:06 ` Bart Van Assche
       [not found]   ` <5433E557.3010505-HInyCGIudOg@public.gmane.org>
  2014-10-07 13:07 ` [PATCH v2 12/12] IB/srp: Add multichannel support Bart Van Assche
  2014-10-08 13:16 ` [PATCH] blk-mq: Use all available hardware queues Bart Van Assche
  7 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-07 13:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi, linux-rdma

Since the block layer already contains functionality to assign
a tag to each request, use that functionality instead of
reimplementing that functionality in the SRP initiator driver.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 30 +++++++++++++++++++++++++-----
 drivers/infiniband/ulp/srp/ib_srp.h |  1 -
 2 files changed, 25 insertions(+), 6 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index cc0bf83b..224ef25 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -853,7 +853,6 @@ static int srp_alloc_req_data(struct srp_rdma_ch *ch)
 			goto out;
 
 		req->indirect_dma_addr = dma_addr;
-		req->index = i;
 		list_add_tail(&req->list, &ch->free_reqs);
 	}
 	ret = 0;
@@ -1648,8 +1647,11 @@ static void srp_process_rsp(struct srp_rdma_ch *ch, struct srp_rsp *rsp)
 			ch->tsk_mgmt_status = rsp->data[3];
 		complete(&ch->tsk_mgmt_done);
 	} else {
-		req = &ch->req_ring[rsp->tag];
-		scmnd = srp_claim_req(ch, req, NULL, NULL);
+		scmnd = scsi_host_find_tag(target->scsi_host, rsp->tag);
+		if (scmnd) {
+			req = (void *)scmnd->host_scribble;
+			scmnd = srp_claim_req(ch, req, NULL, scmnd);
+		}
 		if (!scmnd) {
 			shost_printk(KERN_ERR, target->scsi_host,
 				     "Null scmnd for RSP w/tag %016llx\n",
@@ -1889,6 +1891,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 	struct srp_cmd *cmd;
 	struct ib_device *dev;
 	unsigned long flags;
+	u32 tag;
 	int len, ret;
 	const bool in_scsi_eh = !in_interrupt() && current == shost->ehandler;
 
@@ -1905,6 +1908,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 	if (unlikely(scmnd->result))
 		goto err;
 
+	tag = blk_mq_unique_tag(scmnd->request);
 	ch = &target->ch;
 
 	spin_lock_irqsave(&ch->lock, flags);
@@ -1927,7 +1931,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 
 	cmd->opcode = SRP_CMD;
 	cmd->lun    = cpu_to_be64((u64) scmnd->device->lun << 48);
-	cmd->tag    = req->index;
+	cmd->tag    = tag;
 	memcpy(cmd->cdb, scmnd->cmnd, scmnd->cmd_len);
 
 	req->scmnd    = scmnd;
@@ -2409,6 +2413,7 @@ static int srp_abort(struct scsi_cmnd *scmnd)
 {
 	struct srp_target_port *target = host_to_target(scmnd->device->host);
 	struct srp_request *req = (struct srp_request *) scmnd->host_scribble;
+	u32 tag;
 	struct srp_rdma_ch *ch;
 	int ret;
 
@@ -2417,7 +2422,8 @@ static int srp_abort(struct scsi_cmnd *scmnd)
 	ch = &target->ch;
 	if (!req || !srp_claim_req(ch, req, NULL, scmnd))
 		return SUCCESS;
-	if (srp_send_tsk_mgmt(ch, req->index, scmnd->device->lun,
+	tag = blk_mq_unique_tag(scmnd->request);
+	if (srp_send_tsk_mgmt(ch, tag, scmnd->device->lun,
 			      SRP_TSK_ABORT_TASK) == 0)
 		ret = SUCCESS;
 	else if (target->rport->state == SRP_RPORT_LOST)
@@ -2463,6 +2469,15 @@ static int srp_reset_host(struct scsi_cmnd *scmnd)
 	return srp_reconnect_rport(target->rport) == 0 ? SUCCESS : FAILED;
 }
 
+static int srp_slave_alloc(struct scsi_device *sdev)
+{
+	sdev->tagged_supported = 1;
+
+	scsi_activate_tcq(sdev, sdev->queue_depth);
+
+	return 0;
+}
+
 static int srp_slave_configure(struct scsi_device *sdev)
 {
 	struct Scsi_Host *shost = sdev->host;
@@ -2641,6 +2656,7 @@ static struct scsi_host_template srp_template = {
 	.module				= THIS_MODULE,
 	.name				= "InfiniBand SRP initiator",
 	.proc_name			= DRV_NAME,
+	.slave_alloc			= srp_slave_alloc,
 	.slave_configure		= srp_slave_configure,
 	.info				= srp_target_info,
 	.queuecommand			= srp_queuecommand,
@@ -3076,6 +3092,10 @@ static ssize_t srp_create_target(struct device *dev,
 	if (ret)
 		goto err;
 
+	ret = scsi_init_shared_tag_map(target_host, target_host->can_queue);
+	if (ret)
+		goto err;
+
 	target->req_ring_size = target->queue_size - SRP_TSK_MGMT_SQ_SIZE;
 
 	if (!srp_conn_unique(target->srp_host, target)) {
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index 74530d9..75e8f36 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -127,7 +127,6 @@ struct srp_request {
 	struct srp_direct_buf  *indirect_desc;
 	dma_addr_t		indirect_dma_addr;
 	short			nmdesc;
-	short			index;
 };
 
 /**
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v2 11/12] IB/srp: Eliminate free_reqs list
       [not found] ` <5433E43D.3010107-HInyCGIudOg@public.gmane.org>
                     ` (4 preceding siblings ...)
  2014-10-07 13:05   ` [PATCH v2 09/12] IB/srp: Separate target and channel variables Bart Van Assche
@ 2014-10-07 13:06   ` Bart Van Assche
       [not found]     ` <5433E56E.6010600-HInyCGIudOg@public.gmane.org>
  5 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-07 13:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

The free_reqs list is no longer needed now that we are using
tags assigned by the block layer. Hence remove it.

Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
---
 drivers/infiniband/ulp/srp/ib_srp.c | 24 +++++++++---------------
 drivers/infiniband/ulp/srp/ib_srp.h |  1 -
 2 files changed, 9 insertions(+), 16 deletions(-)

diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index 224ef25..eccaf65 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -821,8 +821,6 @@ static int srp_alloc_req_data(struct srp_rdma_ch *ch)
 	dma_addr_t dma_addr;
 	int i, ret = -ENOMEM;
 
-	INIT_LIST_HEAD(&ch->free_reqs);
-
 	ch->req_ring = kcalloc(target->req_ring_size, sizeof(*ch->req_ring),
 			       GFP_KERNEL);
 	if (!ch->req_ring)
@@ -853,7 +851,6 @@ static int srp_alloc_req_data(struct srp_rdma_ch *ch)
 			goto out;
 
 		req->indirect_dma_addr = dma_addr;
-		list_add_tail(&req->list, &ch->free_reqs);
 	}
 	ret = 0;
 
@@ -1075,7 +1072,6 @@ static void srp_free_req(struct srp_rdma_ch *ch, struct srp_request *req,
 
 	spin_lock_irqsave(&ch->lock, flags);
 	ch->req_lim += req_lim_delta;
-	list_add_tail(&req->list, &ch->free_reqs);
 	spin_unlock_irqrestore(&ch->lock, flags);
 }
 
@@ -1892,6 +1888,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 	struct ib_device *dev;
 	unsigned long flags;
 	u32 tag;
+	u16 idx;
 	int len, ret;
 	const bool in_scsi_eh = !in_interrupt() && current == shost->ehandler;
 
@@ -1910,16 +1907,19 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 
 	tag = blk_mq_unique_tag(scmnd->request);
 	ch = &target->ch;
+	idx = blk_mq_unique_tag_to_tag(tag);
+	WARN_ONCE(idx >= target->req_ring_size, "%s: tag %#x: idx %d >= %d\n",
+		  dev_name(&shost->shost_gendev), tag, idx,
+		  target->req_ring_size);
 
 	spin_lock_irqsave(&ch->lock, flags);
 	iu = __srp_get_tx_iu(ch, SRP_IU_CMD);
-	if (!iu)
-		goto err_unlock;
-
-	req = list_first_entry(&ch->free_reqs, struct srp_request, list);
-	list_del(&req->list);
 	spin_unlock_irqrestore(&ch->lock, flags);
 
+	if (!iu)
+		goto err;
+
+	req = &ch->req_ring[idx];
 	dev = target->srp_host->srp_dev->dev;
 	ib_dma_sync_single_for_cpu(dev, iu->dma, target->max_iu_len,
 				   DMA_TO_DEVICE);
@@ -1980,12 +1980,6 @@ err_iu:
 	 */
 	req->scmnd = NULL;
 
-	spin_lock_irqsave(&ch->lock, flags);
-	list_add(&req->list, &ch->free_reqs);
-
-err_unlock:
-	spin_unlock_irqrestore(&ch->lock, flags);
-
 err:
 	if (scmnd->result) {
 		scmnd->scsi_done(scmnd);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index 75e8f36..bb185d4 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -136,7 +136,6 @@ struct srp_request {
 struct srp_rdma_ch {
 	/* These are RW in the hot path, and commonly used together */
 	struct list_head	free_tx;
-	struct list_head	free_reqs;
 	spinlock_t		lock;
 	s32			req_lim;
 
-- 
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 83+ messages in thread

* [PATCH v2 12/12] IB/srp: Add multichannel support
  2014-10-07 13:01 [PATCH v2 0/12] IB/srp: Add multichannel support Bart Van Assche
                   ` (5 preceding siblings ...)
  2014-10-07 13:06 ` [PATCH v2 10/12] IB/srp: Use block layer tags Bart Van Assche
@ 2014-10-07 13:07 ` Bart Van Assche
  2014-10-17 11:01   ` EH action after scsi_remove_host, was: " Christoph Hellwig
                     ` (2 more replies)
  2014-10-08 13:16 ` [PATCH] blk-mq: Use all available hardware queues Bart Van Assche
  7 siblings, 3 replies; 83+ messages in thread
From: Bart Van Assche @ 2014-10-07 13:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi, linux-rdma

Improve performance by using multiple RDMA/RC channels per SCSI
host for communication with an SRP target. About the
implementation:
- Introduce a loop over all channels in the code that uses
  target->ch.
- Set the SRP_MULTICHAN_MULTI flag during login for the creation
  of the second and subsequent channels.
- RDMA completion vectors are chosen such that RDMA completion
  interrupts are handled by the CPU socket that submitted the I/O
  request. As one can see in this patch it has been assumed if a
  system contains n CPU sockets and m RDMA completion vectors
  have been assigned to an RDMA HCA that IRQ affinity has been
  configured such that completion vectors [i*m/n..(i+1)*m/n) are
  bound to CPU socket i with 0 <= i < n.
- Modify srp_free_ch_ib() and srp_free_req_data() such that it
  becomes safe to invoke these functions after the corresponding
  allocation function failed.
- Add a ch_count sysfs attribute per target port.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
---
 Documentation/ABI/stable/sysfs-driver-ib_srp |  25 ++-
 drivers/infiniband/ulp/srp/ib_srp.c          | 291 ++++++++++++++++++++-------
 drivers/infiniband/ulp/srp/ib_srp.h          |   3 +-
 3 files changed, 238 insertions(+), 81 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp b/Documentation/ABI/stable/sysfs-driver-ib_srp
index b9688de..d5a459e 100644
--- a/Documentation/ABI/stable/sysfs-driver-ib_srp
+++ b/Documentation/ABI/stable/sysfs-driver-ib_srp
@@ -55,12 +55,12 @@ Description:	Interface for making ib_srp connect to a new target.
 		  only safe with partial memory descriptor list support enabled
 		  (allow_ext_sg=1).
 		* comp_vector, a number in the range 0..n-1 specifying the
-		  MSI-X completion vector. Some HCA's allocate multiple (n)
-		  MSI-X vectors per HCA port. If the IRQ affinity masks of
-		  these interrupts have been configured such that each MSI-X
-		  interrupt is handled by a different CPU then the comp_vector
-		  parameter can be used to spread the SRP completion workload
-		  over multiple CPU's.
+		  MSI-X completion vector of the first RDMA channel. Some
+		  HCA's allocate multiple (n) MSI-X vectors per HCA port. If
+		  the IRQ affinity masks of these interrupts have been
+		  configured such that each MSI-X interrupt is handled by a
+		  different CPU then the comp_vector parameter can be used to
+		  spread the SRP completion workload over multiple CPU's.
 		* tl_retry_count, a number in the range 2..7 specifying the
 		  IB RC retry count.
 		* queue_size, the maximum number of commands that the
@@ -88,6 +88,13 @@ Description:	Whether ib_srp is allowed to include a partial memory
 		descriptor list in an SRP_CMD when communicating with an SRP
 		target.
 
+What:		/sys/class/scsi_host/host<n>/ch_count
+Date:		November 1, 2014
+KernelVersion:	3.18
+Contact:	linux-rdma@vger.kernel.org
+Description:	Number of RDMA channels used for communication with the SRP
+		target.
+
 What:		/sys/class/scsi_host/host<n>/cmd_sg_entries
 Date:		May 19, 2011
 KernelVersion:	2.6.39
@@ -95,6 +102,12 @@ Contact:	linux-rdma@vger.kernel.org
 Description:	Maximum number of data buffer descriptors that may be sent to
 		the target in a single SRP_CMD request.
 
+What:		/sys/class/scsi_host/host<n>/comp_vector
+Date:		September 2, 2013
+KernelVersion:	3.11
+Contact:	linux-rdma@vger.kernel.org
+Description:	Completion vector used for the first RDMA channel.
+
 What:		/sys/class/scsi_host/host<n>/dgid
 Date:		June 17, 2006
 KernelVersion:	2.6.17
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
index eccaf65..80699a9 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -123,6 +123,11 @@ MODULE_PARM_DESC(dev_loss_tmo,
 		 " if fast_io_fail_tmo has not been set. \"off\" means that"
 		 " this functionality is disabled.");
 
+static unsigned ch_count;
+module_param(ch_count, uint, 0444);
+MODULE_PARM_DESC(ch_count,
+		 "Number of RDMA channels to use for communication with an SRP target. Using more than one channel improves performance if the HCA supports multiple completion vectors. The default value is the minimum of four times the number of online CPU sockets and the number of completion vectors supported by the HCA.");
+
 static void srp_add_one(struct ib_device *device);
 static void srp_remove_one(struct ib_device *device);
 static void srp_recv_completion(struct ib_cq *cq, void *ch_ptr);
@@ -562,11 +567,26 @@ static void srp_free_ch_ib(struct srp_target_port *target,
 	struct srp_device *dev = target->srp_host->srp_dev;
 	int i;
 
+	if (!ch->target)
+		return;
+
+	/*
+	 * Avoid that the SCSI error handler tries to use this channel after
+	 * it has been freed. The SCSI error handler can namely continue
+	 * trying to perform recovery actions after scsi_remove_host()
+	 * returned.
+	 */
+	ch->target = NULL;
+
 	if (ch->cm_id) {
 		ib_destroy_cm_id(ch->cm_id);
 		ch->cm_id = NULL;
 	}
 
+	/* If srp_new_cm_id() succeeded but srp_create_ch_ib() not, return. */
+	if (!ch->qp)
+		return;
+
 	if (dev->use_fast_reg) {
 		if (ch->fr_pool)
 			srp_destroy_fr_pool(ch->fr_pool);
@@ -647,7 +667,7 @@ static int srp_lookup_path(struct srp_rdma_ch *ch)
 	return ch->status;
 }
 
-static int srp_send_req(struct srp_rdma_ch *ch)
+static int srp_send_req(struct srp_rdma_ch *ch, bool multich)
 {
 	struct srp_target_port *target = ch->target;
 	struct {
@@ -688,6 +708,8 @@ static int srp_send_req(struct srp_rdma_ch *ch)
 	req->priv.req_it_iu_len = cpu_to_be32(target->max_iu_len);
 	req->priv.req_buf_fmt 	= cpu_to_be16(SRP_BUF_FORMAT_DIRECT |
 					      SRP_BUF_FORMAT_INDIRECT);
+	req->priv.req_flags	= (multich ? SRP_MULTICHAN_MULTI :
+				   SRP_MULTICHAN_SINGLE);
 	/*
 	 * In the published SRP specification (draft rev. 16a), the
 	 * port identifier format is 8 bytes of ID extension followed
@@ -769,14 +791,18 @@ static bool srp_change_conn_state(struct srp_target_port *target,
 
 static void srp_disconnect_target(struct srp_target_port *target)
 {
-	struct srp_rdma_ch *ch = &target->ch;
+	struct srp_rdma_ch *ch;
+	int i;
 
 	if (srp_change_conn_state(target, false)) {
 		/* XXX should send SRP_I_LOGOUT request */
 
-		if (ib_send_cm_dreq(ch->cm_id, NULL, 0)) {
-			shost_printk(KERN_DEBUG, target->scsi_host,
-				     PFX "Sending CM DREQ failed\n");
+		for (i = 0; i < target->ch_count; i++) {
+			ch = &target->ch[i];
+			if (ch->cm_id && ib_send_cm_dreq(ch->cm_id, NULL, 0)) {
+				shost_printk(KERN_DEBUG, target->scsi_host,
+					     PFX "Sending CM DREQ failed\n");
+			}
 		}
 	}
 }
@@ -789,7 +815,7 @@ static void srp_free_req_data(struct srp_target_port *target,
 	struct srp_request *req;
 	int i;
 
-	if (!ch->req_ring)
+	if (!ch->target || !ch->req_ring)
 		return;
 
 	for (i = 0; i < target->req_ring_size; ++i) {
@@ -875,7 +901,8 @@ static void srp_del_scsi_host_attr(struct Scsi_Host *shost)
 
 static void srp_remove_target(struct srp_target_port *target)
 {
-	struct srp_rdma_ch *ch = &target->ch;
+	struct srp_rdma_ch *ch;
+	int i;
 
 	WARN_ON_ONCE(target->state != SRP_TARGET_REMOVED);
 
@@ -885,10 +912,18 @@ static void srp_remove_target(struct srp_target_port *target)
 	scsi_remove_host(target->scsi_host);
 	srp_stop_rport_timers(target->rport);
 	srp_disconnect_target(target);
-	srp_free_ch_ib(target, ch);
+	for (i = 0; i < target->ch_count; i++) {
+		ch = &target->ch[i];
+		srp_free_ch_ib(target, ch);
+	}
 	cancel_work_sync(&target->tl_err_work);
 	srp_rport_put(target->rport);
-	srp_free_req_data(target, ch);
+	for (i = 0; i < target->ch_count; i++) {
+		ch = &target->ch[i];
+		srp_free_req_data(target, ch);
+	}
+	kfree(target->ch);
+	target->ch = NULL;
 
 	spin_lock(&target->srp_host->target_lock);
 	list_del(&target->list);
@@ -914,12 +949,12 @@ static void srp_rport_delete(struct srp_rport *rport)
 	srp_queue_remove_work(target);
 }
 
-static int srp_connect_ch(struct srp_rdma_ch *ch)
+static int srp_connect_ch(struct srp_rdma_ch *ch, bool multich)
 {
 	struct srp_target_port *target = ch->target;
 	int ret;
 
-	WARN_ON_ONCE(target->connected);
+	WARN_ON_ONCE(!multich && target->connected);
 
 	target->qp_in_error = false;
 
@@ -929,7 +964,7 @@ static int srp_connect_ch(struct srp_rdma_ch *ch)
 
 	while (1) {
 		init_completion(&ch->done);
-		ret = srp_send_req(ch);
+		ret = srp_send_req(ch, multich);
 		if (ret)
 			return ret;
 		ret = wait_for_completion_interruptible(&ch->done);
@@ -1090,10 +1125,10 @@ static void srp_finish_req(struct srp_rdma_ch *ch, struct srp_request *req,
 static void srp_terminate_io(struct srp_rport *rport)
 {
 	struct srp_target_port *target = rport->lld_data;
-	struct srp_rdma_ch *ch = &target->ch;
+	struct srp_rdma_ch *ch;
 	struct Scsi_Host *shost = target->scsi_host;
 	struct scsi_device *sdev;
-	int i;
+	int i, j;
 
 	/*
 	 * Invoking srp_terminate_io() while srp_queuecommand() is running
@@ -1102,10 +1137,15 @@ static void srp_terminate_io(struct srp_rport *rport)
 	shost_for_each_device(sdev, shost)
 		WARN_ON_ONCE(sdev->request_queue->request_fn_active);
 
-	for (i = 0; i < target->req_ring_size; ++i) {
-		struct srp_request *req = &ch->req_ring[i];
+	for (i = 0; i < target->ch_count; i++) {
+		ch = &target->ch[i];
 
-		srp_finish_req(ch, req, NULL, DID_TRANSPORT_FAILFAST << 16);
+		for (j = 0; j < target->req_ring_size; ++j) {
+			struct srp_request *req = &ch->req_ring[j];
+
+			srp_finish_req(ch, req, NULL,
+				       DID_TRANSPORT_FAILFAST << 16);
+		}
 	}
 }
 
@@ -1121,8 +1161,9 @@ static void srp_terminate_io(struct srp_rport *rport)
 static int srp_rport_reconnect(struct srp_rport *rport)
 {
 	struct srp_target_port *target = rport->lld_data;
-	struct srp_rdma_ch *ch = &target->ch;
-	int i, ret;
+	struct srp_rdma_ch *ch;
+	int i, j, ret = 0;
+	bool multich = false;
 
 	srp_disconnect_target(target);
 
@@ -1134,27 +1175,47 @@ static int srp_rport_reconnect(struct srp_rport *rport)
 	 * case things are really fouled up. Doing so also ensures that all CM
 	 * callbacks will have finished before a new QP is allocated.
 	 */
-	ret = srp_new_cm_id(ch);
-
-	for (i = 0; i < target->req_ring_size; ++i) {
-		struct srp_request *req = &ch->req_ring[i];
-
-		srp_finish_req(ch, req, NULL, DID_RESET << 16);
+	for (i = 0; i < target->ch_count; i++) {
+		ch = &target->ch[i];
+		if (!ch->target)
+			break;
+		ret += srp_new_cm_id(ch);
 	}
+	for (i = 0; i < target->ch_count; i++) {
+		ch = &target->ch[i];
+		if (!ch->target)
+			break;
+		for (j = 0; j < target->req_ring_size; ++j) {
+			struct srp_request *req = &ch->req_ring[j];
 
-	/*
-	 * Whether or not creating a new CM ID succeeded, create a new
-	 * QP. This guarantees that all callback functions for the old QP have
-	 * finished before any send requests are posted on the new QP.
-	 */
-	ret += srp_create_ch_ib(ch);
-
-	INIT_LIST_HEAD(&ch->free_tx);
-	for (i = 0; i < target->queue_size; ++i)
-		list_add(&ch->tx_ring[i]->list, &ch->free_tx);
+			srp_finish_req(ch, req, NULL, DID_RESET << 16);
+		}
+	}
+	for (i = 0; i < target->ch_count; i++) {
+		ch = &target->ch[i];
+		if (!ch->target)
+			break;
+		/*
+		 * Whether or not creating a new CM ID succeeded, create a new
+		 * QP. This guarantees that all completion callback function
+		 * invocations have finished before request resetting starts.
+		 */
+		ret += srp_create_ch_ib(ch);
 
-	if (ret == 0)
-		ret = srp_connect_ch(ch);
+		INIT_LIST_HEAD(&ch->free_tx);
+		for (j = 0; j < target->queue_size; ++j)
+			list_add(&ch->tx_ring[j]->list, &ch->free_tx);
+	}
+	for (i = 0; i < target->ch_count; i++) {
+		ch = &target->ch[i];
+		if (ret || !ch->target) {
+			if (i > 1)
+				ret = 0;
+			break;
+		}
+		ret = srp_connect_ch(ch, multich);
+		multich = true;
+	}
 
 	if (ret == 0)
 		shost_printk(KERN_INFO, target->scsi_host,
@@ -1643,6 +1704,9 @@ static void srp_process_rsp(struct srp_rdma_ch *ch, struct srp_rsp *rsp)
 			ch->tsk_mgmt_status = rsp->data[3];
 		complete(&ch->tsk_mgmt_done);
 	} else {
+		if (blk_mq_unique_tag_to_hwq(rsp->tag) != ch - target->ch)
+			pr_err("Channel idx mismatch: tag %#llx <> ch %#lx\n",
+			       rsp->tag, ch - target->ch);
 		scmnd = scsi_host_find_tag(target->scsi_host, rsp->tag);
 		if (scmnd) {
 			req = (void *)scmnd->host_scribble;
@@ -1650,8 +1714,8 @@ static void srp_process_rsp(struct srp_rdma_ch *ch, struct srp_rsp *rsp)
 		}
 		if (!scmnd) {
 			shost_printk(KERN_ERR, target->scsi_host,
-				     "Null scmnd for RSP w/tag %016llx\n",
-				     (unsigned long long) rsp->tag);
+				     "Null scmnd for RSP w/tag %#016llx received on ch %ld / QP %#x\n",
+				     rsp->tag, ch - target->ch, ch->qp->qp_num);
 
 			spin_lock_irqsave(&ch->lock, flags);
 			ch->req_lim += be32_to_cpu(rsp->req_lim_delta);
@@ -1906,7 +1970,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
 		goto err;
 
 	tag = blk_mq_unique_tag(scmnd->request);
-	ch = &target->ch;
+	ch = &target->ch[blk_mq_unique_tag_to_hwq(tag)];
 	idx = blk_mq_unique_tag_to_tag(tag);
 	WARN_ONCE(idx >= target->req_ring_size, "%s: tag %#x: idx %d >= %d\n",
 		  dev_name(&shost->shost_gendev), tag, idx,
@@ -2408,15 +2472,23 @@ static int srp_abort(struct scsi_cmnd *scmnd)
 	struct srp_target_port *target = host_to_target(scmnd->device->host);
 	struct srp_request *req = (struct srp_request *) scmnd->host_scribble;
 	u32 tag;
+	u16 ch_idx;
 	struct srp_rdma_ch *ch;
 	int ret;
 
 	shost_printk(KERN_ERR, target->scsi_host, "SRP abort called\n");
 
-	ch = &target->ch;
-	if (!req || !srp_claim_req(ch, req, NULL, scmnd))
+	if (!req)
 		return SUCCESS;
 	tag = blk_mq_unique_tag(scmnd->request);
+	ch_idx = blk_mq_unique_tag_to_hwq(tag);
+	if (WARN_ON_ONCE(ch_idx >= target->ch_count))
+		return SUCCESS;
+	ch = &target->ch[ch_idx];
+	if (!srp_claim_req(ch, req, NULL, scmnd))
+		return SUCCESS;
+	shost_printk(KERN_ERR, target->scsi_host,
+		     "Sending SRP abort for tag %#x\n", tag);
 	if (srp_send_tsk_mgmt(ch, tag, scmnd->device->lun,
 			      SRP_TSK_ABORT_TASK) == 0)
 		ret = SUCCESS;
@@ -2434,21 +2506,25 @@ static int srp_abort(struct scsi_cmnd *scmnd)
 static int srp_reset_device(struct scsi_cmnd *scmnd)
 {
 	struct srp_target_port *target = host_to_target(scmnd->device->host);
-	struct srp_rdma_ch *ch = &target->ch;
+	struct srp_rdma_ch *ch;
 	int i;
 
 	shost_printk(KERN_ERR, target->scsi_host, "SRP reset_device called\n");
 
+	ch = &target->ch[0];
 	if (srp_send_tsk_mgmt(ch, SRP_TAG_NO_REQ, scmnd->device->lun,
 			      SRP_TSK_LUN_RESET))
 		return FAILED;
 	if (ch->tsk_mgmt_status)
 		return FAILED;
 
-	for (i = 0; i < target->req_ring_size; ++i) {
-		struct srp_request *req = &ch->req_ring[i];
+	for (i = 0; i < target->ch_count; i++) {
+		ch = &target->ch[i];
+		for (i = 0; i < target->req_ring_size; ++i) {
+			struct srp_request *req = &ch->req_ring[i];
 
-		srp_finish_req(ch, req, scmnd->device, DID_RESET << 16);
+			srp_finish_req(ch, req, scmnd->device, DID_RESET << 16);
+		}
 	}
 
 	return SUCCESS;
@@ -2534,7 +2610,7 @@ static ssize_t show_dgid(struct device *dev, struct device_attribute *attr,
 			 char *buf)
 {
 	struct srp_target_port *target = host_to_target(class_to_shost(dev));
-	struct srp_rdma_ch *ch = &target->ch;
+	struct srp_rdma_ch *ch = &target->ch[0];
 
 	return sprintf(buf, "%pI6\n", ch->path.dgid.raw);
 }
@@ -2551,8 +2627,14 @@ static ssize_t show_req_lim(struct device *dev,
 			    struct device_attribute *attr, char *buf)
 {
 	struct srp_target_port *target = host_to_target(class_to_shost(dev));
+	struct srp_rdma_ch *ch;
+	int i, req_lim = INT_MAX;
 
-	return sprintf(buf, "%d\n", target->ch.req_lim);
+	for (i = 0; i < target->ch_count; i++) {
+		ch = &target->ch[i];
+		req_lim = min(req_lim, ch->req_lim);
+	}
+	return sprintf(buf, "%d\n", req_lim);
 }
 
 static ssize_t show_zero_req_lim(struct device *dev,
@@ -2579,6 +2661,14 @@ static ssize_t show_local_ib_device(struct device *dev,
 	return sprintf(buf, "%s\n", target->srp_host->srp_dev->dev->name);
 }
 
+static ssize_t show_ch_count(struct device *dev, struct device_attribute *attr,
+			     char *buf)
+{
+	struct srp_target_port *target = host_to_target(class_to_shost(dev));
+
+	return sprintf(buf, "%d\n", target->ch_count);
+}
+
 static ssize_t show_comp_vector(struct device *dev,
 				struct device_attribute *attr, char *buf)
 {
@@ -2622,6 +2712,7 @@ static DEVICE_ATTR(req_lim,         S_IRUGO, show_req_lim,         NULL);
 static DEVICE_ATTR(zero_req_lim,    S_IRUGO, show_zero_req_lim,	   NULL);
 static DEVICE_ATTR(local_ib_port,   S_IRUGO, show_local_ib_port,   NULL);
 static DEVICE_ATTR(local_ib_device, S_IRUGO, show_local_ib_device, NULL);
+static DEVICE_ATTR(ch_count,        S_IRUGO, show_ch_count,        NULL);
 static DEVICE_ATTR(comp_vector,     S_IRUGO, show_comp_vector,     NULL);
 static DEVICE_ATTR(tl_retry_count,  S_IRUGO, show_tl_retry_count,  NULL);
 static DEVICE_ATTR(cmd_sg_entries,  S_IRUGO, show_cmd_sg_entries,  NULL);
@@ -2639,6 +2730,7 @@ static struct device_attribute *srp_host_attrs[] = {
 	&dev_attr_zero_req_lim,
 	&dev_attr_local_ib_port,
 	&dev_attr_local_ib_device,
+	&dev_attr_ch_count,
 	&dev_attr_comp_vector,
 	&dev_attr_tl_retry_count,
 	&dev_attr_cmd_sg_entries,
@@ -3048,7 +3140,8 @@ static ssize_t srp_create_target(struct device *dev,
 	struct srp_rdma_ch *ch;
 	struct srp_device *srp_dev = host->srp_dev;
 	struct ib_device *ibdev = srp_dev->dev;
-	int ret;
+	int ret, node_idx, node, cpu, i;
+	bool multich = false;
 
 	target_host = scsi_host_alloc(&srp_template,
 				      sizeof (struct srp_target_port));
@@ -3118,34 +3211,82 @@ static ssize_t srp_create_target(struct device *dev,
 	INIT_WORK(&target->tl_err_work, srp_tl_err_work);
 	INIT_WORK(&target->remove_work, srp_remove_work);
 	spin_lock_init(&target->lock);
-	ch = &target->ch;
-	ch->target = target;
-	ch->comp_vector = target->comp_vector;
-	spin_lock_init(&ch->lock);
-	INIT_LIST_HEAD(&ch->free_tx);
-	ret = srp_alloc_req_data(ch);
-	if (ret)
-		goto err_free_mem;
-
 	ret = ib_query_gid(ibdev, host->port, 0, &target->sgid);
 	if (ret)
-		goto err_free_mem;
+		goto err;
 
-	ret = srp_create_ch_ib(ch);
-	if (ret)
-		goto err_free_mem;
+	ret = -ENOMEM;
+	target->ch_count = max_t(unsigned, num_online_nodes(),
+				 min(ch_count ? :
+				     min(4 * num_online_nodes(),
+					 ibdev->num_comp_vectors),
+				     num_online_cpus()));
+	target->ch = kcalloc(target->ch_count, sizeof(*target->ch),
+			     GFP_KERNEL);
+	if (!target->ch)
+		goto err;
 
-	ret = srp_new_cm_id(ch);
-	if (ret)
-		goto err_free_ib;
+	node_idx = 0;
+	for_each_online_node(node) {
+		const int ch_start = (node_idx * target->ch_count /
+				      num_online_nodes());
+		const int ch_end = ((node_idx + 1) * target->ch_count /
+				    num_online_nodes());
+		const int cv_start = (node_idx * ibdev->num_comp_vectors /
+				      num_online_nodes() + target->comp_vector)
+				     % ibdev->num_comp_vectors;
+		const int cv_end = ((node_idx + 1) * ibdev->num_comp_vectors /
+				    num_online_nodes() + target->comp_vector)
+				   % ibdev->num_comp_vectors;
+		int cpu_idx = 0;
+
+		for_each_online_cpu(cpu) {
+			if (cpu_to_node(cpu) != node)
+				continue;
+			if (ch_start + cpu_idx >= ch_end)
+				continue;
+			ch = &target->ch[ch_start + cpu_idx];
+			ch->target = target;
+			ch->comp_vector = cv_start == cv_end ? cv_start :
+				cv_start + cpu_idx % (cv_end - cv_start);
+			spin_lock_init(&ch->lock);
+			INIT_LIST_HEAD(&ch->free_tx);
+			ret = srp_new_cm_id(ch);
+			if (ret)
+				goto err_disconnect;
 
-	ret = srp_connect_ch(ch);
-	if (ret) {
-		shost_printk(KERN_ERR, target->scsi_host,
-			     PFX "Connection failed\n");
-		goto err_free_ib;
+			ret = srp_create_ch_ib(ch);
+			if (ret)
+				goto err_disconnect;
+
+			ret = srp_alloc_req_data(ch);
+			if (ret)
+				goto err_disconnect;
+
+			ret = srp_connect_ch(ch, multich);
+			if (ret) {
+				shost_printk(KERN_ERR, target->scsi_host,
+					     PFX "Connection %d/%d failed\n",
+					     ch_start + cpu_idx,
+					     target->ch_count);
+				if (node_idx == 0 && cpu_idx == 0) {
+					goto err_disconnect;
+				} else {
+					srp_free_ch_ib(target, ch);
+					srp_free_req_data(target, ch);
+					target->ch_count = ch - target->ch;
+					break;
+				}
+			}
+
+			multich = true;
+			cpu_idx++;
+		}
+		node_idx++;
 	}
 
+	target->scsi_host->nr_hw_queues = target->ch_count;
+
 	ret = srp_add_target(host, target);
 	if (ret)
 		goto err_disconnect;
@@ -3172,11 +3313,13 @@ out:
 err_disconnect:
 	srp_disconnect_target(target);
 
-err_free_ib:
-	srp_free_ch_ib(target, ch);
+	for (i = 0; i < target->ch_count; i++) {
+		ch = &target->ch[i];
+		srp_free_ch_ib(target, ch);
+		srp_free_req_data(target, ch);
+	}
 
-err_free_mem:
-	srp_free_req_data(target, ch);
+	kfree(target->ch);
 
 err:
 	scsi_host_put(target_host);
diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
index bb185d4..5b7dada 100644
--- a/drivers/infiniband/ulp/srp/ib_srp.h
+++ b/drivers/infiniband/ulp/srp/ib_srp.h
@@ -180,8 +180,9 @@ struct srp_target_port {
 	/* read and written in the hot path */
 	spinlock_t		lock;
 
-	struct srp_rdma_ch	ch;
 	/* read only in the hot path */
+	struct srp_rdma_ch	*ch;
+	u32			ch_count;
 	u32			lkey;
 	u32			rkey;
 	enum srp_target_state	state;
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 01/12] blk-mq: Use all available hardware queues
  2014-10-07 13:02   ` [PATCH v2 01/12] blk-mq: Use all available " Bart Van Assche
@ 2014-10-07 14:37     ` Jens Axboe
       [not found]       ` <5433FA8F.3050100-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Jens Axboe @ 2014-10-07 14:37 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: Sagi Grimberg, Sebastian Parschauer, Robert Elliott, Ming Lei,
	linux-scsi, linux-rdma

On 10/07/2014 07:02 AM, Bart Van Assche wrote:
> Suppose that a system has two CPU sockets, three cores per socket,
> that it does not support hyperthreading and that four hardware
> queues are provided by a block driver. With the current algorithm
> this will lead to the following assignment of CPU cores to hardware
> queues:
> 
>   HWQ 0: 0 1
>   HWQ 1: 2 3
>   HWQ 2: 4 5
>   HWQ 3: (none)
> 
> This patch changes the queue assignment into:
> 
>   HWQ 0: 0 1
>   HWQ 1: 2
>   HWQ 2: 3 4
>   HWQ 3: 5
> 
> In other words, this patch has the following three effects:
> - All four hardware queues are used instead of only three.
> - CPU cores are spread more evenly over hardware queues. For the
>   above example the range of the number of CPU cores associated
>   with a single HWQ is reduced from [0..2] to [1..2].
> - If the number of HWQ's is a multiple of the number of CPU sockets
>   it is now guaranteed that all CPU cores associated with a single
>   HWQ reside on the same CPU socket.
> 
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
> Cc: Jens Axboe <axboe@fb.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Ming Lei <ming.lei@canonical.com>
> ---
>  block/blk-mq-cpumap.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
> index 1065d7c..8e56455 100644
> --- a/block/blk-mq-cpumap.c
> +++ b/block/blk-mq-cpumap.c
> @@ -17,7 +17,7 @@
>  static int cpu_to_queue_index(unsigned int nr_cpus, unsigned int nr_queues,
>  			      const int cpu)
>  {
> -	return cpu / ((nr_cpus + nr_queues - 1) / nr_queues);
> +	return cpu * nr_queues / nr_cpus;
>  }
>  
>  static int get_first_sibling(unsigned int cpu)

Lets do this separate, as explained last time, it needs to be evaluated
on its own and doesn't really belong in this series of patches.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 83+ messages in thread

* [PATCH] blk-mq: Use all available hardware queues
  2014-10-07 13:01 [PATCH v2 0/12] IB/srp: Add multichannel support Bart Van Assche
                   ` (6 preceding siblings ...)
  2014-10-07 13:07 ` [PATCH v2 12/12] IB/srp: Add multichannel support Bart Van Assche
@ 2014-10-08 13:16 ` Bart Van Assche
  7 siblings, 0 replies; 83+ messages in thread
From: Bart Van Assche @ 2014-10-08 13:16 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Sagi Grimberg, Sebastian Parschauer,
	Robert Elliott, Ming Lei, linux-kernel

Suppose that a system has two CPU sockets, three cores per socket,
that it does not support hyperthreading and that four hardware
queues are provided by a block driver. With the current algorithm
this will lead to the following assignment of CPU cores to hardware
queues:

  HWQ 0: 0 1
  HWQ 1: 2 3
  HWQ 2: 4 5
  HWQ 3: (none)

This patch changes the queue assignment into:

  HWQ 0: 0 1
  HWQ 1: 2
  HWQ 2: 3 4
  HWQ 3: 5

In other words, this patch has the following three effects:
- All four hardware queues are used instead of only three.
- CPU cores are spread more evenly over hardware queues. For the
  above example the range of the number of CPU cores associated
  with a single HWQ is reduced from [0..2] to [1..2].
- If the number of HWQ's is a multiple of the number of CPU sockets
  it is now guaranteed that all CPU cores associated with a single
  HWQ reside on the same CPU socket.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@canonical.com>
---
 block/blk-mq-cpumap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/blk-mq-cpumap.c b/block/blk-mq-cpumap.c
index 1065d7c..8e56455 100644
--- a/block/blk-mq-cpumap.c
+++ b/block/blk-mq-cpumap.c
@@ -17,7 +17,7 @@
 static int cpu_to_queue_index(unsigned int nr_cpus, unsigned int nr_queues,
 			      const int cpu)
 {
-	return cpu / ((nr_cpus + nr_queues - 1) / nr_queues);
+	return cpu * nr_queues / nr_cpus;
 }
  static int get_first_sibling(unsigned int cpu)
-- 
1.8.4.5


^ permalink raw reply related	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 01/12] blk-mq: Use all available hardware queues
       [not found]       ` <5433FA8F.3050100-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
@ 2014-10-08 13:21         ` Bart Van Assche
       [not found]           ` <54353A74.7040406-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-08 13:21 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Christoph Hellwig, Sagi Grimberg, Sebastian Parschauer,
	Robert Elliott, Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma

On 10/07/14 16:37, Jens Axboe wrote:
> Lets do this separate, as explained last time, it needs to be evaluated
> on its own and doesn't really belong in this series of patches.

Hello Jens,

A few minutes ago I have resent this patch to you with the LKML in CC. I 
hope that Christoph agrees with leaving out this patch from this series 
without me having to resend this series. BTW, Sagi promised me off-list 
to review the other patches in this series.

Bart.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 02/12] blk-mq: Add blk_mq_unique_tag()
  2014-10-07 13:03 ` [PATCH v2 02/12] blk-mq: Add blk_mq_unique_tag() Bart Van Assche
@ 2014-10-11 11:08   ` Christoph Hellwig
  2014-10-13  9:21     ` Bart Van Assche
       [not found]   ` <5433E493.9030304-HInyCGIudOg@public.gmane.org>
  1 sibling, 1 reply; 83+ messages in thread
From: Christoph Hellwig @ 2014-10-11 11:08 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg,
	Sebastian Parschauer, Robert Elliott, Ming Lei, linux-scsi,
	linux-rdma

> +static inline u32 blk_mq_build_unique_tag(int hwq, int tag)
> +{
> +	return (hwq << BLK_MQ_UNIQUE_TAG_BITS) | (tag & BLK_MQ_UNIQUE_TAG_MASK);
> +}

Is there any value in having this as a separate helper?


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 01/12] blk-mq: Use all available hardware queues
       [not found]           ` <54353A74.7040406-HInyCGIudOg@public.gmane.org>
@ 2014-10-11 11:11             ` Christoph Hellwig
       [not found]               ` <20141011111114.GB9593-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Christoph Hellwig @ 2014-10-11 11:11 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On Wed, Oct 08, 2014 at 03:21:56PM +0200, Bart Van Assche wrote:
> On 10/07/14 16:37, Jens Axboe wrote:
> >Lets do this separate, as explained last time, it needs to be evaluated
> >on its own and doesn't really belong in this series of patches.
> 
> Hello Jens,
> 
> A few minutes ago I have resent this patch to you with the LKML in CC. I
> hope that Christoph agrees with leaving out this patch from this series
> without me having to resend this series. BTW, Sagi promised me off-list to
> review the other patches in this series.

Ignoring patches is one of the easier tasks, np.  Do you want me to
merge all the srp updates?  They normally go through the IB tree, but
I can pick them up this time so that we don't need to synchronize the
two trees.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 02/12] blk-mq: Add blk_mq_unique_tag()
  2014-10-11 11:08   ` Christoph Hellwig
@ 2014-10-13  9:21     ` Bart Van Assche
       [not found]       ` <543B99B2.1010307-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-13  9:21 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi, linux-rdma

On 10/11/14 13:08, Christoph Hellwig wrote:
>> +static inline u32 blk_mq_build_unique_tag(int hwq, int tag)
>> +{
>> +	return (hwq << BLK_MQ_UNIQUE_TAG_BITS) | (tag & BLK_MQ_UNIQUE_TAG_MASK);
>> +}
>
> Is there any value in having this as a separate helper?

Hello Christoph,

With the approach for block layer tag management proposed in this patch 
series SCSI LLDs no longer need to call this function. This means that 
the blk_mq_build_unique_tag() function can be eliminated by inlining it 
into blk_mq_unique_tag(). Would you like me to rework this patch 
accordingly ?

Bart.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 01/12] blk-mq: Use all available hardware queues
       [not found]               ` <20141011111114.GB9593-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2014-10-13  9:45                 ` Bart Van Assche
       [not found]                   ` <543B9F47.2090204-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-13  9:45 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On 10/11/14 13:11, Christoph Hellwig wrote:
> On Wed, Oct 08, 2014 at 03:21:56PM +0200, Bart Van Assche wrote:
>> On 10/07/14 16:37, Jens Axboe wrote:
>>> Lets do this separate, as explained last time, it needs to be evaluated
>>> on its own and doesn't really belong in this series of patches.
>>
>> Hello Jens,
>>
>> A few minutes ago I have resent this patch to you with the LKML in CC. I
>> hope that Christoph agrees with leaving out this patch from this series
>> without me having to resend this series. BTW, Sagi promised me off-list to
>> review the other patches in this series.
>
> Ignoring patches is one of the easier tasks, np.  Do you want me to
> merge all the srp updates?  They normally go through the IB tree, but
> I can pick them up this time so that we don't need to synchronize the
> two trees.

Hello Christoph,

Since patch 1/12 already has been sent separately to Jens patches 
2/12..12/12 remain. The SRP initiator changes in this series depend on 
the blk-mq and scsi-mq features added in patches 2/12..4/12. I think we 
should avoid splitting these patches over multiple kernel trees to avoid 
having to deal with dependencies between kernel trees.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 02/12] blk-mq: Add blk_mq_unique_tag()
       [not found]       ` <543B99B2.1010307-HInyCGIudOg@public.gmane.org>
@ 2014-10-13 10:15         ` Christoph Hellwig
  2014-10-19 16:14           ` Sagi Grimberg
  0 siblings, 1 reply; 83+ messages in thread
From: Christoph Hellwig @ 2014-10-13 10:15 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg,
	Sebastian Parschauer, Robert Elliott, Ming Lei,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On Mon, Oct 13, 2014 at 11:21:54AM +0200, Bart Van Assche wrote:
> With the approach for block layer tag management proposed in this patch
> series SCSI LLDs no longer need to call this function. This means that the
> blk_mq_build_unique_tag() function can be eliminated by inlining it into
> blk_mq_unique_tag(). Would you like me to rework this patch accordingly ?

Yes, please.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 10/12] IB/srp: Use block layer tags
       [not found]   ` <5433E557.3010505-HInyCGIudOg@public.gmane.org>
@ 2014-10-17 10:58     ` Christoph Hellwig
       [not found]       ` <20141017105858.GA7819-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2014-10-22 22:03     ` Elliott, Robert (Server Storage)
  1 sibling, 1 reply; 83+ messages in thread
From: Christoph Hellwig @ 2014-10-17 10:58 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg,
	Sebastian Parschauer, Robert Elliott, Ming Lei,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index cc0bf83b..224ef25 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -853,7 +853,6 @@ static int srp_alloc_req_data(struct srp_rdma_ch *ch)
>  			goto out;
>  
>  		req->indirect_dma_addr = dma_addr;
> -		req->index = i;
>  		list_add_tail(&req->list, &ch->free_reqs);
>  	}

Seems like a nice optimization for the future would be to preallocate
the srp requests with the block ones and the scsi command.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 11/12] IB/srp: Eliminate free_reqs list
       [not found]     ` <5433E56E.6010600-HInyCGIudOg@public.gmane.org>
@ 2014-10-17 10:59       ` Christoph Hellwig
       [not found]         ` <20141017105939.GB7819-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Christoph Hellwig @ 2014-10-17 10:59 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg,
	Sebastian Parschauer, Robert Elliott, Ming Lei,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On Tue, Oct 07, 2014 at 03:06:54PM +0200, Bart Van Assche wrote:
> The free_reqs list is no longer needed now that we are using
> tags assigned by the block layer. Hence remove it.

Is there any good reason not to fold this into the previous patch?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* EH action after scsi_remove_host, was: Re: [PATCH v2 12/12] IB/srp: Add multichannel support
  2014-10-07 13:07 ` [PATCH v2 12/12] IB/srp: Add multichannel support Bart Van Assche
@ 2014-10-17 11:01   ` Christoph Hellwig
  2014-10-20 13:53     ` Bart Van Assche
  2014-10-17 11:06   ` Christoph Hellwig
       [not found]   ` <5433E585.607-HInyCGIudOg@public.gmane.org>
  2 siblings, 1 reply; 83+ messages in thread
From: Christoph Hellwig @ 2014-10-17 11:01 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg,
	Sebastian Parschauer, Robert Elliott, linux-scsi

On Tue, Oct 07, 2014 at 03:07:17PM +0200, Bart Van Assche wrote:
> +	/*
> +	 * Avoid that the SCSI error handler tries to use this channel after
> +	 * it has been freed. The SCSI error handler can namely continue
> +	 * trying to perform recovery actions after scsi_remove_host()
> +	 * returned.
> +	 */
> +	ch->target = NULL;

Do you have a reproducer for that?  I think we should fix the root
cause.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 12/12] IB/srp: Add multichannel support
  2014-10-07 13:07 ` [PATCH v2 12/12] IB/srp: Add multichannel support Bart Van Assche
  2014-10-17 11:01   ` EH action after scsi_remove_host, was: " Christoph Hellwig
@ 2014-10-17 11:06   ` Christoph Hellwig
       [not found]     ` <20141017110627.GD7819-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
       [not found]   ` <5433E585.607-HInyCGIudOg@public.gmane.org>
  2 siblings, 1 reply; 83+ messages in thread
From: Christoph Hellwig @ 2014-10-17 11:06 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi, linux-rdma

>  	} else {
> +		if (blk_mq_unique_tag_to_hwq(rsp->tag) != ch - target->ch)
> +			pr_err("Channel idx mismatch: tag %#llx <> ch %#lx\n",
> +			       rsp->tag, ch - target->ch);
>  		scmnd = scsi_host_find_tag(target->scsi_host, rsp->tag);

Shouldn't we do this validity check inside scsi_host_find_tag, so that
all callers get it? That means adding an argument to it,  but there are
very few callers at the moment.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 01/12] blk-mq: Use all available hardware queues
       [not found]                   ` <543B9F47.2090204-HInyCGIudOg@public.gmane.org>
@ 2014-10-17 13:20                     ` Christoph Hellwig
       [not found]                       ` <20141017132053.GF16538-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Christoph Hellwig @ 2014-10-17 13:20 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg,
	Sebastian Parschauer, Robert Elliott, Ming Lei,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On Mon, Oct 13, 2014 at 11:45:43AM +0200, Bart Van Assche wrote:
> Since patch 1/12 already has been sent separately to Jens patches
> 2/12..12/12 remain. The SRP initiator changes in this series depend on the
> blk-mq and scsi-mq features added in patches 2/12..4/12. I think we should
> avoid splitting these patches over multiple kernel trees to avoid having to
> deal with dependencies between kernel trees.

I'd like to pull it in and Roland already did indicate that he's fine
with it.

Sagi, can I get a review from you for the remaining SRP patches?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 01/12] blk-mq: Use all available hardware queues
       [not found]                       ` <20141017132053.GF16538-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2014-10-17 14:11                         ` Sagi Grimberg
  0 siblings, 0 replies; 83+ messages in thread
From: Sagi Grimberg @ 2014-10-17 14:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bart Van Assche, Jens Axboe, Sagi Grimberg, Sebastian Parschauer,
	Robert Elliott, Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma


>> On Mon, Oct 13, 2014 at 11:45:43AM +0200, Bart Van Assche wrote:
>> Since patch 1/12 already has been sent separately to Jens patches
>> 2/12..12/12 remain. The SRP initiator changes in this series depend on the
>> blk-mq and scsi-mq features added in patches 2/12..4/12. I think we should
>> avoid splitting these patches over multiple kernel trees to avoid having to
>> deal with dependencies between kernel trees.
> 
> I'd like to pull it in and Roland already did indicate that he's fine
> with it.
> 
> Sagi, can I get a review from you for the remaining SRP patches?

As I promised Bart, it's on my todo list. Unfortunately this week I was violently pulled to another project...
It's a holiday here in Israel, so I'm planning to review the set by mid next week.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 03/12] scsi-mq: Add support for multiple hardware queues
       [not found]     ` <5433E4AB.8030306-HInyCGIudOg@public.gmane.org>
@ 2014-10-19 15:54       ` Sagi Grimberg
  2014-10-28  2:01       ` Martin K. Petersen
  1 sibling, 0 replies; 83+ messages in thread
From: Sagi Grimberg @ 2014-10-19 15:54 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On 10/7/2014 4:03 PM, Bart Van Assche wrote:
> Allow a SCSI LLD to declare how many hardware queues it supports
> by setting Scsi_Host.nr_hw_queues before calling scsi_add_host().
>
> Note: it is assumed that each hardware queue has a queue depth of
> shost->can_queue. In other words, the total queue depth per host
> is (number of hardware queues) * (shost->can_queue).
>
> Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
> Cc: Christoph Hellwig <hch-jcswGhMUV9g@public.gmane.org>
> Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> ---
>   drivers/scsi/scsi_lib.c  | 2 +-
>   include/scsi/scsi_host.h | 4 ++++
>   2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index db8c449..f829c42 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2072,7 +2072,7 @@ int scsi_mq_setup_tags(struct Scsi_Host *shost)
>
>   	memset(&shost->tag_set, 0, sizeof(shost->tag_set));
>   	shost->tag_set.ops = &scsi_mq_ops;
> -	shost->tag_set.nr_hw_queues = 1;
> +	shost->tag_set.nr_hw_queues = shost->nr_hw_queues ? : 1;
>   	shost->tag_set.queue_depth = shost->can_queue;
>   	shost->tag_set.cmd_size = cmd_size;
>   	shost->tag_set.numa_node = NUMA_NO_NODE;
> diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
> index cafb260..d38cab9 100644
> --- a/include/scsi/scsi_host.h
> +++ b/include/scsi/scsi_host.h
> @@ -638,6 +638,10 @@ struct Scsi_Host {
>   	short unsigned int sg_prot_tablesize;
>   	unsigned int max_sectors;
>   	unsigned long dma_boundary;
> +	/*
> +	 * In scsi-mq mode, the number of hardware queues supported by the LLD.
> +	 */
> +	unsigned nr_hw_queues;
>   	/*
>   	 * Used to assign serial numbers to the cmds.
>   	 * Protected by the host lock.
>


Reviewed-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 04/12] scsi_tcq.h: Add support for multiple hardware queues
  2014-10-07 13:04 ` [PATCH v2 04/12] scsi_tcq.h: Add support for multiple hardware queues Bart Van Assche
@ 2014-10-19 16:12   ` Sagi Grimberg
       [not found]     ` <5443E2DF.1040605-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  2014-10-28  2:06   ` Martin K. Petersen
  1 sibling, 1 reply; 83+ messages in thread
From: Sagi Grimberg @ 2014-10-19 16:12 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi, linux-rdma

On 10/7/2014 4:04 PM, Bart Van Assche wrote:
> Modify scsi_find_tag() and scsi_host_find_tag() such that these
> fuctions can translate a tag generated by blk_mq_unique_tag().
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Sagi Grimberg <sagig@mellanox.com>
> ---
>   include/scsi/scsi_tcq.h | 19 +++++++++++--------
>   1 file changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/include/scsi/scsi_tcq.h b/include/scsi/scsi_tcq.h
> index e645835..ea1ca9c 100644
> --- a/include/scsi/scsi_tcq.h
> +++ b/include/scsi/scsi_tcq.h
> @@ -111,18 +111,21 @@ static inline int scsi_populate_tag_msg(struct scsi_cmnd *cmd, char *msg)
>   }
>
>   static inline struct scsi_cmnd *scsi_mq_find_tag(struct Scsi_Host *shost,
> -		unsigned int hw_ctx, int tag)
> +						 int unique_tag)
>   {
> -	struct request *req;
> +	u16 hwq = blk_mq_unique_tag_to_hwq(unique_tag);
> +	struct request *req = NULL;
>
> -	req = blk_mq_tag_to_rq(shost->tag_set.tags[hw_ctx], tag);
> +	if (hwq < shost->tag_set.nr_hw_queues)
> +		req = blk_mq_tag_to_rq(shost->tag_set.tags[hwq],
> +				       blk_mq_unique_tag_to_tag(unique_tag));
>   	return req ? (struct scsi_cmnd *)req->special : NULL;
>   }
>
>   /**
>    * scsi_find_tag - find a tagged command by device
>    * @SDpnt:	pointer to the ScSI device
> - * @tag:	the tag number
> + * @tag:	tag generated by blk_mq_unique_tag()
>    *
>    * Notes:
>    *	Only works with tags allocated by the generic blk layer.
> @@ -133,9 +136,9 @@ static inline struct scsi_cmnd *scsi_find_tag(struct scsi_device *sdev, int tag)
>
>           if (tag != SCSI_NO_TAG) {
>   		if (shost_use_blk_mq(sdev->host))
> -			return scsi_mq_find_tag(sdev->host, 0, tag);
> +			return scsi_mq_find_tag(sdev->host, tag);
>
> -        	req = blk_queue_find_tag(sdev->request_queue, tag);
> +		req = blk_queue_find_tag(sdev->request_queue, tag);

Why is this line different?

>   	        return req ? (struct scsi_cmnd *)req->special : NULL;
>   	}
>
> @@ -174,7 +177,7 @@ static inline int scsi_init_shared_tag_map(struct Scsi_Host *shost, int depth)
>   /**
>    * scsi_host_find_tag - find the tagged command by host
>    * @shost:	pointer to scsi_host
> - * @tag:	tag of the scsi_cmnd
> + * @tag:	tag generated by blk_mq_unique_tag()
>    *
>    * Notes:
>    *	Only works with tags allocated by the generic blk layer.
> @@ -186,7 +189,7 @@ static inline struct scsi_cmnd *scsi_host_find_tag(struct Scsi_Host *shost,
>
>   	if (tag != SCSI_NO_TAG) {
>   		if (shost_use_blk_mq(shost))
> -			return scsi_mq_find_tag(shost, 0, tag);
> +			return scsi_mq_find_tag(shost, tag);
>   		req = blk_map_queue_find_tag(shost->bqt, tag);
>   		return req ? (struct scsi_cmnd *)req->special : NULL;
>   	}
>


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 02/12] blk-mq: Add blk_mq_unique_tag()
  2014-10-13 10:15         ` Christoph Hellwig
@ 2014-10-19 16:14           ` Sagi Grimberg
  0 siblings, 0 replies; 83+ messages in thread
From: Sagi Grimberg @ 2014-10-19 16:14 UTC (permalink / raw)
  To: Christoph Hellwig, Bart Van Assche
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi, linux-rdma

On 10/13/2014 1:15 PM, Christoph Hellwig wrote:
> On Mon, Oct 13, 2014 at 11:21:54AM +0200, Bart Van Assche wrote:
>> With the approach for block layer tag management proposed in this patch
>> series SCSI LLDs no longer need to call this function. This means that the
>> blk_mq_build_unique_tag() function can be eliminated by inlining it into
>> blk_mq_unique_tag(). Would you like me to rework this patch accordingly ?
>
> Yes, please.
>

With this bit you can add:

Reviewed-by: Sagi Grimberg <sagig@mellanox.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 07/12] IB/srp: Avoid that I/O hangs due to a cable pull during LUN scanning
  2014-10-07 13:05 ` [PATCH v2 07/12] IB/srp: Avoid that I/O hangs due to a cable pull during LUN scanning Bart Van Assche
@ 2014-10-19 16:27   ` Sagi Grimberg
       [not found]     ` <5443E66F.7050901-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Sagi Grimberg @ 2014-10-19 16:27 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi, linux-rdma

On 10/7/2014 4:05 PM, Bart Van Assche wrote:
> If a cable is pulled during LUN scanning it can happen that the
> SRP rport and the SCSI host have been created but no LUNs have been
> added to the SCSI host. Since multipathd only sends SCSI commands
> to a SCSI target if one or more SCSI devices are present and since
> there is no keepalive mechanism for IB queue pairs this means that
> after a LUN scan failed and after a reconnect has succeeded no
> data will be sent over the QP and hence that a subsequent cable
> pull will not be detected. Avoid this by not creating an rport or
> SCSI host if a cable is pulled during a SCSI LUN scan.
>
> Note: so far the above behavior has only been observed with the
> kernel module parameter ch_count set to a value >= 2.
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Cc: Sagi Grimberg <sagig@mellanox.com>
> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
> ---
>   drivers/infiniband/ulp/srp/ib_srp.c | 60 +++++++++++++++++++++++++++++++------
>   drivers/infiniband/ulp/srp/ib_srp.h |  1 +
>   2 files changed, 52 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index 9608e7a..a662c29 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -1111,6 +1111,10 @@ static int srp_rport_reconnect(struct srp_rport *rport)
>   	int i, ret;
>
>   	srp_disconnect_target(target);
> +
> +	if (target->state == SRP_TARGET_SCANNING)
> +		return -ENODEV;
> +
>   	/*
>   	 * Now get a new local CM ID so that we avoid confusing the target in
>   	 * case things are really fouled up. Doing so also ensures that all CM
> @@ -2607,11 +2611,23 @@ static struct scsi_host_template srp_template = {
>   	.shost_attrs			= srp_host_attrs
>   };
>
> +static int srp_sdev_count(struct Scsi_Host *host)
> +{
> +	struct scsi_device *sdev;
> +	int c = 0;
> +
> +	shost_for_each_device(sdev, host)
> +		c++;
> +
> +	return c;
> +}
> +

Is this really an SRP specific routine?
Can you move it to a more natural location?

>   static int srp_add_target(struct srp_host *host, struct srp_target_port *target)
>   {
>   	struct srp_rport_identifiers ids;
>   	struct srp_rport *rport;
>
> +	target->state = SRP_TARGET_SCANNING;
>   	sprintf(target->target_name, "SRP.T10:%016llX",
>   		 (unsigned long long) be64_to_cpu(target->id_ext));
>
> @@ -2634,11 +2650,26 @@ static int srp_add_target(struct srp_host *host, struct srp_target_port *target)
>   	list_add_tail(&target->list, &host->target_list);
>   	spin_unlock(&host->target_lock);
>
> -	target->state = SRP_TARGET_LIVE;
> -
>   	scsi_scan_target(&target->scsi_host->shost_gendev,
>   			 0, target->scsi_id, SCAN_WILD_CARD, 0);
>
> +	if (!target->connected || target->qp_in_error) {
> +		shost_printk(KERN_INFO, target->scsi_host,
> +			     PFX "SCSI scan failed - removing SCSI host\n");
> +		srp_queue_remove_work(target);
> +		goto out;
> +	}

So my impression is that by conditioning target->qp_in_error you are
relying on the fact that SRP eh was invoked here (RC error), what if
scsi eh was invoked prior to that? did you test this path?

> +
> +	pr_debug(PFX "%s: SCSI scan succeeded - detected %d LUNs\n",
> +		 dev_name(&target->scsi_host->shost_gendev),
> +		 srp_sdev_count(target->scsi_host));
> +
> +	spin_lock_irq(&target->lock);
> +	if (target->state == SRP_TARGET_SCANNING)
> +		target->state = SRP_TARGET_LIVE;
> +	spin_unlock_irq(&target->lock);
> +
> +out:
>   	return 0;
>   }
>
> @@ -2982,6 +3013,12 @@ static ssize_t srp_create_target(struct device *dev,
>   	target->tl_retry_count	= 7;
>   	target->queue_size	= SRP_DEFAULT_QUEUE_SIZE;
>
> +	/*
> +	 * Avoid that the SCSI host can be removed by srp_remove_target()
> +	 * before this function returns.
> +	 */
> +	scsi_host_get(target->scsi_host);
> +
>   	mutex_lock(&host->add_target_mutex);
>
>   	ret = srp_parse_options(buf, target);
> @@ -3044,18 +3081,23 @@ static ssize_t srp_create_target(struct device *dev,
>   	if (ret)
>   		goto err_disconnect;
>
> -	shost_printk(KERN_DEBUG, target->scsi_host, PFX
> -		     "new target: id_ext %016llx ioc_guid %016llx pkey %04x service_id %016llx sgid %pI6 dgid %pI6\n",
> -		     be64_to_cpu(target->id_ext),
> -		     be64_to_cpu(target->ioc_guid),
> -		     be16_to_cpu(target->path.pkey),
> -		     be64_to_cpu(target->service_id),
> -		     target->path.sgid.raw, target->path.dgid.raw);
> +	if (target->state != SRP_TARGET_REMOVED) {
> +		shost_printk(KERN_DEBUG, target->scsi_host, PFX
> +			     "new target: id_ext %016llx ioc_guid %016llx pkey %04x service_id %016llx sgid %pI6 dgid %pI6\n",
> +			     be64_to_cpu(target->id_ext),
> +			     be64_to_cpu(target->ioc_guid),
> +			     be16_to_cpu(target->path.pkey),
> +			     be64_to_cpu(target->service_id),
> +			     target->path.sgid.raw, target->orig_dgid);
> +	}
>
>   	ret = count;
>
>   out:
>   	mutex_unlock(&host->add_target_mutex);
> +
> +	scsi_host_put(target->scsi_host);
> +
>   	return ret;
>
>   err_disconnect:
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
> index e46ecb1..00c7c48 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.h
> +++ b/drivers/infiniband/ulp/srp/ib_srp.h
> @@ -73,6 +73,7 @@ enum {
>   };
>
>   enum srp_target_state {
> +	SRP_TARGET_SCANNING,
>   	SRP_TARGET_LIVE,
>   	SRP_TARGET_REMOVED,
>   };
>


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 08/12] IB/srp: Introduce two new srp_target_port member variables
  2014-10-07 13:05 ` [PATCH v2 08/12] IB/srp: Introduce two new srp_target_port member variables Bart Van Assche
@ 2014-10-19 16:30   ` Sagi Grimberg
  0 siblings, 0 replies; 83+ messages in thread
From: Sagi Grimberg @ 2014-10-19 16:30 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi, linux-rdma

On 10/7/2014 4:05 PM, Bart Van Assche wrote:
> Introduce the srp_target_port member variables 'sgid' and 'pkey'.
> Change the type of 'orig_dgid' from __be16[8] into union ib_gid.
> This patch does not change any functionality but makes the
> "Separate target and channel variables" patch easier to verify.
>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Cc: Sagi Grimberg <sagig@mellanox.com>
> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
> ---
>   drivers/infiniband/ulp/srp/ib_srp.c | 39 ++++++++++++++++++++++---------------
>   drivers/infiniband/ulp/srp/ib_srp.h |  4 +++-
>   2 files changed, 26 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index a662c29..5685062 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -262,7 +262,7 @@ static int srp_init_qp(struct srp_target_port *target,
>
>   	ret = ib_find_pkey(target->srp_host->srp_dev->dev,
>   			   target->srp_host->port,
> -			   be16_to_cpu(target->path.pkey),
> +			   be16_to_cpu(target->pkey),
>   			   &attr->pkey_index);
>   	if (ret)
>   		goto out;
> @@ -295,6 +295,10 @@ static int srp_new_cm_id(struct srp_target_port *target)
>   	if (target->cm_id)
>   		ib_destroy_cm_id(target->cm_id);
>   	target->cm_id = new_cm_id;
> +	target->path.sgid = target->sgid;
> +	target->path.dgid = target->orig_dgid;
> +	target->path.pkey = target->pkey;
> +	target->path.service_id = target->service_id;
>
>   	return 0;
>   }
> @@ -689,7 +693,7 @@ static int srp_send_req(struct srp_target_port *target)
>   	 */
>   	if (target->io_class == SRP_REV10_IB_IO_CLASS) {
>   		memcpy(req->priv.initiator_port_id,
> -		       &target->path.sgid.global.interface_id, 8);
> +		       &target->sgid.global.interface_id, 8);
>   		memcpy(req->priv.initiator_port_id + 8,
>   		       &target->initiator_ext, 8);
>   		memcpy(req->priv.target_port_id,     &target->ioc_guid, 8);
> @@ -698,7 +702,7 @@ static int srp_send_req(struct srp_target_port *target)
>   		memcpy(req->priv.initiator_port_id,
>   		       &target->initiator_ext, 8);
>   		memcpy(req->priv.initiator_port_id + 8,
> -		       &target->path.sgid.global.interface_id, 8);
> +		       &target->sgid.global.interface_id, 8);
>   		memcpy(req->priv.target_port_id,     &target->id_ext, 8);
>   		memcpy(req->priv.target_port_id + 8, &target->ioc_guid, 8);
>   	}
> @@ -2175,8 +2179,8 @@ static void srp_cm_rej_handler(struct ib_cm_id *cm_id,
>   			else
>   				shost_printk(KERN_WARNING, shost, PFX
>   					     "SRP LOGIN from %pI6 to %pI6 REJECTED, reason 0x%08x\n",
> -					     target->path.sgid.raw,
> -					     target->orig_dgid, reason);
> +					     target->sgid.raw,
> +					     target->orig_dgid.raw, reason);
>   		} else
>   			shost_printk(KERN_WARNING, shost,
>   				     "  REJ reason: IB_CM_REJ_CONSUMER_DEFINED,"
> @@ -2464,7 +2468,7 @@ static ssize_t show_pkey(struct device *dev, struct device_attribute *attr,
>   {
>   	struct srp_target_port *target = host_to_target(class_to_shost(dev));
>
> -	return sprintf(buf, "0x%04x\n", be16_to_cpu(target->path.pkey));
> +	return sprintf(buf, "0x%04x\n", be16_to_cpu(target->pkey));
>   }
>
>   static ssize_t show_sgid(struct device *dev, struct device_attribute *attr,
> @@ -2472,7 +2476,7 @@ static ssize_t show_sgid(struct device *dev, struct device_attribute *attr,
>   {
>   	struct srp_target_port *target = host_to_target(class_to_shost(dev));
>
> -	return sprintf(buf, "%pI6\n", target->path.sgid.raw);
> +	return sprintf(buf, "%pI6\n", target->sgid.raw);
>   }
>
>   static ssize_t show_dgid(struct device *dev, struct device_attribute *attr,
> @@ -2488,7 +2492,7 @@ static ssize_t show_orig_dgid(struct device *dev,
>   {
>   	struct srp_target_port *target = host_to_target(class_to_shost(dev));
>
> -	return sprintf(buf, "%pI6\n", target->orig_dgid);
> +	return sprintf(buf, "%pI6\n", target->orig_dgid.raw);
>   }
>
>   static ssize_t show_req_lim(struct device *dev,
> @@ -2826,11 +2830,15 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target)
>   			}
>
>   			for (i = 0; i < 16; ++i) {
> -				strlcpy(dgid, p + i * 2, 3);
> -				target->path.dgid.raw[i] = simple_strtoul(dgid, NULL, 16);
> +				strlcpy(dgid, p + i * 2, sizeof(dgid));
> +				if (sscanf(dgid, "%hhx",
> +					   &target->orig_dgid.raw[i]) < 1) {
> +					ret = -EINVAL;
> +					kfree(p);
> +					goto out;
> +				}
>   			}
>   			kfree(p);
> -			memcpy(target->orig_dgid, target->path.dgid.raw, 16);
>   			break;
>
>   		case SRP_OPT_PKEY:
> @@ -2838,7 +2846,7 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target)
>   				pr_warn("bad P_Key parameter '%s'\n", p);
>   				goto out;
>   			}
> -			target->path.pkey = cpu_to_be16(token);
> +			target->pkey = cpu_to_be16(token);
>   			break;
>
>   		case SRP_OPT_SERVICE_ID:
> @@ -2848,7 +2856,6 @@ static int srp_parse_options(const char *buf, struct srp_target_port *target)
>   				goto out;
>   			}
>   			target->service_id = cpu_to_be64(simple_strtoull(p, NULL, 16));
> -			target->path.service_id = target->service_id;
>   			kfree(p);
>   			break;
>
> @@ -3058,7 +3065,7 @@ static ssize_t srp_create_target(struct device *dev,
>   	if (ret)
>   		goto err_free_mem;
>
> -	ret = ib_query_gid(ibdev, host->port, 0, &target->path.sgid);
> +	ret = ib_query_gid(ibdev, host->port, 0, &target->sgid);
>   	if (ret)
>   		goto err_free_mem;
>
> @@ -3086,9 +3093,9 @@ static ssize_t srp_create_target(struct device *dev,
>   			     "new target: id_ext %016llx ioc_guid %016llx pkey %04x service_id %016llx sgid %pI6 dgid %pI6\n",
>   			     be64_to_cpu(target->id_ext),
>   			     be64_to_cpu(target->ioc_guid),
> -			     be16_to_cpu(target->path.pkey),
> +			     be16_to_cpu(target->pkey),
>   			     be64_to_cpu(target->service_id),
> -			     target->path.sgid.raw, target->orig_dgid);
> +			     target->sgid.raw, target->orig_dgid.raw);
>   	}
>
>   	ret = count;
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
> index 00c7c48..8635ab6 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.h
> +++ b/drivers/infiniband/ulp/srp/ib_srp.h
> @@ -157,6 +157,7 @@ struct srp_target_port {
>   	 * command processing. Try to keep them packed into cachelines.
>   	 */
>
> +	union ib_gid		sgid;
>   	__be64			id_ext;
>   	__be64			ioc_guid;
>   	__be64			service_id;
> @@ -173,8 +174,9 @@ struct srp_target_port {
>   	int			comp_vector;
>   	int			tl_retry_count;
>
> +	union ib_gid		orig_dgid;
> +	__be16			pkey;
>   	struct ib_sa_path_rec	path;
> -	__be16			orig_dgid[8];
>   	struct ib_sa_query     *path_query;
>   	int			path_query_id;
>
>

Yeh looks good.

Reviewed-by: Sagi Grimberg <sagig@mellanox.com>

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 09/12] IB/srp: Separate target and channel variables
  2014-10-07 13:05   ` [PATCH v2 09/12] IB/srp: Separate target and channel variables Bart Van Assche
@ 2014-10-19 16:48     ` Sagi Grimberg
  0 siblings, 0 replies; 83+ messages in thread
From: Sagi Grimberg @ 2014-10-19 16:48 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi, linux-rdma

On 10/7/2014 4:05 PM, Bart Van Assche wrote:
> Changes in this patch:
> - Move channel variables into a new structure (struct srp_rdma_ch).
> - Add an srp_target_port pointer, 'lock' and 'comp_vector' members
>    in struct srp_rdma_ch.
> - Add code to initialize these three new member variables.
> - Many boring "target->" into "ch->" changes.
> - The cm_id and completion handler context pointers are now of type
>    srp_rdma_ch * instead of srp_target_port *.
> - Three kzalloc(a * b, f) calls have been changed into kcalloc(a, b, f)
>    to avoid that this patch would trigger a checkpatch warning.
> - Two casts from u64 into unsigned long long have been left out
>    because these are superfluous. Since considerable time u64 is
>    defined as unsigned long long for all architectures supported by
>    the Linux kernel.

This patch is pretty exhausting...
Didn't find anything wrong but I didn't carefully review every bit of it.

You can add:
Acked-by: Sagi Grimberg <sagig@mellanox.com>

>
> Signed-off-by: Bart Van Assche <bvanassche@acm.org>
> Cc: Sagi Grimberg <sagig@mellanox.com>
> Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
> ---
>   drivers/infiniband/ulp/srp/ib_srp.c | 674 +++++++++++++++++++-----------------
>   drivers/infiniband/ulp/srp/ib_srp.h |  64 ++--
>   2 files changed, 403 insertions(+), 335 deletions(-)
>
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index 5685062..cc0bf83b 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -125,8 +125,8 @@ MODULE_PARM_DESC(dev_loss_tmo,
>
>   static void srp_add_one(struct ib_device *device);
>   static void srp_remove_one(struct ib_device *device);
> -static void srp_recv_completion(struct ib_cq *cq, void *target_ptr);
> -static void srp_send_completion(struct ib_cq *cq, void *target_ptr);
> +static void srp_recv_completion(struct ib_cq *cq, void *ch_ptr);
> +static void srp_send_completion(struct ib_cq *cq, void *ch_ptr);
>   static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event);
>
>   static struct scsi_transport_template *ib_srp_transport_template;
> @@ -283,22 +283,23 @@ out:
>   	return ret;
>   }
>
> -static int srp_new_cm_id(struct srp_target_port *target)
> +static int srp_new_cm_id(struct srp_rdma_ch *ch)
>   {
> +	struct srp_target_port *target = ch->target;
>   	struct ib_cm_id *new_cm_id;
>
>   	new_cm_id = ib_create_cm_id(target->srp_host->srp_dev->dev,
> -				    srp_cm_handler, target);
> +				    srp_cm_handler, ch);
>   	if (IS_ERR(new_cm_id))
>   		return PTR_ERR(new_cm_id);
>
> -	if (target->cm_id)
> -		ib_destroy_cm_id(target->cm_id);
> -	target->cm_id = new_cm_id;
> -	target->path.sgid = target->sgid;
> -	target->path.dgid = target->orig_dgid;
> -	target->path.pkey = target->pkey;
> -	target->path.service_id = target->service_id;
> +	if (ch->cm_id)
> +		ib_destroy_cm_id(ch->cm_id);
> +	ch->cm_id = new_cm_id;
> +	ch->path.sgid = target->sgid;
> +	ch->path.dgid = target->orig_dgid;
> +	ch->path.pkey = target->pkey;
> +	ch->path.service_id = target->service_id;
>
>   	return 0;
>   }
> @@ -447,8 +448,9 @@ static struct srp_fr_pool *srp_alloc_fr_pool(struct srp_target_port *target)
>   				  dev->max_pages_per_mr);
>   }
>
> -static int srp_create_target_ib(struct srp_target_port *target)
> +static int srp_create_ch_ib(struct srp_rdma_ch *ch)
>   {
> +	struct srp_target_port *target = ch->target;
>   	struct srp_device *dev = target->srp_host->srp_dev;
>   	struct ib_qp_init_attr *init_attr;
>   	struct ib_cq *recv_cq, *send_cq;
> @@ -462,15 +464,15 @@ static int srp_create_target_ib(struct srp_target_port *target)
>   	if (!init_attr)
>   		return -ENOMEM;
>
> -	recv_cq = ib_create_cq(dev->dev, srp_recv_completion, NULL, target,
> -			       target->queue_size, target->comp_vector);
> +	recv_cq = ib_create_cq(dev->dev, srp_recv_completion, NULL, ch,
> +			       target->queue_size, ch->comp_vector);
>   	if (IS_ERR(recv_cq)) {
>   		ret = PTR_ERR(recv_cq);
>   		goto err;
>   	}
>
> -	send_cq = ib_create_cq(dev->dev, srp_send_completion, NULL, target,
> -			       m * target->queue_size, target->comp_vector);
> +	send_cq = ib_create_cq(dev->dev, srp_send_completion, NULL, ch,
> +			       m * target->queue_size, ch->comp_vector);
>   	if (IS_ERR(send_cq)) {
>   		ret = PTR_ERR(send_cq);
>   		goto err_recv_cq;
> @@ -506,9 +508,9 @@ static int srp_create_target_ib(struct srp_target_port *target)
>   				     "FR pool allocation failed (%d)\n", ret);
>   			goto err_qp;
>   		}
> -		if (target->fr_pool)
> -			srp_destroy_fr_pool(target->fr_pool);
> -		target->fr_pool = fr_pool;
> +		if (ch->fr_pool)
> +			srp_destroy_fr_pool(ch->fr_pool);
> +		ch->fr_pool = fr_pool;
>   	} else if (!dev->use_fast_reg && dev->has_fmr) {
>   		fmr_pool = srp_alloc_fmr_pool(target);
>   		if (IS_ERR(fmr_pool)) {
> @@ -517,21 +519,21 @@ static int srp_create_target_ib(struct srp_target_port *target)
>   				     "FMR pool allocation failed (%d)\n", ret);
>   			goto err_qp;
>   		}
> -		if (target->fmr_pool)
> -			ib_destroy_fmr_pool(target->fmr_pool);
> -		target->fmr_pool = fmr_pool;
> +		if (ch->fmr_pool)
> +			ib_destroy_fmr_pool(ch->fmr_pool);
> +		ch->fmr_pool = fmr_pool;
>   	}
>
> -	if (target->qp)
> -		ib_destroy_qp(target->qp);
> -	if (target->recv_cq)
> -		ib_destroy_cq(target->recv_cq);
> -	if (target->send_cq)
> -		ib_destroy_cq(target->send_cq);
> +	if (ch->qp)
> +		ib_destroy_qp(ch->qp);
> +	if (ch->recv_cq)
> +		ib_destroy_cq(ch->recv_cq);
> +	if (ch->send_cq)
> +		ib_destroy_cq(ch->send_cq);
>
> -	target->qp = qp;
> -	target->recv_cq = recv_cq;
> -	target->send_cq = send_cq;
> +	ch->qp = qp;
> +	ch->recv_cq = recv_cq;
> +	ch->send_cq = send_cq;
>
>   	kfree(init_attr);
>   	return 0;
> @@ -552,98 +554,102 @@ err:
>
>   /*
>    * Note: this function may be called without srp_alloc_iu_bufs() having been
> - * invoked. Hence the target->[rt]x_ring checks.
> + * invoked. Hence the ch->[rt]x_ring checks.
>    */
> -static void srp_free_target_ib(struct srp_target_port *target)
> +static void srp_free_ch_ib(struct srp_target_port *target,
> +			   struct srp_rdma_ch *ch)
>   {
>   	struct srp_device *dev = target->srp_host->srp_dev;
>   	int i;
>
> -	if (target->cm_id) {
> -		ib_destroy_cm_id(target->cm_id);
> -		target->cm_id = NULL;
> +	if (ch->cm_id) {
> +		ib_destroy_cm_id(ch->cm_id);
> +		ch->cm_id = NULL;
>   	}
>
>   	if (dev->use_fast_reg) {
> -		if (target->fr_pool)
> -			srp_destroy_fr_pool(target->fr_pool);
> +		if (ch->fr_pool)
> +			srp_destroy_fr_pool(ch->fr_pool);
>   	} else {
> -		if (target->fmr_pool)
> -			ib_destroy_fmr_pool(target->fmr_pool);
> +		if (ch->fmr_pool)
> +			ib_destroy_fmr_pool(ch->fmr_pool);
>   	}
> -	ib_destroy_qp(target->qp);
> -	ib_destroy_cq(target->send_cq);
> -	ib_destroy_cq(target->recv_cq);
> +	ib_destroy_qp(ch->qp);
> +	ib_destroy_cq(ch->send_cq);
> +	ib_destroy_cq(ch->recv_cq);
>
> -	target->qp = NULL;
> -	target->send_cq = target->recv_cq = NULL;
> +	ch->qp = NULL;
> +	ch->send_cq = ch->recv_cq = NULL;
>
> -	if (target->rx_ring) {
> +	if (ch->rx_ring) {
>   		for (i = 0; i < target->queue_size; ++i)
> -			srp_free_iu(target->srp_host, target->rx_ring[i]);
> -		kfree(target->rx_ring);
> -		target->rx_ring = NULL;
> +			srp_free_iu(target->srp_host, ch->rx_ring[i]);
> +		kfree(ch->rx_ring);
> +		ch->rx_ring = NULL;
>   	}
> -	if (target->tx_ring) {
> +	if (ch->tx_ring) {
>   		for (i = 0; i < target->queue_size; ++i)
> -			srp_free_iu(target->srp_host, target->tx_ring[i]);
> -		kfree(target->tx_ring);
> -		target->tx_ring = NULL;
> +			srp_free_iu(target->srp_host, ch->tx_ring[i]);
> +		kfree(ch->tx_ring);
> +		ch->tx_ring = NULL;
>   	}
>   }
>
>   static void srp_path_rec_completion(int status,
>   				    struct ib_sa_path_rec *pathrec,
> -				    void *target_ptr)
> +				    void *ch_ptr)
>   {
> -	struct srp_target_port *target = target_ptr;
> +	struct srp_rdma_ch *ch = ch_ptr;
> +	struct srp_target_port *target = ch->target;
>
> -	target->status = status;
> +	ch->status = status;
>   	if (status)
>   		shost_printk(KERN_ERR, target->scsi_host,
>   			     PFX "Got failed path rec status %d\n", status);
>   	else
> -		target->path = *pathrec;
> -	complete(&target->done);
> +		ch->path = *pathrec;
> +	complete(&ch->done);
>   }
>
> -static int srp_lookup_path(struct srp_target_port *target)
> +static int srp_lookup_path(struct srp_rdma_ch *ch)
>   {
> +	struct srp_target_port *target = ch->target;
>   	int ret;
>
> -	target->path.numb_path = 1;
> -
> -	init_completion(&target->done);
> -
> -	target->path_query_id = ib_sa_path_rec_get(&srp_sa_client,
> -						   target->srp_host->srp_dev->dev,
> -						   target->srp_host->port,
> -						   &target->path,
> -						   IB_SA_PATH_REC_SERVICE_ID	|
> -						   IB_SA_PATH_REC_DGID		|
> -						   IB_SA_PATH_REC_SGID		|
> -						   IB_SA_PATH_REC_NUMB_PATH	|
> -						   IB_SA_PATH_REC_PKEY,
> -						   SRP_PATH_REC_TIMEOUT_MS,
> -						   GFP_KERNEL,
> -						   srp_path_rec_completion,
> -						   target, &target->path_query);
> -	if (target->path_query_id < 0)
> -		return target->path_query_id;
> -
> -	ret = wait_for_completion_interruptible(&target->done);
> +	ch->path.numb_path = 1;
> +
> +	init_completion(&ch->done);
> +
> +	ch->path_query_id = ib_sa_path_rec_get(&srp_sa_client,
> +					       target->srp_host->srp_dev->dev,
> +					       target->srp_host->port,
> +					       &ch->path,
> +					       IB_SA_PATH_REC_SERVICE_ID |
> +					       IB_SA_PATH_REC_DGID	 |
> +					       IB_SA_PATH_REC_SGID	 |
> +					       IB_SA_PATH_REC_NUMB_PATH	 |
> +					       IB_SA_PATH_REC_PKEY,
> +					       SRP_PATH_REC_TIMEOUT_MS,
> +					       GFP_KERNEL,
> +					       srp_path_rec_completion,
> +					       ch, &ch->path_query);
> +	if (ch->path_query_id < 0)
> +		return ch->path_query_id;
> +
> +	ret = wait_for_completion_interruptible(&ch->done);
>   	if (ret < 0)
>   		return ret;
>
> -	if (target->status < 0)
> +	if (ch->status < 0)
>   		shost_printk(KERN_WARNING, target->scsi_host,
>   			     PFX "Path record query failed\n");
>
> -	return target->status;
> +	return ch->status;
>   }
>
> -static int srp_send_req(struct srp_target_port *target)
> +static int srp_send_req(struct srp_rdma_ch *ch)
>   {
> +	struct srp_target_port *target = ch->target;
>   	struct {
>   		struct ib_cm_req_param param;
>   		struct srp_login_req   priv;
> @@ -654,11 +660,11 @@ static int srp_send_req(struct srp_target_port *target)
>   	if (!req)
>   		return -ENOMEM;
>
> -	req->param.primary_path 	      = &target->path;
> +	req->param.primary_path		      = &ch->path;
>   	req->param.alternate_path 	      = NULL;
>   	req->param.service_id 		      = target->service_id;
> -	req->param.qp_num 		      = target->qp->qp_num;
> -	req->param.qp_type 		      = target->qp->qp_type;
> +	req->param.qp_num		      = ch->qp->qp_num;
> +	req->param.qp_type		      = ch->qp->qp_type;
>   	req->param.private_data 	      = &req->priv;
>   	req->param.private_data_len 	      = sizeof req->priv;
>   	req->param.flow_control 	      = 1;
> @@ -722,7 +728,7 @@ static int srp_send_req(struct srp_target_port *target)
>   		       &target->srp_host->srp_dev->dev->node_guid, 8);
>   	}
>
> -	status = ib_send_cm_req(target->cm_id, &req->param);
> +	status = ib_send_cm_req(ch->cm_id, &req->param);
>
>   	kfree(req);
>
> @@ -763,28 +769,31 @@ static bool srp_change_conn_state(struct srp_target_port *target,
>
>   static void srp_disconnect_target(struct srp_target_port *target)
>   {
> +	struct srp_rdma_ch *ch = &target->ch;
> +
>   	if (srp_change_conn_state(target, false)) {
>   		/* XXX should send SRP_I_LOGOUT request */
>
> -		if (ib_send_cm_dreq(target->cm_id, NULL, 0)) {
> +		if (ib_send_cm_dreq(ch->cm_id, NULL, 0)) {
>   			shost_printk(KERN_DEBUG, target->scsi_host,
>   				     PFX "Sending CM DREQ failed\n");
>   		}
>   	}
>   }
>
> -static void srp_free_req_data(struct srp_target_port *target)
> +static void srp_free_req_data(struct srp_target_port *target,
> +			      struct srp_rdma_ch *ch)
>   {
>   	struct srp_device *dev = target->srp_host->srp_dev;
>   	struct ib_device *ibdev = dev->dev;
>   	struct srp_request *req;
>   	int i;
>
> -	if (!target->req_ring)
> +	if (!ch->req_ring)
>   		return;
>
>   	for (i = 0; i < target->req_ring_size; ++i) {
> -		req = &target->req_ring[i];
> +		req = &ch->req_ring[i];
>   		if (dev->use_fast_reg)
>   			kfree(req->fr_list);
>   		else
> @@ -798,12 +807,13 @@ static void srp_free_req_data(struct srp_target_port *target)
>   		kfree(req->indirect_desc);
>   	}
>
> -	kfree(target->req_ring);
> -	target->req_ring = NULL;
> +	kfree(ch->req_ring);
> +	ch->req_ring = NULL;
>   }
>
> -static int srp_alloc_req_data(struct srp_target_port *target)
> +static int srp_alloc_req_data(struct srp_rdma_ch *ch)
>   {
> +	struct srp_target_port *target = ch->target;
>   	struct srp_device *srp_dev = target->srp_host->srp_dev;
>   	struct ib_device *ibdev = srp_dev->dev;
>   	struct srp_request *req;
> @@ -811,15 +821,15 @@ static int srp_alloc_req_data(struct srp_target_port *target)
>   	dma_addr_t dma_addr;
>   	int i, ret = -ENOMEM;
>
> -	INIT_LIST_HEAD(&target->free_reqs);
> +	INIT_LIST_HEAD(&ch->free_reqs);
>
> -	target->req_ring = kzalloc(target->req_ring_size *
> -				   sizeof(*target->req_ring), GFP_KERNEL);
> -	if (!target->req_ring)
> +	ch->req_ring = kcalloc(target->req_ring_size, sizeof(*ch->req_ring),
> +			       GFP_KERNEL);
> +	if (!ch->req_ring)
>   		goto out;
>
>   	for (i = 0; i < target->req_ring_size; ++i) {
> -		req = &target->req_ring[i];
> +		req = &ch->req_ring[i];
>   		mr_list = kmalloc(target->cmd_sg_cnt * sizeof(void *),
>   				  GFP_KERNEL);
>   		if (!mr_list)
> @@ -844,7 +854,7 @@ static int srp_alloc_req_data(struct srp_target_port *target)
>
>   		req->indirect_dma_addr = dma_addr;
>   		req->index = i;
> -		list_add_tail(&req->list, &target->free_reqs);
> +		list_add_tail(&req->list, &ch->free_reqs);
>   	}
>   	ret = 0;
>
> @@ -869,6 +879,8 @@ static void srp_del_scsi_host_attr(struct Scsi_Host *shost)
>
>   static void srp_remove_target(struct srp_target_port *target)
>   {
> +	struct srp_rdma_ch *ch = &target->ch;
> +
>   	WARN_ON_ONCE(target->state != SRP_TARGET_REMOVED);
>
>   	srp_del_scsi_host_attr(target->scsi_host);
> @@ -877,10 +889,10 @@ static void srp_remove_target(struct srp_target_port *target)
>   	scsi_remove_host(target->scsi_host);
>   	srp_stop_rport_timers(target->rport);
>   	srp_disconnect_target(target);
> -	srp_free_target_ib(target);
> +	srp_free_ch_ib(target, ch);
>   	cancel_work_sync(&target->tl_err_work);
>   	srp_rport_put(target->rport);
> -	srp_free_req_data(target);
> +	srp_free_req_data(target, ch);
>
>   	spin_lock(&target->srp_host->target_lock);
>   	list_del(&target->list);
> @@ -906,24 +918,25 @@ static void srp_rport_delete(struct srp_rport *rport)
>   	srp_queue_remove_work(target);
>   }
>
> -static int srp_connect_target(struct srp_target_port *target)
> +static int srp_connect_ch(struct srp_rdma_ch *ch)
>   {
> +	struct srp_target_port *target = ch->target;
>   	int ret;
>
>   	WARN_ON_ONCE(target->connected);
>
>   	target->qp_in_error = false;
>
> -	ret = srp_lookup_path(target);
> +	ret = srp_lookup_path(ch);
>   	if (ret)
>   		return ret;
>
>   	while (1) {
> -		init_completion(&target->done);
> -		ret = srp_send_req(target);
> +		init_completion(&ch->done);
> +		ret = srp_send_req(ch);
>   		if (ret)
>   			return ret;
> -		ret = wait_for_completion_interruptible(&target->done);
> +		ret = wait_for_completion_interruptible(&ch->done);
>   		if (ret < 0)
>   			return ret;
>
> @@ -933,13 +946,13 @@ static int srp_connect_target(struct srp_target_port *target)
>   		 * back, or SRP_DLID_REDIRECT if we get a lid/qp
>   		 * redirect REJ back.
>   		 */
> -		switch (target->status) {
> +		switch (ch->status) {
>   		case 0:
>   			srp_change_conn_state(target, true);
>   			return 0;
>
>   		case SRP_PORT_REDIRECT:
> -			ret = srp_lookup_path(target);
> +			ret = srp_lookup_path(ch);
>   			if (ret)
>   				return ret;
>   			break;
> @@ -950,16 +963,16 @@ static int srp_connect_target(struct srp_target_port *target)
>   		case SRP_STALE_CONN:
>   			shost_printk(KERN_ERR, target->scsi_host, PFX
>   				     "giving up on stale connection\n");
> -			target->status = -ECONNRESET;
> -			return target->status;
> +			ch->status = -ECONNRESET;
> +			return ch->status;
>
>   		default:
> -			return target->status;
> +			return ch->status;
>   		}
>   	}
>   }
>
> -static int srp_inv_rkey(struct srp_target_port *target, u32 rkey)
> +static int srp_inv_rkey(struct srp_rdma_ch *ch, u32 rkey)
>   {
>   	struct ib_send_wr *bad_wr;
>   	struct ib_send_wr wr = {
> @@ -971,13 +984,14 @@ static int srp_inv_rkey(struct srp_target_port *target, u32 rkey)
>   		.ex.invalidate_rkey = rkey,
>   	};
>
> -	return ib_post_send(target->qp, &wr, &bad_wr);
> +	return ib_post_send(ch->qp, &wr, &bad_wr);
>   }
>
>   static void srp_unmap_data(struct scsi_cmnd *scmnd,
> -			   struct srp_target_port *target,
> +			   struct srp_rdma_ch *ch,
>   			   struct srp_request *req)
>   {
> +	struct srp_target_port *target = ch->target;
>   	struct srp_device *dev = target->srp_host->srp_dev;
>   	struct ib_device *ibdev = dev->dev;
>   	int i, res;
> @@ -991,7 +1005,7 @@ static void srp_unmap_data(struct scsi_cmnd *scmnd,
>   		struct srp_fr_desc **pfr;
>
>   		for (i = req->nmdesc, pfr = req->fr_list; i > 0; i--, pfr++) {
> -			res = srp_inv_rkey(target, (*pfr)->mr->rkey);
> +			res = srp_inv_rkey(ch, (*pfr)->mr->rkey);
>   			if (res < 0) {
>   				shost_printk(KERN_ERR, target->scsi_host, PFX
>   				  "Queueing INV WR for rkey %#x failed (%d)\n",
> @@ -1001,7 +1015,7 @@ static void srp_unmap_data(struct scsi_cmnd *scmnd,
>   			}
>   		}
>   		if (req->nmdesc)
> -			srp_fr_pool_put(target->fr_pool, req->fr_list,
> +			srp_fr_pool_put(ch->fr_pool, req->fr_list,
>   					req->nmdesc);
>   	} else {
>   		struct ib_pool_fmr **pfmr;
> @@ -1016,7 +1030,7 @@ static void srp_unmap_data(struct scsi_cmnd *scmnd,
>
>   /**
>    * srp_claim_req - Take ownership of the scmnd associated with a request.
> - * @target: SRP target port.
> + * @ch: SRP RDMA channel.
>    * @req: SRP request.
>    * @sdev: If not NULL, only take ownership for this SCSI device.
>    * @scmnd: If NULL, take ownership of @req->scmnd. If not NULL, only take
> @@ -1025,14 +1039,14 @@ static void srp_unmap_data(struct scsi_cmnd *scmnd,
>    * Return value:
>    * Either NULL or a pointer to the SCSI command the caller became owner of.
>    */
> -static struct scsi_cmnd *srp_claim_req(struct srp_target_port *target,
> +static struct scsi_cmnd *srp_claim_req(struct srp_rdma_ch *ch,
>   				       struct srp_request *req,
>   				       struct scsi_device *sdev,
>   				       struct scsi_cmnd *scmnd)
>   {
>   	unsigned long flags;
>
> -	spin_lock_irqsave(&target->lock, flags);
> +	spin_lock_irqsave(&ch->lock, flags);
>   	if (req->scmnd &&
>   	    (!sdev || req->scmnd->device == sdev) &&
>   	    (!scmnd || req->scmnd == scmnd)) {
> @@ -1041,40 +1055,38 @@ static struct scsi_cmnd *srp_claim_req(struct srp_target_port *target,
>   	} else {
>   		scmnd = NULL;
>   	}
> -	spin_unlock_irqrestore(&target->lock, flags);
> +	spin_unlock_irqrestore(&ch->lock, flags);
>
>   	return scmnd;
>   }
>
>   /**
>    * srp_free_req() - Unmap data and add request to the free request list.
> - * @target: SRP target port.
> + * @ch:     SRP RDMA channel.
>    * @req:    Request to be freed.
>    * @scmnd:  SCSI command associated with @req.
>    * @req_lim_delta: Amount to be added to @target->req_lim.
>    */
> -static void srp_free_req(struct srp_target_port *target,
> -			 struct srp_request *req, struct scsi_cmnd *scmnd,
> -			 s32 req_lim_delta)
> +static void srp_free_req(struct srp_rdma_ch *ch, struct srp_request *req,
> +			 struct scsi_cmnd *scmnd, s32 req_lim_delta)
>   {
>   	unsigned long flags;
>
> -	srp_unmap_data(scmnd, target, req);
> +	srp_unmap_data(scmnd, ch, req);
>
> -	spin_lock_irqsave(&target->lock, flags);
> -	target->req_lim += req_lim_delta;
> -	list_add_tail(&req->list, &target->free_reqs);
> -	spin_unlock_irqrestore(&target->lock, flags);
> +	spin_lock_irqsave(&ch->lock, flags);
> +	ch->req_lim += req_lim_delta;
> +	list_add_tail(&req->list, &ch->free_reqs);
> +	spin_unlock_irqrestore(&ch->lock, flags);
>   }
>
> -static void srp_finish_req(struct srp_target_port *target,
> -			   struct srp_request *req, struct scsi_device *sdev,
> -			   int result)
> +static void srp_finish_req(struct srp_rdma_ch *ch, struct srp_request *req,
> +			   struct scsi_device *sdev, int result)
>   {
> -	struct scsi_cmnd *scmnd = srp_claim_req(target, req, sdev, NULL);
> +	struct scsi_cmnd *scmnd = srp_claim_req(ch, req, sdev, NULL);
>
>   	if (scmnd) {
> -		srp_free_req(target, req, scmnd, 0);
> +		srp_free_req(ch, req, scmnd, 0);
>   		scmnd->result = result;
>   		scmnd->scsi_done(scmnd);
>   	}
> @@ -1083,6 +1095,7 @@ static void srp_finish_req(struct srp_target_port *target,
>   static void srp_terminate_io(struct srp_rport *rport)
>   {
>   	struct srp_target_port *target = rport->lld_data;
> +	struct srp_rdma_ch *ch = &target->ch;
>   	struct Scsi_Host *shost = target->scsi_host;
>   	struct scsi_device *sdev;
>   	int i;
> @@ -1095,8 +1108,9 @@ static void srp_terminate_io(struct srp_rport *rport)
>   		WARN_ON_ONCE(sdev->request_queue->request_fn_active);
>
>   	for (i = 0; i < target->req_ring_size; ++i) {
> -		struct srp_request *req = &target->req_ring[i];
> -		srp_finish_req(target, req, NULL, DID_TRANSPORT_FAILFAST << 16);
> +		struct srp_request *req = &ch->req_ring[i];
> +
> +		srp_finish_req(ch, req, NULL, DID_TRANSPORT_FAILFAST << 16);
>   	}
>   }
>
> @@ -1112,6 +1126,7 @@ static void srp_terminate_io(struct srp_rport *rport)
>   static int srp_rport_reconnect(struct srp_rport *rport)
>   {
>   	struct srp_target_port *target = rport->lld_data;
> +	struct srp_rdma_ch *ch = &target->ch;
>   	int i, ret;
>
>   	srp_disconnect_target(target);
> @@ -1124,11 +1139,12 @@ static int srp_rport_reconnect(struct srp_rport *rport)
>   	 * case things are really fouled up. Doing so also ensures that all CM
>   	 * callbacks will have finished before a new QP is allocated.
>   	 */
> -	ret = srp_new_cm_id(target);
> +	ret = srp_new_cm_id(ch);
>
>   	for (i = 0; i < target->req_ring_size; ++i) {
> -		struct srp_request *req = &target->req_ring[i];
> -		srp_finish_req(target, req, NULL, DID_RESET << 16);
> +		struct srp_request *req = &ch->req_ring[i];
> +
> +		srp_finish_req(ch, req, NULL, DID_RESET << 16);
>   	}
>
>   	/*
> @@ -1136,14 +1152,14 @@ static int srp_rport_reconnect(struct srp_rport *rport)
>   	 * QP. This guarantees that all callback functions for the old QP have
>   	 * finished before any send requests are posted on the new QP.
>   	 */
> -	ret += srp_create_target_ib(target);
> +	ret += srp_create_ch_ib(ch);
>
> -	INIT_LIST_HEAD(&target->free_tx);
> +	INIT_LIST_HEAD(&ch->free_tx);
>   	for (i = 0; i < target->queue_size; ++i)
> -		list_add(&target->tx_ring[i]->list, &target->free_tx);
> +		list_add(&ch->tx_ring[i]->list, &ch->free_tx);
>
>   	if (ret == 0)
> -		ret = srp_connect_target(target);
> +		ret = srp_connect_ch(ch);
>
>   	if (ret == 0)
>   		shost_printk(KERN_INFO, target->scsi_host,
> @@ -1167,12 +1183,12 @@ static void srp_map_desc(struct srp_map_state *state, dma_addr_t dma_addr,
>   }
>
>   static int srp_map_finish_fmr(struct srp_map_state *state,
> -			      struct srp_target_port *target)
> +			      struct srp_rdma_ch *ch)
>   {
>   	struct ib_pool_fmr *fmr;
>   	u64 io_addr = 0;
>
> -	fmr = ib_fmr_pool_map_phys(target->fmr_pool, state->pages,
> +	fmr = ib_fmr_pool_map_phys(ch->fmr_pool, state->pages,
>   				   state->npages, io_addr);
>   	if (IS_ERR(fmr))
>   		return PTR_ERR(fmr);
> @@ -1186,15 +1202,16 @@ static int srp_map_finish_fmr(struct srp_map_state *state,
>   }
>
>   static int srp_map_finish_fr(struct srp_map_state *state,
> -			     struct srp_target_port *target)
> +			     struct srp_rdma_ch *ch)
>   {
> +	struct srp_target_port *target = ch->target;
>   	struct srp_device *dev = target->srp_host->srp_dev;
>   	struct ib_send_wr *bad_wr;
>   	struct ib_send_wr wr;
>   	struct srp_fr_desc *desc;
>   	u32 rkey;
>
> -	desc = srp_fr_pool_get(target->fr_pool);
> +	desc = srp_fr_pool_get(ch->fr_pool);
>   	if (!desc)
>   		return -ENOMEM;
>
> @@ -1223,12 +1240,13 @@ static int srp_map_finish_fr(struct srp_map_state *state,
>   	srp_map_desc(state, state->base_dma_addr, state->dma_len,
>   		     desc->mr->rkey);
>
> -	return ib_post_send(target->qp, &wr, &bad_wr);
> +	return ib_post_send(ch->qp, &wr, &bad_wr);
>   }
>
>   static int srp_finish_mapping(struct srp_map_state *state,
> -			      struct srp_target_port *target)
> +			      struct srp_rdma_ch *ch)
>   {
> +	struct srp_target_port *target = ch->target;
>   	int ret = 0;
>
>   	if (state->npages == 0)
> @@ -1239,8 +1257,8 @@ static int srp_finish_mapping(struct srp_map_state *state,
>   			     target->rkey);
>   	else
>   		ret = target->srp_host->srp_dev->use_fast_reg ?
> -			srp_map_finish_fr(state, target) :
> -			srp_map_finish_fmr(state, target);
> +			srp_map_finish_fr(state, ch) :
> +			srp_map_finish_fmr(state, ch);
>
>   	if (ret == 0) {
>   		state->npages = 0;
> @@ -1260,10 +1278,11 @@ static void srp_map_update_start(struct srp_map_state *state,
>   }
>
>   static int srp_map_sg_entry(struct srp_map_state *state,
> -			    struct srp_target_port *target,
> +			    struct srp_rdma_ch *ch,
>   			    struct scatterlist *sg, int sg_index,
>   			    bool use_mr)
>   {
> +	struct srp_target_port *target = ch->target;
>   	struct srp_device *dev = target->srp_host->srp_dev;
>   	struct ib_device *ibdev = dev->dev;
>   	dma_addr_t dma_addr = ib_sg_dma_address(ibdev, sg);
> @@ -1292,7 +1311,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
>   	 */
>   	if ((!dev->use_fast_reg && dma_addr & ~dev->mr_page_mask) ||
>   	    dma_len > dev->mr_max_size) {
> -		ret = srp_finish_mapping(state, target);
> +		ret = srp_finish_mapping(state, ch);
>   		if (ret)
>   			return ret;
>
> @@ -1313,7 +1332,7 @@ static int srp_map_sg_entry(struct srp_map_state *state,
>   	while (dma_len) {
>   		unsigned offset = dma_addr & ~dev->mr_page_mask;
>   		if (state->npages == dev->max_pages_per_mr || offset != 0) {
> -			ret = srp_finish_mapping(state, target);
> +			ret = srp_finish_mapping(state, ch);
>   			if (ret)
>   				return ret;
>
> @@ -1337,17 +1356,18 @@ static int srp_map_sg_entry(struct srp_map_state *state,
>   	 */
>   	ret = 0;
>   	if (len != dev->mr_page_size) {
> -		ret = srp_finish_mapping(state, target);
> +		ret = srp_finish_mapping(state, ch);
>   		if (!ret)
>   			srp_map_update_start(state, NULL, 0, 0);
>   	}
>   	return ret;
>   }
>
> -static int srp_map_sg(struct srp_map_state *state,
> -		      struct srp_target_port *target, struct srp_request *req,
> -		      struct scatterlist *scat, int count)
> +static int srp_map_sg(struct srp_map_state *state, struct srp_rdma_ch *ch,
> +		      struct srp_request *req, struct scatterlist *scat,
> +		      int count)
>   {
> +	struct srp_target_port *target = ch->target;
>   	struct srp_device *dev = target->srp_host->srp_dev;
>   	struct ib_device *ibdev = dev->dev;
>   	struct scatterlist *sg;
> @@ -1358,14 +1378,14 @@ static int srp_map_sg(struct srp_map_state *state,
>   	state->pages	= req->map_page;
>   	if (dev->use_fast_reg) {
>   		state->next_fr = req->fr_list;
> -		use_mr = !!target->fr_pool;
> +		use_mr = !!ch->fr_pool;
>   	} else {
>   		state->next_fmr = req->fmr_list;
> -		use_mr = !!target->fmr_pool;
> +		use_mr = !!ch->fmr_pool;
>   	}
>
>   	for_each_sg(scat, sg, count, i) {
> -		if (srp_map_sg_entry(state, target, sg, i, use_mr)) {
> +		if (srp_map_sg_entry(state, ch, sg, i, use_mr)) {
>   			/*
>   			 * Memory registration failed, so backtrack to the
>   			 * first unmapped entry and continue on without using
> @@ -1387,7 +1407,7 @@ backtrack:
>   		}
>   	}
>
> -	if (use_mr && srp_finish_mapping(state, target))
> +	if (use_mr && srp_finish_mapping(state, ch))
>   		goto backtrack;
>
>   	req->nmdesc = state->nmdesc;
> @@ -1395,9 +1415,10 @@ backtrack:
>   	return 0;
>   }
>
> -static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target,
> +static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_rdma_ch *ch,
>   			struct srp_request *req)
>   {
> +	struct srp_target_port *target = ch->target;
>   	struct scatterlist *scat;
>   	struct srp_cmd *cmd = req->cmd->buf;
>   	int len, nents, count;
> @@ -1459,7 +1480,7 @@ static int srp_map_data(struct scsi_cmnd *scmnd, struct srp_target_port *target,
>   				   target->indirect_size, DMA_TO_DEVICE);
>
>   	memset(&state, 0, sizeof(state));
> -	srp_map_sg(&state, target, req, scat, count);
> +	srp_map_sg(&state, ch, req, scat, count);
>
>   	/* We've mapped the request, now pull as much of the indirect
>   	 * descriptor table as we can into the command buffer. If this
> @@ -1520,20 +1541,20 @@ map_complete:
>   /*
>    * Return an IU and possible credit to the free pool
>    */
> -static void srp_put_tx_iu(struct srp_target_port *target, struct srp_iu *iu,
> +static void srp_put_tx_iu(struct srp_rdma_ch *ch, struct srp_iu *iu,
>   			  enum srp_iu_type iu_type)
>   {
>   	unsigned long flags;
>
> -	spin_lock_irqsave(&target->lock, flags);
> -	list_add(&iu->list, &target->free_tx);
> +	spin_lock_irqsave(&ch->lock, flags);
> +	list_add(&iu->list, &ch->free_tx);
>   	if (iu_type != SRP_IU_RSP)
> -		++target->req_lim;
> -	spin_unlock_irqrestore(&target->lock, flags);
> +		++ch->req_lim;
> +	spin_unlock_irqrestore(&ch->lock, flags);
>   }
>
>   /*
> - * Must be called with target->lock held to protect req_lim and free_tx.
> + * Must be called with ch->lock held to protect req_lim and free_tx.
>    * If IU is not sent, it must be returned using srp_put_tx_iu().
>    *
>    * Note:
> @@ -1545,35 +1566,36 @@ static void srp_put_tx_iu(struct srp_target_port *target, struct srp_iu *iu,
>    * - SRP_IU_RSP: 1, since a conforming SRP target never sends more than
>    *   one unanswered SRP request to an initiator.
>    */
> -static struct srp_iu *__srp_get_tx_iu(struct srp_target_port *target,
> +static struct srp_iu *__srp_get_tx_iu(struct srp_rdma_ch *ch,
>   				      enum srp_iu_type iu_type)
>   {
> +	struct srp_target_port *target = ch->target;
>   	s32 rsv = (iu_type == SRP_IU_TSK_MGMT) ? 0 : SRP_TSK_MGMT_SQ_SIZE;
>   	struct srp_iu *iu;
>
> -	srp_send_completion(target->send_cq, target);
> +	srp_send_completion(ch->send_cq, ch);
>
> -	if (list_empty(&target->free_tx))
> +	if (list_empty(&ch->free_tx))
>   		return NULL;
>
>   	/* Initiator responses to target requests do not consume credits */
>   	if (iu_type != SRP_IU_RSP) {
> -		if (target->req_lim <= rsv) {
> +		if (ch->req_lim <= rsv) {
>   			++target->zero_req_lim;
>   			return NULL;
>   		}
>
> -		--target->req_lim;
> +		--ch->req_lim;
>   	}
>
> -	iu = list_first_entry(&target->free_tx, struct srp_iu, list);
> +	iu = list_first_entry(&ch->free_tx, struct srp_iu, list);
>   	list_del(&iu->list);
>   	return iu;
>   }
>
> -static int srp_post_send(struct srp_target_port *target,
> -			 struct srp_iu *iu, int len)
> +static int srp_post_send(struct srp_rdma_ch *ch, struct srp_iu *iu, int len)
>   {
> +	struct srp_target_port *target = ch->target;
>   	struct ib_sge list;
>   	struct ib_send_wr wr, *bad_wr;
>
> @@ -1588,11 +1610,12 @@ static int srp_post_send(struct srp_target_port *target,
>   	wr.opcode     = IB_WR_SEND;
>   	wr.send_flags = IB_SEND_SIGNALED;
>
> -	return ib_post_send(target->qp, &wr, &bad_wr);
> +	return ib_post_send(ch->qp, &wr, &bad_wr);
>   }
>
> -static int srp_post_recv(struct srp_target_port *target, struct srp_iu *iu)
> +static int srp_post_recv(struct srp_rdma_ch *ch, struct srp_iu *iu)
>   {
> +	struct srp_target_port *target = ch->target;
>   	struct ib_recv_wr wr, *bad_wr;
>   	struct ib_sge list;
>
> @@ -1605,35 +1628,36 @@ static int srp_post_recv(struct srp_target_port *target, struct srp_iu *iu)
>   	wr.sg_list  = &list;
>   	wr.num_sge  = 1;
>
> -	return ib_post_recv(target->qp, &wr, &bad_wr);
> +	return ib_post_recv(ch->qp, &wr, &bad_wr);
>   }
>
> -static void srp_process_rsp(struct srp_target_port *target, struct srp_rsp *rsp)
> +static void srp_process_rsp(struct srp_rdma_ch *ch, struct srp_rsp *rsp)
>   {
> +	struct srp_target_port *target = ch->target;
>   	struct srp_request *req;
>   	struct scsi_cmnd *scmnd;
>   	unsigned long flags;
>
>   	if (unlikely(rsp->tag & SRP_TAG_TSK_MGMT)) {
> -		spin_lock_irqsave(&target->lock, flags);
> -		target->req_lim += be32_to_cpu(rsp->req_lim_delta);
> -		spin_unlock_irqrestore(&target->lock, flags);
> +		spin_lock_irqsave(&ch->lock, flags);
> +		ch->req_lim += be32_to_cpu(rsp->req_lim_delta);
> +		spin_unlock_irqrestore(&ch->lock, flags);
>
> -		target->tsk_mgmt_status = -1;
> +		ch->tsk_mgmt_status = -1;
>   		if (be32_to_cpu(rsp->resp_data_len) >= 4)
> -			target->tsk_mgmt_status = rsp->data[3];
> -		complete(&target->tsk_mgmt_done);
> +			ch->tsk_mgmt_status = rsp->data[3];
> +		complete(&ch->tsk_mgmt_done);
>   	} else {
> -		req = &target->req_ring[rsp->tag];
> -		scmnd = srp_claim_req(target, req, NULL, NULL);
> +		req = &ch->req_ring[rsp->tag];
> +		scmnd = srp_claim_req(ch, req, NULL, NULL);
>   		if (!scmnd) {
>   			shost_printk(KERN_ERR, target->scsi_host,
>   				     "Null scmnd for RSP w/tag %016llx\n",
>   				     (unsigned long long) rsp->tag);
>
> -			spin_lock_irqsave(&target->lock, flags);
> -			target->req_lim += be32_to_cpu(rsp->req_lim_delta);
> -			spin_unlock_irqrestore(&target->lock, flags);
> +			spin_lock_irqsave(&ch->lock, flags);
> +			ch->req_lim += be32_to_cpu(rsp->req_lim_delta);
> +			spin_unlock_irqrestore(&ch->lock, flags);
>
>   			return;
>   		}
> @@ -1655,7 +1679,7 @@ static void srp_process_rsp(struct srp_target_port *target, struct srp_rsp *rsp)
>   		else if (unlikely(rsp->flags & SRP_RSP_FLAG_DOOVER))
>   			scsi_set_resid(scmnd, -be32_to_cpu(rsp->data_out_res_cnt));
>
> -		srp_free_req(target, req, scmnd,
> +		srp_free_req(ch, req, scmnd,
>   			     be32_to_cpu(rsp->req_lim_delta));
>
>   		scmnd->host_scribble = NULL;
> @@ -1663,18 +1687,19 @@ static void srp_process_rsp(struct srp_target_port *target, struct srp_rsp *rsp)
>   	}
>   }
>
> -static int srp_response_common(struct srp_target_port *target, s32 req_delta,
> +static int srp_response_common(struct srp_rdma_ch *ch, s32 req_delta,
>   			       void *rsp, int len)
>   {
> +	struct srp_target_port *target = ch->target;
>   	struct ib_device *dev = target->srp_host->srp_dev->dev;
>   	unsigned long flags;
>   	struct srp_iu *iu;
>   	int err;
>
> -	spin_lock_irqsave(&target->lock, flags);
> -	target->req_lim += req_delta;
> -	iu = __srp_get_tx_iu(target, SRP_IU_RSP);
> -	spin_unlock_irqrestore(&target->lock, flags);
> +	spin_lock_irqsave(&ch->lock, flags);
> +	ch->req_lim += req_delta;
> +	iu = __srp_get_tx_iu(ch, SRP_IU_RSP);
> +	spin_unlock_irqrestore(&ch->lock, flags);
>
>   	if (!iu) {
>   		shost_printk(KERN_ERR, target->scsi_host, PFX
> @@ -1686,17 +1711,17 @@ static int srp_response_common(struct srp_target_port *target, s32 req_delta,
>   	memcpy(iu->buf, rsp, len);
>   	ib_dma_sync_single_for_device(dev, iu->dma, len, DMA_TO_DEVICE);
>
> -	err = srp_post_send(target, iu, len);
> +	err = srp_post_send(ch, iu, len);
>   	if (err) {
>   		shost_printk(KERN_ERR, target->scsi_host, PFX
>   			     "unable to post response: %d\n", err);
> -		srp_put_tx_iu(target, iu, SRP_IU_RSP);
> +		srp_put_tx_iu(ch, iu, SRP_IU_RSP);
>   	}
>
>   	return err;
>   }
>
> -static void srp_process_cred_req(struct srp_target_port *target,
> +static void srp_process_cred_req(struct srp_rdma_ch *ch,
>   				 struct srp_cred_req *req)
>   {
>   	struct srp_cred_rsp rsp = {
> @@ -1705,14 +1730,15 @@ static void srp_process_cred_req(struct srp_target_port *target,
>   	};
>   	s32 delta = be32_to_cpu(req->req_lim_delta);
>
> -	if (srp_response_common(target, delta, &rsp, sizeof rsp))
> -		shost_printk(KERN_ERR, target->scsi_host, PFX
> +	if (srp_response_common(ch, delta, &rsp, sizeof(rsp)))
> +		shost_printk(KERN_ERR, ch->target->scsi_host, PFX
>   			     "problems processing SRP_CRED_REQ\n");
>   }
>
> -static void srp_process_aer_req(struct srp_target_port *target,
> +static void srp_process_aer_req(struct srp_rdma_ch *ch,
>   				struct srp_aer_req *req)
>   {
> +	struct srp_target_port *target = ch->target;
>   	struct srp_aer_rsp rsp = {
>   		.opcode = SRP_AER_RSP,
>   		.tag = req->tag,
> @@ -1722,19 +1748,20 @@ static void srp_process_aer_req(struct srp_target_port *target,
>   	shost_printk(KERN_ERR, target->scsi_host, PFX
>   		     "ignoring AER for LUN %llu\n", be64_to_cpu(req->lun));
>
> -	if (srp_response_common(target, delta, &rsp, sizeof rsp))
> +	if (srp_response_common(ch, delta, &rsp, sizeof(rsp)))
>   		shost_printk(KERN_ERR, target->scsi_host, PFX
>   			     "problems processing SRP_AER_REQ\n");
>   }
>
> -static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc)
> +static void srp_handle_recv(struct srp_rdma_ch *ch, struct ib_wc *wc)
>   {
> +	struct srp_target_port *target = ch->target;
>   	struct ib_device *dev = target->srp_host->srp_dev->dev;
>   	struct srp_iu *iu = (struct srp_iu *) (uintptr_t) wc->wr_id;
>   	int res;
>   	u8 opcode;
>
> -	ib_dma_sync_single_for_cpu(dev, iu->dma, target->max_ti_iu_len,
> +	ib_dma_sync_single_for_cpu(dev, iu->dma, ch->max_ti_iu_len,
>   				   DMA_FROM_DEVICE);
>
>   	opcode = *(u8 *) iu->buf;
> @@ -1748,15 +1775,15 @@ static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc)
>
>   	switch (opcode) {
>   	case SRP_RSP:
> -		srp_process_rsp(target, iu->buf);
> +		srp_process_rsp(ch, iu->buf);
>   		break;
>
>   	case SRP_CRED_REQ:
> -		srp_process_cred_req(target, iu->buf);
> +		srp_process_cred_req(ch, iu->buf);
>   		break;
>
>   	case SRP_AER_REQ:
> -		srp_process_aer_req(target, iu->buf);
> +		srp_process_aer_req(ch, iu->buf);
>   		break;
>
>   	case SRP_T_LOGOUT:
> @@ -1771,10 +1798,10 @@ static void srp_handle_recv(struct srp_target_port *target, struct ib_wc *wc)
>   		break;
>   	}
>
> -	ib_dma_sync_single_for_device(dev, iu->dma, target->max_ti_iu_len,
> +	ib_dma_sync_single_for_device(dev, iu->dma, ch->max_ti_iu_len,
>   				      DMA_FROM_DEVICE);
>
> -	res = srp_post_recv(target, iu);
> +	res = srp_post_recv(ch, iu);
>   	if (res != 0)
>   		shost_printk(KERN_ERR, target->scsi_host,
>   			     PFX "Recv failed with error code %d\n", res);
> @@ -1819,33 +1846,35 @@ static void srp_handle_qp_err(u64 wr_id, enum ib_wc_status wc_status,
>   	target->qp_in_error = true;
>   }
>
> -static void srp_recv_completion(struct ib_cq *cq, void *target_ptr)
> +static void srp_recv_completion(struct ib_cq *cq, void *ch_ptr)
>   {
> -	struct srp_target_port *target = target_ptr;
> +	struct srp_rdma_ch *ch = ch_ptr;
>   	struct ib_wc wc;
>
>   	ib_req_notify_cq(cq, IB_CQ_NEXT_COMP);
>   	while (ib_poll_cq(cq, 1, &wc) > 0) {
>   		if (likely(wc.status == IB_WC_SUCCESS)) {
> -			srp_handle_recv(target, &wc);
> +			srp_handle_recv(ch, &wc);
>   		} else {
> -			srp_handle_qp_err(wc.wr_id, wc.status, false, target);
> +			srp_handle_qp_err(wc.wr_id, wc.status, false,
> +					  ch->target);
>   		}
>   	}
>   }
>
> -static void srp_send_completion(struct ib_cq *cq, void *target_ptr)
> +static void srp_send_completion(struct ib_cq *cq, void *ch_ptr)
>   {
> -	struct srp_target_port *target = target_ptr;
> +	struct srp_rdma_ch *ch = ch_ptr;
>   	struct ib_wc wc;
>   	struct srp_iu *iu;
>
>   	while (ib_poll_cq(cq, 1, &wc) > 0) {
>   		if (likely(wc.status == IB_WC_SUCCESS)) {
>   			iu = (struct srp_iu *) (uintptr_t) wc.wr_id;
> -			list_add(&iu->list, &target->free_tx);
> +			list_add(&iu->list, &ch->free_tx);
>   		} else {
> -			srp_handle_qp_err(wc.wr_id, wc.status, true, target);
> +			srp_handle_qp_err(wc.wr_id, wc.status, true,
> +					  ch->target);
>   		}
>   	}
>   }
> @@ -1854,6 +1883,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
>   {
>   	struct srp_target_port *target = host_to_target(shost);
>   	struct srp_rport *rport = target->rport;
> +	struct srp_rdma_ch *ch;
>   	struct srp_request *req;
>   	struct srp_iu *iu;
>   	struct srp_cmd *cmd;
> @@ -1875,14 +1905,16 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
>   	if (unlikely(scmnd->result))
>   		goto err;
>
> -	spin_lock_irqsave(&target->lock, flags);
> -	iu = __srp_get_tx_iu(target, SRP_IU_CMD);
> +	ch = &target->ch;
> +
> +	spin_lock_irqsave(&ch->lock, flags);
> +	iu = __srp_get_tx_iu(ch, SRP_IU_CMD);
>   	if (!iu)
>   		goto err_unlock;
>
> -	req = list_first_entry(&target->free_reqs, struct srp_request, list);
> +	req = list_first_entry(&ch->free_reqs, struct srp_request, list);
>   	list_del(&req->list);
> -	spin_unlock_irqrestore(&target->lock, flags);
> +	spin_unlock_irqrestore(&ch->lock, flags);
>
>   	dev = target->srp_host->srp_dev->dev;
>   	ib_dma_sync_single_for_cpu(dev, iu->dma, target->max_iu_len,
> @@ -1901,7 +1933,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
>   	req->scmnd    = scmnd;
>   	req->cmd      = iu;
>
> -	len = srp_map_data(scmnd, target, req);
> +	len = srp_map_data(scmnd, ch, req);
>   	if (len < 0) {
>   		shost_printk(KERN_ERR, target->scsi_host,
>   			     PFX "Failed to map data (%d)\n", len);
> @@ -1919,7 +1951,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
>   	ib_dma_sync_single_for_device(dev, iu->dma, target->max_iu_len,
>   				      DMA_TO_DEVICE);
>
> -	if (srp_post_send(target, iu, len)) {
> +	if (srp_post_send(ch, iu, len)) {
>   		shost_printk(KERN_ERR, target->scsi_host, PFX "Send failed\n");
>   		goto err_unmap;
>   	}
> @@ -1933,10 +1965,10 @@ unlock_rport:
>   	return ret;
>
>   err_unmap:
> -	srp_unmap_data(scmnd, target, req);
> +	srp_unmap_data(scmnd, ch, req);
>
>   err_iu:
> -	srp_put_tx_iu(target, iu, SRP_IU_CMD);
> +	srp_put_tx_iu(ch, iu, SRP_IU_CMD);
>
>   	/*
>   	 * Avoid that the loops that iterate over the request ring can
> @@ -1944,11 +1976,11 @@ err_iu:
>   	 */
>   	req->scmnd = NULL;
>
> -	spin_lock_irqsave(&target->lock, flags);
> -	list_add(&req->list, &target->free_reqs);
> +	spin_lock_irqsave(&ch->lock, flags);
> +	list_add(&req->list, &ch->free_reqs);
>
>   err_unlock:
> -	spin_unlock_irqrestore(&target->lock, flags);
> +	spin_unlock_irqrestore(&ch->lock, flags);
>
>   err:
>   	if (scmnd->result) {
> @@ -1963,53 +1995,54 @@ err:
>
>   /*
>    * Note: the resources allocated in this function are freed in
> - * srp_free_target_ib().
> + * srp_free_ch_ib().
>    */
> -static int srp_alloc_iu_bufs(struct srp_target_port *target)
> +static int srp_alloc_iu_bufs(struct srp_rdma_ch *ch)
>   {
> +	struct srp_target_port *target = ch->target;
>   	int i;
>
> -	target->rx_ring = kzalloc(target->queue_size * sizeof(*target->rx_ring),
> -				  GFP_KERNEL);
> -	if (!target->rx_ring)
> +	ch->rx_ring = kcalloc(target->queue_size, sizeof(*ch->rx_ring),
> +			      GFP_KERNEL);
> +	if (!ch->rx_ring)
>   		goto err_no_ring;
> -	target->tx_ring = kzalloc(target->queue_size * sizeof(*target->tx_ring),
> -				  GFP_KERNEL);
> -	if (!target->tx_ring)
> +	ch->tx_ring = kcalloc(target->queue_size, sizeof(*ch->tx_ring),
> +			      GFP_KERNEL);
> +	if (!ch->tx_ring)
>   		goto err_no_ring;
>
>   	for (i = 0; i < target->queue_size; ++i) {
> -		target->rx_ring[i] = srp_alloc_iu(target->srp_host,
> -						  target->max_ti_iu_len,
> -						  GFP_KERNEL, DMA_FROM_DEVICE);
> -		if (!target->rx_ring[i])
> +		ch->rx_ring[i] = srp_alloc_iu(target->srp_host,
> +					      ch->max_ti_iu_len,
> +					      GFP_KERNEL, DMA_FROM_DEVICE);
> +		if (!ch->rx_ring[i])
>   			goto err;
>   	}
>
>   	for (i = 0; i < target->queue_size; ++i) {
> -		target->tx_ring[i] = srp_alloc_iu(target->srp_host,
> -						  target->max_iu_len,
> -						  GFP_KERNEL, DMA_TO_DEVICE);
> -		if (!target->tx_ring[i])
> +		ch->tx_ring[i] = srp_alloc_iu(target->srp_host,
> +					      target->max_iu_len,
> +					      GFP_KERNEL, DMA_TO_DEVICE);
> +		if (!ch->tx_ring[i])
>   			goto err;
>
> -		list_add(&target->tx_ring[i]->list, &target->free_tx);
> +		list_add(&ch->tx_ring[i]->list, &ch->free_tx);
>   	}
>
>   	return 0;
>
>   err:
>   	for (i = 0; i < target->queue_size; ++i) {
> -		srp_free_iu(target->srp_host, target->rx_ring[i]);
> -		srp_free_iu(target->srp_host, target->tx_ring[i]);
> +		srp_free_iu(target->srp_host, ch->rx_ring[i]);
> +		srp_free_iu(target->srp_host, ch->tx_ring[i]);
>   	}
>
>
>   err_no_ring:
> -	kfree(target->tx_ring);
> -	target->tx_ring = NULL;
> -	kfree(target->rx_ring);
> -	target->rx_ring = NULL;
> +	kfree(ch->tx_ring);
> +	ch->tx_ring = NULL;
> +	kfree(ch->rx_ring);
> +	ch->rx_ring = NULL;
>
>   	return -ENOMEM;
>   }
> @@ -2043,23 +2076,24 @@ static uint32_t srp_compute_rq_tmo(struct ib_qp_attr *qp_attr, int attr_mask)
>
>   static void srp_cm_rep_handler(struct ib_cm_id *cm_id,
>   			       struct srp_login_rsp *lrsp,
> -			       struct srp_target_port *target)
> +			       struct srp_rdma_ch *ch)
>   {
> +	struct srp_target_port *target = ch->target;
>   	struct ib_qp_attr *qp_attr = NULL;
>   	int attr_mask = 0;
>   	int ret;
>   	int i;
>
>   	if (lrsp->opcode == SRP_LOGIN_RSP) {
> -		target->max_ti_iu_len = be32_to_cpu(lrsp->max_ti_iu_len);
> -		target->req_lim       = be32_to_cpu(lrsp->req_lim_delta);
> +		ch->max_ti_iu_len = be32_to_cpu(lrsp->max_ti_iu_len);
> +		ch->req_lim       = be32_to_cpu(lrsp->req_lim_delta);
>
>   		/*
>   		 * Reserve credits for task management so we don't
>   		 * bounce requests back to the SCSI mid-layer.
>   		 */
>   		target->scsi_host->can_queue
> -			= min(target->req_lim - SRP_TSK_MGMT_SQ_SIZE,
> +			= min(ch->req_lim - SRP_TSK_MGMT_SQ_SIZE,
>   			      target->scsi_host->can_queue);
>   		target->scsi_host->cmd_per_lun
>   			= min_t(int, target->scsi_host->can_queue,
> @@ -2071,8 +2105,8 @@ static void srp_cm_rep_handler(struct ib_cm_id *cm_id,
>   		goto error;
>   	}
>
> -	if (!target->rx_ring) {
> -		ret = srp_alloc_iu_bufs(target);
> +	if (!ch->rx_ring) {
> +		ret = srp_alloc_iu_bufs(ch);
>   		if (ret)
>   			goto error;
>   	}
> @@ -2087,13 +2121,14 @@ static void srp_cm_rep_handler(struct ib_cm_id *cm_id,
>   	if (ret)
>   		goto error_free;
>
> -	ret = ib_modify_qp(target->qp, qp_attr, attr_mask);
> +	ret = ib_modify_qp(ch->qp, qp_attr, attr_mask);
>   	if (ret)
>   		goto error_free;
>
>   	for (i = 0; i < target->queue_size; i++) {
> -		struct srp_iu *iu = target->rx_ring[i];
> -		ret = srp_post_recv(target, iu);
> +		struct srp_iu *iu = ch->rx_ring[i];
> +
> +		ret = srp_post_recv(ch, iu);
>   		if (ret)
>   			goto error_free;
>   	}
> @@ -2105,7 +2140,7 @@ static void srp_cm_rep_handler(struct ib_cm_id *cm_id,
>
>   	target->rq_tmo_jiffies = srp_compute_rq_tmo(qp_attr, attr_mask);
>
> -	ret = ib_modify_qp(target->qp, qp_attr, attr_mask);
> +	ret = ib_modify_qp(ch->qp, qp_attr, attr_mask);
>   	if (ret)
>   		goto error_free;
>
> @@ -2115,13 +2150,14 @@ error_free:
>   	kfree(qp_attr);
>
>   error:
> -	target->status = ret;
> +	ch->status = ret;
>   }
>
>   static void srp_cm_rej_handler(struct ib_cm_id *cm_id,
>   			       struct ib_cm_event *event,
> -			       struct srp_target_port *target)
> +			       struct srp_rdma_ch *ch)
>   {
> +	struct srp_target_port *target = ch->target;
>   	struct Scsi_Host *shost = target->scsi_host;
>   	struct ib_class_port_info *cpi;
>   	int opcode;
> @@ -2129,12 +2165,12 @@ static void srp_cm_rej_handler(struct ib_cm_id *cm_id,
>   	switch (event->param.rej_rcvd.reason) {
>   	case IB_CM_REJ_PORT_CM_REDIRECT:
>   		cpi = event->param.rej_rcvd.ari;
> -		target->path.dlid = cpi->redirect_lid;
> -		target->path.pkey = cpi->redirect_pkey;
> +		ch->path.dlid = cpi->redirect_lid;
> +		ch->path.pkey = cpi->redirect_pkey;
>   		cm_id->remote_cm_qpn = be32_to_cpu(cpi->redirect_qp) & 0x00ffffff;
> -		memcpy(target->path.dgid.raw, cpi->redirect_gid, 16);
> +		memcpy(ch->path.dgid.raw, cpi->redirect_gid, 16);
>
> -		target->status = target->path.dlid ?
> +		ch->status = ch->path.dlid ?
>   			SRP_DLID_REDIRECT : SRP_PORT_REDIRECT;
>   		break;
>
> @@ -2145,26 +2181,26 @@ static void srp_cm_rej_handler(struct ib_cm_id *cm_id,
>   			 * reject reason code 25 when they mean 24
>   			 * (port redirect).
>   			 */
> -			memcpy(target->path.dgid.raw,
> +			memcpy(ch->path.dgid.raw,
>   			       event->param.rej_rcvd.ari, 16);
>
>   			shost_printk(KERN_DEBUG, shost,
>   				     PFX "Topspin/Cisco redirect to target port GID %016llx%016llx\n",
> -				     (unsigned long long) be64_to_cpu(target->path.dgid.global.subnet_prefix),
> -				     (unsigned long long) be64_to_cpu(target->path.dgid.global.interface_id));
> +				     be64_to_cpu(ch->path.dgid.global.subnet_prefix),
> +				     be64_to_cpu(ch->path.dgid.global.interface_id));
>
> -			target->status = SRP_PORT_REDIRECT;
> +			ch->status = SRP_PORT_REDIRECT;
>   		} else {
>   			shost_printk(KERN_WARNING, shost,
>   				     "  REJ reason: IB_CM_REJ_PORT_REDIRECT\n");
> -			target->status = -ECONNRESET;
> +			ch->status = -ECONNRESET;
>   		}
>   		break;
>
>   	case IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID:
>   		shost_printk(KERN_WARNING, shost,
>   			    "  REJ reason: IB_CM_REJ_DUPLICATE_LOCAL_COMM_ID\n");
> -		target->status = -ECONNRESET;
> +		ch->status = -ECONNRESET;
>   		break;
>
>   	case IB_CM_REJ_CONSUMER_DEFINED:
> @@ -2185,24 +2221,25 @@ static void srp_cm_rej_handler(struct ib_cm_id *cm_id,
>   			shost_printk(KERN_WARNING, shost,
>   				     "  REJ reason: IB_CM_REJ_CONSUMER_DEFINED,"
>   				     " opcode 0x%02x\n", opcode);
> -		target->status = -ECONNRESET;
> +		ch->status = -ECONNRESET;
>   		break;
>
>   	case IB_CM_REJ_STALE_CONN:
>   		shost_printk(KERN_WARNING, shost, "  REJ reason: stale connection\n");
> -		target->status = SRP_STALE_CONN;
> +		ch->status = SRP_STALE_CONN;
>   		break;
>
>   	default:
>   		shost_printk(KERN_WARNING, shost, "  REJ reason 0x%x\n",
>   			     event->param.rej_rcvd.reason);
> -		target->status = -ECONNRESET;
> +		ch->status = -ECONNRESET;
>   	}
>   }
>
>   static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
>   {
> -	struct srp_target_port *target = cm_id->context;
> +	struct srp_rdma_ch *ch = cm_id->context;
> +	struct srp_target_port *target = ch->target;
>   	int comp = 0;
>
>   	switch (event->event) {
> @@ -2210,19 +2247,19 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
>   		shost_printk(KERN_DEBUG, target->scsi_host,
>   			     PFX "Sending CM REQ failed\n");
>   		comp = 1;
> -		target->status = -ECONNRESET;
> +		ch->status = -ECONNRESET;
>   		break;
>
>   	case IB_CM_REP_RECEIVED:
>   		comp = 1;
> -		srp_cm_rep_handler(cm_id, event->private_data, target);
> +		srp_cm_rep_handler(cm_id, event->private_data, ch);
>   		break;
>
>   	case IB_CM_REJ_RECEIVED:
>   		shost_printk(KERN_DEBUG, target->scsi_host, PFX "REJ received\n");
>   		comp = 1;
>
> -		srp_cm_rej_handler(cm_id, event, target);
> +		srp_cm_rej_handler(cm_id, event, ch);
>   		break;
>
>   	case IB_CM_DREQ_RECEIVED:
> @@ -2240,7 +2277,7 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
>   			     PFX "connection closed\n");
>   		comp = 1;
>
> -		target->status = 0;
> +		ch->status = 0;
>   		break;
>
>   	case IB_CM_MRA_RECEIVED:
> @@ -2255,7 +2292,7 @@ static int srp_cm_handler(struct ib_cm_id *cm_id, struct ib_cm_event *event)
>   	}
>
>   	if (comp)
> -		complete(&target->done);
> +		complete(&ch->done);
>
>   	return 0;
>   }
> @@ -2311,9 +2348,10 @@ srp_change_queue_depth(struct scsi_device *sdev, int qdepth, int reason)
>   	return sdev->queue_depth;
>   }
>
> -static int srp_send_tsk_mgmt(struct srp_target_port *target,
> -			     u64 req_tag, unsigned int lun, u8 func)
> +static int srp_send_tsk_mgmt(struct srp_rdma_ch *ch, u64 req_tag,
> +			     unsigned int lun, u8 func)
>   {
> +	struct srp_target_port *target = ch->target;
>   	struct srp_rport *rport = target->rport;
>   	struct ib_device *dev = target->srp_host->srp_dev->dev;
>   	struct srp_iu *iu;
> @@ -2322,16 +2360,16 @@ static int srp_send_tsk_mgmt(struct srp_target_port *target,
>   	if (!target->connected || target->qp_in_error)
>   		return -1;
>
> -	init_completion(&target->tsk_mgmt_done);
> +	init_completion(&ch->tsk_mgmt_done);
>
>   	/*
> -	 * Lock the rport mutex to avoid that srp_create_target_ib() is
> +	 * Lock the rport mutex to avoid that srp_create_ch_ib() is
>   	 * invoked while a task management function is being sent.
>   	 */
>   	mutex_lock(&rport->mutex);
> -	spin_lock_irq(&target->lock);
> -	iu = __srp_get_tx_iu(target, SRP_IU_TSK_MGMT);
> -	spin_unlock_irq(&target->lock);
> +	spin_lock_irq(&ch->lock);
> +	iu = __srp_get_tx_iu(ch, SRP_IU_TSK_MGMT);
> +	spin_unlock_irq(&ch->lock);
>
>   	if (!iu) {
>   		mutex_unlock(&rport->mutex);
> @@ -2352,15 +2390,15 @@ static int srp_send_tsk_mgmt(struct srp_target_port *target,
>
>   	ib_dma_sync_single_for_device(dev, iu->dma, sizeof *tsk_mgmt,
>   				      DMA_TO_DEVICE);
> -	if (srp_post_send(target, iu, sizeof *tsk_mgmt)) {
> -		srp_put_tx_iu(target, iu, SRP_IU_TSK_MGMT);
> +	if (srp_post_send(ch, iu, sizeof(*tsk_mgmt))) {
> +		srp_put_tx_iu(ch, iu, SRP_IU_TSK_MGMT);
>   		mutex_unlock(&rport->mutex);
>
>   		return -1;
>   	}
>   	mutex_unlock(&rport->mutex);
>
> -	if (!wait_for_completion_timeout(&target->tsk_mgmt_done,
> +	if (!wait_for_completion_timeout(&ch->tsk_mgmt_done,
>   					 msecs_to_jiffies(SRP_ABORT_TIMEOUT_MS)))
>   		return -1;
>
> @@ -2371,20 +2409,22 @@ static int srp_abort(struct scsi_cmnd *scmnd)
>   {
>   	struct srp_target_port *target = host_to_target(scmnd->device->host);
>   	struct srp_request *req = (struct srp_request *) scmnd->host_scribble;
> +	struct srp_rdma_ch *ch;
>   	int ret;
>
>   	shost_printk(KERN_ERR, target->scsi_host, "SRP abort called\n");
>
> -	if (!req || !srp_claim_req(target, req, NULL, scmnd))
> +	ch = &target->ch;
> +	if (!req || !srp_claim_req(ch, req, NULL, scmnd))
>   		return SUCCESS;
> -	if (srp_send_tsk_mgmt(target, req->index, scmnd->device->lun,
> +	if (srp_send_tsk_mgmt(ch, req->index, scmnd->device->lun,
>   			      SRP_TSK_ABORT_TASK) == 0)
>   		ret = SUCCESS;
>   	else if (target->rport->state == SRP_RPORT_LOST)
>   		ret = FAST_IO_FAIL;
>   	else
>   		ret = FAILED;
> -	srp_free_req(target, req, scmnd, 0);
> +	srp_free_req(ch, req, scmnd, 0);
>   	scmnd->result = DID_ABORT << 16;
>   	scmnd->scsi_done(scmnd);
>
> @@ -2394,19 +2434,21 @@ static int srp_abort(struct scsi_cmnd *scmnd)
>   static int srp_reset_device(struct scsi_cmnd *scmnd)
>   {
>   	struct srp_target_port *target = host_to_target(scmnd->device->host);
> +	struct srp_rdma_ch *ch = &target->ch;
>   	int i;
>
>   	shost_printk(KERN_ERR, target->scsi_host, "SRP reset_device called\n");
>
> -	if (srp_send_tsk_mgmt(target, SRP_TAG_NO_REQ, scmnd->device->lun,
> +	if (srp_send_tsk_mgmt(ch, SRP_TAG_NO_REQ, scmnd->device->lun,
>   			      SRP_TSK_LUN_RESET))
>   		return FAILED;
> -	if (target->tsk_mgmt_status)
> +	if (ch->tsk_mgmt_status)
>   		return FAILED;
>
>   	for (i = 0; i < target->req_ring_size; ++i) {
> -		struct srp_request *req = &target->req_ring[i];
> -		srp_finish_req(target, req, scmnd->device, DID_RESET << 16);
> +		struct srp_request *req = &ch->req_ring[i];
> +
> +		srp_finish_req(ch, req, scmnd->device, DID_RESET << 16);
>   	}
>
>   	return SUCCESS;
> @@ -2483,8 +2525,9 @@ static ssize_t show_dgid(struct device *dev, struct device_attribute *attr,
>   			 char *buf)
>   {
>   	struct srp_target_port *target = host_to_target(class_to_shost(dev));
> +	struct srp_rdma_ch *ch = &target->ch;
>
> -	return sprintf(buf, "%pI6\n", target->path.dgid.raw);
> +	return sprintf(buf, "%pI6\n", ch->path.dgid.raw);
>   }
>
>   static ssize_t show_orig_dgid(struct device *dev,
> @@ -2500,7 +2543,7 @@ static ssize_t show_req_lim(struct device *dev,
>   {
>   	struct srp_target_port *target = host_to_target(class_to_shost(dev));
>
> -	return sprintf(buf, "%d\n", target->req_lim);
> +	return sprintf(buf, "%d\n", target->ch.req_lim);
>   }
>
>   static ssize_t show_zero_req_lim(struct device *dev,
> @@ -2992,6 +3035,7 @@ static ssize_t srp_create_target(struct device *dev,
>   		container_of(dev, struct srp_host, dev);
>   	struct Scsi_Host *target_host;
>   	struct srp_target_port *target;
> +	struct srp_rdma_ch *ch;
>   	struct srp_device *srp_dev = host->srp_dev;
>   	struct ib_device *ibdev = srp_dev->dev;
>   	int ret;
> @@ -3060,8 +3104,12 @@ static ssize_t srp_create_target(struct device *dev,
>   	INIT_WORK(&target->tl_err_work, srp_tl_err_work);
>   	INIT_WORK(&target->remove_work, srp_remove_work);
>   	spin_lock_init(&target->lock);
> -	INIT_LIST_HEAD(&target->free_tx);
> -	ret = srp_alloc_req_data(target);
> +	ch = &target->ch;
> +	ch->target = target;
> +	ch->comp_vector = target->comp_vector;
> +	spin_lock_init(&ch->lock);
> +	INIT_LIST_HEAD(&ch->free_tx);
> +	ret = srp_alloc_req_data(ch);
>   	if (ret)
>   		goto err_free_mem;
>
> @@ -3069,15 +3117,15 @@ static ssize_t srp_create_target(struct device *dev,
>   	if (ret)
>   		goto err_free_mem;
>
> -	ret = srp_create_target_ib(target);
> +	ret = srp_create_ch_ib(ch);
>   	if (ret)
>   		goto err_free_mem;
>
> -	ret = srp_new_cm_id(target);
> +	ret = srp_new_cm_id(ch);
>   	if (ret)
>   		goto err_free_ib;
>
> -	ret = srp_connect_target(target);
> +	ret = srp_connect_ch(ch);
>   	if (ret) {
>   		shost_printk(KERN_ERR, target->scsi_host,
>   			     PFX "Connection failed\n");
> @@ -3111,10 +3159,10 @@ err_disconnect:
>   	srp_disconnect_target(target);
>
>   err_free_ib:
> -	srp_free_target_ib(target);
> +	srp_free_ch_ib(target, ch);
>
>   err_free_mem:
> -	srp_free_req_data(target);
> +	srp_free_req_data(target, ch);
>
>   err:
>   	scsi_host_put(target_host);
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.h b/drivers/infiniband/ulp/srp/ib_srp.h
> index 8635ab6..74530d9 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.h
> +++ b/drivers/infiniband/ulp/srp/ib_srp.h
> @@ -130,7 +130,11 @@ struct srp_request {
>   	short			index;
>   };
>
> -struct srp_target_port {
> +/**
> + * struct srp_rdma_ch
> + * @comp_vector: Completion vector used by this RDMA channel.
> + */
> +struct srp_rdma_ch {
>   	/* These are RW in the hot path, and commonly used together */
>   	struct list_head	free_tx;
>   	struct list_head	free_reqs;
> @@ -138,13 +142,48 @@ struct srp_target_port {
>   	s32			req_lim;
>
>   	/* These are read-only in the hot path */
> -	struct ib_cq	       *send_cq ____cacheline_aligned_in_smp;
> +	struct srp_target_port *target ____cacheline_aligned_in_smp;
> +	struct ib_cq	       *send_cq;
>   	struct ib_cq	       *recv_cq;
>   	struct ib_qp	       *qp;
>   	union {
>   		struct ib_fmr_pool     *fmr_pool;
>   		struct srp_fr_pool     *fr_pool;
>   	};
> +
> +	/* Everything above this point is used in the hot path of
> +	 * command processing. Try to keep them packed into cachelines.
> +	 */
> +
> +	struct completion	done;
> +	int			status;
> +
> +	struct ib_sa_path_rec	path;
> +	struct ib_sa_query     *path_query;
> +	int			path_query_id;
> +
> +	struct ib_cm_id	       *cm_id;
> +	struct srp_iu	      **tx_ring;
> +	struct srp_iu	      **rx_ring;
> +	struct srp_request     *req_ring;
> +	int			max_ti_iu_len;
> +	int			comp_vector;
> +
> +	struct completion	tsk_mgmt_done;
> +	u8			tsk_mgmt_status;
> +};
> +
> +/**
> + * struct srp_target_port
> + * @comp_vector: Completion vector used by the first RDMA channel created for
> + *   this target port.
> + */
> +struct srp_target_port {
> +	/* read and written in the hot path */
> +	spinlock_t		lock;
> +
> +	struct srp_rdma_ch	ch;
> +	/* read only in the hot path */
>   	u32			lkey;
>   	u32			rkey;
>   	enum srp_target_state	state;
> @@ -153,10 +192,7 @@ struct srp_target_port {
>   	unsigned int		indirect_size;
>   	bool			allow_ext_sg;
>
> -	/* Everything above this point is used in the hot path of
> -	 * command processing. Try to keep them packed into cachelines.
> -	 */
> -
> +	/* other member variables */
>   	union ib_gid		sgid;
>   	__be64			id_ext;
>   	__be64			ioc_guid;
> @@ -176,33 +212,17 @@ struct srp_target_port {
>
>   	union ib_gid		orig_dgid;
>   	__be16			pkey;
> -	struct ib_sa_path_rec	path;
> -	struct ib_sa_query     *path_query;
> -	int			path_query_id;
>
>   	u32			rq_tmo_jiffies;
>   	bool			connected;
>
> -	struct ib_cm_id	       *cm_id;
> -
> -	int			max_ti_iu_len;
> -
>   	int			zero_req_lim;
>
> -	struct srp_iu	       **tx_ring;
> -	struct srp_iu	       **rx_ring;
> -	struct srp_request	*req_ring;
> -
>   	struct work_struct	tl_err_work;
>   	struct work_struct	remove_work;
>
>   	struct list_head	list;
> -	struct completion	done;
> -	int			status;
>   	bool			qp_in_error;
> -
> -	struct completion	tsk_mgmt_done;
> -	u8			tsk_mgmt_status;
>   };
>
>   struct srp_iu {
>


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 11/12] IB/srp: Eliminate free_reqs list
       [not found]         ` <20141017105939.GB7819-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2014-10-19 16:59           ` Sagi Grimberg
  2014-10-20 11:47           ` Bart Van Assche
  1 sibling, 0 replies; 83+ messages in thread
From: Sagi Grimberg @ 2014-10-19 16:59 UTC (permalink / raw)
  To: Christoph Hellwig, Bart Van Assche
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On 10/17/2014 1:59 PM, Christoph Hellwig wrote:
> On Tue, Oct 07, 2014 at 03:06:54PM +0200, Bart Van Assche wrote:
>> The free_reqs list is no longer needed now that we are using
>> tags assigned by the block layer. Hence remove it.
>
> Is there any good reason not to fold this into the previous patch?
>

Agree.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 12/12] IB/srp: Add multichannel support
       [not found]   ` <5433E585.607-HInyCGIudOg@public.gmane.org>
@ 2014-10-19 17:36     ` Sagi Grimberg
  2014-10-20 12:56       ` Bart Van Assche
  2014-10-21  9:14     ` Sagi Grimberg
  1 sibling, 1 reply; 83+ messages in thread
From: Sagi Grimberg @ 2014-10-19 17:36 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On 10/7/2014 4:07 PM, Bart Van Assche wrote:
> Improve performance by using multiple RDMA/RC channels per SCSI
> host for communication with an SRP target. About the
> implementation:
> - Introduce a loop over all channels in the code that uses
>    target->ch.
> - Set the SRP_MULTICHAN_MULTI flag during login for the creation
>    of the second and subsequent channels.
> - RDMA completion vectors are chosen such that RDMA completion
>    interrupts are handled by the CPU socket that submitted the I/O
>    request. As one can see in this patch it has been assumed if a
>    system contains n CPU sockets and m RDMA completion vectors
>    have been assigned to an RDMA HCA that IRQ affinity has been
>    configured such that completion vectors [i*m/n..(i+1)*m/n) are
>    bound to CPU socket i with 0 <= i < n.
> - Modify srp_free_ch_ib() and srp_free_req_data() such that it
>    becomes safe to invoke these functions after the corresponding
>    allocation function failed.
> - Add a ch_count sysfs attribute per target port.
>
> Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
> Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>
> ---
>   Documentation/ABI/stable/sysfs-driver-ib_srp |  25 ++-
>   drivers/infiniband/ulp/srp/ib_srp.c          | 291 ++++++++++++++++++++-------
>   drivers/infiniband/ulp/srp/ib_srp.h          |   3 +-
>   3 files changed, 238 insertions(+), 81 deletions(-)
>
> diff --git a/Documentation/ABI/stable/sysfs-driver-ib_srp b/Documentation/ABI/stable/sysfs-driver-ib_srp
> index b9688de..d5a459e 100644
> --- a/Documentation/ABI/stable/sysfs-driver-ib_srp
> +++ b/Documentation/ABI/stable/sysfs-driver-ib_srp
> @@ -55,12 +55,12 @@ Description:	Interface for making ib_srp connect to a new target.
>   		  only safe with partial memory descriptor list support enabled
>   		  (allow_ext_sg=1).
>   		* comp_vector, a number in the range 0..n-1 specifying the
> -		  MSI-X completion vector. Some HCA's allocate multiple (n)
> -		  MSI-X vectors per HCA port. If the IRQ affinity masks of
> -		  these interrupts have been configured such that each MSI-X
> -		  interrupt is handled by a different CPU then the comp_vector
> -		  parameter can be used to spread the SRP completion workload
> -		  over multiple CPU's.
> +		  MSI-X completion vector of the first RDMA channel. Some
> +		  HCA's allocate multiple (n) MSI-X vectors per HCA port. If
> +		  the IRQ affinity masks of these interrupts have been
> +		  configured such that each MSI-X interrupt is handled by a
> +		  different CPU then the comp_vector parameter can be used to
> +		  spread the SRP completion workload over multiple CPU's.

This is fairly not trivial for the user...

Aren't we requesting a bit too much awareness here?
Can't we just "make it work"? The user hands out ch_count - why can't
you do some least-used logic here?

Maybe we can even go with per-cpu QPs and discard comp_vector argument?
this would probably bring the best performance, wouldn't it?
(fallback to least-used logic in case HW support less vectors)

>   		* tl_retry_count, a number in the range 2..7 specifying the
>   		  IB RC retry count.
>   		* queue_size, the maximum number of commands that the
> @@ -88,6 +88,13 @@ Description:	Whether ib_srp is allowed to include a partial memory
>   		descriptor list in an SRP_CMD when communicating with an SRP
>   		target.
>
> +What:		/sys/class/scsi_host/host<n>/ch_count
> +Date:		November 1, 2014
> +KernelVersion:	3.18
> +Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> +Description:	Number of RDMA channels used for communication with the SRP
> +		target.
> +
>   What:		/sys/class/scsi_host/host<n>/cmd_sg_entries
>   Date:		May 19, 2011
>   KernelVersion:	2.6.39
> @@ -95,6 +102,12 @@ Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>   Description:	Maximum number of data buffer descriptors that may be sent to
>   		the target in a single SRP_CMD request.
>
> +What:		/sys/class/scsi_host/host<n>/comp_vector
> +Date:		September 2, 2013
> +KernelVersion:	3.11
> +Contact:	linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> +Description:	Completion vector used for the first RDMA channel.
> +
>   What:		/sys/class/scsi_host/host<n>/dgid
>   Date:		June 17, 2006
>   KernelVersion:	2.6.17
> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
> index eccaf65..80699a9 100644
> --- a/drivers/infiniband/ulp/srp/ib_srp.c
> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
> @@ -123,6 +123,11 @@ MODULE_PARM_DESC(dev_loss_tmo,
>   		 " if fast_io_fail_tmo has not been set. \"off\" means that"
>   		 " this functionality is disabled.");
>
> +static unsigned ch_count;
> +module_param(ch_count, uint, 0444);
> +MODULE_PARM_DESC(ch_count,
> +		 "Number of RDMA channels to use for communication with an SRP target. Using more than one channel improves performance if the HCA supports multiple completion vectors. The default value is the minimum of four times the number of online CPU sockets and the number of completion vectors supported by the HCA.");
> +

Why? how did you get to this magic equation?

>   static void srp_add_one(struct ib_device *device);

...
So it's pretty late for today, this is where I got so far...
Will continue later this week.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 10/12] IB/srp: Use block layer tags
       [not found]       ` <20141017105858.GA7819-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2014-10-20 11:44         ` Bart Van Assche
  0 siblings, 0 replies; 83+ messages in thread
From: Bart Van Assche @ 2014-10-20 11:44 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On 10/17/14 12:58, Christoph Hellwig wrote:
>> diff --git a/drivers/infiniband/ulp/srp/ib_srp.c b/drivers/infiniband/ulp/srp/ib_srp.c
>> index cc0bf83b..224ef25 100644
>> --- a/drivers/infiniband/ulp/srp/ib_srp.c
>> +++ b/drivers/infiniband/ulp/srp/ib_srp.c
>> @@ -853,7 +853,6 @@ static int srp_alloc_req_data(struct srp_rdma_ch *ch)
>>   			goto out;
>>
>>   		req->indirect_dma_addr = dma_addr;
>> -		req->index = i;
>>   		list_add_tail(&req->list, &ch->free_reqs);
>>   	}
>
> Seems like a nice optimization for the future would be to preallocate
> the srp requests with the block ones and the scsi command.

Agreed. The reason why that optimization has not been included in this 
patch series is because it would require more work than the optimization 
in patch 10/12. The free_reqs list is namely not only used when a SCSI 
command is submitted by the SCSI core but also when submitting a task 
management command or when replying to a request submitted by the target 
system. In other words, the free_reqs list would have to be modified 
such that it is only used for the latter two purposes.

Bart.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 11/12] IB/srp: Eliminate free_reqs list
       [not found]         ` <20141017105939.GB7819-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2014-10-19 16:59           ` Sagi Grimberg
@ 2014-10-20 11:47           ` Bart Van Assche
  2014-10-21  8:49             ` Christoph Hellwig
  1 sibling, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-20 11:47 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On 10/17/14 12:59, Christoph Hellwig wrote:
> On Tue, Oct 07, 2014 at 03:06:54PM +0200, Bart Van Assche wrote:
>> The free_reqs list is no longer needed now that we are using
>> tags assigned by the block layer. Hence remove it.
>
> Is there any good reason not to fold this into the previous patch?

The only reason why patches 10/12 and 11/12 are separate patches
is to reduce the size of individual patches and hence to make it
easier to review these patches. If everyone agrees I'm fine with
folding patch 11/12 into patch 10/12.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 12/12] IB/srp: Add multichannel support
       [not found]     ` <20141017110627.GD7819-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2014-10-20 11:57       ` Bart Van Assche
  2014-10-21  8:49         ` Christoph Hellwig
  0 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-20 11:57 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On 10/17/14 13:06, Christoph Hellwig wrote:
>>   	} else {
>> +		if (blk_mq_unique_tag_to_hwq(rsp->tag) != ch - target->ch)
>> +			pr_err("Channel idx mismatch: tag %#llx <> ch %#lx\n",
>> +			       rsp->tag, ch - target->ch);
>>   		scmnd = scsi_host_find_tag(target->scsi_host, rsp->tag);
>
> Shouldn't we do this validity check inside scsi_host_find_tag, so that
> all callers get it? That means adding an argument to it,  but there are
> very few callers at the moment.

Hello Christoph,

That pr_err() statement was convenient while debugging the multiqueue 
code in the SRP initiator driver but can be left out. Would you agree 
with leaving the above three lines of debug code out instead of adding 
an additional argument to scsi_host_find_tag() ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 04/12] scsi_tcq.h: Add support for multiple hardware queues
       [not found]     ` <5443E2DF.1040605-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2014-10-20 12:01       ` Bart Van Assche
       [not found]         ` <5444F995.5080407-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-20 12:01 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On 10/19/14 18:12, Sagi Grimberg wrote:
> On 10/7/2014 4:04 PM, Bart Van Assche wrote:
>> -            req = blk_queue_find_tag(sdev->request_queue, tag);
>> +        req = blk_queue_find_tag(sdev->request_queue, tag);
>
> Why is this line different?

This is because the indentation has been modified from "8x<space><tab>" 
into "<tab><tab>". I can leave out that change if you want.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 07/12] IB/srp: Avoid that I/O hangs due to a cable pull during LUN scanning
       [not found]     ` <5443E66F.7050901-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2014-10-20 12:15       ` Bart Van Assche
  2014-10-21  8:50         ` Christoph Hellwig
  0 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-20 12:15 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On 10/19/14 18:27, Sagi Grimberg wrote:
> On 10/7/2014 4:05 PM, Bart Van Assche wrote:
>> +static int srp_sdev_count(struct Scsi_Host *host)
>> +{
>> +    struct scsi_device *sdev;
>> +    int c = 0;
>> +
>> +    shost_for_each_device(sdev, host)
>> +        c++;
>> +
>> +    return c;
>> +}
>> +
>
> Is this really an SRP specific routine?
> Can you move it to a more natural location?

How about renaming this function into shost_sdev_count() and moving its 
declaration to <scsi/scsi_device.h> and its implementation to 
drivers/scsi/scsi_lib.c ?

>>   static int srp_add_target(struct srp_host *host, struct
>> srp_target_port *target)
>>   {
>>       struct srp_rport_identifiers ids;
>>       struct srp_rport *rport;
>>
>> +    target->state = SRP_TARGET_SCANNING;
>>       sprintf(target->target_name, "SRP.T10:%016llX",
>>            (unsigned long long) be64_to_cpu(target->id_ext));
>>
>> @@ -2634,11 +2650,26 @@ static int srp_add_target(struct srp_host
>> *host, struct srp_target_port *target)
>>       list_add_tail(&target->list, &host->target_list);
>>       spin_unlock(&host->target_lock);
>>
>> -    target->state = SRP_TARGET_LIVE;
>> -
>>       scsi_scan_target(&target->scsi_host->shost_gendev,
>>                0, target->scsi_id, SCAN_WILD_CARD, 0);
>>
>> +    if (!target->connected || target->qp_in_error) {
>> +        shost_printk(KERN_INFO, target->scsi_host,
>> +                 PFX "SCSI scan failed - removing SCSI host\n");
>> +        srp_queue_remove_work(target);
>> +        goto out;
>> +    }
>
> So my impression is that by conditioning target->qp_in_error you are
> relying on the fact that SRP eh was invoked here (RC error), what if
> scsi eh was invoked prior to that? did you test this path?

This code path has been tested. It's not that hard to trigger this code 
path - setting the channel count (ch_count) parameter to a high value 
and running a loop at the target side that disables IB ports after a 
random delay between 0s and (2*srp_daemon_scan_interval) is sufficient. 
After not too many iterations this code will be hit because the higher 
the channel count the longer it takes to log in.

The SCSI EH is only activated after a SCSI command has failed. No SCSI 
commands are sent to a target system before the scsi_scan_target() call. 
This means that the SCSI EH can only get activated while 
scsi_scan_target() is in progress or after that function has finished.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 12/12] IB/srp: Add multichannel support
  2014-10-19 17:36     ` Sagi Grimberg
@ 2014-10-20 12:56       ` Bart Van Assche
       [not found]         ` <54450690.709-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-20 12:56 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi, linux-rdma

On 10/19/14 19:36, Sagi Grimberg wrote:
> On 10/7/2014 4:07 PM, Bart Van Assche wrote:
>>           * comp_vector, a number in the range 0..n-1 specifying the
>> -          MSI-X completion vector. Some HCA's allocate multiple (n)
>> -          MSI-X vectors per HCA port. If the IRQ affinity masks of
>> -          these interrupts have been configured such that each MSI-X
>> -          interrupt is handled by a different CPU then the comp_vector
>> -          parameter can be used to spread the SRP completion workload
>> -          over multiple CPU's.
>> +          MSI-X completion vector of the first RDMA channel. Some
>> +          HCA's allocate multiple (n) MSI-X vectors per HCA port. If
>> +          the IRQ affinity masks of these interrupts have been
>> +          configured such that each MSI-X interrupt is handled by a
>> +          different CPU then the comp_vector parameter can be used to
>> +          spread the SRP completion workload over multiple CPU's.
>
> This is fairly not trivial for the user...
>
> Aren't we requesting a bit too much awareness here?
> Can't we just "make it work"? The user hands out ch_count - why can't
> you do some least-used logic here?
>
> Maybe we can even go with per-cpu QPs and discard comp_vector argument?
> this would probably bring the best performance, wouldn't it?
> (fallback to least-used logic in case HW support less vectors)

Hello Sagi,

The only reason the comp_vector parameter is still supported is because 
of backwards compatibility. What I expect is that users will set the 
ch_count parameter but not the comp_vector parameter.

Using one QP per CPU thread does not necessarily result in the best 
performance. In the tests I ran performance was about 4% better when 
using one QP for each pair of CPU threads (with hyperthreading enabled).

>> +static unsigned ch_count;
>> +module_param(ch_count, uint, 0444);
>> +MODULE_PARM_DESC(ch_count,
>> +         "Number of RDMA channels to use for communication with an
>> SRP target. Using more than one channel improves performance if the
>> HCA supports multiple completion vectors. The default value is the
>> minimum of four times the number of online CPU sockets and the number
>> of completion vectors supported by the HCA.");
>
> Why? how did you get to this magic equation?

On the systems I have access to measurements have shown that this choice 
for the ch_count parameter results in a significant performance 
improvement without consuming too many system resources. The performance 
difference when using more than four channels was small. This means that 
the exact value of this parameter is not that important. What matters to 
me is that users can benefit from improved performance even if the 
ch_count kernel module parameter has been left to its default value.

Bart.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: EH action after scsi_remove_host, was: Re: [PATCH v2 12/12] IB/srp: Add multichannel support
  2014-10-17 11:01   ` EH action after scsi_remove_host, was: " Christoph Hellwig
@ 2014-10-20 13:53     ` Bart Van Assche
  2014-10-21  8:51       ` Christoph Hellwig
  0 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-20 13:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	linux-scsi

On 10/17/14 13:01, Christoph Hellwig wrote:
> On Tue, Oct 07, 2014 at 03:07:17PM +0200, Bart Van Assche wrote:
>> +	/*
>> +	 * Avoid that the SCSI error handler tries to use this channel after
>> +	 * it has been freed. The SCSI error handler can namely continue
>> +	 * trying to perform recovery actions after scsi_remove_host()
>> +	 * returned.
>> +	 */
>> +	ch->target = NULL;
>
> Do you have a reproducer for that?  I think we should fix the root
> cause.

Hello Christoph,

The above assignment statement has been reported to fix a kernel oops 
that could be triggered by cable pulling. Regarding fixing the root 
cause: some time ago I had posted a patch series that makes 
scsi_remove_host() wait until all error handler callback functions have 
finished and also that prevents that any new error handler function 
calls are initiated after scsi_remove_host() has finished 
(http://thread.gmane.org/gmane.linux.scsi/82572/focus=87985). Should I 
repost that patch series ?

Bart.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 11/12] IB/srp: Eliminate free_reqs list
  2014-10-20 11:47           ` Bart Van Assche
@ 2014-10-21  8:49             ` Christoph Hellwig
  0 siblings, 0 replies; 83+ messages in thread
From: Christoph Hellwig @ 2014-10-21  8:49 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg,
	Sebastian Parschauer, Robert Elliott, Ming Lei, linux-scsi,
	linux-rdma

On Mon, Oct 20, 2014 at 01:47:53PM +0200, Bart Van Assche wrote:
> The only reason why patches 10/12 and 11/12 are separate patches
> is to reduce the size of individual patches and hence to make it
> easier to review these patches. If everyone agrees I'm fine with
> folding patch 11/12 into patch 10/12.

I would prefer to merge the two patches.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 12/12] IB/srp: Add multichannel support
  2014-10-20 11:57       ` Bart Van Assche
@ 2014-10-21  8:49         ` Christoph Hellwig
  0 siblings, 0 replies; 83+ messages in thread
From: Christoph Hellwig @ 2014-10-21  8:49 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg,
	Sebastian Parschauer, Robert Elliott, Ming Lei, linux-scsi,
	linux-rdma

On Mon, Oct 20, 2014 at 01:57:21PM +0200, Bart Van Assche wrote:
> That pr_err() statement was convenient while debugging the multiqueue code
> in the SRP initiator driver but can be left out. Would you agree with
> leaving the above three lines of debug code out instead of adding an
> additional argument to scsi_host_find_tag() ?

Feel free to remove the check.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 04/12] scsi_tcq.h: Add support for multiple hardware queues
       [not found]         ` <5444F995.5080407-HInyCGIudOg@public.gmane.org>
@ 2014-10-21  8:49           ` Christoph Hellwig
  2014-10-21  8:59             ` Sagi Grimberg
  0 siblings, 1 reply; 83+ messages in thread
From: Christoph Hellwig @ 2014-10-21  8:49 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Sagi Grimberg, Christoph Hellwig, Jens Axboe, Sagi Grimberg,
	Sebastian Parschauer, Robert Elliott, Ming Lei,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On Mon, Oct 20, 2014 at 02:01:25PM +0200, Bart Van Assche wrote:
> On 10/19/14 18:12, Sagi Grimberg wrote:
> >On 10/7/2014 4:04 PM, Bart Van Assche wrote:
> >>-            req = blk_queue_find_tag(sdev->request_queue, tag);
> >>+        req = blk_queue_find_tag(sdev->request_queue, tag);
> >
> >Why is this line different?
> 
> This is because the indentation has been modified from "8x<space><tab>" into
> "<tab><tab>". I can leave out that change if you want.

Please keep it.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 07/12] IB/srp: Avoid that I/O hangs due to a cable pull during LUN scanning
  2014-10-20 12:15       ` Bart Van Assche
@ 2014-10-21  8:50         ` Christoph Hellwig
  0 siblings, 0 replies; 83+ messages in thread
From: Christoph Hellwig @ 2014-10-21  8:50 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Sagi Grimberg, Christoph Hellwig, Jens Axboe, Sagi Grimberg,
	Sebastian Parschauer, Robert Elliott, Ming Lei, linux-scsi,
	linux-rdma

On Mon, Oct 20, 2014 at 02:15:07PM +0200, Bart Van Assche wrote:
> How about renaming this function into shost_sdev_count() and moving its
> declaration to <scsi/scsi_device.h> and its implementation to
> drivers/scsi/scsi_lib.c ?

I'd prefer to defer this until we have an actual need for it elsewhere.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: EH action after scsi_remove_host, was: Re: [PATCH v2 12/12] IB/srp: Add multichannel support
  2014-10-20 13:53     ` Bart Van Assche
@ 2014-10-21  8:51       ` Christoph Hellwig
  0 siblings, 0 replies; 83+ messages in thread
From: Christoph Hellwig @ 2014-10-21  8:51 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg,
	Sebastian Parschauer, Robert Elliott, linux-scsi

On Mon, Oct 20, 2014 at 03:53:38PM +0200, Bart Van Assche wrote:
> The above assignment statement has been reported to fix a kernel oops that
> could be triggered by cable pulling. Regarding fixing the root cause: some
> time ago I had posted a patch series that makes scsi_remove_host() wait
> until all error handler callback functions have finished and also that
> prevents that any new error handler function calls are initiated after
> scsi_remove_host() has finished
> (http://thread.gmane.org/gmane.linux.scsi/82572/focus=87985). Should I
> repost that patch series ?

Please keep the workaround in srp for now, and then resend the series,
including a new patch to remove the workaround in srp.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 04/12] scsi_tcq.h: Add support for multiple hardware queues
  2014-10-21  8:49           ` Christoph Hellwig
@ 2014-10-21  8:59             ` Sagi Grimberg
  0 siblings, 0 replies; 83+ messages in thread
From: Sagi Grimberg @ 2014-10-21  8:59 UTC (permalink / raw)
  To: Christoph Hellwig, Bart Van Assche
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi, linux-rdma

On 10/21/2014 11:49 AM, Christoph Hellwig wrote:
> On Mon, Oct 20, 2014 at 02:01:25PM +0200, Bart Van Assche wrote:
>> On 10/19/14 18:12, Sagi Grimberg wrote:
>>> On 10/7/2014 4:04 PM, Bart Van Assche wrote:
>>>> -            req = blk_queue_find_tag(sdev->request_queue, tag);
>>>> +        req = blk_queue_find_tag(sdev->request_queue, tag);
>>>
>>> Why is this line different?
>>
>> This is because the indentation has been modified from "8x<space><tab>" into
>> "<tab><tab>". I can leave out that change if you want.
>
> Please keep it.
>

I don't have a big objection on this, but the problem with leaving this
stuff is that it tends to screw up git blame...

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 12/12] IB/srp: Add multichannel support
       [not found]         ` <54450690.709-HInyCGIudOg@public.gmane.org>
@ 2014-10-21  9:10           ` Sagi Grimberg
       [not found]             ` <544622FE.5040906-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Sagi Grimberg @ 2014-10-21  9:10 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On 10/20/2014 3:56 PM, Bart Van Assche wrote:
> On 10/19/14 19:36, Sagi Grimberg wrote:
>> On 10/7/2014 4:07 PM, Bart Van Assche wrote:
>>>           * comp_vector, a number in the range 0..n-1 specifying the
>>> -          MSI-X completion vector. Some HCA's allocate multiple (n)
>>> -          MSI-X vectors per HCA port. If the IRQ affinity masks of
>>> -          these interrupts have been configured such that each MSI-X
>>> -          interrupt is handled by a different CPU then the comp_vector
>>> -          parameter can be used to spread the SRP completion workload
>>> -          over multiple CPU's.
>>> +          MSI-X completion vector of the first RDMA channel. Some
>>> +          HCA's allocate multiple (n) MSI-X vectors per HCA port. If
>>> +          the IRQ affinity masks of these interrupts have been
>>> +          configured such that each MSI-X interrupt is handled by a
>>> +          different CPU then the comp_vector parameter can be used to
>>> +          spread the SRP completion workload over multiple CPU's.
>>
>> This is fairly not trivial for the user...
>>
>> Aren't we requesting a bit too much awareness here?
>> Can't we just "make it work"? The user hands out ch_count - why can't
>> you do some least-used logic here?
>>
>> Maybe we can even go with per-cpu QPs and discard comp_vector argument?
>> this would probably bring the best performance, wouldn't it?
>> (fallback to least-used logic in case HW support less vectors)
>
> Hello Sagi,
>
> The only reason the comp_vector parameter is still supported is because
> of backwards compatibility. What I expect is that users will set the
> ch_count parameter but not the comp_vector parameter.

Agreed...

>
> Using one QP per CPU thread does not necessarily result in the best
> performance. In the tests I ran performance was about 4% better when
> using one QP for each pair of CPU threads (with hyperthreading enabled).

I usually don't like using defaults based on empirical experiments on
specific workloads. IMO, going either full blown MQ (per-cpu), or
go SQ for default.

But that is just my opinion...
you call it.

>
>>> +static unsigned ch_count;
>>> +module_param(ch_count, uint, 0444);
>>> +MODULE_PARM_DESC(ch_count,
>>> +         "Number of RDMA channels to use for communication with an
>>> SRP target. Using more than one channel improves performance if the
>>> HCA supports multiple completion vectors. The default value is the
>>> minimum of four times the number of online CPU sockets and the number
>>> of completion vectors supported by the HCA.");
>>
>> Why? how did you get to this magic equation?
>
> On the systems I have access to measurements have shown that this choice
> for the ch_count parameter results in a significant performance
> improvement without consuming too many system resources. The performance
> difference when using more than four channels was small. This means that
> the exact value of this parameter is not that important. What matters to
> me is that users can benefit from improved performance even if the
> ch_count kernel module parameter has been left to its default value.

I do like the idea of giving users high performance out-of-the-box. But
as I wrote below, I less like the idea of basing your choice on
experiments.

Sagi.

>
> Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 12/12] IB/srp: Add multichannel support
       [not found]   ` <5433E585.607-HInyCGIudOg@public.gmane.org>
  2014-10-19 17:36     ` Sagi Grimberg
@ 2014-10-21  9:14     ` Sagi Grimberg
  2014-10-29 12:36       ` Bart Van Assche
  1 sibling, 1 reply; 83+ messages in thread
From: Sagi Grimberg @ 2014-10-21  9:14 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On 10/7/2014 4:07 PM, Bart Van Assche wrote:
> Improve performance by using multiple RDMA/RC channels per SCSI
> host for communication with an SRP target. About the
> implementation:
> - Introduce a loop over all channels in the code that uses
>    target->ch.
> - Set the SRP_MULTICHAN_MULTI flag during login for the creation
>    of the second and subsequent channels.
> - RDMA completion vectors are chosen such that RDMA completion
>    interrupts are handled by the CPU socket that submitted the I/O
>    request. As one can see in this patch it has been assumed if a
>    system contains n CPU sockets and m RDMA completion vectors
>    have been assigned to an RDMA HCA that IRQ affinity has been
>    configured such that completion vectors [i*m/n..(i+1)*m/n) are
>    bound to CPU socket i with 0 <= i < n.
> - Modify srp_free_ch_ib() and srp_free_req_data() such that it
>    becomes safe to invoke these functions after the corresponding
>    allocation function failed.
> - Add a ch_count sysfs attribute per target port.
>
> Signed-off-by: Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org>
> Cc: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> Cc: Sebastian Parschauer <sebastian.riemer-EIkl63zCoXaH+58JC4qpiA@public.gmane.org>

<SNIP>

>   			spin_lock_irqsave(&ch->lock, flags);
>   			ch->req_lim += be32_to_cpu(rsp->req_lim_delta);
> @@ -1906,7 +1970,7 @@ static int srp_queuecommand(struct Scsi_Host *shost, struct scsi_cmnd *scmnd)
>   		goto err;

Bart,

Any chance you can share some perf output on this code?
I'm interested of knowing the contention on target->lock that is
still taken on the IO path across channels.

Can we think on how to avoid it?

Also would like to understand the where did the bottleneck transition.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* RE: [PATCH v2 10/12] IB/srp: Use block layer tags
       [not found]   ` <5433E557.3010505-HInyCGIudOg@public.gmane.org>
  2014-10-17 10:58     ` Christoph Hellwig
@ 2014-10-22 22:03     ` Elliott, Robert (Server Storage)
       [not found]       ` <94D0CD8314A33A4D9D801C0FE68B4029593212E0-wwDBVnaDRpYSZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
  2014-10-23  8:47       ` Christoph Hellwig
  1 sibling, 2 replies; 83+ messages in thread
From: Elliott, Robert (Server Storage) @ 2014-10-22 22:03 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Ming Lei,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma, Scales, Webb,
	Don Brace (PMC)

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 1195 bytes --]

> -----Original Message-----
> From: Bart Van Assche [mailto:bvanassche@acm.org]
> Sent: Tuesday, 07 October, 2014 8:07 AM
...
> @@ -1927,7 +1931,7 @@ static int srp_queuecommand(struct Scsi_Host
> *shost, struct scsi_cmnd *scmnd)
> 
>  	cmd->opcode = SRP_CMD;
>  	cmd->lun    = cpu_to_be64((u64) scmnd->device->lun << 48);
> -	cmd->tag    = req->index;
> +	cmd->tag    = tag;
>  	memcpy(cmd->cdb, scmnd->cmnd, scmnd->cmd_len);
> 
>  	req->scmnd    = scmnd;
...
> 
> +static int srp_slave_alloc(struct scsi_device *sdev)
> +{
> +	sdev->tagged_supported = 1;
> +
> +	scsi_activate_tcq(sdev, sdev->queue_depth);
> +
> +	return 0;
> +}
> +

Have you tested this with scsi_mod.use_blk_mq=n?

Trying similar changes in hpsa, we still receive some INQUIRY commands 
submitted through queuecommand with tag -1.  They are for devices for
which slave_alloc has not yet been run, implying this work needs to 
be done even earlier.  Maybe the midlayer is missing a slave_alloc
call somewhere?

---
Rob Elliott    HP Server Storage



N‹§²æìr¸›yúèšØb²X¬¶Ç§vØ^–)Þº{.nÇ+‰·¥Š{±­ÙšŠ{ayº\x1dʇڙë,j\a­¢f£¢·hš‹»öì\x17/oSc¾™Ú³9˜uÀ¦æå‰È&jw¨®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿïêäz¹Þ–Šàþf£¢·hšˆ§~ˆmš

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 10/12] IB/srp: Use block layer tags
       [not found]       ` <94D0CD8314A33A4D9D801C0FE68B4029593212E0-wwDBVnaDRpYSZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
@ 2014-10-23  7:16         ` Bart Van Assche
  2014-10-23 17:43           ` Webb Scales
  0 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-23  7:16 UTC (permalink / raw)
  To: Elliott, Robert (Server Storage), Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Ming Lei,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma, Scales, Webb,
	Don Brace (PMC)

On 10/23/14 00:03, Elliott, Robert (Server Storage) wrote:
>> -----Original Message-----
>> From: Bart Van Assche [mailto:bvanassche-HInyCGIudOg@public.gmane.org]
>> Sent: Tuesday, 07 October, 2014 8:07 AM
> ...
>> @@ -1927,7 +1931,7 @@ static int srp_queuecommand(struct Scsi_Host
>> *shost, struct scsi_cmnd *scmnd)
>>
>>   	cmd->opcode = SRP_CMD;
>>   	cmd->lun    = cpu_to_be64((u64) scmnd->device->lun << 48);
>> -	cmd->tag    = req->index;
>> +	cmd->tag    = tag;
>>   	memcpy(cmd->cdb, scmnd->cmnd, scmnd->cmd_len);
>>
>>   	req->scmnd    = scmnd;
> ...
>>
>> +static int srp_slave_alloc(struct scsi_device *sdev)
>> +{
>> +	sdev->tagged_supported = 1;
>> +
>> +	scsi_activate_tcq(sdev, sdev->queue_depth);
>> +
>> +	return 0;
>> +}
>> +
>
> Have you tested this with scsi_mod.use_blk_mq=n?
>
> Trying similar changes in hpsa, we still receive some INQUIRY commands
> submitted through queuecommand with tag -1.  They are for devices for
> which slave_alloc has not yet been run, implying this work needs to
> be done even earlier.  Maybe the midlayer is missing a slave_alloc
> call somewhere?

Hello Rob,

All my tests with use_blk_mq=n were run with a WARN_ON_ONCE(req->tag < 
0) statement present in srp_queuecommand(). I haven't seen any kernel 
warning being triggered during the tests I ran.

I also had a look at scsi_alloc_sdev() in drivers/scsi/scsi_scan.c. The 
number of statements between queue allocation and the slave_alloc() call 
is limited. The only scenario I can think of which could cause 
queuecommand() to be invoked before slave_alloc() is a LUN scan 
initiated from user space via sysfs due to 
scsi_sysfs_device_initialize() being invoked before slave_alloc(). Does 
that make sense to you ?

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 10/12] IB/srp: Use block layer tags
  2014-10-22 22:03     ` Elliott, Robert (Server Storage)
       [not found]       ` <94D0CD8314A33A4D9D801C0FE68B4029593212E0-wwDBVnaDRpYSZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
@ 2014-10-23  8:47       ` Christoph Hellwig
  2014-10-24  4:43         ` Elliott, Robert (Server Storage)
  1 sibling, 1 reply; 83+ messages in thread
From: Christoph Hellwig @ 2014-10-23  8:47 UTC (permalink / raw)
  To: Elliott, Robert (Server Storage)
  Cc: Bart Van Assche, Jens Axboe, Sagi Grimberg, Sebastian Parschauer,
	Ming Lei, linux-scsi, linux-rdma, Scales, Webb, Don Brace (PMC)

On Wed, Oct 22, 2014 at 10:03:24PM +0000, Elliott, Robert (Server Storage) wrote:
> Have you tested this with scsi_mod.use_blk_mq=n?
> 
> Trying similar changes in hpsa, we still receive some INQUIRY commands 
> submitted through queuecommand with tag -1.  They are for devices for
> which slave_alloc has not yet been run, implying this work needs to 
> be done even earlier.  Maybe the midlayer is missing a slave_alloc
> call somewhere?

Did that version of hpsa really enable tagging in slave_alloc
or just in slave_configure?  The latter would cause INQUIRY to be
sent untagged.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 10/12] IB/srp: Use block layer tags
  2014-10-23  7:16         ` Bart Van Assche
@ 2014-10-23 17:43           ` Webb Scales
       [not found]             ` <54493E5A.7050803-VXdhtT5mjnY@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Webb Scales @ 2014-10-23 17:43 UTC (permalink / raw)
  To: Bart Van Assche, Elliott, Robert (Server Storage), Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Ming Lei,
	linux-scsi, linux-rdma, Don Brace (PMC)

On 10/23/14 3:16 AM, Bart Van Assche wrote:
> All my tests with use_blk_mq=n were run with a WARN_ON_ONCE(req->tag < 
> 0) statement present in srp_queuecommand(). I haven't seen any kernel 
> warning being triggered during the tests I ran.
Bart, what's the data type of "req->tag", here?  (E.g., if it 
"unsigned", it will never be less than zero, right?)


             Thanks,

                 Webb

^ permalink raw reply	[flat|nested] 83+ messages in thread

* RE: [PATCH v2 10/12] IB/srp: Use block layer tags
  2014-10-23  8:47       ` Christoph Hellwig
@ 2014-10-24  4:43         ` Elliott, Robert (Server Storage)
  2014-10-24  6:45           ` Christoph Hellwig
  0 siblings, 1 reply; 83+ messages in thread
From: Elliott, Robert (Server Storage) @ 2014-10-24  4:43 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Bart Van Assche, Jens Axboe, Sagi Grimberg, Sebastian Parschauer,
	Ming Lei, linux-scsi, linux-rdma, Scales, Webb, Don Brace (PMC)



> -----Original Message-----
> From: Christoph Hellwig [mailto:hch@infradead.org]
> Sent: Thursday, October 23, 2014 3:48 AM
> To: Elliott, Robert (Server Storage)
> Cc: Bart Van Assche; Jens Axboe; Sagi Grimberg; Sebastian Parschauer;
> Ming Lei; linux-scsi@vger.kernel.org; linux-rdma; Scales, Webb; Don
> Brace (PMC)
> Subject: Re: [PATCH v2 10/12] IB/srp: Use block layer tags
> 
> On Wed, Oct 22, 2014 at 10:03:24PM +0000, Elliott, Robert (Server
> Storage) wrote:
> > Have you tested this with scsi_mod.use_blk_mq=n?
> >
> > Trying similar changes in hpsa, we still receive some INQUIRY commands
> > submitted through queuecommand with tag -1.  They are for devices for
> > which slave_alloc has not yet been run, implying this work needs to
> > be done even earlier.  Maybe the midlayer is missing a slave_alloc
> > call somewhere?
> 
> Did that version of hpsa really enable tagging in slave_alloc
> or just in slave_configure?  The latter would cause INQUIRY to be
> sent untagged.

Yes, it is slave_alloc, not slave_configure.  

However, it was looking at scmd->tag, which is always 0xff (at 
least in those early discovery commands).  scmd->request->tag 
looks like it is the field that has the correct values.

Also, I noticed that scmd->tag is just an 8 bit field, so
it could never represent a large number of tags.

Just to confirm: After calling scsi_init_shared_tag_map()
in non-mq mode, will scmd->request->tag be based on 
controller-wide tag allocation (never using the same
value at the same time for the request queues of multiple
devices in that controller)?



^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 10/12] IB/srp: Use block layer tags
       [not found]             ` <54493E5A.7050803-VXdhtT5mjnY@public.gmane.org>
@ 2014-10-24  6:45               ` Bart Van Assche
       [not found]                 ` <5449F571.7080308-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-24  6:45 UTC (permalink / raw)
  To: webbnh-VXdhtT5mjnY, Elliott, Robert (Server Storage), Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Ming Lei,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma, Don Brace (PMC)

On 10/23/14 19:43, Webb Scales wrote:
> On 10/23/14 3:16 AM, Bart Van Assche wrote:
>> All my tests with use_blk_mq=n were run with a WARN_ON_ONCE(req->tag <
>> 0) statement present in srp_queuecommand(). I haven't seen any kernel
>> warning being triggered during the tests I ran.
>
> Bart, what's the data type of "req->tag", here?  (E.g., if it
> "unsigned", it will never be less than zero, right?)

Hello Webb,

This is what I found in "struct request" in <linux/blkdev.h>:

struct request {
	[ ... ]
	int tag;
	[ ... ]
};

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 10/12] IB/srp: Use block layer tags
  2014-10-24  4:43         ` Elliott, Robert (Server Storage)
@ 2014-10-24  6:45           ` Christoph Hellwig
       [not found]             ` <20141024064514.GA15654-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
  2014-11-03  7:52             ` Kashyap Desai
  0 siblings, 2 replies; 83+ messages in thread
From: Christoph Hellwig @ 2014-10-24  6:45 UTC (permalink / raw)
  To: Elliott, Robert (Server Storage)
  Cc: Bart Van Assche, Jens Axboe, Sagi Grimberg, Sebastian Parschauer,
	Ming Lei, linux-scsi, linux-rdma, Scales, Webb, Don Brace (PMC)

On Fri, Oct 24, 2014 at 04:43:15AM +0000, Elliott, Robert (Server Storage) wrote:
> However, it was looking at scmd->tag, which is always 0xff (at 
> least in those early discovery commands).  scmd->request->tag 
> looks like it is the field that has the correct values.
> 
> Also, I noticed that scmd->tag is just an 8 bit field, so
> it could never represent a large number of tags.

Yes, we need to get rid of scmd->tag.  Hannes had a patchset to get
started on it, and I hope either he or someone else will have time to
get back to it ASAP.

> Just to confirm: After calling scsi_init_shared_tag_map()
> in non-mq mode, will scmd->request->tag be based on 
> controller-wide tag allocation (never using the same
> value at the same time for the request queues of multiple
> devices in that controller)?

Yes.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 10/12] IB/srp: Use block layer tags
       [not found]                 ` <5449F571.7080308-HInyCGIudOg@public.gmane.org>
@ 2014-10-24 15:40                   ` Webb Scales
  0 siblings, 0 replies; 83+ messages in thread
From: Webb Scales @ 2014-10-24 15:40 UTC (permalink / raw)
  To: Bart Van Assche, Elliott, Robert (Server Storage), Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Ming Lei,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma, Don Brace (PMC)

On 10/24/14 2:45 AM, Bart Van Assche wrote:
> On 10/23/14 19:43, Webb Scales wrote:
>> On 10/23/14 3:16 AM, Bart Van Assche wrote:
>>> All my tests with use_blk_mq=n were run with a WARN_ON_ONCE(req->tag <
>>> 0) statement present in srp_queuecommand(). I haven't seen any kernel
>>> warning being triggered during the tests I ran.
>>
>> Bart, what's the data type of "req->tag", here?  (E.g., if it
>> "unsigned", it will never be less than zero, right?)
>
> Hello Webb,
>
> This is what I found in "struct request" in <linux/blkdev.h>:
>
> struct request {
>     [ ... ]
>     int tag;
>     [ ... ]
> };
>
> Bart.
Good:  I just wanted to make sure that you weren't referencing the "tag" 
field in the "scsi_cmnd" struct (which _is_ unsigned).


                 Webb
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 02/12] blk-mq: Add blk_mq_unique_tag()
       [not found]   ` <5433E493.9030304-HInyCGIudOg@public.gmane.org>
@ 2014-10-28  1:55     ` Martin K. Petersen
  0 siblings, 0 replies; 83+ messages in thread
From: Martin K. Petersen @ 2014-10-28  1:55 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg,
	Sebastian Parschauer, Robert Elliott, Ming Lei,
	linux-scsi@vger.kernel.org, linux-rdma

>>>>> "Bart" == Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> writes:

Bart> The queuecommand() callback functions in SCSI low-level drivers
Bart> need to know which hardware context has been selected by the block
Bart> layer. Since this information is not available in the request
Bart> structure, and since passing the hctx pointer directly to the
Bart> queuecommand callback function would require modification of all
Bart> SCSI LLDs, add a function to the block layer that allows to query
Bart> the hardware context index.

I agree with consolidating the two functions. Otherwise OK.

Reviewed-by: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

-- 
Martin K. Petersen	Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 03/12] scsi-mq: Add support for multiple hardware queues
       [not found]     ` <5433E4AB.8030306-HInyCGIudOg@public.gmane.org>
  2014-10-19 15:54       ` Sagi Grimberg
@ 2014-10-28  2:01       ` Martin K. Petersen
  2014-10-29 12:22         ` Bart Van Assche
  1 sibling, 1 reply; 83+ messages in thread
From: Martin K. Petersen @ 2014-10-28  2:01 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg,
	Sebastian Parschauer, Robert Elliott, Ming Lei,
	linux-scsi@vger.kernel.org, linux-rdma

>>>>> "Bart" == Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> writes:

Bart> Allow a SCSI LLD to declare how many hardware queues it supports
Bart> by setting Scsi_Host.nr_hw_queues before calling scsi_add_host().

Bart> Note: it is assumed that each hardware queue has a queue depth of
Bart> shost-> can_queue. In other words, the total queue depth per host
Bart> is (number of hardware queues) * (shost->can_queue).

I suggest you emphasize that assumption in the header file.

Also: What about the host template?

Reviewed-by: Martin K. Petersen <martin.petersen-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

-- 
Martin K. Petersen	Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 04/12] scsi_tcq.h: Add support for multiple hardware queues
  2014-10-07 13:04 ` [PATCH v2 04/12] scsi_tcq.h: Add support for multiple hardware queues Bart Van Assche
  2014-10-19 16:12   ` Sagi Grimberg
@ 2014-10-28  2:06   ` Martin K. Petersen
  1 sibling, 0 replies; 83+ messages in thread
From: Martin K. Petersen @ 2014-10-28  2:06 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg,
	Sebastian Parschauer, Robert Elliott, Ming Lei, linux-scsi,
	linux-rdma

>>>>> "Bart" == Bart Van Assche <bvanassche@acm.org> writes:

Bart> Modify scsi_find_tag() and scsi_host_find_tag() such that these
Bart> fuctions can translate a tag generated by blk_mq_unique_tag().

Looks good to me.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 12/12] IB/srp: Add multichannel support
       [not found]             ` <544622FE.5040906-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2014-10-28 18:32               ` Sagi Grimberg
       [not found]                 ` <544FE13A.60807-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Sagi Grimberg @ 2014-10-28 18:32 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On 10/21/2014 12:10 PM, Sagi Grimberg wrote:
> On 10/20/2014 3:56 PM, Bart Van Assche wrote:
>> On 10/19/14 19:36, Sagi Grimberg wrote:
>>> On 10/7/2014 4:07 PM, Bart Van Assche wrote:
>>>>           * comp_vector, a number in the range 0..n-1 specifying the
>>>> -          MSI-X completion vector. Some HCA's allocate multiple (n)
>>>> -          MSI-X vectors per HCA port. If the IRQ affinity masks of
>>>> -          these interrupts have been configured such that each MSI-X
>>>> -          interrupt is handled by a different CPU then the comp_vector
>>>> -          parameter can be used to spread the SRP completion workload
>>>> -          over multiple CPU's.
>>>> +          MSI-X completion vector of the first RDMA channel. Some
>>>> +          HCA's allocate multiple (n) MSI-X vectors per HCA port. If
>>>> +          the IRQ affinity masks of these interrupts have been
>>>> +          configured such that each MSI-X interrupt is handled by a
>>>> +          different CPU then the comp_vector parameter can be used to
>>>> +          spread the SRP completion workload over multiple CPU's.
>>>
>>> This is fairly not trivial for the user...
>>>
>>> Aren't we requesting a bit too much awareness here?
>>> Can't we just "make it work"? The user hands out ch_count - why can't
>>> you do some least-used logic here?
>>>
>>> Maybe we can even go with per-cpu QPs and discard comp_vector argument?
>>> this would probably bring the best performance, wouldn't it?
>>> (fallback to least-used logic in case HW support less vectors)
>>
>> Hello Sagi,
>>
>> The only reason the comp_vector parameter is still supported is because
>> of backwards compatibility. What I expect is that users will set the
>> ch_count parameter but not the comp_vector parameter.
>

Hey Bart,

Another wander I have with this. Say you have 8 cores on a single numa
node. First connection will attach to vectors 0-3 (ch_count=4) and so
are all the connections. Don't we want to spread that a little?

If we are not going per-cpu, why aren't we trying to spread vectors
around to try and reduce the interference?

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 12/12] IB/srp: Add multichannel support
       [not found]                 ` <544FE13A.60807-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2014-10-29 10:52                   ` Bart Van Assche
  2014-10-30 14:19                     ` Sagi Grimberg
  0 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-29 10:52 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On 10/28/14 19:32, Sagi Grimberg wrote:
> On 10/21/2014 12:10 PM, Sagi Grimberg wrote:
>> On 10/20/2014 3:56 PM, Bart Van Assche wrote:
>>> On 10/19/14 19:36, Sagi Grimberg wrote:
>>>> On 10/7/2014 4:07 PM, Bart Van Assche wrote:
>>>>>           * comp_vector, a number in the range 0..n-1 specifying the
>>>>> -          MSI-X completion vector. Some HCA's allocate multiple (n)
>>>>> -          MSI-X vectors per HCA port. If the IRQ affinity masks of
>>>>> -          these interrupts have been configured such that each MSI-X
>>>>> -          interrupt is handled by a different CPU then the
>>>>> comp_vector
>>>>> -          parameter can be used to spread the SRP completion workload
>>>>> -          over multiple CPU's.
>>>>> +          MSI-X completion vector of the first RDMA channel. Some
>>>>> +          HCA's allocate multiple (n) MSI-X vectors per HCA port. If
>>>>> +          the IRQ affinity masks of these interrupts have been
>>>>> +          configured such that each MSI-X interrupt is handled by a
>>>>> +          different CPU then the comp_vector parameter can be used to
>>>>> +          spread the SRP completion workload over multiple CPU's.
>>>>
>>>> This is fairly not trivial for the user...
>>>>
>>>> Aren't we requesting a bit too much awareness here?
>>>> Can't we just "make it work"? The user hands out ch_count - why can't
>>>> you do some least-used logic here?
>>>>
>>>> Maybe we can even go with per-cpu QPs and discard comp_vector argument?
>>>> this would probably bring the best performance, wouldn't it?
>>>> (fallback to least-used logic in case HW support less vectors)
>>>
>>> The only reason the comp_vector parameter is still supported is because
>>> of backwards compatibility. What I expect is that users will set the
>>> ch_count parameter but not the comp_vector parameter.
>
> Another wander I have with this. Say you have 8 cores on a single numa
> node. First connection will attach to vectors 0-3 (ch_count=4) and so
> are all the connections. Don't we want to spread that a little?
>
> If we are not going per-cpu, why aren't we trying to spread vectors
> around to try and reduce the interference?

Hello Sagi,

Sorry but your question is not entirely clear to me. Are you referring 
to spreading the workload over CPU's or over completion vectors ? If a 
user wants to spread the completion workload maximally by using all 
completion vectors that can be achieved by setting ch_count to a value 
that is equal to or larger than the number of completion vectors.

As mentioned in the commit message, spreading the completion workload 
over CPU's is not entirely under control of the SRP initiator driver. It 
is assumed that a user assigns IRQ affinity such that the interrupts 
associated with different completion vectors are processed by different 
CPU threads. If there are more RDMA channels than completion vectors, 
the SRP initiator driver associates RDMA channels with completion 
vectors such that the workload is spread evenly over the completion vectors.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 03/12] scsi-mq: Add support for multiple hardware queues
  2014-10-28  2:01       ` Martin K. Petersen
@ 2014-10-29 12:22         ` Bart Van Assche
  2014-10-29 12:27           ` Bart Van Assche
  0 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-29 12:22 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg,
	Sebastian Parschauer, Robert Elliott, Ming Lei, linux-scsi,
	linux-rdma

On 10/28/14 03:01, Martin K. Petersen wrote:
>>>>>> "Bart" == Bart Van Assche <bvanassche@acm.org> writes:
>
> Bart> Allow a SCSI LLD to declare how many hardware queues it supports
> Bart> by setting Scsi_Host.nr_hw_queues before calling scsi_add_host().
>
> Bart> Note: it is assumed that each hardware queue has a queue depth of
> Bart> shost-> can_queue. In other words, the total queue depth per host
> Bart> is (number of hardware queues) * (shost->can_queue).
>
> I suggest you emphasize that assumption in the header file.
>
> Also: What about the host template?
>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

Hello Martin,

Thanks for reviewing the blk-core and SCSI-core patches in this series. 
Regarding nr_hw_queues and the SCSI host template: setting that 
parameter in the host template is supported. I will mention this in the 
patch description. I will also clarify the queue size assumptions in 
scsi_host.h.

Bart.



^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 03/12] scsi-mq: Add support for multiple hardware queues
  2014-10-29 12:22         ` Bart Van Assche
@ 2014-10-29 12:27           ` Bart Van Assche
       [not found]             ` <5450DD49.6090108-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-29 12:27 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, Jens Axboe, Sagi Grimberg,
	Sebastian Parschauer, Robert Elliott, Ming Lei, linux-scsi,
	linux-rdma

On 10/29/14 13:22, Bart Van Assche wrote:
> Regarding nr_hw_queues and the SCSI host template: setting that
> parameter in the host template is supported.

(replying to my own e-mail)

Let me correct this: setting nr_hw_queues in the host template is not 
yet supported. If anyone would consider this useful, such functionality 
would be easy to add.

Bart.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 12/12] IB/srp: Add multichannel support
  2014-10-21  9:14     ` Sagi Grimberg
@ 2014-10-29 12:36       ` Bart Van Assche
  2014-10-30 14:22         ` Sagi Grimberg
  0 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-29 12:36 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi, linux-rdma

On 10/21/14 11:14, Sagi Grimberg wrote:
> On 10/7/2014 4:07 PM, Bart Van Assche wrote:
>>               spin_lock_irqsave(&ch->lock, flags);
>>               ch->req_lim += be32_to_cpu(rsp->req_lim_delta);
>> @@ -1906,7 +1970,7 @@ static int srp_queuecommand(struct Scsi_Host
>> *shost, struct scsi_cmnd *scmnd)
>>           goto err;
>
> Bart,
>
> Any chance you can share some perf output on this code?
> I'm interested of knowing the contention on target->lock that is
> still taken on the IO path across channels.
>
> Can we think on how to avoid it?
>
> Also would like to understand the where did the bottleneck transition.

Hello Sagi,

Are you referring to target->lock ? That lock isn't taken anywhere in 
the hot path. More in general, I haven't seen any lock contention in the 
perf output that was caused by the block layer, SCSI core, SRP initiator 
or HCA (mlx4) drivers. The code that showed up highest in the perf 
output was the direct I/O code, the code that is triggered by fio to 
submit I/O requests.

Bart.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 03/12] scsi-mq: Add support for multiple hardware queues
       [not found]             ` <5450DD49.6090108-HInyCGIudOg@public.gmane.org>
@ 2014-10-30  0:53               ` Martin K. Petersen
  0 siblings, 0 replies; 83+ messages in thread
From: Martin K. Petersen @ 2014-10-30  0:53 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Martin K. Petersen, Christoph Hellwig, Jens Axboe, Sagi Grimberg,
	Sebastian Parschauer, Robert Elliott, Ming Lei,
	linux-scsi@vger.kernel.org, linux-rdma

>>>>> "Bart" == Bart Van Assche <bvanassche-HInyCGIudOg@public.gmane.org> writes:

Bart> Let me correct this: setting nr_hw_queues in the host template is
Bart> not yet supported. If anyone would consider this useful, such
Bart> functionality would be easy to add.

It seemed like an obvious thing a device driver would want to describe
in its host template. That's why I asked.

We can add it if somebody wants it but it's a bit of a chicken and egg
problem. Anyway. Minor issue.

-- 
Martin K. Petersen	Oracle Linux Engineering
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 12/12] IB/srp: Add multichannel support
  2014-10-29 10:52                   ` Bart Van Assche
@ 2014-10-30 14:19                     ` Sagi Grimberg
  2014-10-30 14:36                       ` Bart Van Assche
  0 siblings, 1 reply; 83+ messages in thread
From: Sagi Grimberg @ 2014-10-30 14:19 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi, linux-rdma

On 10/29/2014 12:52 PM, Bart Van Assche wrote:
> On 10/28/14 19:32, Sagi Grimberg wrote:
>> On 10/21/2014 12:10 PM, Sagi Grimberg wrote:
>>> On 10/20/2014 3:56 PM, Bart Van Assche wrote:
>>>> On 10/19/14 19:36, Sagi Grimberg wrote:
>>>>> On 10/7/2014 4:07 PM, Bart Van Assche wrote:
>>>>>>           * comp_vector, a number in the range 0..n-1 specifying the
>>>>>> -          MSI-X completion vector. Some HCA's allocate multiple (n)
>>>>>> -          MSI-X vectors per HCA port. If the IRQ affinity masks of
>>>>>> -          these interrupts have been configured such that each MSI-X
>>>>>> -          interrupt is handled by a different CPU then the
>>>>>> comp_vector
>>>>>> -          parameter can be used to spread the SRP completion
>>>>>> workload
>>>>>> -          over multiple CPU's.
>>>>>> +          MSI-X completion vector of the first RDMA channel. Some
>>>>>> +          HCA's allocate multiple (n) MSI-X vectors per HCA port. If
>>>>>> +          the IRQ affinity masks of these interrupts have been
>>>>>> +          configured such that each MSI-X interrupt is handled by a
>>>>>> +          different CPU then the comp_vector parameter can be
>>>>>> used to
>>>>>> +          spread the SRP completion workload over multiple CPU's.
>>>>>
>>>>> This is fairly not trivial for the user...
>>>>>
>>>>> Aren't we requesting a bit too much awareness here?
>>>>> Can't we just "make it work"? The user hands out ch_count - why can't
>>>>> you do some least-used logic here?
>>>>>
>>>>> Maybe we can even go with per-cpu QPs and discard comp_vector
>>>>> argument?
>>>>> this would probably bring the best performance, wouldn't it?
>>>>> (fallback to least-used logic in case HW support less vectors)
>>>>
>>>> The only reason the comp_vector parameter is still supported is because
>>>> of backwards compatibility. What I expect is that users will set the
>>>> ch_count parameter but not the comp_vector parameter.
>>
>> Another wander I have with this. Say you have 8 cores on a single numa
>> node. First connection will attach to vectors 0-3 (ch_count=4) and so
>> are all the connections. Don't we want to spread that a little?
>>
>> If we are not going per-cpu, why aren't we trying to spread vectors
>> around to try and reduce the interference?
>
> Hello Sagi,
>
> Sorry but your question is not entirely clear to me. Are you referring
> to spreading the workload over CPU's or over completion vectors ?

I'm talking about completion vectors, but I assume both as I consider
spreading interrupt vectors across CPU cores a common practice.

> If a
> user wants to spread the completion workload maximally by using all
> completion vectors that can be achieved by setting ch_count to a value
> that is equal to or larger than the number of completion vectors.
>

I'm talking about the default.
My impression here that in the default settings, on a 1 NUMA node with
8 cores, 2 different srp connections (using 4 channels each) will be
associated with comp vectors 0-3. while it could potentially use
vectors 4-7 and reduce possible mutual interference. right?
(you said yourself that the user is not expected to use comp_vector
and it is only for backward compatibility).

Now given that each connection uses less than per-cpu channels, don't
you think this logic will be helpful?

> As mentioned in the commit message, spreading the completion workload
> over CPU's is not entirely under control of the SRP initiator driver.

I was referring to comp vectors - but I consider 1x1 mapping a common
usage when it comes to RDMA (and not only btw).

Feel free to correct me if I misunderstand the implementation.

Sagi.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 12/12] IB/srp: Add multichannel support
  2014-10-29 12:36       ` Bart Van Assche
@ 2014-10-30 14:22         ` Sagi Grimberg
  0 siblings, 0 replies; 83+ messages in thread
From: Sagi Grimberg @ 2014-10-30 14:22 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi, linux-rdma

On 10/29/2014 2:36 PM, Bart Van Assche wrote:
> On 10/21/14 11:14, Sagi Grimberg wrote:
>> On 10/7/2014 4:07 PM, Bart Van Assche wrote:
>>>               spin_lock_irqsave(&ch->lock, flags);
>>>               ch->req_lim += be32_to_cpu(rsp->req_lim_delta);
>>> @@ -1906,7 +1970,7 @@ static int srp_queuecommand(struct Scsi_Host
>>> *shost, struct scsi_cmnd *scmnd)
>>>           goto err;
>>
>> Bart,
>>
>> Any chance you can share some perf output on this code?
>> I'm interested of knowing the contention on target->lock that is
>> still taken on the IO path across channels.
>>
>> Can we think on how to avoid it?
>>
>> Also would like to understand the where did the bottleneck transition.
>
> Hello Sagi,
>
> Are you referring to target->lock ? That lock isn't taken anywhere in
> the hot path.

Right, my recollection was that we used to acquire the target-lock in
srp_chkready(). I see that's not the case anymore.

> More in general, I haven't seen any lock contention in the
> perf output that was caused by the block layer, SCSI core, SRP initiator
> or HCA (mlx4) drivers. The code that showed up highest in the perf
> output was the direct I/O code, the code that is triggered by fio to
> submit I/O requests.

Interesting, you don't see more contention on SQ/RQ/CQ locks? I find
that surprising.

Sagi.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 12/12] IB/srp: Add multichannel support
  2014-10-30 14:19                     ` Sagi Grimberg
@ 2014-10-30 14:36                       ` Bart Van Assche
       [not found]                         ` <54524D08.4040203-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-30 14:36 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi, linux-rdma

On 10/30/14 15:19, Sagi Grimberg wrote:
> My impression here that in the default settings, on a 1 NUMA node with
> 8 cores, 2 different srp connections (using 4 channels each) will be
> associated with comp vectors 0-3. while it could potentially use
> vectors 4-7 and reduce possible mutual interference. right?

Hello Sagi,

That's correct. For this example if use of all completion vectors is 
desired additional configuration is required, e.g. by setting ch_count 
to 8 in /etc/modprobe.d/ib_srp.conf. By the way, I'm not sure it is 
possible to avoid manual configuration and tuning entirely. As an 
example, with a six core CPU at the initiator side and with 
hyperthreading enabled (12 CPU threads in total) I see higher IOPS 
results with ch_count=6 compared to ch_count=8 or ch_count=12. I have 
not tried to determine why but maybe this is because ch_count values 
that are below the number of CPU threads cause some interrupt coalescing.

Bart.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 12/12] IB/srp: Add multichannel support
       [not found]                         ` <54524D08.4040203-HInyCGIudOg@public.gmane.org>
@ 2014-10-30 15:06                           ` Sagi Grimberg
       [not found]                             ` <545253E3.7000009-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Sagi Grimberg @ 2014-10-30 15:06 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On 10/30/2014 4:36 PM, Bart Van Assche wrote:
> On 10/30/14 15:19, Sagi Grimberg wrote:
>> My impression here that in the default settings, on a 1 NUMA node with
>> 8 cores, 2 different srp connections (using 4 channels each) will be
>> associated with comp vectors 0-3. while it could potentially use
>> vectors 4-7 and reduce possible mutual interference. right?
>
> Hello Sagi,
>
> That's correct. For this example if use of all completion vectors is
> desired additional configuration is required, e.g. by setting ch_count
> to 8 in /etc/modprobe.d/ib_srp.conf.

That is why I think that the user is still expected to be aware of
the configuration in order to get max performance. I would like to
see best performance to "just work". For example, I don't see any sort
of sw queue count to configure, it "just works".

Now I also agree with this may mean more (or sometimes way more)
resources, but I suggest that if we go with default of 4 per numa node
we should take care of such situations.

I'm not strict about this wrt to this patch set. But I think we should
consider this bit.

> By the way, I'm not sure it is
> possible to avoid manual configuration and tuning entirely. As an
> example, with a six core CPU at the initiator side and with
> hyperthreading enabled (12 CPU threads in total) I see higher IOPS
> results with ch_count=6 compared to ch_count=8 or ch_count=12.
> I have
> not tried to determine why but maybe this is because ch_count values
> that are below the number of CPU threads cause some interrupt coalescing.

I'm not aware of any implicit interrupt coalescing effect...
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 12/12] IB/srp: Add multichannel support
       [not found]                             ` <545253E3.7000009-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2014-10-30 15:19                               ` Bart Van Assche
       [not found]                                 ` <545256E5.9010501-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-30 15:19 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On 10/30/14 16:06, Sagi Grimberg wrote:
> I'm not aware of any implicit interrupt coalescing effect...

In case it was not clear what I was referring to: if multiple completion 
queue handling routines run on the same CPU then the average number of 
work completions processed by each completion handling routine increases 
due to the increased time between generation of an interrupt and the 
start of the completion handler routine. As you know this helps overall 
system throughput.

Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 12/12] IB/srp: Add multichannel support
       [not found]                                 ` <545256E5.9010501-HInyCGIudOg@public.gmane.org>
@ 2014-10-30 17:33                                   ` Sagi Grimberg
  2014-10-31  9:19                                     ` Bart Van Assche
  0 siblings, 1 reply; 83+ messages in thread
From: Sagi Grimberg @ 2014-10-30 17:33 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On 10/30/2014 5:19 PM, Bart Van Assche wrote:
> On 10/30/14 16:06, Sagi Grimberg wrote:
>> I'm not aware of any implicit interrupt coalescing effect...
>
> In case it was not clear what I was referring to: if multiple completion
> queue handling routines run on the same CPU then the average number of
> work completions processed by each completion handling routine increases
> due to the increased time between generation of an interrupt and the
> start of the completion handler routine. As you know this helps overall
> system throughput.
>

Now I realize that we can hit serious problems here since we never
solved the issue of srp polling routine that might poll forever within
an interrupt (or at least until a hard lockup). Its interesting that
you weren't able to hit that with a high workload. Did you try running
this code on a virtual function (I witnessed this issue in iser on a VM).

Moreover, the fairness issue is even more likely to be encountered in 
multichannel. Did you try to hit that? I really think this patchset
*needs* to deal with the 2 issues I mentioned as the probability of
hitting them increases with a faster IO stack.

I remember this was discussed lately with consideration for using
blk-iopoll or not. But I think that for now the initial approach of
bailing out of the once we hit a budget is fine for now.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 12/12] IB/srp: Add multichannel support
  2014-10-30 17:33                                   ` Sagi Grimberg
@ 2014-10-31  9:19                                     ` Bart Van Assche
       [not found]                                       ` <5453541D.7040206-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-10-31  9:19 UTC (permalink / raw)
  To: Sagi Grimberg, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi, linux-rdma

On 10/30/14 18:33, Sagi Grimberg wrote:
> Now I realize that we can hit serious problems here since we never
> solved the issue of srp polling routine that might poll forever within
> an interrupt (or at least until a hard lockup). Its interesting that
> you weren't able to hit that with a high workload. Did you try running
> this code on a virtual function (I witnessed this issue in iser on a VM).
>
> Moreover, the fairness issue is even more likely to be encountered in
> multichannel. Did you try to hit that? I really think this patchset
> *needs* to deal with the 2 issues I mentioned as the probability of
> hitting them increases with a faster IO stack.
>
> I remember this was discussed lately with consideration for using
> blk-iopoll or not. But I think that for now the initial approach of
> bailing out of the once we hit a budget is fine for now.

Hello Sagi,

As you mentioned so far this fairness issue has only caused trouble with 
iSER in a virtual machine guest. I have not yet seen anyone reporting a 
QP servicing fairness problem for the SRP initiator. Although analyzing 
and if needed limiting the maximum number of iterations in the SRP 
polling routine is on my to-do list, addressing that issue is outside of 
the scope of this patch series.

Regarding the impact of this patch series on QP handling fairness: the 
time spent in the SRP RDMA completion handler depends on the number of 
completions processed at once. This number depends on:
(a) The number of CPU cores in the initiator system that submit I/O and
     that are associated with a single RDMA channel.
(b) The target system processing speed per RDMA channel.

This patch series reduces (a) by a factor ch_count. (b) is either 
unaffected (linear scaling) or slightly reduced (less than linear 
scaling). My conclusion is that if this patch series has an impact on QP 
handling fairness that it will improve fairness since the number of 
completions processed at once either remains unchanged or that it is 
reduced.

Bart.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 10/12] IB/srp: Use block layer tags
       [not found]             ` <20141024064514.GA15654-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2014-10-31 17:34               ` Hannes Reinecke
  0 siblings, 0 replies; 83+ messages in thread
From: Hannes Reinecke @ 2014-10-31 17:34 UTC (permalink / raw)
  To: Christoph Hellwig, Elliott, Robert (Server Storage)
  Cc: Bart Van Assche, Jens Axboe, Sagi Grimberg, Sebastian Parschauer,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma, Scales,
	Webb, Don Brace (PMC)

On 10/24/2014 08:45 AM, Christoph Hellwig wrote:
> On Fri, Oct 24, 2014 at 04:43:15AM +0000, Elliott, Robert (Server Storage) wrote:
>> However, it was looking at scmd->tag, which is always 0xff (at 
>> least in those early discovery commands).  scmd->request->tag 
>> looks like it is the field that has the correct values.
>>
>> Also, I noticed that scmd->tag is just an 8 bit field, so
>> it could never represent a large number of tags.
> 
> Yes, we need to get rid of scmd->tag.  Hannes had a patchset to get
> started on it, and I hope either he or someone else will have time to
> get back to it ASAP.
> 
I'm planning on doing so; just waiting for the NCR5380 cleanup
to get in.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare-l3A5Bk7waGM@public.gmane.org			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 12/12] IB/srp: Add multichannel support
       [not found]                                       ` <5453541D.7040206-HInyCGIudOg@public.gmane.org>
@ 2014-11-02 13:03                                         ` Sagi Grimberg
  2014-11-03  1:46                                           ` Elliott, Robert (Server Storage)
  0 siblings, 1 reply; 83+ messages in thread
From: Sagi Grimberg @ 2014-11-02 13:03 UTC (permalink / raw)
  To: Bart Van Assche, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Robert Elliott,
	Ming Lei, linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On 10/31/2014 11:19 AM, Bart Van Assche wrote:
> On 10/30/14 18:33, Sagi Grimberg wrote:
>> Now I realize that we can hit serious problems here since we never
>> solved the issue of srp polling routine that might poll forever within
>> an interrupt (or at least until a hard lockup). Its interesting that
>> you weren't able to hit that with a high workload. Did you try running
>> this code on a virtual function (I witnessed this issue in iser on a VM).
>>
>> Moreover, the fairness issue is even more likely to be encountered in
>> multichannel. Did you try to hit that? I really think this patchset
>> *needs* to deal with the 2 issues I mentioned as the probability of
>> hitting them increases with a faster IO stack.
>>
>> I remember this was discussed lately with consideration for using
>> blk-iopoll or not. But I think that for now the initial approach of
>> bailing out of the once we hit a budget is fine for now.
>
> Hello Sagi,
>
> As you mentioned so far this fairness issue has only caused trouble with
> iSER in a virtual machine guest. I have not yet seen anyone reporting a
> QP servicing fairness problem for the SRP initiator.

IMHO, this is not iSER specific issue, it is easily indicated from the
code that a specific workload SRP will poll recv completion queue
forever in an interrupt context.

I encountered this issue on a virtual guest in a high workload (80+
sessions with heavy traffic on all) because qemu smp_affinity setting
was broken (might still be, didn't check that for a while). This caused 
all completion vectors to fire interrupts to core 0 causing a high
events contention on a single event queue (causing lockup situations
and starvation of other CQs). Using more completion queues will enhance
this situation.

I think running multichannel code when all MSIX vectors affinity are
directed to a single CPU can invoke what I'm talking about.

> Although analyzing
> and if needed limiting the maximum number of iterations in the SRP
> polling routine is on my to-do list, addressing that issue is outside of
> the scope of this patch series.

Although both of us did not yet hear of such complaints from SRP users,
I disagree because this might make the problems worse. But if you want
to take it later I guess that's fine too.

>
> Regarding the impact of this patch series on QP handling fairness: the
> time spent in the SRP RDMA completion handler depends on the number of
> completions processed at once. This number depends on:
> (a) The number of CPU cores in the initiator system that submit I/O and
>      that are associated with a single RDMA channel.
> (b) The target system processing speed per RDMA channel.
>
> This patch series reduces (a) by a factor ch_count.

This is under the assumption that IRQ affinity is spread across several
CPUS and that's fine, but we should *not* hit a hard lockup in case it
is not (and I suspect we can).

> (b) is either
> unaffected (linear scaling) or slightly reduced (less than linear
> scaling). My conclusion is that if this patch series has an impact on QP
> handling fairness that it will improve fairness since the number of
> completions processed at once either remains unchanged or that it is
> reduced.
>

I think in the single CPU completion queue processing, this can enhance
the problem as well.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* RE: [PATCH v2 12/12] IB/srp: Add multichannel support
  2014-11-02 13:03                                         ` Sagi Grimberg
@ 2014-11-03  1:46                                           ` Elliott, Robert (Server Storage)
  2014-11-04 11:46                                             ` Bart Van Assche
  0 siblings, 1 reply; 83+ messages in thread
From: Elliott, Robert (Server Storage) @ 2014-11-03  1:46 UTC (permalink / raw)
  To: Sagi Grimberg, Bart Van Assche, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Ming Lei,
	linux-scsi, linux-rdma



> -----Original Message-----
> From: Sagi Grimberg [mailto:sagig@dev.mellanox.co.il]
> Sent: Sunday, November 02, 2014 7:03 AM
> To: Bart Van Assche; Christoph Hellwig
> Cc: Jens Axboe; Sagi Grimberg; Sebastian Parschauer; Elliott, Robert
> (Server Storage); Ming Lei; linux-scsi@vger.kernel.org; linux-rdma
> Subject: Re: [PATCH v2 12/12] IB/srp: Add multichannel support
> 
...
> IMHO, this is not iSER specific issue, it is easily indicated from the
> code that a specific workload SRP will poll recv completion queue
> forever in an interrupt context.
> 
> I encountered this issue on a virtual guest in a high workload (80+
> sessions with heavy traffic on all) because qemu smp_affinity setting
> was broken (might still be, didn't check that for a while). This caused
> all completion vectors to fire interrupts to core 0 causing a high
> events contention on a single event queue (causing lockup situations
> and starvation of other CQs). Using more completion queues will enhance
> this situation.
> 
> I think running multichannel code when all MSIX vectors affinity are
> directed to a single CPU can invoke what I'm talking about.

That's not an SRP specific problem either.  If you ask just one CPU to
service interrupts and block layer completions for submissions from lots
of other CPUs, it's bound to become overloaded.

Setting rq_affinity=2 helps quite a bit for the block layer completion
work.  This patch proposed making that the default for blk-mq:
	https://lkml.org/lkml/2014/9/9/931

For SRP interrupt processing, irqbalance recently changed its default 
to ignore the affinity_hint; you now need to pass an option to honor
the hint, or provide a policy script to do so for selected irqs.  For
multi-million IOPS workloads, irqbalance takes far too long to reroute
them based on activity; you're likely to overload a CPU with 100% 
hardirq processing, creating self-detected stalls for the submitting
processes on that CPU and other problems.  Sending interrupts back 
to the submitting CPU provides self-throttling.


^ permalink raw reply	[flat|nested] 83+ messages in thread

* RE: [PATCH v2 10/12] IB/srp: Use block layer tags
  2014-10-24  6:45           ` Christoph Hellwig
       [not found]             ` <20141024064514.GA15654-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
@ 2014-11-03  7:52             ` Kashyap Desai
  2014-11-03  8:25               ` Christoph Hellwig
  1 sibling, 1 reply; 83+ messages in thread
From: Kashyap Desai @ 2014-11-03  7:52 UTC (permalink / raw)
  To: Christoph Hellwig, Elliott, Robert (Server Storage)
  Cc: Bart Van Assche, Jens Axboe, Sagi Grimberg, Sebastian Parschauer,
	Ming Lei, linux-scsi, linux-rdma, Scales, Webb, Don Brace (PMC)

> -----Original Message-----
> From: linux-scsi-owner@vger.kernel.org [mailto:linux-scsi-
> owner@vger.kernel.org] On Behalf Of Christoph Hellwig
> Sent: Friday, October 24, 2014 12:15 PM
> To: Elliott, Robert (Server Storage)
> Cc: Bart Van Assche; Jens Axboe; Sagi Grimberg; Sebastian Parschauer;
Ming
> Lei; linux-scsi@vger.kernel.org; linux-rdma; Scales, Webb; Don Brace
(PMC)
> Subject: Re: [PATCH v2 10/12] IB/srp: Use block layer tags
>
> On Fri, Oct 24, 2014 at 04:43:15AM +0000, Elliott, Robert (Server
Storage)
> wrote:
> > However, it was looking at scmd->tag, which is always 0xff (at least
> > in those early discovery commands).  scmd->request->tag looks like it
> > is the field that has the correct values.
> >
> > Also, I noticed that scmd->tag is just an 8 bit field, so it could
> > never represent a large number of tags.
>
> Yes, we need to get rid of scmd->tag.  Hannes had a patchset to get
started
> on it, and I hope either he or someone else will have time to get back
to it
> ASAP.
>
> > Just to confirm: After calling scsi_init_shared_tag_map() in non-mq
> > mode, will scmd->request->tag be based on controller-wide tag
> > allocation (never using the same value at the same time for the
> > request queues of multiple devices in that controller)?
>
> Yes.

Hi Everyone, I am doing similar code changes for megaraid_sas driver. I
found this thread more suitable to get help from you...

I also noticed that after using scsi_init_shared_tag_map() in driver, tags
are coming controller wide.
And that is a good for megaraid_sas driver to get-rid-off internal command
pool list and start using block layer tags.

As explained by Robert,
I also used below setting in slave_alloc(), so that first Inquiry command
received by driver also has valid Tag in (Non-MQ mode).
	sdev->tagged_supported = 1
Without above setting Inquiry command comes with 0xFF. If I see below code
in scsi_scan.c, it looks like tagged_supported value should be popped by
after actual value reported from Inquiry command.

        if ((sdev->scsi_level >= SCSI_2) && (inq_result[7] & 2) &&
            !(*bflags & BLIST_NOTQ))
                sdev->tagged_supported = 1;

Does it means what driver set in slave_alloc is OK to get Inquiry commands
with valid Tag  ?
What will happen if Target really does not support Tagged command queue
and driver overwrite it in slave_alloc ?

Thanks, Kashyap

> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the
> body of a message to majordomo@vger.kernel.org More majordomo info at
> http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 10/12] IB/srp: Use block layer tags
  2014-11-03  7:52             ` Kashyap Desai
@ 2014-11-03  8:25               ` Christoph Hellwig
  0 siblings, 0 replies; 83+ messages in thread
From: Christoph Hellwig @ 2014-11-03  8:25 UTC (permalink / raw)
  To: Kashyap Desai
  Cc: Christoph Hellwig, Elliott, Robert (Server Storage),
	Bart Van Assche, Jens Axboe, Sagi Grimberg, Sebastian Parschauer,
	Ming Lei, linux-scsi, linux-rdma, Scales, Webb, Don Brace (PMC)

On Mon, Nov 03, 2014 at 01:22:18PM +0530, Kashyap Desai wrote:
> I also used below setting in slave_alloc(), so that first Inquiry command
> received by driver also has valid Tag in (Non-MQ mode).
> 	sdev->tagged_supported = 1
> Without above setting Inquiry command comes with 0xFF. If I see below code
> in scsi_scan.c, it looks like tagged_supported value should be popped by
> after actual value reported from Inquiry command.
> 
>         if ((sdev->scsi_level >= SCSI_2) && (inq_result[7] & 2) &&
>             !(*bflags & BLIST_NOTQ))
>                 sdev->tagged_supported = 1;
> 
> Does it means what driver set in slave_alloc is OK to get Inquiry commands
> with valid Tag  ?
> What will happen if Target really does not support Tagged command queue
> and driver overwrite it in slave_alloc ?

I think the right thing here is to better differenciate between
assigning a software tag, and a SCSI tagged command.  That is, we should
assign a request->tag value for every command if we enable host wide
tags, independent of issuing an actual tagged command.

I'll send a series of patches for this ASAP.

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 12/12] IB/srp: Add multichannel support
  2014-11-03  1:46                                           ` Elliott, Robert (Server Storage)
@ 2014-11-04 11:46                                             ` Bart Van Assche
       [not found]                                               ` <5458BC8B.40202-HInyCGIudOg@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Bart Van Assche @ 2014-11-04 11:46 UTC (permalink / raw)
  To: Elliott, Robert (Server Storage), Sagi Grimberg, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Ming Lei,
	linux-scsi, linux-rdma

On 11/03/14 02:46, Elliott, Robert (Server Storage) wrote:
>> -----Original Message-----
>> From: Sagi Grimberg [mailto:sagig@dev.mellanox.co.il]
>> Sent: Sunday, November 02, 2014 7:03 AM
>> To: Bart Van Assche; Christoph Hellwig
>> Cc: Jens Axboe; Sagi Grimberg; Sebastian Parschauer; Elliott, Robert
>> (Server Storage); Ming Lei; linux-scsi@vger.kernel.org; linux-rdma
>> Subject: Re: [PATCH v2 12/12] IB/srp: Add multichannel support
>>
> ...
>> IMHO, this is not iSER specific issue, it is easily indicated from the
>> code that a specific workload SRP will poll recv completion queue
>> forever in an interrupt context.
>>
>> I encountered this issue on a virtual guest in a high workload (80+
>> sessions with heavy traffic on all) because qemu smp_affinity setting
>> was broken (might still be, didn't check that for a while). This caused
>> all completion vectors to fire interrupts to core 0 causing a high
>> events contention on a single event queue (causing lockup situations
>> and starvation of other CQs). Using more completion queues will enhance
>> this situation.
>>
>> I think running multichannel code when all MSIX vectors affinity are
>> directed to a single CPU can invoke what I'm talking about.
>
> That's not an SRP specific problem either.  If you ask just one CPU to
> service interrupts and block layer completions for submissions from lots
> of other CPUs, it's bound to become overloaded.
>
> Setting rq_affinity=2 helps quite a bit for the block layer completion
> work.  This patch proposed making that the default for blk-mq:
> 	https://lkml.org/lkml/2014/9/9/931
>
> For SRP interrupt processing, irqbalance recently changed its default
> to ignore the affinity_hint; you now need to pass an option to honor
> the hint, or provide a policy script to do so for selected irqs.  For
> multi-million IOPS workloads, irqbalance takes far too long to reroute
> them based on activity; you're likely to overload a CPU with 100%
> hardirq processing, creating self-detected stalls for the submitting
> processes on that CPU and other problems.  Sending interrupts back
> to the submitting CPU provides self-throttling.

Hello Sagi,

To me it seems like with Rob's reply all questions about this patch 
series have been answered. But I think Christoph is still waiting for a 
Reviewed-by tag from you for patch 12/12.

Bart.



^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 12/12] IB/srp: Add multichannel support
       [not found]                                               ` <5458BC8B.40202-HInyCGIudOg@public.gmane.org>
@ 2014-11-04 12:15                                                 ` Sagi Grimberg
       [not found]                                                   ` <5458C344.2040109-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Sagi Grimberg @ 2014-11-04 12:15 UTC (permalink / raw)
  To: Bart Van Assche, Elliott, Robert (Server Storage), Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Ming Lei,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On 11/4/2014 1:46 PM, Bart Van Assche wrote:
> On 11/03/14 02:46, Elliott, Robert (Server Storage) wrote:
>>> -----Original Message-----
>>> From: Sagi Grimberg [mailto:sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org]
>>> Sent: Sunday, November 02, 2014 7:03 AM
>>> To: Bart Van Assche; Christoph Hellwig
>>> Cc: Jens Axboe; Sagi Grimberg; Sebastian Parschauer; Elliott, Robert
>>> (Server Storage); Ming Lei; linux-scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-rdma
>>> Subject: Re: [PATCH v2 12/12] IB/srp: Add multichannel support
>>>
>> ...
>>> IMHO, this is not iSER specific issue, it is easily indicated from the
>>> code that a specific workload SRP will poll recv completion queue
>>> forever in an interrupt context.
>>>
>>> I encountered this issue on a virtual guest in a high workload (80+
>>> sessions with heavy traffic on all) because qemu smp_affinity setting
>>> was broken (might still be, didn't check that for a while). This caused
>>> all completion vectors to fire interrupts to core 0 causing a high
>>> events contention on a single event queue (causing lockup situations
>>> and starvation of other CQs). Using more completion queues will enhance
>>> this situation.
>>>
>>> I think running multichannel code when all MSIX vectors affinity are
>>> directed to a single CPU can invoke what I'm talking about.
>>
>> That's not an SRP specific problem either.  If you ask just one CPU to
>> service interrupts and block layer completions for submissions from lots
>> of other CPUs, it's bound to become overloaded.
>>
>> Setting rq_affinity=2 helps quite a bit for the block layer completion
>> work.  This patch proposed making that the default for blk-mq:
>>     https://lkml.org/lkml/2014/9/9/931
>>
>> For SRP interrupt processing, irqbalance recently changed its default
>> to ignore the affinity_hint; you now need to pass an option to honor
>> the hint, or provide a policy script to do so for selected irqs.  For
>> multi-million IOPS workloads, irqbalance takes far too long to reroute
>> them based on activity; you're likely to overload a CPU with 100%
>> hardirq processing, creating self-detected stalls for the submitting
>> processes on that CPU and other problems.  Sending interrupts back
>> to the submitting CPU provides self-throttling.
>
> Hello Sagi,
>
> To me it seems like with Rob's reply all questions about this patch
> series have been answered. But I think Christoph is still waiting for a
> Reviewed-by tag from you for patch 12/12.
>

Hey Bart & Rob,

I'm sorry but I didn't get to reply to the Rob's email yesterday.

I think that Rob and I are not talking about the same issue. In
case only a single core is servicing interrupts it is indeed expected
that it will spend 100% in hard-irq, that's acceptable since it is
pounded with completions all the time.

However, I'm referring to a condition where SRP will spend infinite
time servicing a single interrupt (while loop on ib_poll_cq that never
drains) which will lead to a hard lockup.

This *can* happen, and I do believe that with an optimized IO path
it is even more likely to.

Anyway, since I am sure you ran sufficient testing on this code (and
didn't see the issue) and I don't want to my concerns to block this
code from 3.18, and I didn't find other gating issues, you can add:

Reviewed-by: Sagi Grimberg <sagig-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* RE: [PATCH v2 12/12] IB/srp: Add multichannel support
       [not found]                                                   ` <5458C344.2040109-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
@ 2014-11-05  4:57                                                     ` Elliott, Robert (Server Storage)
       [not found]                                                       ` <94D0CD8314A33A4D9D801C0FE68B40295937104F-2m9nI20wMFwSZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
  0 siblings, 1 reply; 83+ messages in thread
From: Elliott, Robert (Server Storage) @ 2014-11-05  4:57 UTC (permalink / raw)
  To: Sagi Grimberg, Bart Van Assche, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Ming Lei,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma



> -----Original Message-----
> From: Sagi Grimberg [mailto:sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org]
> Sent: Tuesday, November 04, 2014 6:15 AM
> To: Bart Van Assche; Elliott, Robert (Server Storage); Christoph Hellwig
> Cc: Jens Axboe; Sagi Grimberg; Sebastian Parschauer; Ming Lei; linux-
> scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-rdma
> Subject: Re: [PATCH v2 12/12] IB/srp: Add multichannel support
> 
...
> I think that Rob and I are not talking about the same issue. In
> case only a single core is servicing interrupts it is indeed expected
> that it will spend 100% in hard-irq, that's acceptable since it is
> pounded with completions all the time.
> 
> However, I'm referring to a condition where SRP will spend infinite
> time servicing a single interrupt (while loop on ib_poll_cq that never
> drains) which will lead to a hard lockup.
> 
> This *can* happen, and I do believe that with an optimized IO path
> it is even more likely to.

If the IB completions/interrupts are only for IOs submitted on this
CPU, then the CQ will eventually drain, because this CPU is not 
submitting anything new while stuck in the loop.

This can become bursty, though - submit a lot of IOs, then be busy
completing all of them and not submitting more, resulting in the 
queue depth bouncing from 0 to high to 0 to high.  I've seen
that with both hpsa and mpt3sas drivers.  The fio options
iodepth_batch, iodepth_batch_complete, and iodepth_low
can amplify and reduce that effect (using libaio).

I haven't found a good way for the LLD ISRs and the block
layer completion code to decide to yield the CPU based on how
much time they are taking - that would almost qualify as
a realtime kernel feature.  If you compile with
CONFIG_IRQ_TIME_ACCOUNTING, the kernel does keep track
of that information; perhaps that could be exported so
modules can use it?

---
Rob Elliott, HP Server Storage

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

* Re: [PATCH v2 12/12] IB/srp: Add multichannel support
       [not found]                                                       ` <94D0CD8314A33A4D9D801C0FE68B40295937104F-2m9nI20wMFwSZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
@ 2014-11-05 11:22                                                         ` Sagi Grimberg
  0 siblings, 0 replies; 83+ messages in thread
From: Sagi Grimberg @ 2014-11-05 11:22 UTC (permalink / raw)
  To: Elliott, Robert (Server Storage), Bart Van Assche, Christoph Hellwig
  Cc: Jens Axboe, Sagi Grimberg, Sebastian Parschauer, Ming Lei,
	linux-scsi-u79uwXL29TY76Z2rM5mHXA, linux-rdma

On 11/5/2014 6:57 AM, Elliott, Robert (Server Storage) wrote:
>
>
>> -----Original Message-----
>> From: Sagi Grimberg [mailto:sagig-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org]
>> Sent: Tuesday, November 04, 2014 6:15 AM
>> To: Bart Van Assche; Elliott, Robert (Server Storage); Christoph Hellwig
>> Cc: Jens Axboe; Sagi Grimberg; Sebastian Parschauer; Ming Lei; linux-
>> scsi-u79uwXL29TY76Z2rM5mHXA@public.gmane.org; linux-rdma
>> Subject: Re: [PATCH v2 12/12] IB/srp: Add multichannel support
>>
> ...
>> I think that Rob and I are not talking about the same issue. In
>> case only a single core is servicing interrupts it is indeed expected
>> that it will spend 100% in hard-irq, that's acceptable since it is
>> pounded with completions all the time.
>>
>> However, I'm referring to a condition where SRP will spend infinite
>> time servicing a single interrupt (while loop on ib_poll_cq that never
>> drains) which will lead to a hard lockup.
>>
>> This *can* happen, and I do believe that with an optimized IO path
>> it is even more likely to.
>
> If the IB completions/interrupts are only for IOs submitted on this
> CPU, then the CQ will eventually drain, because this CPU is not
> submitting anything new while stuck in the loop.

They're not (or not necessarily). I'm talking about the case where the
IO completions are submitted from another CPU. This creates a cycle
where the submitter is generating completions on CPU X and the completer
is evacuating room for more submissions on CPU Y. This process can
never end while the completer is in hard-irq context.

>
> This can become bursty, though - submit a lot of IOs, then be busy
> completing all of them and not submitting more, resulting in the
> queue depth bouncing from 0 to high to 0 to high.  I've seen
> that with both hpsa and mpt3sas drivers.  The fio options
> iodepth_batch, iodepth_batch_complete, and iodepth_low
> can amplify and reduce that effect (using libaio).
>

blk-iopoll (or some other form of budgeting completions) should take
care of that.

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 83+ messages in thread

end of thread, other threads:[~2014-11-05 11:22 UTC | newest]

Thread overview: 83+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-10-07 13:01 [PATCH v2 0/12] IB/srp: Add multichannel support Bart Van Assche
2014-10-07 13:03 ` [PATCH v2 02/12] blk-mq: Add blk_mq_unique_tag() Bart Van Assche
2014-10-11 11:08   ` Christoph Hellwig
2014-10-13  9:21     ` Bart Van Assche
     [not found]       ` <543B99B2.1010307-HInyCGIudOg@public.gmane.org>
2014-10-13 10:15         ` Christoph Hellwig
2014-10-19 16:14           ` Sagi Grimberg
     [not found]   ` <5433E493.9030304-HInyCGIudOg@public.gmane.org>
2014-10-28  1:55     ` Martin K. Petersen
2014-10-07 13:04 ` [PATCH v2 04/12] scsi_tcq.h: Add support for multiple hardware queues Bart Van Assche
2014-10-19 16:12   ` Sagi Grimberg
     [not found]     ` <5443E2DF.1040605-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-10-20 12:01       ` Bart Van Assche
     [not found]         ` <5444F995.5080407-HInyCGIudOg@public.gmane.org>
2014-10-21  8:49           ` Christoph Hellwig
2014-10-21  8:59             ` Sagi Grimberg
2014-10-28  2:06   ` Martin K. Petersen
     [not found] ` <5433E43D.3010107-HInyCGIudOg@public.gmane.org>
2014-10-07 13:02   ` [PATCH v2 01/12] blk-mq: Use all available " Bart Van Assche
2014-10-07 14:37     ` Jens Axboe
     [not found]       ` <5433FA8F.3050100-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org>
2014-10-08 13:21         ` Bart Van Assche
     [not found]           ` <54353A74.7040406-HInyCGIudOg@public.gmane.org>
2014-10-11 11:11             ` Christoph Hellwig
     [not found]               ` <20141011111114.GB9593-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2014-10-13  9:45                 ` Bart Van Assche
     [not found]                   ` <543B9F47.2090204-HInyCGIudOg@public.gmane.org>
2014-10-17 13:20                     ` Christoph Hellwig
     [not found]                       ` <20141017132053.GF16538-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2014-10-17 14:11                         ` Sagi Grimberg
2014-10-07 13:03   ` [PATCH v2 03/12] scsi-mq: Add support for multiple " Bart Van Assche
     [not found]     ` <5433E4AB.8030306-HInyCGIudOg@public.gmane.org>
2014-10-19 15:54       ` Sagi Grimberg
2014-10-28  2:01       ` Martin K. Petersen
2014-10-29 12:22         ` Bart Van Assche
2014-10-29 12:27           ` Bart Van Assche
     [not found]             ` <5450DD49.6090108-HInyCGIudOg@public.gmane.org>
2014-10-30  0:53               ` Martin K. Petersen
2014-10-07 13:04   ` [PATCH v2 05/12] IB/srp: Move ib_destroy_cm_id() call into srp_free_ch_ib() Bart Van Assche
2014-10-07 13:04   ` [PATCH v2 06/12] IB/srp: Remove stale connection retry mechanism Bart Van Assche
2014-10-07 13:05   ` [PATCH v2 09/12] IB/srp: Separate target and channel variables Bart Van Assche
2014-10-19 16:48     ` Sagi Grimberg
2014-10-07 13:06   ` [PATCH v2 11/12] IB/srp: Eliminate free_reqs list Bart Van Assche
     [not found]     ` <5433E56E.6010600-HInyCGIudOg@public.gmane.org>
2014-10-17 10:59       ` Christoph Hellwig
     [not found]         ` <20141017105939.GB7819-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2014-10-19 16:59           ` Sagi Grimberg
2014-10-20 11:47           ` Bart Van Assche
2014-10-21  8:49             ` Christoph Hellwig
2014-10-07 13:05 ` [PATCH v2 07/12] IB/srp: Avoid that I/O hangs due to a cable pull during LUN scanning Bart Van Assche
2014-10-19 16:27   ` Sagi Grimberg
     [not found]     ` <5443E66F.7050901-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-10-20 12:15       ` Bart Van Assche
2014-10-21  8:50         ` Christoph Hellwig
2014-10-07 13:05 ` [PATCH v2 08/12] IB/srp: Introduce two new srp_target_port member variables Bart Van Assche
2014-10-19 16:30   ` Sagi Grimberg
2014-10-07 13:06 ` [PATCH v2 10/12] IB/srp: Use block layer tags Bart Van Assche
     [not found]   ` <5433E557.3010505-HInyCGIudOg@public.gmane.org>
2014-10-17 10:58     ` Christoph Hellwig
     [not found]       ` <20141017105858.GA7819-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2014-10-20 11:44         ` Bart Van Assche
2014-10-22 22:03     ` Elliott, Robert (Server Storage)
     [not found]       ` <94D0CD8314A33A4D9D801C0FE68B4029593212E0-wwDBVnaDRpYSZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
2014-10-23  7:16         ` Bart Van Assche
2014-10-23 17:43           ` Webb Scales
     [not found]             ` <54493E5A.7050803-VXdhtT5mjnY@public.gmane.org>
2014-10-24  6:45               ` Bart Van Assche
     [not found]                 ` <5449F571.7080308-HInyCGIudOg@public.gmane.org>
2014-10-24 15:40                   ` Webb Scales
2014-10-23  8:47       ` Christoph Hellwig
2014-10-24  4:43         ` Elliott, Robert (Server Storage)
2014-10-24  6:45           ` Christoph Hellwig
     [not found]             ` <20141024064514.GA15654-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2014-10-31 17:34               ` Hannes Reinecke
2014-11-03  7:52             ` Kashyap Desai
2014-11-03  8:25               ` Christoph Hellwig
2014-10-07 13:07 ` [PATCH v2 12/12] IB/srp: Add multichannel support Bart Van Assche
2014-10-17 11:01   ` EH action after scsi_remove_host, was: " Christoph Hellwig
2014-10-20 13:53     ` Bart Van Assche
2014-10-21  8:51       ` Christoph Hellwig
2014-10-17 11:06   ` Christoph Hellwig
     [not found]     ` <20141017110627.GD7819-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
2014-10-20 11:57       ` Bart Van Assche
2014-10-21  8:49         ` Christoph Hellwig
     [not found]   ` <5433E585.607-HInyCGIudOg@public.gmane.org>
2014-10-19 17:36     ` Sagi Grimberg
2014-10-20 12:56       ` Bart Van Assche
     [not found]         ` <54450690.709-HInyCGIudOg@public.gmane.org>
2014-10-21  9:10           ` Sagi Grimberg
     [not found]             ` <544622FE.5040906-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-10-28 18:32               ` Sagi Grimberg
     [not found]                 ` <544FE13A.60807-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-10-29 10:52                   ` Bart Van Assche
2014-10-30 14:19                     ` Sagi Grimberg
2014-10-30 14:36                       ` Bart Van Assche
     [not found]                         ` <54524D08.4040203-HInyCGIudOg@public.gmane.org>
2014-10-30 15:06                           ` Sagi Grimberg
     [not found]                             ` <545253E3.7000009-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-10-30 15:19                               ` Bart Van Assche
     [not found]                                 ` <545256E5.9010501-HInyCGIudOg@public.gmane.org>
2014-10-30 17:33                                   ` Sagi Grimberg
2014-10-31  9:19                                     ` Bart Van Assche
     [not found]                                       ` <5453541D.7040206-HInyCGIudOg@public.gmane.org>
2014-11-02 13:03                                         ` Sagi Grimberg
2014-11-03  1:46                                           ` Elliott, Robert (Server Storage)
2014-11-04 11:46                                             ` Bart Van Assche
     [not found]                                               ` <5458BC8B.40202-HInyCGIudOg@public.gmane.org>
2014-11-04 12:15                                                 ` Sagi Grimberg
     [not found]                                                   ` <5458C344.2040109-LDSdmyG8hGV8YrgS2mwiifqBs+8SCbDb@public.gmane.org>
2014-11-05  4:57                                                     ` Elliott, Robert (Server Storage)
     [not found]                                                       ` <94D0CD8314A33A4D9D801C0FE68B40295937104F-2m9nI20wMFwSZAcGdq5asR6epYMZPwEe5NbjCUgZEJk@public.gmane.org>
2014-11-05 11:22                                                         ` Sagi Grimberg
2014-10-21  9:14     ` Sagi Grimberg
2014-10-29 12:36       ` Bart Van Assche
2014-10-30 14:22         ` Sagi Grimberg
2014-10-08 13:16 ` [PATCH] blk-mq: Use all available hardware queues Bart Van Assche

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.