Linux-NVME Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH v2 0/5] nvmet-rdma/srpt: SRQ per completion vector
@ 2020-03-18 15:02 Max Gurtovoy
  2020-03-18 15:02 ` [PATCH v2 1/5] IB/core: add a simple SRQ pool per PD Max Gurtovoy
                   ` (5 more replies)
  0 siblings, 6 replies; 28+ messages in thread
From: Max Gurtovoy @ 2020-03-18 15:02 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, loberman, bvanassche, linux-rdma
  Cc: rgirase, vladimirk, shlomin, leonro, dledford, jgg, oren, kbusch,
	Max Gurtovoy, idanb

This set is a renewed version of the feature for NVMEoF/RDMA target. In
this series I've decided to implement it also for SRP target that had
similar implementatiom (SRQ per HCA) after previous requests from the
community. The logic is intended to save resource allocation (by sharing
them) and utilize the locality of completions to get the best performance
with Shared Receive Queues (SRQs). We'll create a SRQ per completion
vector (and not per device) using a new API (basic SRQ pool, added to this
patchset too) and associate each created QP/CQ/channel with an
appropriate SRQ. This will also reduce the lock contention on the single
SRQ per device (today's solution).

For NVMEoF, my testing environment included 4 initiators (CX5, CX5, CX4,
CX3) that were connected to 4 subsystems (1 ns per sub) throw 2 ports
(each initiator connected to unique subsystem backed in a different
bull_blk device) using a switch to the NVMEoF target (CX5).
I used RoCE link layer. For SRP, I used 1 server with RoCE loopback connection
(results are not mentioned below) for testing. Hopefully I'll get a tested-by
signature and feedback from Laurence and Rupesh on the SRP part during the review
process.

The below results were made a while ago using NVMEoF.

Configuration:
 - Irqbalancer stopped on each server
 - set_irq_affinity.sh on each interface
 - 2 initiators run traffic throw port 1
 - 2 initiators run traffic throw port 2
 - On initiator set register_always=N
 - Fio with 12 jobs, iodepth 128

Memory consumption calculation for recv buffers (target):
 - Multiple SRQ: SRQ_size * comp_num * ib_devs_num * inline_buffer_size
 - Single SRQ: SRQ_size * 1 * ib_devs_num * inline_buffer_size
 - MQ: RQ_size * CPU_num * ctrl_num * inline_buffer_size

Cases:
 1. Multiple SRQ with 1024 entries:
    - Mem = 1024 * 24 * 2 * 4k = 192MiB (Constant number - not depend on initiators number)
 2. Multiple SRQ with 256 entries:
    - Mem = 256 * 24 * 2 * 4k = 48MiB (Constant number - not depend on initiators number)
 3. MQ:
    - Mem = 256 * 24 * 8 * 4k = 192MiB (Mem grows for every new created ctrl)
 4. Single SRQ (current SRQ implementation):
    - Mem = 4096 * 1 * 2 * 4k = 32MiB (Constant number - not depend on initiators number)

results:

BS    1.read (target CPU)   2.read (target CPU)    3.read (target CPU)   4.read (target CPU)
---  --------------------- --------------------- --------------------- ----------------------
1k     5.88M (80%)            5.45M (72%)            6.77M (91%)          2.2M (72%)

2k     3.56M (65%)            3.45M (59%)            3.72M (64%)          2.12M (59%)

4k     1.8M (33%)             1.87M (32%)            1.88M (32%)          1.59M (34%)

BS    1.write (target CPU)   2.write (target CPU) 3.write (target CPU)   4.write (target CPU)
---  --------------------- --------------------- --------------------- ----------------------
1k     5.42M (63%)            5.14M (55%)            7.75M (82%)          2.14M (74%)

2k     4.15M (56%)            4.14M (51%)            4.16M (52%)          2.08M (73%)

4k     2.17M (28%)            2.17M (27%)            2.16M (28%)          1.62M (24%)


We can see the perf improvement between Case 2 and Case 4 (same order of resource).
We can see the benefit in resource consumption (mem and CPU) with a small perf loss
between cases 2 and 3.
There is still an open question between the perf differance for 1k between Case 1 and
Case 3, but I guess we can investigate and improve it incrementaly.

Thanks to Idan Burstein and Oren Duer for suggesting this nice feature.

Changes from v1:
 - rename srq_set to srq_pool (Leon)
 - changed srpt to use ib_alloc_cq (patch 4/5)
 - removed caching of comp_vector in ib_cq
 - minor fixes got from Leon's review

Max Gurtovoy (5):
  IB/core: add a simple SRQ pool per PD
  nvmet-rdma: add srq pointer to rdma_cmd
  nvmet-rdma: use SRQ per completion vector
  RDMA/srpt: use ib_alloc_cq instead of ib_alloc_cq_any
  RDMA/srpt: use SRQ per completion vector

 drivers/infiniband/core/Makefile      |   2 +-
 drivers/infiniband/core/srq_pool.c    |  75 +++++++++++++
 drivers/infiniband/core/verbs.c       |   3 +
 drivers/infiniband/ulp/srpt/ib_srpt.c | 187 +++++++++++++++++++++++--------
 drivers/infiniband/ulp/srpt/ib_srpt.h |  28 ++++-
 drivers/nvme/target/rdma.c            | 203 ++++++++++++++++++++++++++--------
 include/rdma/ib_verbs.h               |   4 +
 include/rdma/srq_pool.h               |  18 +++
 8 files changed, 419 insertions(+), 101 deletions(-)
 create mode 100644 drivers/infiniband/core/srq_pool.c
 create mode 100644 include/rdma/srq_pool.h

-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 1/5] IB/core: add a simple SRQ pool per PD
  2020-03-18 15:02 [PATCH v2 0/5] nvmet-rdma/srpt: SRQ per completion vector Max Gurtovoy
@ 2020-03-18 15:02 ` Max Gurtovoy
  2020-03-20  5:59   ` Sagi Grimberg
  2020-03-18 15:02 ` [PATCH v2 2/5] nvmet-rdma: add srq pointer to rdma_cmd Max Gurtovoy
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 28+ messages in thread
From: Max Gurtovoy @ 2020-03-18 15:02 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, loberman, bvanassche, linux-rdma
  Cc: rgirase, vladimirk, shlomin, leonro, dledford, jgg, oren, kbusch,
	Max Gurtovoy, idanb

ULP's can use this API to create/destroy SRQ's with the same
characteristics for implementing a logic that aimed to save resources
without significant performance penalty (e.g. create SRQ per completion
vector and use shared receive buffers for multiple controllers of the
ULP).

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/infiniband/core/Makefile   |  2 +-
 drivers/infiniband/core/srq_pool.c | 75 ++++++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/verbs.c    |  3 ++
 include/rdma/ib_verbs.h            |  4 ++
 include/rdma/srq_pool.h            | 18 +++++++++
 5 files changed, 101 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/core/srq_pool.c
 create mode 100644 include/rdma/srq_pool.h

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index d1b14887..ca377b0 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -12,7 +12,7 @@ ib_core-y :=			packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
 				roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
 				multicast.o mad.o smi.o agent.o mad_rmpp.o \
 				nldev.o restrack.o counters.o ib_core_uverbs.o \
-				trace.o
+				trace.o srq_pool.o
 
 ib_core-$(CONFIG_SECURITY_INFINIBAND) += security.o
 ib_core-$(CONFIG_CGROUP_RDMA) += cgroup.o
diff --git a/drivers/infiniband/core/srq_pool.c b/drivers/infiniband/core/srq_pool.c
new file mode 100644
index 0000000..68321f0
--- /dev/null
+++ b/drivers/infiniband/core/srq_pool.c
@@ -0,0 +1,75 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/*
+ * Copyright (c) 2020 Mellanox Technologies. All rights reserved.
+ */
+
+#include <rdma/srq_pool.h>
+
+struct ib_srq *rdma_srq_pool_get(struct ib_pd *pd)
+{
+	struct ib_srq *srq;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pd->srq_lock, flags);
+	srq = list_first_entry_or_null(&pd->srqs, struct ib_srq, pd_entry);
+	if (srq)
+		list_del(&srq->pd_entry);
+	spin_unlock_irqrestore(&pd->srq_lock, flags);
+
+	return srq;
+}
+EXPORT_SYMBOL(rdma_srq_pool_get);
+
+void rdma_srq_pool_put(struct ib_pd *pd, struct ib_srq *srq)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&pd->srq_lock, flags);
+	list_add(&srq->pd_entry, &pd->srqs);
+	spin_unlock_irqrestore(&pd->srq_lock, flags);
+}
+EXPORT_SYMBOL(rdma_srq_pool_put);
+
+int rdma_srq_pool_init(struct ib_pd *pd, int nr,
+		struct ib_srq_init_attr *srq_attr)
+{
+	struct ib_srq *srq;
+	unsigned long flags;
+	int ret, i;
+
+	for (i = 0; i < nr; i++) {
+		srq = ib_create_srq(pd, srq_attr);
+		if (IS_ERR(srq)) {
+			ret = PTR_ERR(srq);
+			goto out;
+		}
+
+		spin_lock_irqsave(&pd->srq_lock, flags);
+		list_add_tail(&srq->pd_entry, &pd->srqs);
+		spin_unlock_irqrestore(&pd->srq_lock, flags);
+	}
+
+	return 0;
+out:
+	rdma_srq_pool_destroy(pd);
+	return ret;
+}
+EXPORT_SYMBOL(rdma_srq_pool_init);
+
+void rdma_srq_pool_destroy(struct ib_pd *pd)
+{
+	struct ib_srq *srq;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pd->srq_lock, flags);
+	while (!list_empty(&pd->srqs)) {
+		srq = list_first_entry(&pd->srqs, struct ib_srq, pd_entry);
+		list_del(&srq->pd_entry);
+
+		spin_unlock_irqrestore(&pd->srq_lock, flags);
+		ib_destroy_srq(srq);
+		spin_lock_irqsave(&pd->srq_lock, flags);
+	}
+	spin_unlock_irqrestore(&pd->srq_lock, flags);
+}
+EXPORT_SYMBOL(rdma_srq_pool_destroy);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index e62c9df..0bb69d2 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -272,6 +272,8 @@ struct ib_pd *__ib_alloc_pd(struct ib_device *device, unsigned int flags,
 	pd->__internal_mr = NULL;
 	atomic_set(&pd->usecnt, 0);
 	pd->flags = flags;
+	spin_lock_init(&pd->srq_lock);
+	INIT_LIST_HEAD(&pd->srqs);
 
 	pd->res.type = RDMA_RESTRACK_PD;
 	rdma_restrack_set_task(&pd->res, caller);
@@ -340,6 +342,7 @@ void ib_dealloc_pd_user(struct ib_pd *pd, struct ib_udata *udata)
 		pd->__internal_mr = NULL;
 	}
 
+	WARN_ON_ONCE(!list_empty(&pd->srqs));
 	/* uverbs manipulates usecnt with proper locking, while the kabi
 	   requires the caller to guarantee we can't race here. */
 	WARN_ON(atomic_read(&pd->usecnt));
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 1f779fa..1dcfefb 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1517,6 +1517,9 @@ struct ib_pd {
 
 	u32			unsafe_global_rkey;
 
+	spinlock_t		srq_lock;
+	struct list_head	srqs;
+
 	/*
 	 * Implementation details of the RDMA core, don't use in drivers:
 	 */
@@ -1585,6 +1588,7 @@ struct ib_srq {
 	void		       *srq_context;
 	enum ib_srq_type	srq_type;
 	atomic_t		usecnt;
+	struct list_head	pd_entry; /* srq pool entry */
 
 	struct {
 		struct ib_cq   *cq;
diff --git a/include/rdma/srq_pool.h b/include/rdma/srq_pool.h
new file mode 100644
index 0000000..ee83896
--- /dev/null
+++ b/include/rdma/srq_pool.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR Linux-OpenIB) */
+/*
+ * Copyright (c) 2020 Mellanox Technologies. All rights reserved.
+ */
+
+#ifndef _RDMA_SRQ_POOL_H
+#define _RDMA_SRQ_POOL_H
+
+#include <rdma/ib_verbs.h>
+
+struct ib_srq *rdma_srq_pool_get(struct ib_pd *pd);
+void rdma_srq_pool_put(struct ib_pd *pd, struct ib_srq *srq);
+
+int rdma_srq_pool_init(struct ib_pd *pd, int nr,
+		struct ib_srq_init_attr *srq_attr);
+void rdma_srq_pool_destroy(struct ib_pd *pd);
+
+#endif /* _RDMA_SRQ_POOL_H */
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 2/5] nvmet-rdma: add srq pointer to rdma_cmd
  2020-03-18 15:02 [PATCH v2 0/5] nvmet-rdma/srpt: SRQ per completion vector Max Gurtovoy
  2020-03-18 15:02 ` [PATCH v2 1/5] IB/core: add a simple SRQ pool per PD Max Gurtovoy
@ 2020-03-18 15:02 ` Max Gurtovoy
  2020-03-18 23:32   ` Jason Gunthorpe
  2020-03-19  4:05   ` Bart Van Assche
  2020-03-18 15:02 ` [PATCH v2 3/5] nvmet-rdma: use SRQ per completion vector Max Gurtovoy
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 28+ messages in thread
From: Max Gurtovoy @ 2020-03-18 15:02 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, loberman, bvanassche, linux-rdma
  Cc: rgirase, vladimirk, shlomin, leonro, dledford, jgg, oren, kbusch,
	Max Gurtovoy, idanb

This is a preparetion patch for the SRQ per completion vector feature.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/nvme/target/rdma.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 37d262a..04062af 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -38,6 +38,7 @@ struct nvmet_rdma_cmd {
 	struct scatterlist	inline_sg[NVMET_RDMA_MAX_INLINE_SGE];
 	struct nvme_command     *nvme_cmd;
 	struct nvmet_rdma_queue	*queue;
+	struct ib_srq		*srq;
 };
 
 enum {
@@ -461,8 +462,8 @@ static int nvmet_rdma_post_recv(struct nvmet_rdma_device *ndev,
 		cmd->sge[0].addr, cmd->sge[0].length,
 		DMA_FROM_DEVICE);
 
-	if (ndev->srq)
-		ret = ib_post_srq_recv(ndev->srq, &cmd->wr, NULL);
+	if (cmd->srq)
+		ret = ib_post_srq_recv(cmd->srq, &cmd->wr, NULL);
 	else
 		ret = ib_post_recv(cmd->queue->cm_id->qp, &cmd->wr, NULL);
 
@@ -882,6 +883,7 @@ static int nvmet_rdma_init_srq(struct nvmet_rdma_device *ndev)
 	ndev->srq_size = srq_size;
 
 	for (i = 0; i < srq_size; i++) {
+		ndev->srq_cmds[i].srq = srq;
 		ret = nvmet_rdma_post_recv(ndev, &ndev->srq_cmds[i]);
 		if (ret)
 			goto out_free_cmds;
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 3/5] nvmet-rdma: use SRQ per completion vector
  2020-03-18 15:02 [PATCH v2 0/5] nvmet-rdma/srpt: SRQ per completion vector Max Gurtovoy
  2020-03-18 15:02 ` [PATCH v2 1/5] IB/core: add a simple SRQ pool per PD Max Gurtovoy
  2020-03-18 15:02 ` [PATCH v2 2/5] nvmet-rdma: add srq pointer to rdma_cmd Max Gurtovoy
@ 2020-03-18 15:02 ` Max Gurtovoy
  2020-03-19  4:09   ` Bart Van Assche
  2020-03-20  5:47   ` Sagi Grimberg
  2020-03-18 15:02 ` [PATCH v2 4/5] RDMA/srpt: use ib_alloc_cq instead of ib_alloc_cq_any Max Gurtovoy
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 28+ messages in thread
From: Max Gurtovoy @ 2020-03-18 15:02 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, loberman, bvanassche, linux-rdma
  Cc: rgirase, vladimirk, shlomin, leonro, dledford, jgg, oren, kbusch,
	Max Gurtovoy, idanb

In order to save resource allocation and utilize the completion
locality in a better way (compared to SRQ per device that exist today),
allocate Shared Receive Queues (SRQs) per completion vector. Associate
each created QP/CQ with an appropriate SRQ according to the queue index.
This association will reduce the lock contention in the fast path
(compared to SRQ per device solution) and increase the locality in
memory buffers. Add new module parameter for SRQ size to adjust it
according to the expected load. User should make sure the size is >= 256
to avoid lack of resources.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/nvme/target/rdma.c | 205 ++++++++++++++++++++++++++++++++++-----------
 1 file changed, 154 insertions(+), 51 deletions(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 04062af..e415128 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -18,6 +18,7 @@
 #include <asm/unaligned.h>
 
 #include <rdma/ib_verbs.h>
+#include <rdma/srq_pool.h>
 #include <rdma/rdma_cm.h>
 #include <rdma/rw.h>
 
@@ -31,6 +32,8 @@
 #define NVMET_RDMA_MAX_INLINE_SGE		4
 #define NVMET_RDMA_MAX_INLINE_DATA_SIZE		max_t(int, SZ_16K, PAGE_SIZE)
 
+struct nvmet_rdma_srq;
+
 struct nvmet_rdma_cmd {
 	struct ib_sge		sge[NVMET_RDMA_MAX_INLINE_SGE + 1];
 	struct ib_cqe		cqe;
@@ -38,7 +41,7 @@ struct nvmet_rdma_cmd {
 	struct scatterlist	inline_sg[NVMET_RDMA_MAX_INLINE_SGE];
 	struct nvme_command     *nvme_cmd;
 	struct nvmet_rdma_queue	*queue;
-	struct ib_srq		*srq;
+	struct nvmet_rdma_srq   *nsrq;
 };
 
 enum {
@@ -80,6 +83,7 @@ struct nvmet_rdma_queue {
 	struct ib_cq		*cq;
 	atomic_t		sq_wr_avail;
 	struct nvmet_rdma_device *dev;
+	struct nvmet_rdma_srq   *nsrq;
 	spinlock_t		state_lock;
 	enum nvmet_rdma_queue_state state;
 	struct nvmet_cq		nvme_cq;
@@ -97,17 +101,24 @@ struct nvmet_rdma_queue {
 
 	int			idx;
 	int			host_qid;
+	int			comp_vector;
 	int			recv_queue_size;
 	int			send_queue_size;
 
 	struct list_head	queue_list;
 };
 
+struct nvmet_rdma_srq {
+	struct ib_srq            *srq;
+	struct nvmet_rdma_cmd    *cmds;
+	struct nvmet_rdma_device *ndev;
+};
+
 struct nvmet_rdma_device {
 	struct ib_device	*device;
 	struct ib_pd		*pd;
-	struct ib_srq		*srq;
-	struct nvmet_rdma_cmd	*srq_cmds;
+	struct nvmet_rdma_srq	**srqs;
+	int			srq_count;
 	size_t			srq_size;
 	struct kref		ref;
 	struct list_head	entry;
@@ -119,6 +130,16 @@ struct nvmet_rdma_device {
 module_param_named(use_srq, nvmet_rdma_use_srq, bool, 0444);
 MODULE_PARM_DESC(use_srq, "Use shared receive queue.");
 
+static int srq_size_set(const char *val, const struct kernel_param *kp);
+static const struct kernel_param_ops srq_size_ops = {
+	.set = srq_size_set,
+	.get = param_get_int,
+};
+
+static int nvmet_rdma_srq_size = 1024;
+module_param_cb(srq_size, &srq_size_ops, &nvmet_rdma_srq_size, 0644);
+MODULE_PARM_DESC(srq_size, "set Shared Receive Queue (SRQ) size, should >= 256 (default: 1024)");
+
 static DEFINE_IDA(nvmet_rdma_queue_ida);
 static LIST_HEAD(nvmet_rdma_queue_list);
 static DEFINE_MUTEX(nvmet_rdma_queue_mutex);
@@ -139,6 +160,17 @@ static int nvmet_rdma_alloc_rsp(struct nvmet_rdma_device *ndev,
 
 static const struct nvmet_fabrics_ops nvmet_rdma_ops;
 
+static int srq_size_set(const char *val, const struct kernel_param *kp)
+{
+	int n = 0, ret;
+
+	ret = kstrtoint(val, 10, &n);
+	if (ret != 0 || n < 256)
+		return -EINVAL;
+
+	return param_set_int(val, kp);
+}
+
 static int num_pages(int len)
 {
 	return 1 + (((len - 1) & PAGE_MASK) >> PAGE_SHIFT);
@@ -462,8 +494,8 @@ static int nvmet_rdma_post_recv(struct nvmet_rdma_device *ndev,
 		cmd->sge[0].addr, cmd->sge[0].length,
 		DMA_FROM_DEVICE);
 
-	if (cmd->srq)
-		ret = ib_post_srq_recv(cmd->srq, &cmd->wr, NULL);
+	if (cmd->nsrq)
+		ret = ib_post_srq_recv(cmd->nsrq->srq, &cmd->wr, NULL);
 	else
 		ret = ib_post_recv(cmd->queue->cm_id->qp, &cmd->wr, NULL);
 
@@ -841,30 +873,83 @@ static void nvmet_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc)
 	nvmet_rdma_handle_command(queue, rsp);
 }
 
-static void nvmet_rdma_destroy_srq(struct nvmet_rdma_device *ndev)
+static void nvmet_rdma_destroy_srq(struct nvmet_rdma_srq *nsrq)
+{
+	nvmet_rdma_free_cmds(nsrq->ndev, nsrq->cmds, nsrq->ndev->srq_size,
+			     false);
+	rdma_srq_pool_put(nsrq->ndev->pd, nsrq->srq);
+
+	kfree(nsrq);
+}
+
+static void nvmet_rdma_destroy_srqs(struct nvmet_rdma_device *ndev)
 {
-	if (!ndev->srq)
+	int i;
+
+	if (!ndev->srqs)
 		return;
 
-	nvmet_rdma_free_cmds(ndev, ndev->srq_cmds, ndev->srq_size, false);
-	ib_destroy_srq(ndev->srq);
+	for (i = 0; i < ndev->srq_count; i++)
+		nvmet_rdma_destroy_srq(ndev->srqs[i]);
+
+	rdma_srq_pool_destroy(ndev->pd);
+	kfree(ndev->srqs);
+	ndev->srqs = NULL;
+	ndev->srq_count = 0;
+	ndev->srq_size = 0;
 }
 
-static int nvmet_rdma_init_srq(struct nvmet_rdma_device *ndev)
+static struct nvmet_rdma_srq *
+nvmet_rdma_init_srq(struct nvmet_rdma_device *ndev)
 {
-	struct ib_srq_init_attr srq_attr = { NULL, };
+	size_t srq_size = ndev->srq_size;
+	struct nvmet_rdma_srq *nsrq;
 	struct ib_srq *srq;
-	size_t srq_size;
 	int ret, i;
 
-	srq_size = 4095;	/* XXX: tune */
+	nsrq = kzalloc(sizeof(*nsrq), GFP_KERNEL);
+	if (!nsrq)
+		return ERR_PTR(-ENOMEM);
 
-	srq_attr.attr.max_wr = srq_size;
-	srq_attr.attr.max_sge = 1 + ndev->inline_page_count;
-	srq_attr.attr.srq_limit = 0;
-	srq_attr.srq_type = IB_SRQT_BASIC;
-	srq = ib_create_srq(ndev->pd, &srq_attr);
-	if (IS_ERR(srq)) {
+	srq = rdma_srq_pool_get(ndev->pd);
+	if (!srq) {
+		ret = -EAGAIN;
+		goto out_free_nsrq;
+	}
+
+	nsrq->cmds = nvmet_rdma_alloc_cmds(ndev, srq_size, false);
+	if (IS_ERR(nsrq->cmds)) {
+		ret = PTR_ERR(nsrq->cmds);
+		goto out_put_srq;
+	}
+
+	nsrq->srq = srq;
+	nsrq->ndev = ndev;
+
+	for (i = 0; i < srq_size; i++) {
+		nsrq->cmds[i].nsrq = nsrq;
+		ret = nvmet_rdma_post_recv(ndev, &nsrq->cmds[i]);
+		if (ret)
+			goto out_free_cmds;
+	}
+
+	return nsrq;
+
+out_free_cmds:
+	nvmet_rdma_free_cmds(ndev, nsrq->cmds, srq_size, false);
+out_put_srq:
+	rdma_srq_pool_put(ndev->pd, srq);
+out_free_nsrq:
+	kfree(nsrq);
+	return ERR_PTR(ret);
+}
+
+static int nvmet_rdma_init_srqs(struct nvmet_rdma_device *ndev)
+{
+	struct ib_srq_init_attr srq_attr = { NULL, };
+	int i, ret;
+
+	if (!ndev->device->attrs.max_srq_wr || !ndev->device->attrs.max_srq) {
 		/*
 		 * If SRQs aren't supported we just go ahead and use normal
 		 * non-shared receive queues.
@@ -873,31 +958,44 @@ static int nvmet_rdma_init_srq(struct nvmet_rdma_device *ndev)
 		return 0;
 	}
 
-	ndev->srq_cmds = nvmet_rdma_alloc_cmds(ndev, srq_size, false);
-	if (IS_ERR(ndev->srq_cmds)) {
-		ret = PTR_ERR(ndev->srq_cmds);
-		goto out_destroy_srq;
-	}
+	ndev->srq_size = min(ndev->device->attrs.max_srq_wr,
+			     nvmet_rdma_srq_size);
+	ndev->srq_count = min(ndev->device->num_comp_vectors,
+			      ndev->device->attrs.max_srq);
 
-	ndev->srq = srq;
-	ndev->srq_size = srq_size;
+	ndev->srqs = kcalloc(ndev->srq_count, sizeof(*ndev->srqs), GFP_KERNEL);
+	if (!ndev->srqs)
+		return -ENOMEM;
 
-	for (i = 0; i < srq_size; i++) {
-		ndev->srq_cmds[i].srq = srq;
-		ret = nvmet_rdma_post_recv(ndev, &ndev->srq_cmds[i]);
-		if (ret)
-			goto out_free_cmds;
+	srq_attr.attr.max_wr = ndev->srq_size;
+	srq_attr.attr.max_sge = 2;
+	srq_attr.attr.srq_limit = 0;
+	srq_attr.srq_type = IB_SRQT_BASIC;
+	ret = rdma_srq_pool_init(ndev->pd, ndev->srq_count, &srq_attr);
+	if (ret)
+		goto err_free;
+
+	for (i = 0; i < ndev->srq_count; i++) {
+		ndev->srqs[i] = nvmet_rdma_init_srq(ndev);
+		if (IS_ERR(ndev->srqs[i]))
+			goto err_srq;
 	}
 
 	return 0;
 
-out_free_cmds:
-	nvmet_rdma_free_cmds(ndev, ndev->srq_cmds, ndev->srq_size, false);
-out_destroy_srq:
-	ib_destroy_srq(srq);
+err_srq:
+	while (--i >= 0)
+		nvmet_rdma_destroy_srq(ndev->srqs[i]);
+	rdma_srq_pool_destroy(ndev->pd);
+err_free:
+	kfree(ndev->srqs);
+	ndev->srqs = NULL;
+	ndev->srq_count = 0;
+	ndev->srq_size = 0;
 	return ret;
 }
 
+
 static void nvmet_rdma_free_dev(struct kref *ref)
 {
 	struct nvmet_rdma_device *ndev =
@@ -907,7 +1005,7 @@ static void nvmet_rdma_free_dev(struct kref *ref)
 	list_del(&ndev->entry);
 	mutex_unlock(&device_list_mutex);
 
-	nvmet_rdma_destroy_srq(ndev);
+	nvmet_rdma_destroy_srqs(ndev);
 	ib_dealloc_pd(ndev->pd);
 
 	kfree(ndev);
@@ -953,7 +1051,7 @@ static void nvmet_rdma_free_dev(struct kref *ref)
 		goto out_free_dev;
 
 	if (nvmet_rdma_use_srq) {
-		ret = nvmet_rdma_init_srq(ndev);
+		ret = nvmet_rdma_init_srqs(ndev);
 		if (ret)
 			goto out_free_pd;
 	}
@@ -977,14 +1075,8 @@ static int nvmet_rdma_create_queue_ib(struct nvmet_rdma_queue *queue)
 {
 	struct ib_qp_init_attr qp_attr;
 	struct nvmet_rdma_device *ndev = queue->dev;
-	int comp_vector, nr_cqe, ret, i;
+	int nr_cqe, ret, i;
 
-	/*
-	 * Spread the io queues across completion vectors,
-	 * but still keep all admin queues on vector 0.
-	 */
-	comp_vector = !queue->host_qid ? 0 :
-		queue->idx % ndev->device->num_comp_vectors;
 
 	/*
 	 * Reserve CQ slots for RECV + RDMA_READ/RDMA_WRITE + RDMA_SEND.
@@ -992,7 +1084,7 @@ static int nvmet_rdma_create_queue_ib(struct nvmet_rdma_queue *queue)
 	nr_cqe = queue->recv_queue_size + 2 * queue->send_queue_size;
 
 	queue->cq = ib_alloc_cq(ndev->device, queue,
-			nr_cqe + 1, comp_vector,
+			nr_cqe + 1, queue->comp_vector,
 			IB_POLL_WORKQUEUE);
 	if (IS_ERR(queue->cq)) {
 		ret = PTR_ERR(queue->cq);
@@ -1014,8 +1106,8 @@ static int nvmet_rdma_create_queue_ib(struct nvmet_rdma_queue *queue)
 	qp_attr.cap.max_send_sge = max(ndev->device->attrs.max_sge_rd,
 					ndev->device->attrs.max_send_sge);
 
-	if (ndev->srq) {
-		qp_attr.srq = ndev->srq;
+	if (queue->nsrq) {
+		qp_attr.srq = queue->nsrq->srq;
 	} else {
 		/* +1 for drain */
 		qp_attr.cap.max_recv_wr = 1 + queue->recv_queue_size;
@@ -1034,7 +1126,7 @@ static int nvmet_rdma_create_queue_ib(struct nvmet_rdma_queue *queue)
 		 __func__, queue->cq->cqe, qp_attr.cap.max_send_sge,
 		 qp_attr.cap.max_send_wr, queue->cm_id);
 
-	if (!ndev->srq) {
+	if (!queue->nsrq) {
 		for (i = 0; i < queue->recv_queue_size; i++) {
 			queue->cmds[i].queue = queue;
 			ret = nvmet_rdma_post_recv(ndev, &queue->cmds[i]);
@@ -1070,7 +1162,7 @@ static void nvmet_rdma_free_queue(struct nvmet_rdma_queue *queue)
 	nvmet_sq_destroy(&queue->nvme_sq);
 
 	nvmet_rdma_destroy_queue_ib(queue);
-	if (!queue->dev->srq) {
+	if (!queue->nsrq) {
 		nvmet_rdma_free_cmds(queue->dev, queue->cmds,
 				queue->recv_queue_size,
 				!queue->host_qid);
@@ -1182,13 +1274,22 @@ static int nvmet_rdma_cm_reject(struct rdma_cm_id *cm_id,
 		goto out_destroy_sq;
 	}
 
+	/*
+	 * Spread the io queues across completion vectors,
+	 * but still keep all admin queues on vector 0.
+	 */
+	queue->comp_vector = !queue->host_qid ? 0 :
+		queue->idx % ndev->device->num_comp_vectors;
+
 	ret = nvmet_rdma_alloc_rsps(queue);
 	if (ret) {
 		ret = NVME_RDMA_CM_NO_RSC;
 		goto out_ida_remove;
 	}
 
-	if (!ndev->srq) {
+	if (ndev->srqs) {
+		queue->nsrq = ndev->srqs[queue->comp_vector % ndev->srq_count];
+	} else {
 		queue->cmds = nvmet_rdma_alloc_cmds(ndev,
 				queue->recv_queue_size,
 				!queue->host_qid);
@@ -1209,10 +1310,12 @@ static int nvmet_rdma_cm_reject(struct rdma_cm_id *cm_id,
 	return queue;
 
 out_free_cmds:
-	if (!ndev->srq) {
+	if (!queue->nsrq) {
 		nvmet_rdma_free_cmds(queue->dev, queue->cmds,
 				queue->recv_queue_size,
 				!queue->host_qid);
+	} else {
+		queue->nsrq = NULL;
 	}
 out_free_responses:
 	nvmet_rdma_free_rsps(queue);
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 4/5] RDMA/srpt: use ib_alloc_cq instead of ib_alloc_cq_any
  2020-03-18 15:02 [PATCH v2 0/5] nvmet-rdma/srpt: SRQ per completion vector Max Gurtovoy
                   ` (2 preceding siblings ...)
  2020-03-18 15:02 ` [PATCH v2 3/5] nvmet-rdma: use SRQ per completion vector Max Gurtovoy
@ 2020-03-18 15:02 ` Max Gurtovoy
  2020-03-19  4:15   ` Bart Van Assche
  2020-03-18 15:02 ` [PATCH v2 5/5] RDMA/srpt: use SRQ per completion vector Max Gurtovoy
  2020-03-19  4:02 ` [PATCH v2 0/5] nvmet-rdma/srpt: " Bart Van Assche
  5 siblings, 1 reply; 28+ messages in thread
From: Max Gurtovoy @ 2020-03-18 15:02 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, loberman, bvanassche, linux-rdma
  Cc: rgirase, vladimirk, shlomin, leonro, dledford, jgg, oren, kbusch,
	Max Gurtovoy, idanb

For spreading the completion vectors fairly, add global ida for RDMA
channeles and use ida_simple_{get,remove} API. This is a preparation
for the SRQ per core feature.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 21 +++++++++++++++++----
 drivers/infiniband/ulp/srpt/ib_srpt.h |  2 ++
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index 9855274..d0294d8 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -89,6 +89,8 @@ static int srpt_get_u64_x(char *buffer, const struct kernel_param *kp)
 		 "Using this value for ioc_guid, id_ext, and cm_listen_id instead of using the node_guid of the first HCA.");
 
 static struct ib_client srpt_client;
+/* Will be used to allocate an index for srpt_rdma_ch. */
+static DEFINE_IDA(srpt_channel_ida);
 /* Protects both rdma_cm_port and rdma_cm_id. */
 static DEFINE_MUTEX(rdma_cm_mutex);
 /* Port number RDMA/CM will bind to. */
@@ -1779,7 +1781,7 @@ static int srpt_create_ch_ib(struct srpt_rdma_ch *ch)
 	struct srpt_device *sdev = sport->sdev;
 	const struct ib_device_attr *attrs = &sdev->device->attrs;
 	int sq_size = sport->port_attrib.srp_sq_size;
-	int i, ret;
+	int i, ret, comp_vector;
 
 	WARN_ON(ch->rq_size < 1);
 
@@ -1788,9 +1790,11 @@ static int srpt_create_ch_ib(struct srpt_rdma_ch *ch)
 	if (!qp_init)
 		goto out;
 
+	 /* Spread the io channeles across completion vectors */
+	comp_vector = ch->idx % sdev->device->num_comp_vectors;
 retry:
-	ch->cq = ib_alloc_cq_any(sdev->device, ch, ch->rq_size + sq_size,
-				 IB_POLL_WORKQUEUE);
+	ch->cq = ib_alloc_cq(sdev->device, ch, ch->rq_size + sq_size,
+			     comp_vector, IB_POLL_WORKQUEUE);
 	if (IS_ERR(ch->cq)) {
 		ret = PTR_ERR(ch->cq);
 		pr_err("failed to create CQ cqe= %d ret= %d\n",
@@ -2119,6 +2123,8 @@ static void srpt_release_channel_work(struct work_struct *w)
 
 	kmem_cache_destroy(ch->req_buf_cache);
 
+	ida_simple_remove(&srpt_channel_ida, ch->idx);
+
 	kref_put(&ch->kref, srpt_free_ch);
 }
 
@@ -2215,6 +2221,10 @@ static int srpt_cm_req_recv(struct srpt_device *const sdev,
 		goto reject;
 	}
 
+	ch->idx = ida_simple_get(&srpt_channel_ida, 0, 0, GFP_KERNEL);
+	if (ch->idx < 0)
+		goto free_ch;
+
 	kref_init(&ch->kref);
 	ch->pkey = be16_to_cpu(pkey);
 	ch->nexus = nexus;
@@ -2243,7 +2253,7 @@ static int srpt_cm_req_recv(struct srpt_device *const sdev,
 	ch->rsp_buf_cache = kmem_cache_create("srpt-rsp-buf", ch->max_rsp_size,
 					      512, 0, NULL);
 	if (!ch->rsp_buf_cache)
-		goto free_ch;
+		goto free_idx;
 
 	ch->ioctx_ring = (struct srpt_send_ioctx **)
 		srpt_alloc_ioctx_ring(ch->sport->sdev, ch->rq_size,
@@ -2478,6 +2488,8 @@ static int srpt_cm_req_recv(struct srpt_device *const sdev,
 free_rsp_cache:
 	kmem_cache_destroy(ch->rsp_buf_cache);
 
+free_idx:
+	ida_simple_remove(&srpt_channel_ida, ch->idx);
 free_ch:
 	if (rdma_cm_id)
 		rdma_cm_id->context = NULL;
@@ -3916,6 +3928,7 @@ static void __exit srpt_cleanup_module(void)
 		rdma_destroy_id(rdma_cm_id);
 	ib_unregister_client(&srpt_client);
 	target_unregister_template(&srpt_template);
+	ida_destroy(&srpt_channel_ida);
 }
 
 module_init(srpt_init_module);
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.h b/drivers/infiniband/ulp/srpt/ib_srpt.h
index 2e1a698..5c40653 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.h
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.h
@@ -292,6 +292,7 @@ enum rdma_ch_state {
  * @sess:          Session information associated with this SRP channel.
  * @sess_name:     Session name.
  * @release_work:  Allows scheduling of srpt_release_channel().
+ * @idx:           channel index.
  */
 struct srpt_rdma_ch {
 	struct srpt_nexus	*nexus;
@@ -331,6 +332,7 @@ struct srpt_rdma_ch {
 	struct se_session	*sess;
 	u8			sess_name[40];
 	struct work_struct	release_work;
+	int			idx;
 };
 
 /**
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 5/5] RDMA/srpt: use SRQ per completion vector
  2020-03-18 15:02 [PATCH v2 0/5] nvmet-rdma/srpt: SRQ per completion vector Max Gurtovoy
                   ` (3 preceding siblings ...)
  2020-03-18 15:02 ` [PATCH v2 4/5] RDMA/srpt: use ib_alloc_cq instead of ib_alloc_cq_any Max Gurtovoy
@ 2020-03-18 15:02 ` Max Gurtovoy
  2020-03-19  4:20   ` Bart Van Assche
  2020-03-19  4:02 ` [PATCH v2 0/5] nvmet-rdma/srpt: " Bart Van Assche
  5 siblings, 1 reply; 28+ messages in thread
From: Max Gurtovoy @ 2020-03-18 15:02 UTC (permalink / raw)
  To: linux-nvme, sagi, hch, loberman, bvanassche, linux-rdma
  Cc: rgirase, vladimirk, shlomin, leonro, dledford, jgg, oren, kbusch,
	Max Gurtovoy, idanb

In order to save resource allocation and utilize the completion
locality in a better way (compared to SRQ per device that exist
today), allocate Shared Receive Queues (SRQs) per completion vector.
Associate each created channel with an appropriate SRQ according to the
completion vector index. This association will reduce the lock
contention in the fast path (compared to SRQ per device solution) and
increase the locality in memory buffers.

Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
---
 drivers/infiniband/ulp/srpt/ib_srpt.c | 166 +++++++++++++++++++++++++---------
 drivers/infiniband/ulp/srpt/ib_srpt.h |  26 +++++-
 2 files changed, 145 insertions(+), 47 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index d0294d8..a30890d 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -813,6 +813,31 @@ static bool srpt_test_and_set_cmd_state(struct srpt_send_ioctx *ioctx,
 }
 
 /**
+ * srpt_srq_post_recv - post an initial IB receive request for SRQ
+ * @srq: SRPT SRQ context.
+ * @ioctx: Receive I/O context pointer.
+ */
+static int srpt_srq_post_recv(struct srpt_srq *srq,
+			      struct srpt_recv_ioctx *ioctx)
+{
+	struct srpt_device *sdev = srq->sdev;
+	struct ib_sge list;
+	struct ib_recv_wr wr;
+
+	list.addr = ioctx->ioctx.dma + ioctx->ioctx.offset;
+	list.length = srp_max_req_size;
+	list.lkey = sdev->lkey;
+
+	ioctx->ioctx.cqe.done = srpt_recv_done;
+	wr.wr_cqe = &ioctx->ioctx.cqe;
+	wr.next = NULL;
+	wr.sg_list = &list;
+	wr.num_sge = 1;
+
+	return ib_post_srq_recv(srq->ibsrq, &wr, NULL);
+}
+
+/**
  * srpt_post_recv - post an IB receive request
  * @sdev: SRPT HCA pointer.
  * @ch: SRPT RDMA channel.
@@ -836,7 +861,7 @@ static int srpt_post_recv(struct srpt_device *sdev, struct srpt_rdma_ch *ch,
 	wr.num_sge = 1;
 
 	if (sdev->use_srq)
-		return ib_post_srq_recv(sdev->srq, &wr, NULL);
+		return ib_post_srq_recv(ch->srq->ibsrq, &wr, NULL);
 	else
 		return ib_post_recv(ch->qp, &wr, NULL);
 }
@@ -1824,7 +1849,8 @@ static int srpt_create_ch_ib(struct srpt_rdma_ch *ch)
 					SRPT_MAX_SG_PER_WQE);
 	qp_init->port_num = ch->sport->port;
 	if (sdev->use_srq) {
-		qp_init->srq = sdev->srq;
+		ch->srq = sdev->srqs[comp_vector % sdev->srq_count];
+		qp_init->srq = ch->srq->ibsrq;
 	} else {
 		qp_init->cap.max_recv_wr = ch->rq_size;
 		qp_init->cap.max_recv_sge = min(attrs->max_recv_sge,
@@ -1882,6 +1908,8 @@ static int srpt_create_ch_ib(struct srpt_rdma_ch *ch)
 
 static void srpt_destroy_ch_ib(struct srpt_rdma_ch *ch)
 {
+	if (ch->srq)
+		ch->srq = NULL;
 	ib_destroy_qp(ch->qp);
 	ib_free_cq(ch->cq);
 }
@@ -3030,20 +3058,73 @@ static struct se_wwn *srpt_lookup_wwn(const char *name)
 	return wwn;
 }
 
-static void srpt_free_srq(struct srpt_device *sdev)
+static void srpt_free_srq(struct srpt_srq *srq)
 {
-	if (!sdev->srq)
-		return;
 
-	ib_destroy_srq(sdev->srq);
-	srpt_free_ioctx_ring((struct srpt_ioctx **)sdev->ioctx_ring, sdev,
-			     sdev->srq_size, sdev->req_buf_cache,
+	srpt_free_ioctx_ring((struct srpt_ioctx **)srq->ioctx_ring, srq->sdev,
+			     srq->sdev->srq_size, srq->sdev->req_buf_cache,
 			     DMA_FROM_DEVICE);
+	rdma_srq_pool_put(srq->sdev->pd, srq->ibsrq);
+	kfree(srq);
+
+}
+
+static void srpt_free_srqs(struct srpt_device *sdev)
+{
+	int i;
+
+	if (!sdev->srqs)
+		return;
+
+	for (i = 0; i < sdev->srq_count; i++)
+		srpt_free_srq(sdev->srqs[i]);
 	kmem_cache_destroy(sdev->req_buf_cache);
-	sdev->srq = NULL;
+	rdma_srq_pool_destroy(sdev->pd);
+	kfree(sdev->srqs);
+	sdev->srqs = NULL;
 }
 
-static int srpt_alloc_srq(struct srpt_device *sdev)
+static struct srpt_srq *srpt_alloc_srq(struct srpt_device *sdev)
+{
+	struct srpt_srq	*srq;
+	int i, ret;
+
+	srq = kzalloc(sizeof(*srq), GFP_KERNEL);
+	if (!srq)
+		return ERR_PTR(-ENOMEM);
+
+	srq->ibsrq = rdma_srq_pool_get(sdev->pd);
+	if (!srq->ibsrq) {
+		ret = -EAGAIN;
+		goto free_srq;
+	}
+
+	srq->ioctx_ring = (struct srpt_recv_ioctx **)
+		srpt_alloc_ioctx_ring(sdev, sdev->srq_size,
+				      sizeof(*srq->ioctx_ring[0]),
+				      sdev->req_buf_cache, 0, DMA_FROM_DEVICE);
+	if (!srq->ioctx_ring) {
+		ret = -ENOMEM;
+		goto put_srq;
+	}
+
+	srq->sdev = sdev;
+
+	for (i = 0; i < sdev->srq_size; ++i) {
+		INIT_LIST_HEAD(&srq->ioctx_ring[i]->wait_list);
+		srpt_srq_post_recv(srq, srq->ioctx_ring[i]);
+	}
+
+	return srq;
+
+put_srq:
+	rdma_srq_pool_put(sdev->pd, srq->ibsrq);
+free_srq:
+	kfree(srq);
+	return ERR_PTR(ret);
+}
+
+static int srpt_alloc_srqs(struct srpt_device *sdev)
 {
 	struct ib_srq_init_attr srq_attr = {
 		.event_handler = srpt_srq_event,
@@ -3053,46 +3134,45 @@ static int srpt_alloc_srq(struct srpt_device *sdev)
 		.srq_type = IB_SRQT_BASIC,
 	};
 	struct ib_device *device = sdev->device;
-	struct ib_srq *srq;
-	int i;
+	int i, j, ret;
 
-	WARN_ON_ONCE(sdev->srq);
-	srq = ib_create_srq(sdev->pd, &srq_attr);
-	if (IS_ERR(srq)) {
-		pr_debug("ib_create_srq() failed: %ld\n", PTR_ERR(srq));
-		return PTR_ERR(srq);
-	}
+	WARN_ON_ONCE(sdev->srqs);
+	sdev->srqs = kcalloc(sdev->srq_count, sizeof(*sdev->srqs), GFP_KERNEL);
+	if (!sdev->srqs)
+		return -ENOMEM;
+
+	pr_debug("create SRQ set #wr= %d max_allow=%d dev= %s\n",
+		 sdev->srq_size, sdev->device->attrs.max_srq_wr,
+		 dev_name(&device->dev));
 
-	pr_debug("create SRQ #wr= %d max_allow=%d dev= %s\n", sdev->srq_size,
-		 sdev->device->attrs.max_srq_wr, dev_name(&device->dev));
+	ret = rdma_srq_pool_init(sdev->pd, sdev->srq_count, &srq_attr);
+	if (ret)
+		goto out_free;
 
 	sdev->req_buf_cache = kmem_cache_create("srpt-srq-req-buf",
 						srp_max_req_size, 0, 0, NULL);
 	if (!sdev->req_buf_cache)
-		goto free_srq;
+		goto out_free_pool;
 
-	sdev->ioctx_ring = (struct srpt_recv_ioctx **)
-		srpt_alloc_ioctx_ring(sdev, sdev->srq_size,
-				      sizeof(*sdev->ioctx_ring[0]),
-				      sdev->req_buf_cache, 0, DMA_FROM_DEVICE);
-	if (!sdev->ioctx_ring)
-		goto free_cache;
+	for (i = 0; i < sdev->srq_count; i++) {
+		sdev->srqs[i] = srpt_alloc_srq(sdev);
+		if (IS_ERR(sdev->srqs[i]))
+			goto free_srq;
+	}
 
 	sdev->use_srq = true;
-	sdev->srq = srq;
-
-	for (i = 0; i < sdev->srq_size; ++i) {
-		INIT_LIST_HEAD(&sdev->ioctx_ring[i]->wait_list);
-		srpt_post_recv(sdev, NULL, sdev->ioctx_ring[i]);
-	}
 
 	return 0;
 
-free_cache:
-	kmem_cache_destroy(sdev->req_buf_cache);
-
 free_srq:
-	ib_destroy_srq(srq);
+	for (j = 0; j < i; j++)
+		srpt_free_srq(sdev->srqs[j]);
+	kmem_cache_destroy(sdev->req_buf_cache);
+out_free_pool:
+	rdma_srq_pool_destroy(sdev->pd);
+out_free:
+	kfree(sdev->srqs);
+	sdev->srqs = NULL;
 	return -ENOMEM;
 }
 
@@ -3102,10 +3182,10 @@ static int srpt_use_srq(struct srpt_device *sdev, bool use_srq)
 	int ret = 0;
 
 	if (!use_srq) {
-		srpt_free_srq(sdev);
+		srpt_free_srqs(sdev);
 		sdev->use_srq = false;
-	} else if (use_srq && !sdev->srq) {
-		ret = srpt_alloc_srq(sdev);
+	} else if (use_srq && !sdev->srqs) {
+		ret = srpt_alloc_srqs(sdev);
 	}
 	pr_debug("%s(%s): use_srq = %d; ret = %d\n", __func__,
 		 dev_name(&device->dev), sdev->use_srq, ret);
@@ -3139,6 +3219,8 @@ static void srpt_add_one(struct ib_device *device)
 	sdev->lkey = sdev->pd->local_dma_lkey;
 
 	sdev->srq_size = min(srpt_srq_size, sdev->device->attrs.max_srq_wr);
+	sdev->srq_count = min(sdev->device->num_comp_vectors,
+			      sdev->device->attrs.max_srq);
 
 	srpt_use_srq(sdev, sdev->port[0].port_attrib.use_srq);
 
@@ -3216,7 +3298,7 @@ static void srpt_add_one(struct ib_device *device)
 	if (sdev->cm_id)
 		ib_destroy_cm_id(sdev->cm_id);
 err_ring:
-	srpt_free_srq(sdev);
+	srpt_free_srqs(sdev);
 	ib_dealloc_pd(sdev->pd);
 free_dev:
 	kfree(sdev);
@@ -3267,7 +3349,7 @@ static void srpt_remove_one(struct ib_device *device, void *client_data)
 	for (i = 0; i < sdev->device->phys_port_cnt; i++)
 		srpt_release_sport(&sdev->port[i]);
 
-	srpt_free_srq(sdev);
+	srpt_free_srqs(sdev);
 
 	ib_dealloc_pd(sdev->pd);
 
diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.h b/drivers/infiniband/ulp/srpt/ib_srpt.h
index 5c40653..cad8635 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.h
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.h
@@ -42,6 +42,7 @@
 #include <rdma/ib_verbs.h>
 #include <rdma/ib_sa.h>
 #include <rdma/ib_cm.h>
+#include <rdma/srq_pool.h>
 #include <rdma/rdma_cm.h>
 #include <rdma/rw.h>
 
@@ -56,6 +57,7 @@
 #define SRP_SERVICE_NAME_PREFIX		"SRP.T10:"
 
 struct srpt_nexus;
+struct srpt_srq;
 
 enum {
 	/*
@@ -255,6 +257,7 @@ enum rdma_ch_state {
 /**
  * struct srpt_rdma_ch - RDMA channel
  * @nexus:         I_T nexus this channel is associated with.
+ * @srq:           SRQ that this channel is associated with (if use_srq=True).
  * @qp:            IB queue pair used for communicating over this channel.
  * @ib_cm:	   See below.
  * @ib_cm.cm_id:   IB CM ID associated with the channel.
@@ -296,6 +299,7 @@ enum rdma_ch_state {
  */
 struct srpt_rdma_ch {
 	struct srpt_nexus	*nexus;
+	struct srpt_srq		*srq;
 	struct ib_qp		*qp;
 	union {
 		struct {
@@ -434,17 +438,29 @@ struct srpt_port {
 };
 
 /**
+ * struct srpt_srq - SRQ (shared receive queue) context for SRPT
+ * @ibsrq:         verbs SRQ pointer.
+ * @ioctx_ring:    Per SRQ ring.
+ * @sdev:          backpointer to the HCA information.
+ */
+struct srpt_srq {
+	struct ib_srq		*ibsrq;
+	struct srpt_recv_ioctx	**ioctx_ring;
+	struct srpt_device	*sdev;
+};
+
+/**
  * struct srpt_device - information associated by SRPT with a single HCA
  * @device:        Backpointer to the struct ib_device managed by the IB core.
  * @pd:            IB protection domain.
  * @lkey:          L_Key (local key) with write access to all local memory.
- * @srq:           Per-HCA SRQ (shared receive queue).
  * @cm_id:         Connection identifier.
- * @srq_size:      SRQ size.
+ * @srqs:          SRQ's array.
+ * @srq_count:     SRQs array size.
+ * @srq_size:      SRQ size for each in SRQ the array.
  * @sdev_mutex:	   Serializes use_srq changes.
  * @use_srq:       Whether or not to use SRQ.
  * @req_buf_cache: kmem_cache for @ioctx_ring buffers.
- * @ioctx_ring:    Per-HCA SRQ.
  * @event_handler: Per-HCA asynchronous IB event handler.
  * @list:          Node in srpt_dev_list.
  * @port:          Information about the ports owned by this HCA.
@@ -453,13 +469,13 @@ struct srpt_device {
 	struct ib_device	*device;
 	struct ib_pd		*pd;
 	u32			lkey;
-	struct ib_srq		*srq;
 	struct ib_cm_id		*cm_id;
+	struct srpt_srq		**srqs;
+	int			srq_count;
 	int			srq_size;
 	struct mutex		sdev_mutex;
 	bool			use_srq;
 	struct kmem_cache	*req_buf_cache;
-	struct srpt_recv_ioctx	**ioctx_ring;
 	struct ib_event_handler	event_handler;
 	struct list_head	list;
 	struct srpt_port        port[];
-- 
1.8.3.1


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/5] nvmet-rdma: add srq pointer to rdma_cmd
  2020-03-18 15:02 ` [PATCH v2 2/5] nvmet-rdma: add srq pointer to rdma_cmd Max Gurtovoy
@ 2020-03-18 23:32   ` Jason Gunthorpe
  2020-03-19  8:48     ` Max Gurtovoy
  2020-03-19  4:05   ` Bart Van Assche
  1 sibling, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2020-03-18 23:32 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: loberman, bvanassche, vladimirk, idanb, linux-rdma, shlomin,
	linux-nvme, leonro, dledford, oren, kbusch, rgirase, hch, sagi

On Wed, Mar 18, 2020 at 05:02:54PM +0200, Max Gurtovoy wrote:
> This is a preparetion patch for the SRQ per completion vector feature.
> 
> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> ---
>  drivers/nvme/target/rdma.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)

Max, how are you sending these emails, and why don't they thread
properly on patchworks:

https://patchwork.kernel.org/project/linux-rdma/list/?series=258271

This patch is missing from the series

v1 had the same issue

Very strange. Can you fix it?

Jason

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 0/5] nvmet-rdma/srpt: SRQ per completion vector
  2020-03-18 15:02 [PATCH v2 0/5] nvmet-rdma/srpt: SRQ per completion vector Max Gurtovoy
                   ` (4 preceding siblings ...)
  2020-03-18 15:02 ` [PATCH v2 5/5] RDMA/srpt: use SRQ per completion vector Max Gurtovoy
@ 2020-03-19  4:02 ` " Bart Van Assche
  5 siblings, 0 replies; 28+ messages in thread
From: Bart Van Assche @ 2020-03-19  4:02 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, sagi, hch, loberman, linux-rdma
  Cc: rgirase, vladimirk, shlomin, leonro, dledford, jgg, oren, kbusch, idanb

On 2020-03-18 08:02, Max Gurtovoy wrote:
> I used RoCE link layer. For SRP, I used 1 server with RoCE loopback connection
> (results are not mentioned below) for testing. Hopefully I'll get a tested-by
> signature and feedback from Laurence and Rupesh on the SRP part during the review
> process.

Hi Max,

The MAD code in ib_srpt is not triggered when using RoCE. Please also
test SRP over IB.

Additionally, how does this patch series affect SRP performance?

Thanks,

Bart.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/5] nvmet-rdma: add srq pointer to rdma_cmd
  2020-03-18 15:02 ` [PATCH v2 2/5] nvmet-rdma: add srq pointer to rdma_cmd Max Gurtovoy
  2020-03-18 23:32   ` Jason Gunthorpe
@ 2020-03-19  4:05   ` Bart Van Assche
  1 sibling, 0 replies; 28+ messages in thread
From: Bart Van Assche @ 2020-03-19  4:05 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, sagi, hch, loberman, linux-rdma
  Cc: rgirase, vladimirk, leonro, shlomin, dledford, jgg, oren, kbusch, idanb

On 2020-03-18 08:02, Max Gurtovoy wrote:
> This is a preparetion patch for the SRQ per completion vector feature.
            ^^^^^^^^^^^
            preparation?

Thanks,

Bart.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/5] nvmet-rdma: use SRQ per completion vector
  2020-03-18 15:02 ` [PATCH v2 3/5] nvmet-rdma: use SRQ per completion vector Max Gurtovoy
@ 2020-03-19  4:09   ` Bart Van Assche
  2020-03-19  9:15     ` Max Gurtovoy
  2020-03-20  5:47   ` Sagi Grimberg
  1 sibling, 1 reply; 28+ messages in thread
From: Bart Van Assche @ 2020-03-19  4:09 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, sagi, hch, loberman, linux-rdma
  Cc: rgirase, vladimirk, shlomin, leonro, dledford, jgg, oren, kbusch, idanb

On 2020-03-18 08:02, Max Gurtovoy wrote:
> In order to save resource allocation and utilize the completion
                   ^^^^^^^^^^^^^^^^^^^
                   resources?

> +static int nvmet_rdma_srq_size = 1024;
> +module_param_cb(srq_size, &srq_size_ops, &nvmet_rdma_srq_size, 0644);
> +MODULE_PARM_DESC(srq_size, "set Shared Receive Queue (SRQ) size, should >= 256 (default: 1024)");

Is an SRQ overflow fatal? Isn't the SRQ size something that should be
computed by the nvmet_rdma driver such that SRQ overflows do not happen?

Thanks,

Bart.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 4/5] RDMA/srpt: use ib_alloc_cq instead of ib_alloc_cq_any
  2020-03-18 15:02 ` [PATCH v2 4/5] RDMA/srpt: use ib_alloc_cq instead of ib_alloc_cq_any Max Gurtovoy
@ 2020-03-19  4:15   ` Bart Van Assche
  0 siblings, 0 replies; 28+ messages in thread
From: Bart Van Assche @ 2020-03-19  4:15 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, sagi, hch, loberman, linux-rdma
  Cc: rgirase, vladimirk, shlomin, leonro, dledford, jgg, oren, kbusch, idanb

On 2020-03-18 08:02, Max Gurtovoy wrote:
> For spreading the completion vectors fairly, add global ida for RDMA
> channeles and use ida_simple_{get,remove} API. This is a preparation
  ^^^^^^^^^
  channels?

> +/* Will be used to allocate an index for srpt_rdma_ch. */
> +static DEFINE_IDA(srpt_channel_ida);

I don't think that we need a global data structure for spreading work
across completion vectors. How about assigning 0 to ch->idx if
SRP_MTCH_ACTION is not set during login and assigning 1 .. n to ch->idx
for all other channels for which SRP_MTCH_ACTION is set during login?

> +	 /* Spread the io channeles across completion vectors */
                       ^^^^^^^^^^^^
                       RDMA channels?

Thanks,

Bart.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 5/5] RDMA/srpt: use SRQ per completion vector
  2020-03-18 15:02 ` [PATCH v2 5/5] RDMA/srpt: use SRQ per completion vector Max Gurtovoy
@ 2020-03-19  4:20   ` Bart Van Assche
  0 siblings, 0 replies; 28+ messages in thread
From: Bart Van Assche @ 2020-03-19  4:20 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, sagi, hch, loberman, linux-rdma
  Cc: rgirase, vladimirk, shlomin, leonro, dledford, jgg, oren, kbusch, idanb

On 2020-03-18 08:02, Max Gurtovoy wrote:
> +	for (i = 0; i < sdev->srq_size; ++i) {
> +		INIT_LIST_HEAD(&srq->ioctx_ring[i]->wait_list);
> +		srpt_srq_post_recv(srq, srq->ioctx_ring[i]);
> +	}

Please check the srpt_srq_post_recv() return value.

Thanks,

Bart.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/5] nvmet-rdma: add srq pointer to rdma_cmd
  2020-03-18 23:32   ` Jason Gunthorpe
@ 2020-03-19  8:48     ` Max Gurtovoy
  2020-03-19  9:14       ` Leon Romanovsky
  2020-03-19 11:54       ` Jason Gunthorpe
  0 siblings, 2 replies; 28+ messages in thread
From: Max Gurtovoy @ 2020-03-19  8:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: loberman, bvanassche, vladimirk, idanb, linux-rdma, shlomin,
	linux-nvme, leonro, dledford, oren, kbusch, rgirase, hch, sagi


On 3/19/2020 1:32 AM, Jason Gunthorpe wrote:
> On Wed, Mar 18, 2020 at 05:02:54PM +0200, Max Gurtovoy wrote:
>> This is a preparetion patch for the SRQ per completion vector feature.
>>
>> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>> ---
>>   drivers/nvme/target/rdma.c | 6 ++++--
>>   1 file changed, 4 insertions(+), 2 deletions(-)
> Max, how are you sending these emails, and why don't they thread
> properly on patchworks:
>
> https://patchwork.kernel.org/project/linux-rdma/list/?series=258271
>
> This patch is missing from the series
>
> v1 had the same issue
>
> Very strange. Can you fix it?

I'm using "git send-email".

Should I use some special flags or CC another email for it ?

How do you suggest sending patches so we'll see it on patchworks ?

>
> Jason

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/5] nvmet-rdma: add srq pointer to rdma_cmd
  2020-03-19  8:48     ` Max Gurtovoy
@ 2020-03-19  9:14       ` Leon Romanovsky
  2020-03-19 10:55         ` Max Gurtovoy
  2020-03-19 11:54       ` Jason Gunthorpe
  1 sibling, 1 reply; 28+ messages in thread
From: Leon Romanovsky @ 2020-03-19  9:14 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: loberman, bvanassche, vladimirk, shlomin, linux-rdma, linux-nvme,
	idanb, dledford, Jason Gunthorpe, oren, kbusch, rgirase, hch,
	sagi

On Thu, Mar 19, 2020 at 10:48:22AM +0200, Max Gurtovoy wrote:
>
> On 3/19/2020 1:32 AM, Jason Gunthorpe wrote:
> > On Wed, Mar 18, 2020 at 05:02:54PM +0200, Max Gurtovoy wrote:
> > > This is a preparetion patch for the SRQ per completion vector feature.
> > >
> > > Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> > > ---
> > >   drivers/nvme/target/rdma.c | 6 ++++--
> > >   1 file changed, 4 insertions(+), 2 deletions(-)
> > Max, how are you sending these emails, and why don't they thread
> > properly on patchworks:
> >
> > https://patchwork.kernel.org/project/linux-rdma/list/?series=258271
> >
> > This patch is missing from the series
> >
> > v1 had the same issue
> >
> > Very strange. Can you fix it?
>
> I'm using "git send-email".
>
> Should I use some special flags or CC another email for it ?
>
> How do you suggest sending patches so we'll see it on patchworks ?

It can be related to your SMTP relay, I see that you are sending
patches through labmailer, and it is known as unreliable. Use the
same SMTP server as you are using to send emails.

Thanks

>
> >
> > Jason

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/5] nvmet-rdma: use SRQ per completion vector
  2020-03-19  4:09   ` Bart Van Assche
@ 2020-03-19  9:15     ` Max Gurtovoy
  2020-03-19 11:56       ` Jason Gunthorpe
  0 siblings, 1 reply; 28+ messages in thread
From: Max Gurtovoy @ 2020-03-19  9:15 UTC (permalink / raw)
  To: Bart Van Assche, linux-nvme, sagi, hch, loberman, linux-rdma
  Cc: rgirase, vladimirk, shlomin, leonro, dledford, jgg, oren, kbusch, idanb


On 3/19/2020 6:09 AM, Bart Van Assche wrote:
> On 2020-03-18 08:02, Max Gurtovoy wrote:
>> In order to save resource allocation and utilize the completion
>                     ^^^^^^^^^^^^^^^^^^^
>                     resources?

thanks.


>
>> +static int nvmet_rdma_srq_size = 1024;
>> +module_param_cb(srq_size, &srq_size_ops, &nvmet_rdma_srq_size, 0644);
>> +MODULE_PARM_DESC(srq_size, "set Shared Receive Queue (SRQ) size, should >= 256 (default: 1024)");
> Is an SRQ overflow fatal? Isn't the SRQ size something that should be
> computed by the nvmet_rdma driver such that SRQ overflows do not happen?

I've added the following code to make sure that the size is not greater 
than device capability:

+ndev->srq_size = min(ndev->device->attrs.max_srq_wr,
+                            nvmet_rdma_srq_size);

In case the SRQ doesn't have enough credits it will send rnr to the 
initiator and the initiator will retry later on.

We might need to change the min_rnr_timer during the connection 
establishment, but still didn't see the need for it.

-Max

>
> Thanks,
>
> Bart.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/5] nvmet-rdma: add srq pointer to rdma_cmd
  2020-03-19  9:14       ` Leon Romanovsky
@ 2020-03-19 10:55         ` Max Gurtovoy
  0 siblings, 0 replies; 28+ messages in thread
From: Max Gurtovoy @ 2020-03-19 10:55 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: loberman, bvanassche, vladimirk, shlomin, linux-rdma, linux-nvme,
	idanb, dledford, Jason Gunthorpe, oren, kbusch, rgirase, hch,
	sagi


On 3/19/2020 11:14 AM, Leon Romanovsky wrote:
> On Thu, Mar 19, 2020 at 10:48:22AM +0200, Max Gurtovoy wrote:
>> On 3/19/2020 1:32 AM, Jason Gunthorpe wrote:
>>> On Wed, Mar 18, 2020 at 05:02:54PM +0200, Max Gurtovoy wrote:
>>>> This is a preparetion patch for the SRQ per completion vector feature.
>>>>
>>>> Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
>>>> ---
>>>>    drivers/nvme/target/rdma.c | 6 ++++--
>>>>    1 file changed, 4 insertions(+), 2 deletions(-)
>>> Max, how are you sending these emails, and why don't they thread
>>> properly on patchworks:
>>>
>>> https://patchwork.kernel.org/project/linux-rdma/list/?series=258271
>>>
>>> This patch is missing from the series
>>>
>>> v1 had the same issue
>>>
>>> Very strange. Can you fix it?
>> I'm using "git send-email".
>>
>> Should I use some special flags or CC another email for it ?
>>
>> How do you suggest sending patches so we'll see it on patchworks ?
> It can be related to your SMTP relay, I see that you are sending
> patches through labmailer, and it is known as unreliable. Use the
> same SMTP server as you are using to send emails.

I'll use it in V3.

Thanks.


>
> Thanks
>
>>> Jason

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/5] nvmet-rdma: add srq pointer to rdma_cmd
  2020-03-19  8:48     ` Max Gurtovoy
  2020-03-19  9:14       ` Leon Romanovsky
@ 2020-03-19 11:54       ` Jason Gunthorpe
  2020-03-19 14:08         ` Konstantin Ryabitsev
  2020-03-19 21:58         ` Konstantin Ryabitsev
  1 sibling, 2 replies; 28+ messages in thread
From: Jason Gunthorpe @ 2020-03-19 11:54 UTC (permalink / raw)
  To: Max Gurtovoy, Konstantin Ryabitsev
  Cc: loberman, bvanassche, vladimirk, idanb, linux-rdma, shlomin,
	linux-nvme, leonro, dledford, oren, kbusch, rgirase, hch, sagi

On Thu, Mar 19, 2020 at 10:48:22AM +0200, Max Gurtovoy wrote:
> 
> On 3/19/2020 1:32 AM, Jason Gunthorpe wrote:
> > On Wed, Mar 18, 2020 at 05:02:54PM +0200, Max Gurtovoy wrote:
> > > This is a preparetion patch for the SRQ per completion vector feature.
> > > 
> > > Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
> > >   drivers/nvme/target/rdma.c | 6 ++++--
> > >   1 file changed, 4 insertions(+), 2 deletions(-)
> > Max, how are you sending these emails, and why don't they thread
> > properly on patchworks:
> > 
> > https://patchwork.kernel.org/project/linux-rdma/list/?series=258271
> > 
> > This patch is missing from the series
> > 
> > v1 had the same issue
> > 
> > Very strange. Can you fix it?
> 
> I'm using "git send-email".
> 
> Should I use some special flags or CC another email for it ?
> 
> How do you suggest sending patches so we'll see it on patchworks ?

I looked at these mails and they seem properly threaded/etc.

I've added Konstantin, perhaps he knows why patchworks is acting
weird here?

Jason

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/5] nvmet-rdma: use SRQ per completion vector
  2020-03-19  9:15     ` Max Gurtovoy
@ 2020-03-19 11:56       ` Jason Gunthorpe
  2020-03-19 12:48         ` Max Gurtovoy
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2020-03-19 11:56 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: loberman, Bart Van Assche, vladimirk, idanb, linux-rdma, shlomin,
	linux-nvme, leonro, dledford, oren, kbusch, rgirase, hch, sagi

On Thu, Mar 19, 2020 at 11:15:50AM +0200, Max Gurtovoy wrote:
> 
> On 3/19/2020 6:09 AM, Bart Van Assche wrote:
> > On 2020-03-18 08:02, Max Gurtovoy wrote:
> > > In order to save resource allocation and utilize the completion
> >                     ^^^^^^^^^^^^^^^^^^^
> >                     resources?
> 
> thanks.
> 
> 
> > 
> > > +static int nvmet_rdma_srq_size = 1024;
> > > +module_param_cb(srq_size, &srq_size_ops, &nvmet_rdma_srq_size, 0644);
> > > +MODULE_PARM_DESC(srq_size, "set Shared Receive Queue (SRQ) size, should >= 256 (default: 1024)");
> > Is an SRQ overflow fatal? Isn't the SRQ size something that should be
> > computed by the nvmet_rdma driver such that SRQ overflows do not happen?
> 
> I've added the following code to make sure that the size is not greater than
> device capability:
> 
> +ndev->srq_size = min(ndev->device->attrs.max_srq_wr,
> +                            nvmet_rdma_srq_size);
> 
> In case the SRQ doesn't have enough credits it will send rnr to the
> initiator and the initiator will retry later on.

This is a pretty big change, in bad cases we could significantly
overflow the srq space available...

A big part of most verbs protocols to ensure that the RQ does not
overflow.

Are we sure it is OK? With all initiator/targets out there?

Jason

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/5] nvmet-rdma: use SRQ per completion vector
  2020-03-19 11:56       ` Jason Gunthorpe
@ 2020-03-19 12:48         ` Max Gurtovoy
  2020-03-19 13:53           ` Jason Gunthorpe
  0 siblings, 1 reply; 28+ messages in thread
From: Max Gurtovoy @ 2020-03-19 12:48 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: loberman, Bart Van Assche, vladimirk, idanb, linux-rdma, shlomin,
	linux-nvme, leonro, dledford, oren, kbusch, rgirase, hch, sagi


On 3/19/2020 1:56 PM, Jason Gunthorpe wrote:
> On Thu, Mar 19, 2020 at 11:15:50AM +0200, Max Gurtovoy wrote:
>> On 3/19/2020 6:09 AM, Bart Van Assche wrote:
>>> On 2020-03-18 08:02, Max Gurtovoy wrote:
>>>> In order to save resource allocation and utilize the completion
>>>                      ^^^^^^^^^^^^^^^^^^^
>>>                      resources?
>> thanks.
>>
>>
>>>> +static int nvmet_rdma_srq_size = 1024;
>>>> +module_param_cb(srq_size, &srq_size_ops, &nvmet_rdma_srq_size, 0644);
>>>> +MODULE_PARM_DESC(srq_size, "set Shared Receive Queue (SRQ) size, should >= 256 (default: 1024)");
>>> Is an SRQ overflow fatal? Isn't the SRQ size something that should be
>>> computed by the nvmet_rdma driver such that SRQ overflows do not happen?
>> I've added the following code to make sure that the size is not greater than
>> device capability:
>>
>> +ndev->srq_size = min(ndev->device->attrs.max_srq_wr,
>> +                            nvmet_rdma_srq_size);
>>
>> In case the SRQ doesn't have enough credits it will send rnr to the
>> initiator and the initiator will retry later on.
> This is a pretty big change, in bad cases we could significantly
> overflow the srq space available...
>
> A big part of most verbs protocols to ensure that the RQ does not
> overflow.
>
> Are we sure it is OK? With all initiator/targets out there?

IMO if we set the srq size to utilize the wire so we're good.

So the best we can do is decrease the rnr timer to re-transmit faster - 
I've patches for that as well.

Let me know if you prefer I'll sent it in v3.

Nevertheless, this situation is better from the current SRQ per HCA 
implementation.


>
> Jason

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/5] nvmet-rdma: use SRQ per completion vector
  2020-03-19 12:48         ` Max Gurtovoy
@ 2020-03-19 13:53           ` Jason Gunthorpe
  2020-03-19 14:49             ` Bart Van Assche
  0 siblings, 1 reply; 28+ messages in thread
From: Jason Gunthorpe @ 2020-03-19 13:53 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: loberman, Bart Van Assche, vladimirk, idanb, linux-rdma, shlomin,
	linux-nvme, leonro, dledford, oren, kbusch, rgirase, hch, sagi

On Thu, Mar 19, 2020 at 02:48:20PM +0200, Max Gurtovoy wrote:

> Nevertheless, this situation is better from the current SRQ per HCA
> implementation.

nvme/srp/etc already use srq? I see it in the target but not initiator?

Just worried about breaking some weird target somewhere

Jason

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/5] nvmet-rdma: add srq pointer to rdma_cmd
  2020-03-19 11:54       ` Jason Gunthorpe
@ 2020-03-19 14:08         ` Konstantin Ryabitsev
  2020-03-19 21:58         ` Konstantin Ryabitsev
  1 sibling, 0 replies; 28+ messages in thread
From: Konstantin Ryabitsev @ 2020-03-19 14:08 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: loberman, bvanassche, vladimirk, idanb, linux-rdma, rgirase,
	shlomin, linux-nvme, leonro, dledford, oren, kbusch,
	Max Gurtovoy, hch, sagi

On Thu, Mar 19, 2020 at 08:54:31AM -0300, Jason Gunthorpe wrote:
> > > Max, how are you sending these emails, and why don't they thread
> > > properly on patchworks:
> > > 
> > > https://patchwork.kernel.org/project/linux-rdma/list/?series=258271
> > > 
> > > This patch is missing from the series
> > > 
> > > v1 had the same issue
> > > 
> > > Very strange. Can you fix it?
> > 
> > I'm using "git send-email".
> > 
> > Should I use some special flags or CC another email for it ?
> > 
> > How do you suggest sending patches so we'll see it on patchworks ?
> 
> I looked at these mails and they seem properly threaded/etc.
> 
> I've added Konstantin, perhaps he knows why patchworks is acting
> weird here?

Not sure. Everything appears to be properly threaded. I see 2/5 arriving 
at precisely the same time as the cover letter, so 2/5 probably got 
processed before 0/5. I have no idea if that actually matters to 
patchwork -- a whole bunch of series would break if arrival ordering was 
that important. At least I assume that would be the case.

I'll check with upstream.

-K

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/5] nvmet-rdma: use SRQ per completion vector
  2020-03-19 13:53           ` Jason Gunthorpe
@ 2020-03-19 14:49             ` Bart Van Assche
       [not found]               ` <50dd8f5d-d092-54bc-236d-1e702fb95240@mellanox.com>
  0 siblings, 1 reply; 28+ messages in thread
From: Bart Van Assche @ 2020-03-19 14:49 UTC (permalink / raw)
  To: Jason Gunthorpe, Max Gurtovoy
  Cc: loberman, sagi, vladimirk, idanb, linux-rdma, shlomin,
	linux-nvme, leonro, dledford, oren, kbusch, rgirase, hch

On 3/19/20 6:53 AM, Jason Gunthorpe wrote:
> On Thu, Mar 19, 2020 at 02:48:20PM +0200, Max Gurtovoy wrote:
> 
>> Nevertheless, this situation is better from the current SRQ per HCA
>> implementation.
> 
> nvme/srp/etc already use srq? I see it in the target but not initiator?
> 
> Just worried about breaking some weird target somewhere

 From the upstream SRP target driver:

static void srpt_get_ioc(struct srpt_port *sport, u32 slot,
			 struct ib_dm_mad *mad)
{
	[ ... ]
	if (sdev->use_srq)
		send_queue_depth = sdev->srq_size;
	else
		send_queue_depth = min(MAX_SRPT_RQ_SIZE,
				       sdev->device->attrs.max_qp_wr);
	[ ... ]
	iocp->send_queue_depth = cpu_to_be16(send_queue_depth);
	[ ... ]
}

I'm not sure the SRP initiator uses that data from the device management
I/O controller profile.

Anyway, with one SRQ per initiator it is possible for the initiator to
prevent SRQ overflows. I don't think that it is possible for an initiator
to prevent target side SRQ overflows if shared receive queues are shared
across multiple initiators.

Thanks,

Bart.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/5] nvmet-rdma: use SRQ per completion vector
       [not found]                 ` <6e3cc1c4-b24e-f607-42b3-5b83dd8c312c@mellanox.com>
@ 2020-03-19 16:27                   ` Max Gurtovoy
  0 siblings, 0 replies; 28+ messages in thread
From: Max Gurtovoy @ 2020-03-19 16:27 UTC (permalink / raw)
  To: Bart Van Assche, Jason Gunthorpe
  Cc: loberman, sagi, vladimirk, idanb, linux-rdma, shlomin,
	linux-nvme, leonro, dledford, oren, kbusch, rgirase, hch

and again - hopefully last time I'm blocked...

On 3/19/2020 5:44 PM, Max Gurtovoy wrote:
>
> sending again - I think the previous mail was blocked.
>
> On 3/19/2020 5:10 PM, Max Gurtovoy wrote:
>>
>>
>> On 3/19/2020 4:49 PM, Bart Van Assche wrote:
>>> On 3/19/20 6:53 AM, Jason Gunthorpe wrote:
>>>> On Thu, Mar 19, 2020 at 02:48:20PM +0200, Max Gurtovoy wrote:
>>>>
>>>>> Nevertheless, this situation is better from the current SRQ per HCA
>>>>> implementation.
>>>>
>>>> nvme/srp/etc already use srq? I see it in the target but not 
>>>> initiator?
>>>>
>>>> Just worried about breaking some weird target somewhere
>>
>> Jason,
>>
>> The feature is only for target side and has no influence on initiator 
>> an the ULP level.
>>
>> The only thing that I did is fixed the SRQ implementation for 
>> nvme/rdma and srpt that allocate 1 SRQ per device.
>>
>> Now there are N SRQs per device that try to act as pure MQ 
>> implementation (without SRQ).
>>
>>>
>>> From the upstream SRP target driver:
>>>
>>> static void srpt_get_ioc(struct srpt_port *sport, u32 slot,
>>>              struct ib_dm_mad *mad)
>>> {
>>>     [ ... ]
>>>     if (sdev->use_srq)
>>>         send_queue_depth = sdev->srq_size;
>>>     else
>>>         send_queue_depth = min(MAX_SRPT_RQ_SIZE,
>>>                        sdev->device->attrs.max_qp_wr);
>>>     [ ... ]
>>>     iocp->send_queue_depth = cpu_to_be16(send_queue_depth);
>>>     [ ... ]
>>> }
>>>
>>> I'm not sure the SRP initiator uses that data from the device 
>>> management
>>> I/O controller profile.
>>>
>>> Anyway, with one SRQ per initiator it is possible for the initiator to
>>> prevent SRQ overflows. I don't think that it is possible for an 
>>> initiator
>>> to prevent target side SRQ overflows if shared receive queues are 
>>> shared
>>> across multiple initiators.
>>
>> I don't to change initiator code and prevent overflow.
>>
>> As I explained earlier, the SRQs in the target side will be assigned 
>> for all controllers for specific device (instead of global 1 per 
>> device) and share the receive buffers.
>>
>> Not per initiator. This will cause lock contention.
>>
>> In case the target SRQ has no resources in the specific time, the low 
>> level (RC qp) is responsible to send rnr to the initiator and the 
>> initiator (RC qp) will retry in the transport layer and not in the ULP.
>>
>> This is set by min_rnr_timer value that by default set to 0 (max 
>> value).For SRQ case in general, IMO better to set it to 1 (minimal 
>> value) to avoid longer latency since there is a chance that SRQ is full.
>>
>> In my testing I didn't see a real need to set the min_rnr_timer but I 
>> have patches for that in case Jason thinks that this should be part 
>> of this series that is not so small without it.
>>
>>
>>>
>>> Thanks,
>>>
>>> Bart.

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 2/5] nvmet-rdma: add srq pointer to rdma_cmd
  2020-03-19 11:54       ` Jason Gunthorpe
  2020-03-19 14:08         ` Konstantin Ryabitsev
@ 2020-03-19 21:58         ` Konstantin Ryabitsev
  1 sibling, 0 replies; 28+ messages in thread
From: Konstantin Ryabitsev @ 2020-03-19 21:58 UTC (permalink / raw)
  To: Jason Gunthorpe
  Cc: loberman, bvanassche, vladimirk, idanb, linux-rdma, rgirase,
	shlomin, linux-nvme, leonro, dledford, oren, kbusch,
	Max Gurtovoy, hch, sagi

On Thu, Mar 19, 2020 at 08:54:31AM -0300, Jason Gunthorpe wrote:
> > I'm using "git send-email".
> > 
> > Should I use some special flags or CC another email for it ?
> > 
> > How do you suggest sending patches so we'll see it on patchworks ?
> 
> I looked at these mails and they seem properly threaded/etc.
> 
> I've added Konstantin, perhaps he knows why patchworks is acting
> weird here?

Looks like it's hitting a race condition in patchwork that will be fixed 
in the next release.

Until then, you can grab these patches from lore.kernel.org (e.g. with 
b4), though it won't help with git-patchwork-bot integration.

-K

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 3/5] nvmet-rdma: use SRQ per completion vector
  2020-03-18 15:02 ` [PATCH v2 3/5] nvmet-rdma: use SRQ per completion vector Max Gurtovoy
  2020-03-19  4:09   ` Bart Van Assche
@ 2020-03-20  5:47   ` Sagi Grimberg
  1 sibling, 0 replies; 28+ messages in thread
From: Sagi Grimberg @ 2020-03-20  5:47 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, hch, loberman, bvanassche, linux-rdma
  Cc: rgirase, vladimirk, leonro, shlomin, dledford, jgg, oren, kbusch, idanb


> +static void nvmet_rdma_destroy_srqs(struct nvmet_rdma_device *ndev)
>   {
> -	if (!ndev->srq)
> +	int i;
> +
> +	if (!ndev->srqs)
>   		return;
>   
> -	nvmet_rdma_free_cmds(ndev, ndev->srq_cmds, ndev->srq_size, false);
> -	ib_destroy_srq(ndev->srq);
> +	for (i = 0; i < ndev->srq_count; i++)
> +		nvmet_rdma_destroy_srq(ndev->srqs[i]);
> +
> +	rdma_srq_pool_destroy(ndev->pd);
> +	kfree(ndev->srqs);
> +	ndev->srqs = NULL;
> +	ndev->srq_count = 0;
> +	ndev->srq_size = 0;

What is the point assigning these?

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/5] IB/core: add a simple SRQ pool per PD
  2020-03-18 15:02 ` [PATCH v2 1/5] IB/core: add a simple SRQ pool per PD Max Gurtovoy
@ 2020-03-20  5:59   ` Sagi Grimberg
  2020-03-20 13:21     ` Max Gurtovoy
  2020-03-20 14:27     ` Leon Romanovsky
  0 siblings, 2 replies; 28+ messages in thread
From: Sagi Grimberg @ 2020-03-20  5:59 UTC (permalink / raw)
  To: Max Gurtovoy, linux-nvme, hch, loberman, bvanassche, linux-rdma
  Cc: rgirase, vladimirk, shlomin, leonro, dledford, jgg, oren, kbusch, idanb


> ULP's can use this API to create/destroy SRQ's with the same
> characteristics for implementing a logic that aimed to save resources
> without significant performance penalty (e.g. create SRQ per completion
> vector and use shared receive buffers for multiple controllers of the
> ULP).

There is almost no logic in here. Is there a real point in having
in the way it is?

What is the point of creating a pool, getting all the srqs, manage
in the ulp (in an array), putting back, and destroy as a pool?

I'd expect to have a refcount for each qp referencing a srq from the
pool, and also that the pool would manage the srqs themselves.

srqs are long lived resources, unlike mrs which are taken and restored
to the pool on a per I/O basis...

Its not that I hate it or something, just not clear to me how useful it
is to have in this form...

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/5] IB/core: add a simple SRQ pool per PD
  2020-03-20  5:59   ` Sagi Grimberg
@ 2020-03-20 13:21     ` Max Gurtovoy
  2020-03-20 14:27     ` Leon Romanovsky
  1 sibling, 0 replies; 28+ messages in thread
From: Max Gurtovoy @ 2020-03-20 13:21 UTC (permalink / raw)
  To: Sagi Grimberg, linux-nvme, hch, loberman, bvanassche, linux-rdma
  Cc: rgirase, vladimirk, shlomin, leonro, dledford, jgg, oren, kbusch, idanb


On 3/20/2020 7:59 AM, Sagi Grimberg wrote:
>
>> ULP's can use this API to create/destroy SRQ's with the same
>> characteristics for implementing a logic that aimed to save resources
>> without significant performance penalty (e.g. create SRQ per completion
>> vector and use shared receive buffers for multiple controllers of the
>> ULP).
>
> There is almost no logic in here. Is there a real point in having
> in the way it is?
>
> What is the point of creating a pool, getting all the srqs, manage
> in the ulp (in an array), putting back, and destroy as a pool?
>
> I'd expect to have a refcount for each qp referencing a srq from the
> pool, and also that the pool would manage the srqs themselves.
>
> srqs are long lived resources, unlike mrs which are taken and restored
> to the pool on a per I/O basis...
>
> Its not that I hate it or something, just not clear to me how useful it
> is to have in this form...

Sagi,

It's surprising to me since in my V1 two years ago I sent a pure 
nvmet/RDMA implementation and no srq_pool in there.

And you have asked to add a srq_pool in the review.

Also I was asked to add another implementation with this API for another 
ULP back then and I didn't have the capacity for it.

Now I've done both NVMf and SRP target  implementation with the SRQ pool.

I'm ok with removing/upgrading the pool in a way everyone would be happy.

I'm ok with removing the SRP implementation if it's not needed.

I just want to add this feature to NVMf target 5.7 release.

So please decide on the implementation and I'll send the patches.

-Max.


_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 1/5] IB/core: add a simple SRQ pool per PD
  2020-03-20  5:59   ` Sagi Grimberg
  2020-03-20 13:21     ` Max Gurtovoy
@ 2020-03-20 14:27     ` Leon Romanovsky
  1 sibling, 0 replies; 28+ messages in thread
From: Leon Romanovsky @ 2020-03-20 14:27 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: loberman, bvanassche, vladimirk, shlomin, linux-rdma, rgirase,
	linux-nvme, idanb, dledford, jgg, oren, kbusch, Max Gurtovoy,
	hch

On Thu, Mar 19, 2020 at 10:59:33PM -0700, Sagi Grimberg wrote:
>
> > ULP's can use this API to create/destroy SRQ's with the same
> > characteristics for implementing a logic that aimed to save resources
> > without significant performance penalty (e.g. create SRQ per completion
> > vector and use shared receive buffers for multiple controllers of the
> > ULP).
>
> There is almost no logic in here. Is there a real point in having
> in the way it is?
>
> What is the point of creating a pool, getting all the srqs, manage
> in the ulp (in an array), putting back, and destroy as a pool?
>
> I'd expect to have a refcount for each qp referencing a srq from the
> pool, and also that the pool would manage the srqs themselves.
>
> srqs are long lived resources, unlike mrs which are taken and restored
> to the pool on a per I/O basis...
>
> Its not that I hate it or something, just not clear to me how useful it
> is to have in this form...

Sagi,

Why do we need such complexity of referencing to the QPs?
This SRQ pool is intended for the kernel users whom we trust and we don't expect
that QP will be destroyed before SRQ.

Thanks

_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, back to index

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-18 15:02 [PATCH v2 0/5] nvmet-rdma/srpt: SRQ per completion vector Max Gurtovoy
2020-03-18 15:02 ` [PATCH v2 1/5] IB/core: add a simple SRQ pool per PD Max Gurtovoy
2020-03-20  5:59   ` Sagi Grimberg
2020-03-20 13:21     ` Max Gurtovoy
2020-03-20 14:27     ` Leon Romanovsky
2020-03-18 15:02 ` [PATCH v2 2/5] nvmet-rdma: add srq pointer to rdma_cmd Max Gurtovoy
2020-03-18 23:32   ` Jason Gunthorpe
2020-03-19  8:48     ` Max Gurtovoy
2020-03-19  9:14       ` Leon Romanovsky
2020-03-19 10:55         ` Max Gurtovoy
2020-03-19 11:54       ` Jason Gunthorpe
2020-03-19 14:08         ` Konstantin Ryabitsev
2020-03-19 21:58         ` Konstantin Ryabitsev
2020-03-19  4:05   ` Bart Van Assche
2020-03-18 15:02 ` [PATCH v2 3/5] nvmet-rdma: use SRQ per completion vector Max Gurtovoy
2020-03-19  4:09   ` Bart Van Assche
2020-03-19  9:15     ` Max Gurtovoy
2020-03-19 11:56       ` Jason Gunthorpe
2020-03-19 12:48         ` Max Gurtovoy
2020-03-19 13:53           ` Jason Gunthorpe
2020-03-19 14:49             ` Bart Van Assche
     [not found]               ` <50dd8f5d-d092-54bc-236d-1e702fb95240@mellanox.com>
     [not found]                 ` <6e3cc1c4-b24e-f607-42b3-5b83dd8c312c@mellanox.com>
2020-03-19 16:27                   ` Max Gurtovoy
2020-03-20  5:47   ` Sagi Grimberg
2020-03-18 15:02 ` [PATCH v2 4/5] RDMA/srpt: use ib_alloc_cq instead of ib_alloc_cq_any Max Gurtovoy
2020-03-19  4:15   ` Bart Van Assche
2020-03-18 15:02 ` [PATCH v2 5/5] RDMA/srpt: use SRQ per completion vector Max Gurtovoy
2020-03-19  4:20   ` Bart Van Assche
2020-03-19  4:02 ` [PATCH v2 0/5] nvmet-rdma/srpt: " Bart Van Assche

Linux-NVME Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nvme/0 linux-nvme/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nvme linux-nvme/ https://lore.kernel.org/linux-nvme \
		linux-nvme@lists.infradead.org
	public-inbox-index linux-nvme

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.infradead.lists.linux-nvme


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git