All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector
@ 2017-11-16 17:21 Max Gurtovoy
  2017-11-16 17:21 ` [PATCH 1/3] IB/core: add a simple SRQ pool per PD Max Gurtovoy
                   ` (4 more replies)
  0 siblings, 5 replies; 32+ messages in thread
From: Max Gurtovoy @ 2017-11-16 17:21 UTC (permalink / raw)


Since there is an active discussion regarding the CQ pool architecture, I decided to push
this feature (maybe it can be pushed before CQ pool).

This is a new feature for NVMEoF RDMA target, that is intended to save resource allocation
(by sharing them) and utilize the locality of completions to get the best performance with
Shared Receive Queues (SRQs). We'll create a SRQ per completion vector (and not per device)
using a new API (SRQ pool, added to this patchset too) and associate each created QP/CQ with
an appropriate SRQ. This will also reduce the lock contention on the single SRQ per device
(today's solution).

My testing environment included 4 initiators (CX5, CX5, CX4, CX3) that were connected to 4
subsystems (1 ns per sub) throw 2 ports (each initiator connected to unique subsystem
backed in a different bull_blk device) using a switch to the NVMEoF target (CX5).
I used RoCE link layer.

Configuration:
 - Irqbalancer stopped on each server
 - set_irq_affinity.sh on each interface
 - 2 initiators run traffic throw port 1
 - 2 initiators run traffic throw port 2
 - On initiator set register_always=N
 - Fio with 12 jobs, iodepth 128

Memory consumption calculation for recv buffers (target):
 - Multiple SRQ: SRQ_size * comp_num * ib_devs_num * inline_buffer_size
 - Single SRQ: SRQ_size * 1 * ib_devs_num * inline_buffer_size
 - MQ: RQ_size * CPU_num * ctrl_num * inline_buffer_size

Cases:
 1. Multiple SRQ with 1024 entries:
    - Mem = 1024 * 24 * 2 * 4k = 192MiB (Constant number - not depend on initiators number)
 2. Multiple SRQ with 256 entries:
    - Mem = 256 * 24 * 2 * 4k = 48MiB (Constant number - not depend on initiators number)
 3. MQ:
    - Mem = 256 * 24 * 8 * 4k = 192MiB (Mem grows for every new created ctrl)
 4. Single SRQ (current SRQ implementation):
    - Mem = 4096 * 1 * 2 * 4k = 32MiB (Constant number - not depend on initiators number)

results:

BS    1.read (target CPU)   2.read (target CPU)    3.read (target CPU)   4.read (target CPU)
---  --------------------- --------------------- --------------------- ----------------------
1k     5.88M (80%)            5.45M (72%)            6.77M (91%)          2.2M (72%)

2k     3.56M (65%)            3.45M (59%)            3.72M (64%)          2.12M (59%)

4k     1.8M (33%)             1.87M (32%)            1.88M (32%)          1.59M (34%)

BS    1.write (target CPU)   2.write (target CPU) 3.write (target CPU)   4.write (target CPU)
---  --------------------- --------------------- --------------------- ----------------------
1k     5.42M (63%)            5.14M (55%)            7.75M (82%)          2.14M (74%)

2k     4.15M (56%)            4.14M (51%)            4.16M (52%)          2.08M (73%)

4k     2.17M (28%)            2.17M (27%)            2.16M (28%)          1.62M (24%)


We can see the perf improvement between Case 2 and Case 4 (same order of resource).
We can see the benefit in resource consumption (mem and CPU) with a small perf loss
between cases 2 and 3.
There is still an open question between the perf differance for 1k between Case 1 and
Case 3, but I guess we can investigate and improve it incrementaly.

Thanks to Idan Burstein and Oren Duer for suggesting this nice feature.

Changes from V1:
 - Added SRQ pool per protection domain for IB/core
 - Fixed few comments from Christoph and Sagi

Max Gurtovoy (3):
  IB/core: add a simple SRQ pool per PD
  nvmet-rdma: use srq pointer in rdma_cmd
  nvmet-rdma: use SRQ per completion vector

 drivers/infiniband/core/Makefile   |   2 +-
 drivers/infiniband/core/srq_pool.c | 106 +++++++++++++++++++++
 drivers/infiniband/core/verbs.c    |   4 +
 drivers/nvme/target/rdma.c         | 190 +++++++++++++++++++++++++++----------
 include/rdma/ib_verbs.h            |   5 +
 include/rdma/srq_pool.h            |  46 +++++++++
 6 files changed, 301 insertions(+), 52 deletions(-)
 create mode 100644 drivers/infiniband/core/srq_pool.c
 create mode 100644 include/rdma/srq_pool.h

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/3] IB/core: add a simple SRQ pool per PD
  2017-11-16 17:21 [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector Max Gurtovoy
@ 2017-11-16 17:21 ` Max Gurtovoy
       [not found]   ` <1510852885-25519-2-git-send-email-maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2017-11-16 18:32   ` Sagi Grimberg
  2017-11-16 17:21 ` [PATCH 2/3] nvmet-rdma: use srq pointer in rdma_cmd Max Gurtovoy
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 32+ messages in thread
From: Max Gurtovoy @ 2017-11-16 17:21 UTC (permalink / raw)


Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/infiniband/core/Makefile   |   2 +-
 drivers/infiniband/core/srq_pool.c | 106 +++++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/verbs.c    |   4 ++
 include/rdma/ib_verbs.h            |   5 ++
 include/rdma/srq_pool.h            |  46 ++++++++++++++++
 5 files changed, 162 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/core/srq_pool.c
 create mode 100644 include/rdma/srq_pool.h

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index b4df164..365da0c 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -11,7 +11,7 @@ ib_core-y :=			packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
 				device.o fmr_pool.o cache.o netlink.o \
 				roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
 				multicast.o mad.o smi.o agent.o mad_rmpp.o \
-				security.o nldev.o
+				security.o nldev.o srq_pool.o
 
 ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
 ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
diff --git a/drivers/infiniband/core/srq_pool.c b/drivers/infiniband/core/srq_pool.c
new file mode 100644
index 0000000..4f4a089
--- /dev/null
+++ b/drivers/infiniband/core/srq_pool.c
@@ -0,0 +1,106 @@
+/*
+ * Copyright (c) 2017 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#include <rdma/srq_pool.h>
+
+struct ib_srq *ib_srq_pool_get(struct ib_pd *pd)
+{
+	struct ib_srq *srq;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pd->srq_lock, flags);
+	srq = list_first_entry_or_null(&pd->srqs, struct ib_srq, pd_entry);
+	if (srq) {
+		list_del(&srq->pd_entry);
+		pd->srqs_used++;
+	}
+	spin_unlock_irqrestore(&pd->srq_lock, flags);
+
+	return srq;
+}
+EXPORT_SYMBOL(ib_srq_pool_get);
+
+void ib_srq_pool_put(struct ib_pd *pd, struct ib_srq *srq)
+{
+	unsigned long flags;
+
+	spin_lock_irqsave(&pd->srq_lock, flags);
+	list_add(&srq->pd_entry, &pd->srqs);
+	pd->srqs_used--;
+	spin_unlock_irqrestore(&pd->srq_lock, flags);
+}
+EXPORT_SYMBOL(ib_srq_pool_put);
+
+int ib_srq_pool_init(struct ib_pd *pd, int nr,
+		struct ib_srq_init_attr *srq_attr)
+{
+	struct ib_srq *srq;
+	unsigned long flags;
+	int ret, i;
+
+	for (i = 0; i < nr; i++) {
+		srq = ib_create_srq(pd, srq_attr);
+		if (IS_ERR(srq)) {
+			ret = PTR_ERR(srq);
+			goto out;
+		}
+
+		spin_lock_irqsave(&pd->srq_lock, flags);
+		list_add_tail(&srq->pd_entry, &pd->srqs);
+		spin_unlock_irqrestore(&pd->srq_lock, flags);
+	}
+
+	return 0;
+out:
+	ib_srq_pool_destroy(pd);
+	return ret;
+}
+EXPORT_SYMBOL(ib_srq_pool_init);
+
+void ib_srq_pool_destroy(struct ib_pd *pd)
+{
+	struct ib_srq *srq;
+	unsigned long flags;
+
+	spin_lock_irqsave(&pd->srq_lock, flags);
+	while (!list_empty(&pd->srqs)) {
+		srq = list_first_entry(&pd->srqs, struct ib_srq, pd_entry);
+		list_del(&srq->pd_entry);
+
+		spin_unlock_irqrestore(&pd->srq_lock, flags);
+		ib_destroy_srq(srq);
+		spin_lock_irqsave(&pd->srq_lock, flags);
+	}
+	spin_unlock_irqrestore(&pd->srq_lock, flags);
+}
+EXPORT_SYMBOL(ib_srq_pool_destroy);
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index de57d6c..74db405 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -233,6 +233,9 @@ struct ib_pd *__ib_alloc_pd(struct ib_device *device, unsigned int flags,
 	pd->__internal_mr = NULL;
 	atomic_set(&pd->usecnt, 0);
 	pd->flags = flags;
+	pd->srqs_used = 0;
+	spin_lock_init(&pd->srq_lock);
+	INIT_LIST_HEAD(&pd->srqs);
 
 	if (device->attrs.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)
 		pd->local_dma_lkey = device->local_dma_lkey;
@@ -289,6 +292,7 @@ void ib_dealloc_pd(struct ib_pd *pd)
 		pd->__internal_mr = NULL;
 	}
 
+	WARN_ON_ONCE(pd->srqs_used > 0);
 	/* uverbs manipulates usecnt with proper locking, while the kabi
 	   requires the caller to guarantee we can't race here. */
 	WARN_ON(atomic_read(&pd->usecnt));
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index bdb1279..fdc721f 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1512,6 +1512,10 @@ struct ib_pd {
 
 	u32			unsafe_global_rkey;
 
+	spinlock_t		srq_lock;
+	int			srqs_used;
+	struct list_head	srqs;
+
 	/*
 	 * Implementation details of the RDMA core, don't use in drivers:
 	 */
@@ -1566,6 +1570,7 @@ struct ib_srq {
 	void		       *srq_context;
 	enum ib_srq_type	srq_type;
 	atomic_t		usecnt;
+	struct list_head	pd_entry; /* srq pool entry */
 
 	struct {
 		struct ib_cq   *cq;
diff --git a/include/rdma/srq_pool.h b/include/rdma/srq_pool.h
new file mode 100644
index 0000000..04aa059
--- /dev/null
+++ b/include/rdma/srq_pool.h
@@ -0,0 +1,46 @@
+/*
+ * Copyright (c) 2017 Mellanox Technologies. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ */
+
+#ifndef _RDMA_SRQ_POOL_H
+#define _RDMA_SRQ_POOL_H 1
+
+#include <rdma/ib_verbs.h>
+
+struct ib_srq *ib_srq_pool_get(struct ib_pd *pd);
+void ib_srq_pool_put(struct ib_pd *pd, struct ib_srq *srq);
+
+int ib_srq_pool_init(struct ib_pd *pd, int nr,
+		struct ib_srq_init_attr *srq_attr);
+void ib_srq_pool_destroy(struct ib_pd *pd);
+
+#endif /* _RDMA_SRQ_POOL_H */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 2/3] nvmet-rdma: use srq pointer in rdma_cmd
  2017-11-16 17:21 [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector Max Gurtovoy
  2017-11-16 17:21 ` [PATCH 1/3] IB/core: add a simple SRQ pool per PD Max Gurtovoy
@ 2017-11-16 17:21 ` Max Gurtovoy
  2017-11-16 17:26   ` [Suspected-Phishing][PATCH " Max Gurtovoy
  2017-11-16 18:37   ` [PATCH " Sagi Grimberg
  2017-11-16 17:21 ` [PATCH 3/3] nvmet-rdma: use SRQ per completion vector Max Gurtovoy
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 32+ messages in thread
From: Max Gurtovoy @ 2017-11-16 17:21 UTC (permalink / raw)


This is a preparetion patch that creates an infrastructure in post
recv mechanism for the SRQ per completion vector feature.

Change-Id: I0b0130920580e6999f0517400a45a1078e9553c9
Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/target/rdma.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 4991290..3ecfc12 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -45,6 +45,7 @@ struct nvmet_rdma_cmd {
 	struct page		*inline_page;
 	struct nvme_command     *nvme_cmd;
 	struct nvmet_rdma_queue	*queue;
+	struct ib_srq		*srq;
 };
 
 enum {
@@ -442,8 +443,8 @@ static int nvmet_rdma_post_recv(struct nvmet_rdma_device *ndev,
 		cmd->sge[0].addr, cmd->sge[0].length,
 		DMA_FROM_DEVICE);
 
-	if (ndev->srq)
-		return ib_post_srq_recv(ndev->srq, &cmd->wr, &bad_wr);
+	if (cmd->srq)
+		return ib_post_srq_recv(cmd->srq, &cmd->wr, &bad_wr);
 	return ib_post_recv(cmd->queue->cm_id->qp, &cmd->wr, &bad_wr);
 }
 
@@ -821,8 +822,10 @@ static int nvmet_rdma_init_srq(struct nvmet_rdma_device *ndev)
 	ndev->srq = srq;
 	ndev->srq_size = srq_size;
 
-	for (i = 0; i < srq_size; i++)
+	for (i = 0; i < srq_size; i++) {
+		ndev->srq_cmds[i].srq = srq;
 		nvmet_rdma_post_recv(ndev, &ndev->srq_cmds[i]);
+	}
 
 	return 0;
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH 3/3] nvmet-rdma: use SRQ per completion vector
  2017-11-16 17:21 [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector Max Gurtovoy
  2017-11-16 17:21 ` [PATCH 1/3] IB/core: add a simple SRQ pool per PD Max Gurtovoy
  2017-11-16 17:21 ` [PATCH 2/3] nvmet-rdma: use srq pointer in rdma_cmd Max Gurtovoy
@ 2017-11-16 17:21 ` Max Gurtovoy
       [not found] ` <1510852885-25519-1-git-send-email-maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  2017-11-16 18:36 ` Sagi Grimberg
  4 siblings, 0 replies; 32+ messages in thread
From: Max Gurtovoy @ 2017-11-16 17:21 UTC (permalink / raw)


In order to save resource allocation and utilize the completion
locality in a better way (compared to SRQ per device), allocate
Shared Receive Queues (SRQs) per completion vector (and not per
device). Associate each created QP/CQ with an appropriate SRQ
according to the queue index. This association will reduce the
lock contention in the fast path (compared to SRQ per
device solution) and increase the locality in memory buffers.
Add new module parameter for SRQ size to adjust it according to
the expected load. User should make sure the size is >= 256 to
avoid lack of resources.

Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
---
 drivers/nvme/target/rdma.c | 189 ++++++++++++++++++++++++++++++++-------------
 1 file changed, 137 insertions(+), 52 deletions(-)

diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
index 3ecfc12..5ff6261 100644
--- a/drivers/nvme/target/rdma.c
+++ b/drivers/nvme/target/rdma.c
@@ -20,6 +20,7 @@
 #include <linux/module.h>
 #include <linux/nvme.h>
 #include <linux/slab.h>
+#include <rdma/srq_pool.h>
 #include <linux/string.h>
 #include <linux/wait.h>
 #include <linux/inet.h>
@@ -37,6 +38,8 @@
  */
 #define NVMET_RDMA_INLINE_DATA_SIZE	PAGE_SIZE
 
+struct nvmet_rdma_srq;
+
 struct nvmet_rdma_cmd {
 	struct ib_sge		sge[2];
 	struct ib_cqe		cqe;
@@ -45,7 +48,7 @@ struct nvmet_rdma_cmd {
 	struct page		*inline_page;
 	struct nvme_command     *nvme_cmd;
 	struct nvmet_rdma_queue	*queue;
-	struct ib_srq		*srq;
+	struct nvmet_rdma_srq   *nsrq;
 };
 
 enum {
@@ -87,6 +90,7 @@ struct nvmet_rdma_queue {
 	struct ib_cq		*cq;
 	atomic_t		sq_wr_avail;
 	struct nvmet_rdma_device *dev;
+	struct nvmet_rdma_srq   *nsrq;
 	spinlock_t		state_lock;
 	enum nvmet_rdma_queue_state state;
 	struct nvmet_cq		nvme_cq;
@@ -104,17 +108,24 @@ struct nvmet_rdma_queue {
 
 	int			idx;
 	int			host_qid;
+	int			comp_vector;
 	int			recv_queue_size;
 	int			send_queue_size;
 
 	struct list_head	queue_list;
 };
 
+struct nvmet_rdma_srq {
+	struct ib_srq            *srq;
+	struct nvmet_rdma_cmd    *cmds;
+	struct nvmet_rdma_device *ndev;
+};
+
 struct nvmet_rdma_device {
 	struct ib_device	*device;
 	struct ib_pd		*pd;
-	struct ib_srq		*srq;
-	struct nvmet_rdma_cmd	*srq_cmds;
+	struct nvmet_rdma_srq	**srqs;
+	int			srq_count;
 	size_t			srq_size;
 	struct kref		ref;
 	struct list_head	entry;
@@ -124,6 +135,16 @@ struct nvmet_rdma_device {
 module_param_named(use_srq, nvmet_rdma_use_srq, bool, 0444);
 MODULE_PARM_DESC(use_srq, "Use shared receive queue.");
 
+static int srq_size_set(const char *val, const struct kernel_param *kp);
+static const struct kernel_param_ops srq_size_ops = {
+	.set = srq_size_set,
+	.get = param_get_int,
+};
+
+static int nvmet_rdma_srq_size = 4095;
+module_param_cb(srq_size, &srq_size_ops, &nvmet_rdma_srq_size, 0644);
+MODULE_PARM_DESC(srq_size, "set Shared Receive Queue (SRQ) size, should >= 256 (default: 4095)");
+
 static DEFINE_IDA(nvmet_rdma_queue_ida);
 static LIST_HEAD(nvmet_rdma_queue_list);
 static DEFINE_MUTEX(nvmet_rdma_queue_mutex);
@@ -140,6 +161,17 @@ struct nvmet_rdma_device {
 
 static struct nvmet_fabrics_ops nvmet_rdma_ops;
 
+static int srq_size_set(const char *val, const struct kernel_param *kp)
+{
+	int n = 0, ret;
+
+	ret = kstrtoint(val, 10, &n);
+	if (ret != 0 || n < 256)
+		return -EINVAL;
+
+	return param_set_int(val, kp);
+}
+
 /* XXX: really should move to a generic header sooner or later.. */
 static inline u32 get_unaligned_le24(const u8 *p)
 {
@@ -443,8 +475,8 @@ static int nvmet_rdma_post_recv(struct nvmet_rdma_device *ndev,
 		cmd->sge[0].addr, cmd->sge[0].length,
 		DMA_FROM_DEVICE);
 
-	if (cmd->srq)
-		return ib_post_srq_recv(cmd->srq, &cmd->wr, &bad_wr);
+	if (cmd->nsrq)
+		return ib_post_srq_recv(cmd->nsrq->srq, &cmd->wr, &bad_wr);
 	return ib_post_recv(cmd->queue->cm_id->qp, &cmd->wr, &bad_wr);
 }
 
@@ -781,56 +813,107 @@ static void nvmet_rdma_recv_done(struct ib_cq *cq, struct ib_wc *wc)
 	nvmet_rdma_handle_command(queue, rsp);
 }
 
-static void nvmet_rdma_destroy_srq(struct nvmet_rdma_device *ndev)
+static void nvmet_rdma_destroy_srq(struct nvmet_rdma_srq *nsrq)
 {
-	if (!ndev->srq)
+	nvmet_rdma_free_cmds(nsrq->ndev, nsrq->cmds, nsrq->ndev->srq_size, false);
+	ib_srq_pool_put(nsrq->ndev->pd, nsrq->srq);
+
+	kfree(nsrq);
+}
+
+static void nvmet_rdma_destroy_srqs(struct nvmet_rdma_device *ndev)
+{
+	int i;
+
+	if (!ndev->srqs)
 		return;
 
-	nvmet_rdma_free_cmds(ndev, ndev->srq_cmds, ndev->srq_size, false);
-	ib_destroy_srq(ndev->srq);
+	for (i = 0; i < ndev->srq_count; i++)
+		nvmet_rdma_destroy_srq(ndev->srqs[i]);
+
+	ib_srq_pool_destroy(ndev->pd);
+	kfree(ndev->srqs);
 }
 
-static int nvmet_rdma_init_srq(struct nvmet_rdma_device *ndev)
+static struct nvmet_rdma_srq *
+nvmet_rdma_init_srq(struct nvmet_rdma_device *ndev)
 {
-	struct ib_srq_init_attr srq_attr = { NULL, };
+	size_t srq_size = ndev->srq_size;
+	struct nvmet_rdma_srq *nsrq;
 	struct ib_srq *srq;
-	size_t srq_size;
 	int ret, i;
 
-	srq_size = 4095;	/* XXX: tune */
+	nsrq = kzalloc(sizeof(*nsrq), GFP_KERNEL);
+	if (!nsrq)
+		return ERR_PTR(-ENOMEM);
 
-	srq_attr.attr.max_wr = srq_size;
-	srq_attr.attr.max_sge = 2;
-	srq_attr.attr.srq_limit = 0;
-	srq_attr.srq_type = IB_SRQT_BASIC;
-	srq = ib_create_srq(ndev->pd, &srq_attr);
-	if (IS_ERR(srq)) {
-		/*
-		 * If SRQs aren't supported we just go ahead and use normal
-		 * non-shared receive queues.
-		 */
-		pr_info("SRQ requested but not supported.\n");
-		return 0;
+	srq = ib_srq_pool_get(ndev->pd);
+	if (!srq) {
+		ret = -EAGAIN;
+		goto out_free_nsrq;
 	}
 
-	ndev->srq_cmds = nvmet_rdma_alloc_cmds(ndev, srq_size, false);
-	if (IS_ERR(ndev->srq_cmds)) {
-		ret = PTR_ERR(ndev->srq_cmds);
-		goto out_destroy_srq;
+	nsrq->cmds = nvmet_rdma_alloc_cmds(ndev, srq_size, false);
+	if (IS_ERR(nsrq->cmds)) {
+		ret = PTR_ERR(nsrq->cmds);
+		goto out_put_srq;
 	}
 
-	ndev->srq = srq;
-	ndev->srq_size = srq_size;
+	nsrq->srq = srq;
+	nsrq->ndev = ndev;
 
 	for (i = 0; i < srq_size; i++) {
-		ndev->srq_cmds[i].srq = srq;
-		nvmet_rdma_post_recv(ndev, &ndev->srq_cmds[i]);
+		nsrq->cmds[i].nsrq = nsrq;
+		nvmet_rdma_post_recv(ndev, &nsrq->cmds[i]);
+	}
+
+	return nsrq;
+
+out_put_srq:
+	ib_srq_pool_put(ndev->pd, srq);
+out_free_nsrq:
+	kfree(nsrq);
+	return ERR_PTR(ret);
+}
+
+static int nvmet_rdma_init_srqs(struct nvmet_rdma_device *ndev)
+{
+	struct ib_srq_init_attr srq_attr = { NULL, };
+	int i, ret;
+
+	ndev->srq_size = min(ndev->device->attrs.max_srq_wr,
+			nvmet_rdma_srq_size);
+	ndev->srq_count = min(ndev->device->num_comp_vectors,
+			ndev->device->attrs.max_srq);
+
+	ndev->srqs = kcalloc(ndev->srq_count, sizeof(*ndev->srqs), GFP_KERNEL);
+	if (!ndev->srqs)
+		return -ENOMEM;
+
+	srq_attr.attr.max_wr = ndev->srq_size;
+	srq_attr.attr.max_sge = 2;
+	srq_attr.attr.srq_limit = 0;
+	srq_attr.srq_type = IB_SRQT_BASIC;
+	ret = ib_srq_pool_init(ndev->pd, ndev->srq_count, &srq_attr);
+	if (ret)
+		goto err_free;
+
+	for (i = 0; i < ndev->srq_count; i++) {
+		ndev->srqs[i] = nvmet_rdma_init_srq(ndev);
+		if (IS_ERR(ndev->srqs[i]))
+			goto err_srq;
 	}
 
 	return 0;
 
-out_destroy_srq:
-	ib_destroy_srq(srq);
+err_srq:
+	while (--i >= 0)
+		nvmet_rdma_destroy_srq(ndev->srqs[i]);
+	ib_srq_pool_destroy(ndev->pd);
+err_free:
+	kfree(ndev->srqs);
+	ndev->srq_count = 0;
+	ndev->srq_size = 0;
 	return ret;
 }
 
@@ -843,7 +926,7 @@ static void nvmet_rdma_free_dev(struct kref *ref)
 	list_del(&ndev->entry);
 	mutex_unlock(&device_list_mutex);
 
-	nvmet_rdma_destroy_srq(ndev);
+	nvmet_rdma_destroy_srqs(ndev);
 	ib_dealloc_pd(ndev->pd);
 
 	kfree(ndev);
@@ -874,7 +957,7 @@ static void nvmet_rdma_free_dev(struct kref *ref)
 		goto out_free_dev;
 
 	if (nvmet_rdma_use_srq) {
-		ret = nvmet_rdma_init_srq(ndev);
+		ret = nvmet_rdma_init_srqs(ndev);
 		if (ret)
 			goto out_free_pd;
 	}
@@ -898,14 +981,7 @@ static int nvmet_rdma_create_queue_ib(struct nvmet_rdma_queue *queue)
 {
 	struct ib_qp_init_attr qp_attr;
 	struct nvmet_rdma_device *ndev = queue->dev;
-	int comp_vector, nr_cqe, ret, i;
-
-	/*
-	 * Spread the io queues across completion vectors,
-	 * but still keep all admin queues on vector 0.
-	 */
-	comp_vector = !queue->host_qid ? 0 :
-		queue->idx % ndev->device->num_comp_vectors;
+	int nr_cqe, ret, i;
 
 	/*
 	 * Reserve CQ slots for RECV + RDMA_READ/RDMA_WRITE + RDMA_SEND.
@@ -913,7 +989,7 @@ static int nvmet_rdma_create_queue_ib(struct nvmet_rdma_queue *queue)
 	nr_cqe = queue->recv_queue_size + 2 * queue->send_queue_size;
 
 	queue->cq = ib_alloc_cq(ndev->device, queue,
-			nr_cqe + 1, comp_vector,
+			nr_cqe + 1, queue->comp_vector,
 			IB_POLL_WORKQUEUE);
 	if (IS_ERR(queue->cq)) {
 		ret = PTR_ERR(queue->cq);
@@ -935,8 +1011,8 @@ static int nvmet_rdma_create_queue_ib(struct nvmet_rdma_queue *queue)
 	qp_attr.cap.max_send_sge = max(ndev->device->attrs.max_sge_rd,
 					ndev->device->attrs.max_sge);
 
-	if (ndev->srq) {
-		qp_attr.srq = ndev->srq;
+	if (queue->nsrq) {
+		qp_attr.srq = queue->nsrq->srq;
 	} else {
 		/* +1 for drain */
 		qp_attr.cap.max_recv_wr = 1 + queue->recv_queue_size;
@@ -955,7 +1031,7 @@ static int nvmet_rdma_create_queue_ib(struct nvmet_rdma_queue *queue)
 		 __func__, queue->cq->cqe, qp_attr.cap.max_send_sge,
 		 qp_attr.cap.max_send_wr, queue->cm_id);
 
-	if (!ndev->srq) {
+	if (!queue->nsrq) {
 		for (i = 0; i < queue->recv_queue_size; i++) {
 			queue->cmds[i].queue = queue;
 			nvmet_rdma_post_recv(ndev, &queue->cmds[i]);
@@ -984,7 +1060,7 @@ static void nvmet_rdma_free_queue(struct nvmet_rdma_queue *queue)
 	nvmet_sq_destroy(&queue->nvme_sq);
 
 	nvmet_rdma_destroy_queue_ib(queue);
-	if (!queue->dev->srq) {
+	if (!queue->nsrq) {
 		nvmet_rdma_free_cmds(queue->dev, queue->cmds,
 				queue->recv_queue_size,
 				!queue->host_qid);
@@ -1101,13 +1177,22 @@ static int nvmet_rdma_cm_reject(struct rdma_cm_id *cm_id,
 		goto out_destroy_sq;
 	}
 
+	/*
+	 * Spread the io queues across completion vectors,
+	 * but still keep all admin queues on vector 0.
+	 */
+	queue->comp_vector = !queue->host_qid ? 0 :
+		queue->idx % ndev->device->num_comp_vectors;
+
 	ret = nvmet_rdma_alloc_rsps(queue);
 	if (ret) {
 		ret = NVME_RDMA_CM_NO_RSC;
 		goto out_ida_remove;
 	}
 
-	if (!ndev->srq) {
+	if (ndev->srqs) {
+		queue->nsrq = ndev->srqs[queue->comp_vector % ndev->srq_count];
+	} else {
 		queue->cmds = nvmet_rdma_alloc_cmds(ndev,
 				queue->recv_queue_size,
 				!queue->host_qid);
@@ -1128,7 +1213,7 @@ static int nvmet_rdma_cm_reject(struct rdma_cm_id *cm_id,
 	return queue;
 
 out_free_cmds:
-	if (!ndev->srq) {
+	if (!queue->nsrq) {
 		nvmet_rdma_free_cmds(queue->dev, queue->cmds,
 				queue->recv_queue_size,
 				!queue->host_qid);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Suspected-Phishing][PATCH 2/3] nvmet-rdma: use srq pointer in rdma_cmd
  2017-11-16 17:21 ` [PATCH 2/3] nvmet-rdma: use srq pointer in rdma_cmd Max Gurtovoy
@ 2017-11-16 17:26   ` Max Gurtovoy
  2017-11-16 18:37   ` [PATCH " Sagi Grimberg
  1 sibling, 0 replies; 32+ messages in thread
From: Max Gurtovoy @ 2017-11-16 17:26 UTC (permalink / raw)




On 11/16/2017 7:21 PM, Max Gurtovoy wrote:
> This is a preparetion patch that creates an infrastructure in post
> recv mechanism for the SRQ per completion vector feature.
> 
> Change-Id: I0b0130920580e6999f0517400a45a1078e9553c9

please ignore the change-id, it's just a hook in my git.

> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
> ---
>   drivers/nvme/target/rdma.c | 9 ++++++---
>   1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c
> index 4991290..3ecfc12 100644
> --- a/drivers/nvme/target/rdma.c
> +++ b/drivers/nvme/target/rdma.c
> @@ -45,6 +45,7 @@ struct nvmet_rdma_cmd {
>   	struct page		*inline_page;
>   	struct nvme_command     *nvme_cmd;
>   	struct nvmet_rdma_queue	*queue;
> +	struct ib_srq		*srq;
>   };
>   
>   enum {
> @@ -442,8 +443,8 @@ static int nvmet_rdma_post_recv(struct nvmet_rdma_device *ndev,
>   		cmd->sge[0].addr, cmd->sge[0].length,
>   		DMA_FROM_DEVICE);
>   
> -	if (ndev->srq)
> -		return ib_post_srq_recv(ndev->srq, &cmd->wr, &bad_wr);
> +	if (cmd->srq)
> +		return ib_post_srq_recv(cmd->srq, &cmd->wr, &bad_wr);
>   	return ib_post_recv(cmd->queue->cm_id->qp, &cmd->wr, &bad_wr);
>   }
>   
> @@ -821,8 +822,10 @@ static int nvmet_rdma_init_srq(struct nvmet_rdma_device *ndev)
>   	ndev->srq = srq;
>   	ndev->srq_size = srq_size;
>   
> -	for (i = 0; i < srq_size; i++)
> +	for (i = 0; i < srq_size; i++) {
> +		ndev->srq_cmds[i].srq = srq;
>   		nvmet_rdma_post_recv(ndev, &ndev->srq_cmds[i]);
> +	}
>   
>   	return 0;
>   
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector
  2017-11-16 17:21 [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector Max Gurtovoy
@ 2017-11-16 18:08     ` Leon Romanovsky
  2017-11-16 17:21 ` [PATCH 2/3] nvmet-rdma: use srq pointer in rdma_cmd Max Gurtovoy
                       ` (3 subsequent siblings)
  4 siblings, 0 replies; 32+ messages in thread
From: Leon Romanovsky @ 2017-11-16 18:08 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: hch-jcswGhMUV9g, sagi-NQWnxTmZq1alnMjI0IkVqw,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	vladimirk-VPRAkNaXOzVWk0Htik3J/w, idanb-VPRAkNaXOzVWk0Htik3J/w,
	RDMA mailing list

On Thu, Nov 16, 2017 at 07:21:22PM +0200, Max Gurtovoy wrote:
> Since there is an active discussion regarding the CQ pool architecture, I decided to push
> this feature (maybe it can be pushed before CQ pool).

Max,

Thanks for CCing me, can you please repost the series and CC linux-rdma too?

>
> This is a new feature for NVMEoF RDMA target, that is intended to save resource allocation
> (by sharing them) and utilize the locality of completions to get the best performance with
> Shared Receive Queues (SRQs). We'll create a SRQ per completion vector (and not per device)
> using a new API (SRQ pool, added to this patchset too) and associate each created QP/CQ with
> an appropriate SRQ. This will also reduce the lock contention on the single SRQ per device
> (today's solution).
>
> My testing environment included 4 initiators (CX5, CX5, CX4, CX3) that were connected to 4
> subsystems (1 ns per sub) throw 2 ports (each initiator connected to unique subsystem
> backed in a different bull_blk device) using a switch to the NVMEoF target (CX5).
> I used RoCE link layer.
>
> Configuration:
>  - Irqbalancer stopped on each server
>  - set_irq_affinity.sh on each interface
>  - 2 initiators run traffic throw port 1
>  - 2 initiators run traffic throw port 2
>  - On initiator set register_always=N
>  - Fio with 12 jobs, iodepth 128
>
> Memory consumption calculation for recv buffers (target):
>  - Multiple SRQ: SRQ_size * comp_num * ib_devs_num * inline_buffer_size
>  - Single SRQ: SRQ_size * 1 * ib_devs_num * inline_buffer_size
>  - MQ: RQ_size * CPU_num * ctrl_num * inline_buffer_size
>
> Cases:
>  1. Multiple SRQ with 1024 entries:
>     - Mem = 1024 * 24 * 2 * 4k = 192MiB (Constant number - not depend on initiators number)
>  2. Multiple SRQ with 256 entries:
>     - Mem = 256 * 24 * 2 * 4k = 48MiB (Constant number - not depend on initiators number)
>  3. MQ:
>     - Mem = 256 * 24 * 8 * 4k = 192MiB (Mem grows for every new created ctrl)
>  4. Single SRQ (current SRQ implementation):
>     - Mem = 4096 * 1 * 2 * 4k = 32MiB (Constant number - not depend on initiators number)
>
> results:
>
> BS    1.read (target CPU)   2.read (target CPU)    3.read (target CPU)   4.read (target CPU)
> ---  --------------------- --------------------- --------------------- ----------------------
> 1k     5.88M (80%)            5.45M (72%)            6.77M (91%)          2.2M (72%)
>
> 2k     3.56M (65%)            3.45M (59%)            3.72M (64%)          2.12M (59%)
>
> 4k     1.8M (33%)             1.87M (32%)            1.88M (32%)          1.59M (34%)
>
> BS    1.write (target CPU)   2.write (target CPU) 3.write (target CPU)   4.write (target CPU)
> ---  --------------------- --------------------- --------------------- ----------------------
> 1k     5.42M (63%)            5.14M (55%)            7.75M (82%)          2.14M (74%)
>
> 2k     4.15M (56%)            4.14M (51%)            4.16M (52%)          2.08M (73%)
>
> 4k     2.17M (28%)            2.17M (27%)            2.16M (28%)          1.62M (24%)
>
>
> We can see the perf improvement between Case 2 and Case 4 (same order of resource).
> We can see the benefit in resource consumption (mem and CPU) with a small perf loss
> between cases 2 and 3.
> There is still an open question between the perf differance for 1k between Case 1 and
> Case 3, but I guess we can investigate and improve it incrementaly.
>
> Thanks to Idan Burstein and Oren Duer for suggesting this nice feature.
>
> Changes from V1:
>  - Added SRQ pool per protection domain for IB/core
>  - Fixed few comments from Christoph and Sagi
>
> Max Gurtovoy (3):
>   IB/core: add a simple SRQ pool per PD
>   nvmet-rdma: use srq pointer in rdma_cmd
>   nvmet-rdma: use SRQ per completion vector
>
>  drivers/infiniband/core/Makefile   |   2 +-
>  drivers/infiniband/core/srq_pool.c | 106 +++++++++++++++++++++
>  drivers/infiniband/core/verbs.c    |   4 +
>  drivers/nvme/target/rdma.c         | 190 +++++++++++++++++++++++++++----------
>  include/rdma/ib_verbs.h            |   5 +
>  include/rdma/srq_pool.h            |  46 +++++++++
>  6 files changed, 301 insertions(+), 52 deletions(-)
>  create mode 100644 drivers/infiniband/core/srq_pool.c
>  create mode 100644 include/rdma/srq_pool.h
>
> --
> 1.8.3.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector
@ 2017-11-16 18:08     ` Leon Romanovsky
  0 siblings, 0 replies; 32+ messages in thread
From: Leon Romanovsky @ 2017-11-16 18:08 UTC (permalink / raw)


On Thu, Nov 16, 2017@07:21:22PM +0200, Max Gurtovoy wrote:
> Since there is an active discussion regarding the CQ pool architecture, I decided to push
> this feature (maybe it can be pushed before CQ pool).

Max,

Thanks for CCing me, can you please repost the series and CC linux-rdma too?

>
> This is a new feature for NVMEoF RDMA target, that is intended to save resource allocation
> (by sharing them) and utilize the locality of completions to get the best performance with
> Shared Receive Queues (SRQs). We'll create a SRQ per completion vector (and not per device)
> using a new API (SRQ pool, added to this patchset too) and associate each created QP/CQ with
> an appropriate SRQ. This will also reduce the lock contention on the single SRQ per device
> (today's solution).
>
> My testing environment included 4 initiators (CX5, CX5, CX4, CX3) that were connected to 4
> subsystems (1 ns per sub) throw 2 ports (each initiator connected to unique subsystem
> backed in a different bull_blk device) using a switch to the NVMEoF target (CX5).
> I used RoCE link layer.
>
> Configuration:
>  - Irqbalancer stopped on each server
>  - set_irq_affinity.sh on each interface
>  - 2 initiators run traffic throw port 1
>  - 2 initiators run traffic throw port 2
>  - On initiator set register_always=N
>  - Fio with 12 jobs, iodepth 128
>
> Memory consumption calculation for recv buffers (target):
>  - Multiple SRQ: SRQ_size * comp_num * ib_devs_num * inline_buffer_size
>  - Single SRQ: SRQ_size * 1 * ib_devs_num * inline_buffer_size
>  - MQ: RQ_size * CPU_num * ctrl_num * inline_buffer_size
>
> Cases:
>  1. Multiple SRQ with 1024 entries:
>     - Mem = 1024 * 24 * 2 * 4k = 192MiB (Constant number - not depend on initiators number)
>  2. Multiple SRQ with 256 entries:
>     - Mem = 256 * 24 * 2 * 4k = 48MiB (Constant number - not depend on initiators number)
>  3. MQ:
>     - Mem = 256 * 24 * 8 * 4k = 192MiB (Mem grows for every new created ctrl)
>  4. Single SRQ (current SRQ implementation):
>     - Mem = 4096 * 1 * 2 * 4k = 32MiB (Constant number - not depend on initiators number)
>
> results:
>
> BS    1.read (target CPU)   2.read (target CPU)    3.read (target CPU)   4.read (target CPU)
> ---  --------------------- --------------------- --------------------- ----------------------
> 1k     5.88M (80%)            5.45M (72%)            6.77M (91%)          2.2M (72%)
>
> 2k     3.56M (65%)            3.45M (59%)            3.72M (64%)          2.12M (59%)
>
> 4k     1.8M (33%)             1.87M (32%)            1.88M (32%)          1.59M (34%)
>
> BS    1.write (target CPU)   2.write (target CPU) 3.write (target CPU)   4.write (target CPU)
> ---  --------------------- --------------------- --------------------- ----------------------
> 1k     5.42M (63%)            5.14M (55%)            7.75M (82%)          2.14M (74%)
>
> 2k     4.15M (56%)            4.14M (51%)            4.16M (52%)          2.08M (73%)
>
> 4k     2.17M (28%)            2.17M (27%)            2.16M (28%)          1.62M (24%)
>
>
> We can see the perf improvement between Case 2 and Case 4 (same order of resource).
> We can see the benefit in resource consumption (mem and CPU) with a small perf loss
> between cases 2 and 3.
> There is still an open question between the perf differance for 1k between Case 1 and
> Case 3, but I guess we can investigate and improve it incrementaly.
>
> Thanks to Idan Burstein and Oren Duer for suggesting this nice feature.
>
> Changes from V1:
>  - Added SRQ pool per protection domain for IB/core
>  - Fixed few comments from Christoph and Sagi
>
> Max Gurtovoy (3):
>   IB/core: add a simple SRQ pool per PD
>   nvmet-rdma: use srq pointer in rdma_cmd
>   nvmet-rdma: use SRQ per completion vector
>
>  drivers/infiniband/core/Makefile   |   2 +-
>  drivers/infiniband/core/srq_pool.c | 106 +++++++++++++++++++++
>  drivers/infiniband/core/verbs.c    |   4 +
>  drivers/nvme/target/rdma.c         | 190 +++++++++++++++++++++++++++----------
>  include/rdma/ib_verbs.h            |   5 +
>  include/rdma/srq_pool.h            |  46 +++++++++
>  6 files changed, 301 insertions(+), 52 deletions(-)
>  create mode 100644 drivers/infiniband/core/srq_pool.c
>  create mode 100644 include/rdma/srq_pool.h
>
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] IB/core: add a simple SRQ pool per PD
  2017-11-16 17:21 ` [PATCH 1/3] IB/core: add a simple SRQ pool per PD Max Gurtovoy
@ 2017-11-16 18:10       ` Leon Romanovsky
  2017-11-16 18:32   ` Sagi Grimberg
  1 sibling, 0 replies; 32+ messages in thread
From: Leon Romanovsky @ 2017-11-16 18:10 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: hch-jcswGhMUV9g, sagi-NQWnxTmZq1alnMjI0IkVqw,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	vladimirk-VPRAkNaXOzVWk0Htik3J/w, RDMA mailing list

On Thu, Nov 16, 2017 at 07:21:23PM +0200, Max Gurtovoy wrote:
> Signed-off-by: Max Gurtovoy <maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
> ---
>  drivers/infiniband/core/Makefile   |   2 +-
>  drivers/infiniband/core/srq_pool.c | 106 +++++++++++++++++++++++++++++++++++++
>  drivers/infiniband/core/verbs.c    |   4 ++
>  include/rdma/ib_verbs.h            |   5 ++
>  include/rdma/srq_pool.h            |  46 ++++++++++++++++
>  5 files changed, 162 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/infiniband/core/srq_pool.c
>  create mode 100644 include/rdma/srq_pool.h
>
> diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
> index b4df164..365da0c 100644
> --- a/drivers/infiniband/core/Makefile
> +++ b/drivers/infiniband/core/Makefile
> @@ -11,7 +11,7 @@ ib_core-y :=			packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
>  				device.o fmr_pool.o cache.o netlink.o \
>  				roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
>  				multicast.o mad.o smi.o agent.o mad_rmpp.o \
> -				security.o nldev.o
> +				security.o nldev.o srq_pool.o
>
>  ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
>  ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
> diff --git a/drivers/infiniband/core/srq_pool.c b/drivers/infiniband/core/srq_pool.c
> new file mode 100644
> index 0000000..4f4a089
> --- /dev/null
> +++ b/drivers/infiniband/core/srq_pool.c
> @@ -0,0 +1,106 @@
> +/*
> + * Copyright (c) 2017 Mellanox Technologies. All rights reserved.
> + *
> + * This software is available to you under a choice of one of two
> + * licenses.  You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + *     Redistribution and use in source and binary forms, with or
> + *     without modification, are permitted provided that the following
> + *     conditions are met:
> + *
> + *      - Redistributions of source code must retain the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer.
> + *
> + *      - Redistributions in binary form must reproduce the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer in the documentation and/or other materials
> + *        provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + *
> + */
> +
> +#include <rdma/srq_pool.h>
> +
> +struct ib_srq *ib_srq_pool_get(struct ib_pd *pd)
> +{
> +	struct ib_srq *srq;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&pd->srq_lock, flags);
> +	srq = list_first_entry_or_null(&pd->srqs, struct ib_srq, pd_entry);
> +	if (srq) {
> +		list_del(&srq->pd_entry);
> +		pd->srqs_used++;
> +	}
> +	spin_unlock_irqrestore(&pd->srq_lock, flags);
> +
> +	return srq;
> +}
> +EXPORT_SYMBOL(ib_srq_pool_get);
> +
> +void ib_srq_pool_put(struct ib_pd *pd, struct ib_srq *srq)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&pd->srq_lock, flags);
> +	list_add(&srq->pd_entry, &pd->srqs);
> +	pd->srqs_used--;
> +	spin_unlock_irqrestore(&pd->srq_lock, flags);
> +}
> +EXPORT_SYMBOL(ib_srq_pool_put);
> +
> +int ib_srq_pool_init(struct ib_pd *pd, int nr,
> +		struct ib_srq_init_attr *srq_attr)
> +{
> +	struct ib_srq *srq;
> +	unsigned long flags;
> +	int ret, i;
> +
> +	for (i = 0; i < nr; i++) {
> +		srq = ib_create_srq(pd, srq_attr);
> +		if (IS_ERR(srq)) {
> +			ret = PTR_ERR(srq);
> +			goto out;
> +		}
> +
> +		spin_lock_irqsave(&pd->srq_lock, flags);
> +		list_add_tail(&srq->pd_entry, &pd->srqs);
> +		spin_unlock_irqrestore(&pd->srq_lock, flags);
> +	}
> +
> +	return 0;
> +out:
> +	ib_srq_pool_destroy(pd);
> +	return ret;
> +}
> +EXPORT_SYMBOL(ib_srq_pool_init);
> +
> +void ib_srq_pool_destroy(struct ib_pd *pd)
> +{
> +	struct ib_srq *srq;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&pd->srq_lock, flags);
> +	while (!list_empty(&pd->srqs)) {
> +		srq = list_first_entry(&pd->srqs, struct ib_srq, pd_entry);
> +		list_del(&srq->pd_entry);
> +
> +		spin_unlock_irqrestore(&pd->srq_lock, flags);
> +		ib_destroy_srq(srq);
> +		spin_lock_irqsave(&pd->srq_lock, flags);
> +	}
> +	spin_unlock_irqrestore(&pd->srq_lock, flags);
> +}
> +EXPORT_SYMBOL(ib_srq_pool_destroy);
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index de57d6c..74db405 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -233,6 +233,9 @@ struct ib_pd *__ib_alloc_pd(struct ib_device *device, unsigned int flags,
>  	pd->__internal_mr = NULL;
>  	atomic_set(&pd->usecnt, 0);
>  	pd->flags = flags;
> +	pd->srqs_used = 0;
> +	spin_lock_init(&pd->srq_lock);
> +	INIT_LIST_HEAD(&pd->srqs);
>
>  	if (device->attrs.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)
>  		pd->local_dma_lkey = device->local_dma_lkey;
> @@ -289,6 +292,7 @@ void ib_dealloc_pd(struct ib_pd *pd)
>  		pd->__internal_mr = NULL;
>  	}
>
> +	WARN_ON_ONCE(pd->srqs_used > 0);
>  	/* uverbs manipulates usecnt with proper locking, while the kabi
>  	   requires the caller to guarantee we can't race here. */
>  	WARN_ON(atomic_read(&pd->usecnt));
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index bdb1279..fdc721f 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1512,6 +1512,10 @@ struct ib_pd {
>
>  	u32			unsafe_global_rkey;
>
> +	spinlock_t		srq_lock;
> +	int			srqs_used;
> +	struct list_head	srqs;
> +
>  	/*
>  	 * Implementation details of the RDMA core, don't use in drivers:
>  	 */
> @@ -1566,6 +1570,7 @@ struct ib_srq {
>  	void		       *srq_context;
>  	enum ib_srq_type	srq_type;
>  	atomic_t		usecnt;
> +	struct list_head	pd_entry; /* srq pool entry */
>
>  	struct {
>  		struct ib_cq   *cq;
> diff --git a/include/rdma/srq_pool.h b/include/rdma/srq_pool.h
> new file mode 100644
> index 0000000..04aa059
> --- /dev/null
> +++ b/include/rdma/srq_pool.h
> @@ -0,0 +1,46 @@
> +/*
> + * Copyright (c) 2017 Mellanox Technologies. All rights reserved.
> + *
> + * This software is available to you under a choice of one of two
> + * licenses.  You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + *     Redistribution and use in source and binary forms, with or
> + *     without modification, are permitted provided that the following
> + *     conditions are met:
> + *
> + *      - Redistributions of source code must retain the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer.
> + *
> + *      - Redistributions in binary form must reproduce the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer in the documentation and/or other materials
> + *        provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + *
> + */
> +
> +#ifndef _RDMA_SRQ_POOL_H
> +#define _RDMA_SRQ_POOL_H 1
> +
> +#include <rdma/ib_verbs.h>
> +
> +struct ib_srq *ib_srq_pool_get(struct ib_pd *pd);
> +void ib_srq_pool_put(struct ib_pd *pd, struct ib_srq *srq);
> +
> +int ib_srq_pool_init(struct ib_pd *pd, int nr,
> +		struct ib_srq_init_attr *srq_attr);
> +void ib_srq_pool_destroy(struct ib_pd *pd);

Can you please use rdma_ prefix instead of ib_ prefix?

Thanks

> +
> +#endif /* _RDMA_SRQ_POOL_H */
> --
> 1.8.3.1
>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/3] IB/core: add a simple SRQ pool per PD
@ 2017-11-16 18:10       ` Leon Romanovsky
  0 siblings, 0 replies; 32+ messages in thread
From: Leon Romanovsky @ 2017-11-16 18:10 UTC (permalink / raw)


On Thu, Nov 16, 2017@07:21:23PM +0200, Max Gurtovoy wrote:
> Signed-off-by: Max Gurtovoy <maxg at mellanox.com>
> ---
>  drivers/infiniband/core/Makefile   |   2 +-
>  drivers/infiniband/core/srq_pool.c | 106 +++++++++++++++++++++++++++++++++++++
>  drivers/infiniband/core/verbs.c    |   4 ++
>  include/rdma/ib_verbs.h            |   5 ++
>  include/rdma/srq_pool.h            |  46 ++++++++++++++++
>  5 files changed, 162 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/infiniband/core/srq_pool.c
>  create mode 100644 include/rdma/srq_pool.h
>
> diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
> index b4df164..365da0c 100644
> --- a/drivers/infiniband/core/Makefile
> +++ b/drivers/infiniband/core/Makefile
> @@ -11,7 +11,7 @@ ib_core-y :=			packer.o ud_header.o verbs.o cq.o rw.o sysfs.o \
>  				device.o fmr_pool.o cache.o netlink.o \
>  				roce_gid_mgmt.o mr_pool.o addr.o sa_query.o \
>  				multicast.o mad.o smi.o agent.o mad_rmpp.o \
> -				security.o nldev.o
> +				security.o nldev.o srq_pool.o
>
>  ib_core-$(CONFIG_INFINIBAND_USER_MEM) += umem.o
>  ib_core-$(CONFIG_INFINIBAND_ON_DEMAND_PAGING) += umem_odp.o umem_rbtree.o
> diff --git a/drivers/infiniband/core/srq_pool.c b/drivers/infiniband/core/srq_pool.c
> new file mode 100644
> index 0000000..4f4a089
> --- /dev/null
> +++ b/drivers/infiniband/core/srq_pool.c
> @@ -0,0 +1,106 @@
> +/*
> + * Copyright (c) 2017 Mellanox Technologies. All rights reserved.
> + *
> + * This software is available to you under a choice of one of two
> + * licenses.  You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + *     Redistribution and use in source and binary forms, with or
> + *     without modification, are permitted provided that the following
> + *     conditions are met:
> + *
> + *      - Redistributions of source code must retain the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer.
> + *
> + *      - Redistributions in binary form must reproduce the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer in the documentation and/or other materials
> + *        provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + *
> + */
> +
> +#include <rdma/srq_pool.h>
> +
> +struct ib_srq *ib_srq_pool_get(struct ib_pd *pd)
> +{
> +	struct ib_srq *srq;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&pd->srq_lock, flags);
> +	srq = list_first_entry_or_null(&pd->srqs, struct ib_srq, pd_entry);
> +	if (srq) {
> +		list_del(&srq->pd_entry);
> +		pd->srqs_used++;
> +	}
> +	spin_unlock_irqrestore(&pd->srq_lock, flags);
> +
> +	return srq;
> +}
> +EXPORT_SYMBOL(ib_srq_pool_get);
> +
> +void ib_srq_pool_put(struct ib_pd *pd, struct ib_srq *srq)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&pd->srq_lock, flags);
> +	list_add(&srq->pd_entry, &pd->srqs);
> +	pd->srqs_used--;
> +	spin_unlock_irqrestore(&pd->srq_lock, flags);
> +}
> +EXPORT_SYMBOL(ib_srq_pool_put);
> +
> +int ib_srq_pool_init(struct ib_pd *pd, int nr,
> +		struct ib_srq_init_attr *srq_attr)
> +{
> +	struct ib_srq *srq;
> +	unsigned long flags;
> +	int ret, i;
> +
> +	for (i = 0; i < nr; i++) {
> +		srq = ib_create_srq(pd, srq_attr);
> +		if (IS_ERR(srq)) {
> +			ret = PTR_ERR(srq);
> +			goto out;
> +		}
> +
> +		spin_lock_irqsave(&pd->srq_lock, flags);
> +		list_add_tail(&srq->pd_entry, &pd->srqs);
> +		spin_unlock_irqrestore(&pd->srq_lock, flags);
> +	}
> +
> +	return 0;
> +out:
> +	ib_srq_pool_destroy(pd);
> +	return ret;
> +}
> +EXPORT_SYMBOL(ib_srq_pool_init);
> +
> +void ib_srq_pool_destroy(struct ib_pd *pd)
> +{
> +	struct ib_srq *srq;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&pd->srq_lock, flags);
> +	while (!list_empty(&pd->srqs)) {
> +		srq = list_first_entry(&pd->srqs, struct ib_srq, pd_entry);
> +		list_del(&srq->pd_entry);
> +
> +		spin_unlock_irqrestore(&pd->srq_lock, flags);
> +		ib_destroy_srq(srq);
> +		spin_lock_irqsave(&pd->srq_lock, flags);
> +	}
> +	spin_unlock_irqrestore(&pd->srq_lock, flags);
> +}
> +EXPORT_SYMBOL(ib_srq_pool_destroy);
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index de57d6c..74db405 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -233,6 +233,9 @@ struct ib_pd *__ib_alloc_pd(struct ib_device *device, unsigned int flags,
>  	pd->__internal_mr = NULL;
>  	atomic_set(&pd->usecnt, 0);
>  	pd->flags = flags;
> +	pd->srqs_used = 0;
> +	spin_lock_init(&pd->srq_lock);
> +	INIT_LIST_HEAD(&pd->srqs);
>
>  	if (device->attrs.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)
>  		pd->local_dma_lkey = device->local_dma_lkey;
> @@ -289,6 +292,7 @@ void ib_dealloc_pd(struct ib_pd *pd)
>  		pd->__internal_mr = NULL;
>  	}
>
> +	WARN_ON_ONCE(pd->srqs_used > 0);
>  	/* uverbs manipulates usecnt with proper locking, while the kabi
>  	   requires the caller to guarantee we can't race here. */
>  	WARN_ON(atomic_read(&pd->usecnt));
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index bdb1279..fdc721f 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1512,6 +1512,10 @@ struct ib_pd {
>
>  	u32			unsafe_global_rkey;
>
> +	spinlock_t		srq_lock;
> +	int			srqs_used;
> +	struct list_head	srqs;
> +
>  	/*
>  	 * Implementation details of the RDMA core, don't use in drivers:
>  	 */
> @@ -1566,6 +1570,7 @@ struct ib_srq {
>  	void		       *srq_context;
>  	enum ib_srq_type	srq_type;
>  	atomic_t		usecnt;
> +	struct list_head	pd_entry; /* srq pool entry */
>
>  	struct {
>  		struct ib_cq   *cq;
> diff --git a/include/rdma/srq_pool.h b/include/rdma/srq_pool.h
> new file mode 100644
> index 0000000..04aa059
> --- /dev/null
> +++ b/include/rdma/srq_pool.h
> @@ -0,0 +1,46 @@
> +/*
> + * Copyright (c) 2017 Mellanox Technologies. All rights reserved.
> + *
> + * This software is available to you under a choice of one of two
> + * licenses.  You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + *     Redistribution and use in source and binary forms, with or
> + *     without modification, are permitted provided that the following
> + *     conditions are met:
> + *
> + *      - Redistributions of source code must retain the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer.
> + *
> + *      - Redistributions in binary form must reproduce the above
> + *        copyright notice, this list of conditions and the following
> + *        disclaimer in the documentation and/or other materials
> + *        provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + *
> + */
> +
> +#ifndef _RDMA_SRQ_POOL_H
> +#define _RDMA_SRQ_POOL_H 1
> +
> +#include <rdma/ib_verbs.h>
> +
> +struct ib_srq *ib_srq_pool_get(struct ib_pd *pd);
> +void ib_srq_pool_put(struct ib_pd *pd, struct ib_srq *srq);
> +
> +int ib_srq_pool_init(struct ib_pd *pd, int nr,
> +		struct ib_srq_init_attr *srq_attr);
> +void ib_srq_pool_destroy(struct ib_pd *pd);

Can you please use rdma_ prefix instead of ib_ prefix?

Thanks

> +
> +#endif /* _RDMA_SRQ_POOL_H */
> --
> 1.8.3.1
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/3] IB/core: add a simple SRQ pool per PD
  2017-11-16 17:21 ` [PATCH 1/3] IB/core: add a simple SRQ pool per PD Max Gurtovoy
       [not found]   ` <1510852885-25519-2-git-send-email-maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-11-16 18:32   ` Sagi Grimberg
  2017-11-17 19:17     ` Max Gurtovoy
  1 sibling, 1 reply; 32+ messages in thread
From: Sagi Grimberg @ 2017-11-16 18:32 UTC (permalink / raw)



> +EXPORT_SYMBOL(ib_srq_pool_init);
> +
> +void ib_srq_pool_destroy(struct ib_pd *pd)
> +{
> +	struct ib_srq *srq;
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&pd->srq_lock, flags);
> +	while (!list_empty(&pd->srqs)) {
> +		srq = list_first_entry(&pd->srqs, struct ib_srq, pd_entry);
> +		list_del(&srq->pd_entry);
> +
> +		spin_unlock_irqrestore(&pd->srq_lock, flags);
> +		ib_destroy_srq(srq);
> +		spin_lock_irqsave(&pd->srq_lock, flags);
> +	}
> +	spin_unlock_irqrestore(&pd->srq_lock, flags);

Why not just splicing the list and iterate sanely?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector
  2017-11-16 17:21 [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector Max Gurtovoy
                   ` (3 preceding siblings ...)
       [not found] ` <1510852885-25519-1-git-send-email-maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
@ 2017-11-16 18:36 ` Sagi Grimberg
  2017-11-17 19:32   ` Max Gurtovoy
  4 siblings, 1 reply; 32+ messages in thread
From: Sagi Grimberg @ 2017-11-16 18:36 UTC (permalink / raw)



> Since there is an active discussion regarding the CQ pool architecture, I decided to push
> this feature (maybe it can be pushed before CQ pool).
> 
> This is a new feature for NVMEoF RDMA target,

Any chance having this for the rest? isert, srpt, svcrdma?

> results:
> 
> BS    1.read (target CPU)   2.read (target CPU)    3.read (target CPU)   4.read (target CPU)
> ---  --------------------- --------------------- --------------------- ----------------------
> 1k     5.88M (80%)            5.45M (72%)            6.77M (91%)          2.2M (72%)
> 
> 2k     3.56M (65%)            3.45M (59%)            3.72M (64%)          2.12M (59%)
> 
> 4k     1.8M (33%)             1.87M (32%)            1.88M (32%)          1.59M (34%)

I'm blown by the fact that there is such a small difference for 4k
reads, how many cpu-cores did you have on the target-system? single
numa-node? Maybe if that is the case we can use less srqs than 
per-completion-vector...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 2/3] nvmet-rdma: use srq pointer in rdma_cmd
  2017-11-16 17:21 ` [PATCH 2/3] nvmet-rdma: use srq pointer in rdma_cmd Max Gurtovoy
  2017-11-16 17:26   ` [Suspected-Phishing][PATCH " Max Gurtovoy
@ 2017-11-16 18:37   ` Sagi Grimberg
  1 sibling, 0 replies; 32+ messages in thread
From: Sagi Grimberg @ 2017-11-16 18:37 UTC (permalink / raw)



> This is a preparetion patch that creates an infrastructure in post
> recv mechanism for the SRQ per completion vector feature.
> 
> Change-Id: I0b0130920580e6999f0517400a45a1078e9553c9

No need for this Change-Id...

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] IB/core: add a simple SRQ pool per PD
  2017-11-16 18:10       ` Leon Romanovsky
@ 2017-11-17 19:12           ` Max Gurtovoy
  -1 siblings, 0 replies; 32+ messages in thread
From: Max Gurtovoy @ 2017-11-17 19:12 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: hch-jcswGhMUV9g, sagi-NQWnxTmZq1alnMjI0IkVqw,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	vladimirk-VPRAkNaXOzVWk0Htik3J/w, RDMA mailing list

>> +#ifndef _RDMA_SRQ_POOL_H
>> +#define _RDMA_SRQ_POOL_H 1
>> +
>> +#include <rdma/ib_verbs.h>
>> +
>> +struct ib_srq *ib_srq_pool_get(struct ib_pd *pd);
>> +void ib_srq_pool_put(struct ib_pd *pd, struct ib_srq *srq);
>> +
>> +int ib_srq_pool_init(struct ib_pd *pd, int nr,
>> +		struct ib_srq_init_attr *srq_attr);
>> +void ib_srq_pool_destroy(struct ib_pd *pd);
> 
> Can you please use rdma_ prefix instead of ib_ prefix?
> 
> Thanks
> 

Do you mean rdma_srq_pool_get instead of ib_srq_pool_get ? is this new 
convetion ? this is a pool that contain ib_srq objects and not rdma_srq 
objects..

I can do it if needed.

>> +
>> +#endif /* _RDMA_SRQ_POOL_H */
>> --
>> 1.8.3.1
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/3] IB/core: add a simple SRQ pool per PD
@ 2017-11-17 19:12           ` Max Gurtovoy
  0 siblings, 0 replies; 32+ messages in thread
From: Max Gurtovoy @ 2017-11-17 19:12 UTC (permalink / raw)


>> +#ifndef _RDMA_SRQ_POOL_H
>> +#define _RDMA_SRQ_POOL_H 1
>> +
>> +#include <rdma/ib_verbs.h>
>> +
>> +struct ib_srq *ib_srq_pool_get(struct ib_pd *pd);
>> +void ib_srq_pool_put(struct ib_pd *pd, struct ib_srq *srq);
>> +
>> +int ib_srq_pool_init(struct ib_pd *pd, int nr,
>> +		struct ib_srq_init_attr *srq_attr);
>> +void ib_srq_pool_destroy(struct ib_pd *pd);
> 
> Can you please use rdma_ prefix instead of ib_ prefix?
> 
> Thanks
> 

Do you mean rdma_srq_pool_get instead of ib_srq_pool_get ? is this new 
convetion ? this is a pool that contain ib_srq objects and not rdma_srq 
objects..

I can do it if needed.

>> +
>> +#endif /* _RDMA_SRQ_POOL_H */
>> --
>> 1.8.3.1
>>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/3] IB/core: add a simple SRQ pool per PD
  2017-11-16 18:32   ` Sagi Grimberg
@ 2017-11-17 19:17     ` Max Gurtovoy
  0 siblings, 0 replies; 32+ messages in thread
From: Max Gurtovoy @ 2017-11-17 19:17 UTC (permalink / raw)




On 11/16/2017 8:32 PM, Sagi Grimberg wrote:
> 
>> +EXPORT_SYMBOL(ib_srq_pool_init);
>> +
>> +void ib_srq_pool_destroy(struct ib_pd *pd)
>> +{
>> +??? struct ib_srq *srq;
>> +??? unsigned long flags;
>> +
>> +??? spin_lock_irqsave(&pd->srq_lock, flags);
>> +??? while (!list_empty(&pd->srqs)) {
>> +??????? srq = list_first_entry(&pd->srqs, struct ib_srq, pd_entry);
>> +??????? list_del(&srq->pd_entry);
>> +
>> +??????? spin_unlock_irqrestore(&pd->srq_lock, flags);
>> +??????? ib_destroy_srq(srq);
>> +??????? spin_lock_irqsave(&pd->srq_lock, flags);
>> +??? }
>> +??? spin_unlock_irqrestore(&pd->srq_lock, flags);
> 
> Why not just splicing the list and iterate sanely?

sure, I'll use list_splice_init(&pd->srqs, &tmp_list)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector
  2017-11-16 18:08     ` Leon Romanovsky
@ 2017-11-17 19:18         ` Max Gurtovoy
  -1 siblings, 0 replies; 32+ messages in thread
From: Max Gurtovoy @ 2017-11-17 19:18 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: hch-jcswGhMUV9g, sagi-NQWnxTmZq1alnMjI0IkVqw,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	vladimirk-VPRAkNaXOzVWk0Htik3J/w, idanb-VPRAkNaXOzVWk0Htik3J/w,
	RDMA mailing list



On 11/16/2017 8:08 PM, Leon Romanovsky wrote:
> On Thu, Nov 16, 2017 at 07:21:22PM +0200, Max Gurtovoy wrote:
>> Since there is an active discussion regarding the CQ pool architecture, I decided to push
>> this feature (maybe it can be pushed before CQ pool).
> 
> Max,
> 
> Thanks for CCing me, can you please repost the series and CC linux-rdma too?

Sure, I'll send V3 and CC linux-rdma too.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector
@ 2017-11-17 19:18         ` Max Gurtovoy
  0 siblings, 0 replies; 32+ messages in thread
From: Max Gurtovoy @ 2017-11-17 19:18 UTC (permalink / raw)




On 11/16/2017 8:08 PM, Leon Romanovsky wrote:
> On Thu, Nov 16, 2017@07:21:22PM +0200, Max Gurtovoy wrote:
>> Since there is an active discussion regarding the CQ pool architecture, I decided to push
>> this feature (maybe it can be pushed before CQ pool).
> 
> Max,
> 
> Thanks for CCing me, can you please repost the series and CC linux-rdma too?

Sure, I'll send V3 and CC linux-rdma too.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector
  2017-11-16 18:36 ` Sagi Grimberg
@ 2017-11-17 19:32   ` Max Gurtovoy
       [not found]     ` <a8e2be0b-deb7-8cc1-92ce-fc5dea3e241a-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
  0 siblings, 1 reply; 32+ messages in thread
From: Max Gurtovoy @ 2017-11-17 19:32 UTC (permalink / raw)




On 11/16/2017 8:36 PM, Sagi Grimberg wrote:
> 
>> Since there is an active discussion regarding the CQ pool 
>> architecture, I decided to push
>> this feature (maybe it can be pushed before CQ pool).
>>
>> This is a new feature for NVMEoF RDMA target,
> 
> Any chance having this for the rest? isert, srpt, svcrdma?
> 

We can implement it for isert, but I think it's better to see how the CQ 
pool will be defined first.
It can bring a big benefit and improvement for ib_srpt (similar to 
NVMEoF target) but I'm not sure if I can commit for that one soon..

>> results:
>>
>> BS??? 1.read (target CPU)?? 2.read (target CPU)??? 3.read (target 
>> CPU)?? 4.read (target CPU)
>> ---? --------------------- --------------------- --------------------- 
>> ----------------------
>> 1k???? 5.88M (80%)??????????? 5.45M (72%)??????????? 6.77M 
>> (91%)????????? 2.2M (72%)
>>
>> 2k???? 3.56M (65%)??????????? 3.45M (59%)??????????? 3.72M 
>> (64%)????????? 2.12M (59%)
>>
>> 4k???? 1.8M (33%)???????????? 1.87M (32%)??????????? 1.88M 
>> (32%)????????? 1.59M (34%)
> 
> I'm blown by the fact that there is such a small difference for 4k
> reads, how many cpu-cores did you have on the target-system? single
> numa-node? Maybe if that is the case we can use less srqs than 
> per-completion-vector...

These results were taken without the performance improvments we sent few 
weeks ago and on different and weaker servers.
Are you suggesting another module param for srq_count per device ? I 
tried to avoid that.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH 1/3] IB/core: add a simple SRQ pool per PD
  2017-11-17 19:12           ` Max Gurtovoy
@ 2017-11-18  7:36               ` Leon Romanovsky
  -1 siblings, 0 replies; 32+ messages in thread
From: Leon Romanovsky @ 2017-11-18  7:36 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: hch-jcswGhMUV9g, sagi-NQWnxTmZq1alnMjI0IkVqw,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	vladimirk-VPRAkNaXOzVWk0Htik3J/w, RDMA mailing list

[-- Attachment #1: Type: text/plain, Size: 1213 bytes --]

On Fri, Nov 17, 2017 at 09:12:46PM +0200, Max Gurtovoy wrote:
> > > +#ifndef _RDMA_SRQ_POOL_H
> > > +#define _RDMA_SRQ_POOL_H 1
> > > +
> > > +#include <rdma/ib_verbs.h>
> > > +
> > > +struct ib_srq *ib_srq_pool_get(struct ib_pd *pd);
> > > +void ib_srq_pool_put(struct ib_pd *pd, struct ib_srq *srq);
> > > +
> > > +int ib_srq_pool_init(struct ib_pd *pd, int nr,
> > > +		struct ib_srq_init_attr *srq_attr);
> > > +void ib_srq_pool_destroy(struct ib_pd *pd);
> >
> > Can you please use rdma_ prefix instead of ib_ prefix?
> >
> > Thanks
> >
>
> Do you mean rdma_srq_pool_get instead of ib_srq_pool_get ? is this new
> convetion ? this is a pool that contain ib_srq objects and not rdma_srq
> objects..

I want to follow similar pattern as was chosen for various popular pools
implementation, like dma_pool_, mempool e.t.c

IMHO rdma_srqpool_* looks cleaner than it is now.

Thanks

>
> I can do it if needed.

>
> > > +
> > > +#endif /* _RDMA_SRQ_POOL_H */
> > > --
> > > 1.8.3.1
> > >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH 1/3] IB/core: add a simple SRQ pool per PD
@ 2017-11-18  7:36               ` Leon Romanovsky
  0 siblings, 0 replies; 32+ messages in thread
From: Leon Romanovsky @ 2017-11-18  7:36 UTC (permalink / raw)


On Fri, Nov 17, 2017@09:12:46PM +0200, Max Gurtovoy wrote:
> > > +#ifndef _RDMA_SRQ_POOL_H
> > > +#define _RDMA_SRQ_POOL_H 1
> > > +
> > > +#include <rdma/ib_verbs.h>
> > > +
> > > +struct ib_srq *ib_srq_pool_get(struct ib_pd *pd);
> > > +void ib_srq_pool_put(struct ib_pd *pd, struct ib_srq *srq);
> > > +
> > > +int ib_srq_pool_init(struct ib_pd *pd, int nr,
> > > +		struct ib_srq_init_attr *srq_attr);
> > > +void ib_srq_pool_destroy(struct ib_pd *pd);
> >
> > Can you please use rdma_ prefix instead of ib_ prefix?
> >
> > Thanks
> >
>
> Do you mean rdma_srq_pool_get instead of ib_srq_pool_get ? is this new
> convetion ? this is a pool that contain ib_srq objects and not rdma_srq
> objects..

I want to follow similar pattern as was chosen for various popular pools
implementation, like dma_pool_, mempool e.t.c

IMHO rdma_srqpool_* looks cleaner than it is now.

Thanks

>
> I can do it if needed.

>
> > > +
> > > +#endif /* _RDMA_SRQ_POOL_H */
> > > --
> > > 1.8.3.1
> > >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20171118/298fd8de/attachment.sig>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector
  2017-11-17 19:32   ` Max Gurtovoy
@ 2017-11-18 12:52         ` Leon Romanovsky
  0 siblings, 0 replies; 32+ messages in thread
From: Leon Romanovsky @ 2017-11-18 12:52 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Sagi Grimberg, hch-jcswGhMUV9g,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	vladimirk-VPRAkNaXOzVWk0Htik3J/w, idanb-VPRAkNaXOzVWk0Htik3J/w,
	RDMA mailing list

On Fri, Nov 17, 2017 at 09:32:42PM +0200, Max Gurtovoy wrote:
>
>
> On 11/16/2017 8:36 PM, Sagi Grimberg wrote:
> >
> > > Since there is an active discussion regarding the CQ pool
> > > architecture, I decided to push
> > > this feature (maybe it can be pushed before CQ pool).
> > >
> > > This is a new feature for NVMEoF RDMA target,
> >
> > Any chance having this for the rest? isert, srpt, svcrdma?
> >
>
> We can implement it for isert, but I think it's better to see how the CQ
> pool will be defined first.
> It can bring a big benefit and improvement for ib_srpt (similar to NVMEoF
> target) but I'm not sure if I can commit for that one soon..

Too bad, but I don't see inclusion of generic SRQ pool code in RDMA
subsystem without actual conversion of existing ULP clients.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector
@ 2017-11-18 12:52         ` Leon Romanovsky
  0 siblings, 0 replies; 32+ messages in thread
From: Leon Romanovsky @ 2017-11-18 12:52 UTC (permalink / raw)


On Fri, Nov 17, 2017@09:32:42PM +0200, Max Gurtovoy wrote:
>
>
> On 11/16/2017 8:36 PM, Sagi Grimberg wrote:
> >
> > > Since there is an active discussion regarding the CQ pool
> > > architecture, I decided to push
> > > this feature (maybe it can be pushed before CQ pool).
> > >
> > > This is a new feature for NVMEoF RDMA target,
> >
> > Any chance having this for the rest? isert, srpt, svcrdma?
> >
>
> We can implement it for isert, but I think it's better to see how the CQ
> pool will be defined first.
> It can bring a big benefit and improvement for ib_srpt (similar to NVMEoF
> target) but I'm not sure if I can commit for that one soon..

Too bad, but I don't see inclusion of generic SRQ pool code in RDMA
subsystem without actual conversion of existing ULP clients.

Thanks

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector
  2017-11-18 12:52         ` Leon Romanovsky
@ 2017-11-18 13:57             ` Max Gurtovoy
  -1 siblings, 0 replies; 32+ messages in thread
From: Max Gurtovoy @ 2017-11-18 13:57 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Sagi Grimberg, hch-jcswGhMUV9g,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	vladimirk-VPRAkNaXOzVWk0Htik3J/w, idanb-VPRAkNaXOzVWk0Htik3J/w,
	RDMA mailing list



On 11/18/2017 2:52 PM, Leon Romanovsky wrote:
> On Fri, Nov 17, 2017 at 09:32:42PM +0200, Max Gurtovoy wrote:
>>
>>
>> On 11/16/2017 8:36 PM, Sagi Grimberg wrote:
>>>
>>>> Since there is an active discussion regarding the CQ pool
>>>> architecture, I decided to push
>>>> this feature (maybe it can be pushed before CQ pool).
>>>>
>>>> This is a new feature for NVMEoF RDMA target,
>>>
>>> Any chance having this for the rest? isert, srpt, svcrdma?
>>>
>>
>> We can implement it for isert, but I think it's better to see how the CQ
>> pool will be defined first.
>> It can bring a big benefit and improvement for ib_srpt (similar to NVMEoF
>> target) but I'm not sure if I can commit for that one soon..
> 
> Too bad, but I don't see inclusion of generic SRQ pool code in RDMA
> subsystem without actual conversion of existing ULP clients.
> 
> Thanks
> 

This patchset adds this feature to NVMEoF target so actually there are 
ULPs that use it. Same issue we have with mr_pool that only RDMA rw.c 
use it (Now we're adding it to NVMEoF initiators too - in review).
I can add srq_pool to iSER target code but I don't want to re-write it 
again in few weeks, when the CQ pool will be added.
Regarding other ULPs, we don't have a testing environment for them so I 
prefer not to commit on their implementation in the near future.

I don't know why we can't add this feature "as is".
Other ULPs maintainers might use it once it will be pushed.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector
@ 2017-11-18 13:57             ` Max Gurtovoy
  0 siblings, 0 replies; 32+ messages in thread
From: Max Gurtovoy @ 2017-11-18 13:57 UTC (permalink / raw)




On 11/18/2017 2:52 PM, Leon Romanovsky wrote:
> On Fri, Nov 17, 2017@09:32:42PM +0200, Max Gurtovoy wrote:
>>
>>
>> On 11/16/2017 8:36 PM, Sagi Grimberg wrote:
>>>
>>>> Since there is an active discussion regarding the CQ pool
>>>> architecture, I decided to push
>>>> this feature (maybe it can be pushed before CQ pool).
>>>>
>>>> This is a new feature for NVMEoF RDMA target,
>>>
>>> Any chance having this for the rest? isert, srpt, svcrdma?
>>>
>>
>> We can implement it for isert, but I think it's better to see how the CQ
>> pool will be defined first.
>> It can bring a big benefit and improvement for ib_srpt (similar to NVMEoF
>> target) but I'm not sure if I can commit for that one soon..
> 
> Too bad, but I don't see inclusion of generic SRQ pool code in RDMA
> subsystem without actual conversion of existing ULP clients.
> 
> Thanks
> 

This patchset adds this feature to NVMEoF target so actually there are 
ULPs that use it. Same issue we have with mr_pool that only RDMA rw.c 
use it (Now we're adding it to NVMEoF initiators too - in review).
I can add srq_pool to iSER target code but I don't want to re-write it 
again in few weeks, when the CQ pool will be added.
Regarding other ULPs, we don't have a testing environment for them so I 
prefer not to commit on their implementation in the near future.

I don't know why we can't add this feature "as is".
Other ULPs maintainers might use it once it will be pushed.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector
  2017-11-18 13:57             ` Max Gurtovoy
@ 2017-11-18 14:40                 ` Leon Romanovsky
  -1 siblings, 0 replies; 32+ messages in thread
From: Leon Romanovsky @ 2017-11-18 14:40 UTC (permalink / raw)
  To: Max Gurtovoy
  Cc: Sagi Grimberg, hch-jcswGhMUV9g,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	vladimirk-VPRAkNaXOzVWk0Htik3J/w, idanb-VPRAkNaXOzVWk0Htik3J/w,
	RDMA mailing list

[-- Attachment #1: Type: text/plain, Size: 2662 bytes --]

On Sat, Nov 18, 2017 at 03:57:15PM +0200, Max Gurtovoy wrote:
>
>
> On 11/18/2017 2:52 PM, Leon Romanovsky wrote:
> > On Fri, Nov 17, 2017 at 09:32:42PM +0200, Max Gurtovoy wrote:
> > >
> > >
> > > On 11/16/2017 8:36 PM, Sagi Grimberg wrote:
> > > >
> > > > > Since there is an active discussion regarding the CQ pool
> > > > > architecture, I decided to push
> > > > > this feature (maybe it can be pushed before CQ pool).
> > > > >
> > > > > This is a new feature for NVMEoF RDMA target,
> > > >
> > > > Any chance having this for the rest? isert, srpt, svcrdma?
> > > >
> > >
> > > We can implement it for isert, but I think it's better to see how the CQ
> > > pool will be defined first.
> > > It can bring a big benefit and improvement for ib_srpt (similar to NVMEoF
> > > target) but I'm not sure if I can commit for that one soon..
> >
> > Too bad, but I don't see inclusion of generic SRQ pool code in RDMA
> > subsystem without actual conversion of existing ULP clients.
> >
> > Thanks
> >
>
> This patchset adds this feature to NVMEoF target so actually there are ULPs
> that use it. Same issue we have with mr_pool that only RDMA rw.c use it (Now
> we're adding it to NVMEoF initiators too - in review).

The difference between your code and mr_pool is that mr_pool is part of
RDMA/core and in use by RDMA/core (rw.c), which in use by all ULPs.

However if you insist, we can remove EXPORT_SYMBOL from mr_pool
implementation, because of being part of RDMA/core and it blows
symbols map without need. Should I?

In your case, you are proposing generic interface, which supposed to be
good fit for all ULPs but without those ULPs.

> I can add srq_pool to iSER target code but I don't want to re-write it again
> in few weeks, when the CQ pool will be added.

So, please finalize interface in RFC stage and once you are ready, proceed to
the actual patches.

> Regarding other ULPs, we don't have a testing environment for them so I
> prefer not to commit on their implementation in the near future.

You are not expected to have all testing environment, it is their (ULPs
maintainers) responsibility to test your conversion, because you are
doing conversion to generic interface.

>
> I don't know why we can't add this feature "as is".
> Other ULPs maintainers might use it once it will be pushed.

Sorry, but it is not how kernel development process works.
"You propose -> you do" and not "You propose -> they do".

Thanks

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector
@ 2017-11-18 14:40                 ` Leon Romanovsky
  0 siblings, 0 replies; 32+ messages in thread
From: Leon Romanovsky @ 2017-11-18 14:40 UTC (permalink / raw)


On Sat, Nov 18, 2017@03:57:15PM +0200, Max Gurtovoy wrote:
>
>
> On 11/18/2017 2:52 PM, Leon Romanovsky wrote:
> > On Fri, Nov 17, 2017@09:32:42PM +0200, Max Gurtovoy wrote:
> > >
> > >
> > > On 11/16/2017 8:36 PM, Sagi Grimberg wrote:
> > > >
> > > > > Since there is an active discussion regarding the CQ pool
> > > > > architecture, I decided to push
> > > > > this feature (maybe it can be pushed before CQ pool).
> > > > >
> > > > > This is a new feature for NVMEoF RDMA target,
> > > >
> > > > Any chance having this for the rest? isert, srpt, svcrdma?
> > > >
> > >
> > > We can implement it for isert, but I think it's better to see how the CQ
> > > pool will be defined first.
> > > It can bring a big benefit and improvement for ib_srpt (similar to NVMEoF
> > > target) but I'm not sure if I can commit for that one soon..
> >
> > Too bad, but I don't see inclusion of generic SRQ pool code in RDMA
> > subsystem without actual conversion of existing ULP clients.
> >
> > Thanks
> >
>
> This patchset adds this feature to NVMEoF target so actually there are ULPs
> that use it. Same issue we have with mr_pool that only RDMA rw.c use it (Now
> we're adding it to NVMEoF initiators too - in review).

The difference between your code and mr_pool is that mr_pool is part of
RDMA/core and in use by RDMA/core (rw.c), which in use by all ULPs.

However if you insist, we can remove EXPORT_SYMBOL from mr_pool
implementation, because of being part of RDMA/core and it blows
symbols map without need. Should I?

In your case, you are proposing generic interface, which supposed to be
good fit for all ULPs but without those ULPs.

> I can add srq_pool to iSER target code but I don't want to re-write it again
> in few weeks, when the CQ pool will be added.

So, please finalize interface in RFC stage and once you are ready, proceed to
the actual patches.

> Regarding other ULPs, we don't have a testing environment for them so I
> prefer not to commit on their implementation in the near future.

You are not expected to have all testing environment, it is their (ULPs
maintainers) responsibility to test your conversion, because you are
doing conversion to generic interface.

>
> I don't know why we can't add this feature "as is".
> Other ULPs maintainers might use it once it will be pushed.

Sorry, but it is not how kernel development process works.
"You propose -> you do" and not "You propose -> they do".

Thanks

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20171118/389f2052/attachment.sig>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector
  2017-11-18 14:40                 ` Leon Romanovsky
@ 2017-11-18 21:29                     ` Max Gurtovoy
  -1 siblings, 0 replies; 32+ messages in thread
From: Max Gurtovoy @ 2017-11-18 21:29 UTC (permalink / raw)
  To: Leon Romanovsky
  Cc: Sagi Grimberg, hch-jcswGhMUV9g,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	vladimirk-VPRAkNaXOzVWk0Htik3J/w, idanb-VPRAkNaXOzVWk0Htik3J/w,
	RDMA mailing list



On 11/18/2017 4:40 PM, Leon Romanovsky wrote:
> On Sat, Nov 18, 2017 at 03:57:15PM +0200, Max Gurtovoy wrote:
>>
>>
>> On 11/18/2017 2:52 PM, Leon Romanovsky wrote:
>>> On Fri, Nov 17, 2017 at 09:32:42PM +0200, Max Gurtovoy wrote:
>>>>
>>>>
>>>> On 11/16/2017 8:36 PM, Sagi Grimberg wrote:
>>>>>
>>>>>> Since there is an active discussion regarding the CQ pool
>>>>>> architecture, I decided to push
>>>>>> this feature (maybe it can be pushed before CQ pool).
>>>>>>
>>>>>> This is a new feature for NVMEoF RDMA target,
>>>>>
>>>>> Any chance having this for the rest? isert, srpt, svcrdma?
>>>>>
>>>>
>>>> We can implement it for isert, but I think it's better to see how the CQ
>>>> pool will be defined first.
>>>> It can bring a big benefit and improvement for ib_srpt (similar to NVMEoF
>>>> target) but I'm not sure if I can commit for that one soon..
>>>
>>> Too bad, but I don't see inclusion of generic SRQ pool code in RDMA
>>> subsystem without actual conversion of existing ULP clients.
>>>
>>> Thanks
>>>
>>
>> This patchset adds this feature to NVMEoF target so actually there are ULPs
>> that use it. Same issue we have with mr_pool that only RDMA rw.c use it (Now
>> we're adding it to NVMEoF initiators too - in review).
> 
> The difference between your code and mr_pool is that mr_pool is part of
> RDMA/core and in use by RDMA/core (rw.c), which in use by all ULPs.
> 
> However if you insist, we can remove EXPORT_SYMBOL from mr_pool
> implementation, because of being part of RDMA/core and it blows
> symbols map without need. Should I?

No, we'll use it in NVMEoF host as I mentioned earlier.
> 
> In your case, you are proposing generic interface, which supposed to be
> good fit for all ULPs but without those ULPs.
> 
>> I can add srq_pool to iSER target code but I don't want to re-write it again
>> in few weeks, when the CQ pool will be added.
> 
> So, please finalize interface in RFC stage and once you are ready, proceed to
> the actual patches.
> 
>> Regarding other ULPs, we don't have a testing environment for them so I
>> prefer not to commit on their implementation in the near future.
> 
> You are not expected to have all testing environment, it is their (ULPs
> maintainers) responsibility to test your conversion, because you are
> doing conversion to generic interface.
> 
>>
>> I don't know why we can't add this feature "as is".
>> Other ULPs maintainers might use it once it will be pushed.
> 
> Sorry, but it is not how kernel development process works.
> "You propose -> you do" and not "You propose -> they do".

I'm not changing an interface here. So all the other ULPs that use SRQ 
(ipoib and srpt) currently will cuntinue using it.
I don't know why this patchset brought up the idea to add SRQ pools to 
isert/svcrdma/etc.., but knowing that there are patches (under 
discussions) that will have a big enfluance on these drivers (at least 
isert), it doesn't make sence to implement *new* feature (SRQ usage) and 
chage it a week afterwards.

I will send V3 in a few days with some fixes that I got, so it would be 
nice to have a more comments on the code (I don't see a problem with the 
kernel development process in this patchset).


> 
> Thanks
> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector
@ 2017-11-18 21:29                     ` Max Gurtovoy
  0 siblings, 0 replies; 32+ messages in thread
From: Max Gurtovoy @ 2017-11-18 21:29 UTC (permalink / raw)




On 11/18/2017 4:40 PM, Leon Romanovsky wrote:
> On Sat, Nov 18, 2017@03:57:15PM +0200, Max Gurtovoy wrote:
>>
>>
>> On 11/18/2017 2:52 PM, Leon Romanovsky wrote:
>>> On Fri, Nov 17, 2017@09:32:42PM +0200, Max Gurtovoy wrote:
>>>>
>>>>
>>>> On 11/16/2017 8:36 PM, Sagi Grimberg wrote:
>>>>>
>>>>>> Since there is an active discussion regarding the CQ pool
>>>>>> architecture, I decided to push
>>>>>> this feature (maybe it can be pushed before CQ pool).
>>>>>>
>>>>>> This is a new feature for NVMEoF RDMA target,
>>>>>
>>>>> Any chance having this for the rest? isert, srpt, svcrdma?
>>>>>
>>>>
>>>> We can implement it for isert, but I think it's better to see how the CQ
>>>> pool will be defined first.
>>>> It can bring a big benefit and improvement for ib_srpt (similar to NVMEoF
>>>> target) but I'm not sure if I can commit for that one soon..
>>>
>>> Too bad, but I don't see inclusion of generic SRQ pool code in RDMA
>>> subsystem without actual conversion of existing ULP clients.
>>>
>>> Thanks
>>>
>>
>> This patchset adds this feature to NVMEoF target so actually there are ULPs
>> that use it. Same issue we have with mr_pool that only RDMA rw.c use it (Now
>> we're adding it to NVMEoF initiators too - in review).
> 
> The difference between your code and mr_pool is that mr_pool is part of
> RDMA/core and in use by RDMA/core (rw.c), which in use by all ULPs.
> 
> However if you insist, we can remove EXPORT_SYMBOL from mr_pool
> implementation, because of being part of RDMA/core and it blows
> symbols map without need. Should I?

No, we'll use it in NVMEoF host as I mentioned earlier.
> 
> In your case, you are proposing generic interface, which supposed to be
> good fit for all ULPs but without those ULPs.
> 
>> I can add srq_pool to iSER target code but I don't want to re-write it again
>> in few weeks, when the CQ pool will be added.
> 
> So, please finalize interface in RFC stage and once you are ready, proceed to
> the actual patches.
> 
>> Regarding other ULPs, we don't have a testing environment for them so I
>> prefer not to commit on their implementation in the near future.
> 
> You are not expected to have all testing environment, it is their (ULPs
> maintainers) responsibility to test your conversion, because you are
> doing conversion to generic interface.
> 
>>
>> I don't know why we can't add this feature "as is".
>> Other ULPs maintainers might use it once it will be pushed.
> 
> Sorry, but it is not how kernel development process works.
> "You propose -> you do" and not "You propose -> they do".

I'm not changing an interface here. So all the other ULPs that use SRQ 
(ipoib and srpt) currently will cuntinue using it.
I don't know why this patchset brought up the idea to add SRQ pools to 
isert/svcrdma/etc.., but knowing that there are patches (under 
discussions) that will have a big enfluance on these drivers (at least 
isert), it doesn't make sence to implement *new* feature (SRQ usage) and 
chage it a week afterwards.

I will send V3 in a few days with some fixes that I got, so it would be 
nice to have a more comments on the code (I don't see a problem with the 
kernel development process in this patchset).


> 
> Thanks
> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo at vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector
  2017-11-18 21:29                     ` Max Gurtovoy
@ 2017-11-20 11:00                         ` Sagi Grimberg
  -1 siblings, 0 replies; 32+ messages in thread
From: Sagi Grimberg @ 2017-11-20 11:00 UTC (permalink / raw)
  To: Max Gurtovoy, Leon Romanovsky
  Cc: hch-jcswGhMUV9g, linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	vladimirk-VPRAkNaXOzVWk0Htik3J/w, idanb-VPRAkNaXOzVWk0Htik3J/w,
	RDMA mailing list


>>>>> We can implement it for isert, but I think it's better to see how 
>>>>> the CQ
>>>>> pool will be defined first.
>>>>> It can bring a big benefit and improvement for ib_srpt (similar to 
>>>>> NVMEoF
>>>>> target) but I'm not sure if I can commit for that one soon..
>>>>
>>>> Too bad, but I don't see inclusion of generic SRQ pool code in RDMA
>>>> subsystem without actual conversion of existing ULP clients.
>>>>
>>>> Thanks
>>>>
>>>
>>> This patchset adds this feature to NVMEoF target so actually there 
>>> are ULPs
>>> that use it. Same issue we have with mr_pool that only RDMA rw.c use 
>>> it (Now
>>> we're adding it to NVMEoF initiators too - in review).
>>
>> The difference between your code and mr_pool is that mr_pool is part of
>> RDMA/core and in use by RDMA/core (rw.c), which in use by all ULPs.
>>
>> However if you insist, we can remove EXPORT_SYMBOL from mr_pool
>> implementation, because of being part of RDMA/core and it blows
>> symbols map without need. Should I?
> 
> No, we'll use it in NVMEoF host as I mentioned earlier.
>>
>> In your case, you are proposing generic interface, which supposed to be
>> good fit for all ULPs but without those ULPs.
>>
>>> I can add srq_pool to iSER target code but I don't want to re-write 
>>> it again
>>> in few weeks, when the CQ pool will be added.
>>
>> So, please finalize interface in RFC stage and once you are ready, 
>> proceed to
>> the actual patches.
>>
>>> Regarding other ULPs, we don't have a testing environment for them so I
>>> prefer not to commit on their implementation in the near future.
>>
>> You are not expected to have all testing environment, it is their (ULPs
>> maintainers) responsibility to test your conversion, because you are
>> doing conversion to generic interface.
>>
>>>
>>> I don't know why we can't add this feature "as is".
>>> Other ULPs maintainers might use it once it will be pushed.
>>
>> Sorry, but it is not how kernel development process works.
>> "You propose -> you do" and not "You propose -> they do".
> 
> I'm not changing an interface here. So all the other ULPs that use SRQ 
> (ipoib and srpt) currently will cuntinue using it.
> I don't know why this patchset brought up the idea to add SRQ pools to 
> isert/svcrdma/etc.., but knowing that there are patches (under 
> discussions) that will have a big enfluance on these drivers (at least 
> isert), it doesn't make sence to implement *new* feature (SRQ usage) and 
> chage it a week afterwards.

I'm almost sorry I asked :)

Max,

Leon's request while adding more work for you is valid as I see it. Leon
and Doug (like other kernel maintainers and others in our community) are
interested in improving the RDMA core subsystem in the sense of offering
useful interfaces and having the consumers implement as less as
possible. Making useful features/interfaces (like in your case)
available for (and adopted by) most of the common consumers is helping
the community and the subsystem as a whole rather than helping the
specific module we happen to be focused on at that specific time. The
long term goal is to make the consumers do as much as possible with
implementing as less as possible.

Having said that, if it was up to me, I wouldn't say its a hard
requirement but definitely encouraged (I try to do it for the core
interfaces I happen to offer and I know others have too).
I think that Doug and others should really decide on the direction here.

What do others think?
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector
@ 2017-11-20 11:00                         ` Sagi Grimberg
  0 siblings, 0 replies; 32+ messages in thread
From: Sagi Grimberg @ 2017-11-20 11:00 UTC (permalink / raw)



>>>>> We can implement it for isert, but I think it's better to see how 
>>>>> the CQ
>>>>> pool will be defined first.
>>>>> It can bring a big benefit and improvement for ib_srpt (similar to 
>>>>> NVMEoF
>>>>> target) but I'm not sure if I can commit for that one soon..
>>>>
>>>> Too bad, but I don't see inclusion of generic SRQ pool code in RDMA
>>>> subsystem without actual conversion of existing ULP clients.
>>>>
>>>> Thanks
>>>>
>>>
>>> This patchset adds this feature to NVMEoF target so actually there 
>>> are ULPs
>>> that use it. Same issue we have with mr_pool that only RDMA rw.c use 
>>> it (Now
>>> we're adding it to NVMEoF initiators too - in review).
>>
>> The difference between your code and mr_pool is that mr_pool is part of
>> RDMA/core and in use by RDMA/core (rw.c), which in use by all ULPs.
>>
>> However if you insist, we can remove EXPORT_SYMBOL from mr_pool
>> implementation, because of being part of RDMA/core and it blows
>> symbols map without need. Should I?
> 
> No, we'll use it in NVMEoF host as I mentioned earlier.
>>
>> In your case, you are proposing generic interface, which supposed to be
>> good fit for all ULPs but without those ULPs.
>>
>>> I can add srq_pool to iSER target code but I don't want to re-write 
>>> it again
>>> in few weeks, when the CQ pool will be added.
>>
>> So, please finalize interface in RFC stage and once you are ready, 
>> proceed to
>> the actual patches.
>>
>>> Regarding other ULPs, we don't have a testing environment for them so I
>>> prefer not to commit on their implementation in the near future.
>>
>> You are not expected to have all testing environment, it is their (ULPs
>> maintainers) responsibility to test your conversion, because you are
>> doing conversion to generic interface.
>>
>>>
>>> I don't know why we can't add this feature "as is".
>>> Other ULPs maintainers might use it once it will be pushed.
>>
>> Sorry, but it is not how kernel development process works.
>> "You propose -> you do" and not "You propose -> they do".
> 
> I'm not changing an interface here. So all the other ULPs that use SRQ 
> (ipoib and srpt) currently will cuntinue using it.
> I don't know why this patchset brought up the idea to add SRQ pools to 
> isert/svcrdma/etc.., but knowing that there are patches (under 
> discussions) that will have a big enfluance on these drivers (at least 
> isert), it doesn't make sence to implement *new* feature (SRQ usage) and 
> chage it a week afterwards.

I'm almost sorry I asked :)

Max,

Leon's request while adding more work for you is valid as I see it. Leon
and Doug (like other kernel maintainers and others in our community) are
interested in improving the RDMA core subsystem in the sense of offering
useful interfaces and having the consumers implement as less as
possible. Making useful features/interfaces (like in your case)
available for (and adopted by) most of the common consumers is helping
the community and the subsystem as a whole rather than helping the
specific module we happen to be focused on at that specific time. The
long term goal is to make the consumers do as much as possible with
implementing as less as possible.

Having said that, if it was up to me, I wouldn't say its a hard
requirement but definitely encouraged (I try to do it for the core
interfaces I happen to offer and I know others have too).
I think that Doug and others should really decide on the direction here.

What do others think?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector
  2017-11-20 11:00                         ` Sagi Grimberg
@ 2017-11-20 11:34                             ` Leon Romanovsky
  -1 siblings, 0 replies; 32+ messages in thread
From: Leon Romanovsky @ 2017-11-20 11:34 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Max Gurtovoy, hch-jcswGhMUV9g,
	linux-nvme-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	vladimirk-VPRAkNaXOzVWk0Htik3J/w, idanb-VPRAkNaXOzVWk0Htik3J/w,
	RDMA mailing list

[-- Attachment #1: Type: text/plain, Size: 4755 bytes --]

On Mon, Nov 20, 2017 at 01:00:28PM +0200, Sagi Grimberg wrote:
>
> > > > > > We can implement it for isert, but I think it's better
> > > > > > to see how the CQ
> > > > > > pool will be defined first.
> > > > > > It can bring a big benefit and improvement for ib_srpt
> > > > > > (similar to NVMEoF
> > > > > > target) but I'm not sure if I can commit for that one soon..
> > > > >
> > > > > Too bad, but I don't see inclusion of generic SRQ pool code in RDMA
> > > > > subsystem without actual conversion of existing ULP clients.
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > > > This patchset adds this feature to NVMEoF target so actually
> > > > there are ULPs
> > > > that use it. Same issue we have with mr_pool that only RDMA rw.c
> > > > use it (Now
> > > > we're adding it to NVMEoF initiators too - in review).
> > >
> > > The difference between your code and mr_pool is that mr_pool is part of
> > > RDMA/core and in use by RDMA/core (rw.c), which in use by all ULPs.
> > >
> > > However if you insist, we can remove EXPORT_SYMBOL from mr_pool
> > > implementation, because of being part of RDMA/core and it blows
> > > symbols map without need. Should I?
> >
> > No, we'll use it in NVMEoF host as I mentioned earlier.
> > >
> > > In your case, you are proposing generic interface, which supposed to be
> > > good fit for all ULPs but without those ULPs.
> > >
> > > > I can add srq_pool to iSER target code but I don't want to
> > > > re-write it again
> > > > in few weeks, when the CQ pool will be added.
> > >
> > > So, please finalize interface in RFC stage and once you are ready,
> > > proceed to
> > > the actual patches.
> > >
> > > > Regarding other ULPs, we don't have a testing environment for them so I
> > > > prefer not to commit on their implementation in the near future.
> > >
> > > You are not expected to have all testing environment, it is their (ULPs
> > > maintainers) responsibility to test your conversion, because you are
> > > doing conversion to generic interface.
> > >
> > > >
> > > > I don't know why we can't add this feature "as is".
> > > > Other ULPs maintainers might use it once it will be pushed.
> > >
> > > Sorry, but it is not how kernel development process works.
> > > "You propose -> you do" and not "You propose -> they do".
> >
> > I'm not changing an interface here. So all the other ULPs that use SRQ
> > (ipoib and srpt) currently will cuntinue using it.
> > I don't know why this patchset brought up the idea to add SRQ pools to
> > isert/svcrdma/etc.., but knowing that there are patches (under
> > discussions) that will have a big enfluance on these drivers (at least
> > isert), it doesn't make sence to implement *new* feature (SRQ usage) and
> > chage it a week afterwards.
>
> I'm almost sorry I asked :)
>

There is nothing to sorry about it. It was just a matter of time when we
would see it.

The reason why I'm so direct in this case, is related to the fact that
resource pools are really beneficial when they are used for create/destroy
of resources in (massive) dynamic operations.

In the proposed code (nvme-rdma), it is not the case and resources (SRQ)
are statically allocated, hence it makes no sense to me to propose new interface
without seeing any benefits from it.

This is why I'm asking from Max or implement real users of this interface or
don't provide that interface at all.

> Max,
>
> Leon's request while adding more work for you is valid as I see it. Leon
> and Doug (like other kernel maintainers and others in our community) are
> interested in improving the RDMA core subsystem in the sense of offering
> useful interfaces and having the consumers implement as less as
> possible. Making useful features/interfaces (like in your case)
> available for (and adopted by) most of the common consumers is helping
> the community and the subsystem as a whole rather than helping the
> specific module we happen to be focused on at that specific time. The
> long term goal is to make the consumers do as much as possible with
> implementing as less as possible.
>
> Having said that, if it was up to me, I wouldn't say its a hard
> requirement but definitely encouraged (I try to do it for the core
> interfaces I happen to offer and I know others have too).
> I think that Doug and others should really decide on the direction here.
>
> What do others think?

Right, I'm not alone here.

I already said my position in this mailing list, but will repeat it:
core work is must to move forward and to make this community better.

Thanks

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector
@ 2017-11-20 11:34                             ` Leon Romanovsky
  0 siblings, 0 replies; 32+ messages in thread
From: Leon Romanovsky @ 2017-11-20 11:34 UTC (permalink / raw)


On Mon, Nov 20, 2017@01:00:28PM +0200, Sagi Grimberg wrote:
>
> > > > > > We can implement it for isert, but I think it's better
> > > > > > to see how the CQ
> > > > > > pool will be defined first.
> > > > > > It can bring a big benefit and improvement for ib_srpt
> > > > > > (similar to NVMEoF
> > > > > > target) but I'm not sure if I can commit for that one soon..
> > > > >
> > > > > Too bad, but I don't see inclusion of generic SRQ pool code in RDMA
> > > > > subsystem without actual conversion of existing ULP clients.
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > > > This patchset adds this feature to NVMEoF target so actually
> > > > there are ULPs
> > > > that use it. Same issue we have with mr_pool that only RDMA rw.c
> > > > use it (Now
> > > > we're adding it to NVMEoF initiators too - in review).
> > >
> > > The difference between your code and mr_pool is that mr_pool is part of
> > > RDMA/core and in use by RDMA/core (rw.c), which in use by all ULPs.
> > >
> > > However if you insist, we can remove EXPORT_SYMBOL from mr_pool
> > > implementation, because of being part of RDMA/core and it blows
> > > symbols map without need. Should I?
> >
> > No, we'll use it in NVMEoF host as I mentioned earlier.
> > >
> > > In your case, you are proposing generic interface, which supposed to be
> > > good fit for all ULPs but without those ULPs.
> > >
> > > > I can add srq_pool to iSER target code but I don't want to
> > > > re-write it again
> > > > in few weeks, when the CQ pool will be added.
> > >
> > > So, please finalize interface in RFC stage and once you are ready,
> > > proceed to
> > > the actual patches.
> > >
> > > > Regarding other ULPs, we don't have a testing environment for them so I
> > > > prefer not to commit on their implementation in the near future.
> > >
> > > You are not expected to have all testing environment, it is their (ULPs
> > > maintainers) responsibility to test your conversion, because you are
> > > doing conversion to generic interface.
> > >
> > > >
> > > > I don't know why we can't add this feature "as is".
> > > > Other ULPs maintainers might use it once it will be pushed.
> > >
> > > Sorry, but it is not how kernel development process works.
> > > "You propose -> you do" and not "You propose -> they do".
> >
> > I'm not changing an interface here. So all the other ULPs that use SRQ
> > (ipoib and srpt) currently will cuntinue using it.
> > I don't know why this patchset brought up the idea to add SRQ pools to
> > isert/svcrdma/etc.., but knowing that there are patches (under
> > discussions) that will have a big enfluance on these drivers (at least
> > isert), it doesn't make sence to implement *new* feature (SRQ usage) and
> > chage it a week afterwards.
>
> I'm almost sorry I asked :)
>

There is nothing to sorry about it. It was just a matter of time when we
would see it.

The reason why I'm so direct in this case, is related to the fact that
resource pools are really beneficial when they are used for create/destroy
of resources in (massive) dynamic operations.

In the proposed code (nvme-rdma), it is not the case and resources (SRQ)
are statically allocated, hence it makes no sense to me to propose new interface
without seeing any benefits from it.

This is why I'm asking from Max or implement real users of this interface or
don't provide that interface at all.

> Max,
>
> Leon's request while adding more work for you is valid as I see it. Leon
> and Doug (like other kernel maintainers and others in our community) are
> interested in improving the RDMA core subsystem in the sense of offering
> useful interfaces and having the consumers implement as less as
> possible. Making useful features/interfaces (like in your case)
> available for (and adopted by) most of the common consumers is helping
> the community and the subsystem as a whole rather than helping the
> specific module we happen to be focused on at that specific time. The
> long term goal is to make the consumers do as much as possible with
> implementing as less as possible.
>
> Having said that, if it was up to me, I wouldn't say its a hard
> requirement but definitely encouraged (I try to do it for the core
> interfaces I happen to offer and I know others have too).
> I think that Doug and others should really decide on the direction here.
>
> What do others think?

Right, I'm not alone here.

I already said my position in this mailing list, but will repeat it:
core work is must to move forward and to make this community better.

Thanks

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo at vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.infradead.org/pipermail/linux-nvme/attachments/20171120/044e20d8/attachment.sig>

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2017-11-20 11:34 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-16 17:21 [PATCH v2 0/3] nvmet-rdma: SRQ per completion vector Max Gurtovoy
2017-11-16 17:21 ` [PATCH 1/3] IB/core: add a simple SRQ pool per PD Max Gurtovoy
     [not found]   ` <1510852885-25519-2-git-send-email-maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-11-16 18:10     ` Leon Romanovsky
2017-11-16 18:10       ` Leon Romanovsky
     [not found]       ` <20171116181049.GO18825-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-11-17 19:12         ` Max Gurtovoy
2017-11-17 19:12           ` Max Gurtovoy
     [not found]           ` <c94d2e03-dd32-ff0a-789f-aae4d65f3955-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-11-18  7:36             ` Leon Romanovsky
2017-11-18  7:36               ` Leon Romanovsky
2017-11-16 18:32   ` Sagi Grimberg
2017-11-17 19:17     ` Max Gurtovoy
2017-11-16 17:21 ` [PATCH 2/3] nvmet-rdma: use srq pointer in rdma_cmd Max Gurtovoy
2017-11-16 17:26   ` [Suspected-Phishing][PATCH " Max Gurtovoy
2017-11-16 18:37   ` [PATCH " Sagi Grimberg
2017-11-16 17:21 ` [PATCH 3/3] nvmet-rdma: use SRQ per completion vector Max Gurtovoy
     [not found] ` <1510852885-25519-1-git-send-email-maxg-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-11-16 18:08   ` [PATCH v2 0/3] nvmet-rdma: " Leon Romanovsky
2017-11-16 18:08     ` Leon Romanovsky
     [not found]     ` <20171116180853.GN18825-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-11-17 19:18       ` Max Gurtovoy
2017-11-17 19:18         ` Max Gurtovoy
2017-11-16 18:36 ` Sagi Grimberg
2017-11-17 19:32   ` Max Gurtovoy
     [not found]     ` <a8e2be0b-deb7-8cc1-92ce-fc5dea3e241a-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-11-18 12:52       ` Leon Romanovsky
2017-11-18 12:52         ` Leon Romanovsky
     [not found]         ` <20171118125229.GT18825-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-11-18 13:57           ` Max Gurtovoy
2017-11-18 13:57             ` Max Gurtovoy
     [not found]             ` <935af437-d69e-e258-c00a-8bf9a04f9988-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-11-18 14:40               ` Leon Romanovsky
2017-11-18 14:40                 ` Leon Romanovsky
     [not found]                 ` <20171118144042.GU18825-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-11-18 21:29                   ` Max Gurtovoy
2017-11-18 21:29                     ` Max Gurtovoy
     [not found]                     ` <d6579bd6-053e-1214-ea95-ff72e6191cb0-VPRAkNaXOzVWk0Htik3J/w@public.gmane.org>
2017-11-20 11:00                       ` Sagi Grimberg
2017-11-20 11:00                         ` Sagi Grimberg
     [not found]                         ` <14682e66-de0d-042b-434b-0eb40fb79f0c-NQWnxTmZq1alnMjI0IkVqw@public.gmane.org>
2017-11-20 11:34                           ` Leon Romanovsky
2017-11-20 11:34                             ` Leon Romanovsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.