All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC 0/3] Proposal for exposing rdma_rw MR factor
@ 2017-08-01 21:06 Chuck Lever
       [not found] ` <20170801205334.15781.18761.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Chuck Lever @ 2017-08-01 21:06 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Since I converted NFSD to use the new rdma_rw API, I've been
struggling with how NFSD determines the size of its send CQ, and how
it provisions its send queue limit (ie, when it waits for more send
queue entries to become available).

The problem arises because rdma_rw hides the exact number of send
WQEs it has provisioned. It determines this number based on the MR
registration mode (FRWR for iWARP devices, local DMA rkey for
others), but the mode itself is no longer exposed to ULPs.

Thus when FRWR is in use, rdma_rw adds more WQEs, but NFSD is no
longer aware of this, does not provision a larger CQ, and does not
raise its send queue accounting limit. That means the extra WQEs are
never really used.

At LSF this year, Christoph and Sagi proposed a simple API that
expose information about how many CQ entries are needed so that ULPs
can provision their CQs more accurately.

The first patch here is a pre-requisite for the third. The second
patch adds the new API. The third patch changes NFSD to use the new
API to provision its CQ and send queue accounting accurately.

This approach was tested using the (fixed) force_mr core module
parameter.


---

Chuck Lever (3):
      svcrdma: Limit RQ depth
      rdma core: Add rdma_rw_mr_payload()
      svcrdma: Estimate Send Queue depth properly


 drivers/infiniband/core/rw.c             |   24 ++++++++++++++++++++
 include/rdma/rw.h                        |    2 ++
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   36 +++++++++++++++++++++---------
 3 files changed, 51 insertions(+), 11 deletions(-)

--
Chuck Lever
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH RFC 1/3] svcrdma: Limit RQ depth
       [not found] ` <20170801205334.15781.18761.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
@ 2017-08-01 21:06   ` Chuck Lever
  2017-08-01 21:07   ` [PATCH RFC 2/3] rdma core: Add rdma_rw_mr_payload() Chuck Lever
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 11+ messages in thread
From: Chuck Lever @ 2017-08-01 21:06 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

Ensure that the chosen Receive Queue depth for a newly created
transport does not overrun the QP WR limit of the underlying device.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 2aa8473..cdb04f8 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -167,8 +167,8 @@ static bool svc_rdma_prealloc_ctxts(struct svcxprt_rdma *xprt)
 {
 	unsigned int i;
 
-	/* Each RPC/RDMA credit can consume a number of send
-	 * and receive WQEs. One ctxt is allocated for each.
+	/* Each RPC/RDMA credit can consume one Receive and
+	 * one Send WQE at the same time.
 	 */
 	i = xprt->sc_sq_depth + xprt->sc_rq_depth;
 
@@ -742,13 +742,18 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	newxprt->sc_max_sge = min((size_t)dev->attrs.max_sge,
 				  (size_t)RPCSVC_MAXPAGES);
 	newxprt->sc_max_req_size = svcrdma_max_req_size;
-	newxprt->sc_max_requests = min_t(u32, dev->attrs.max_qp_wr,
-					 svcrdma_max_requests);
-	newxprt->sc_fc_credits = cpu_to_be32(newxprt->sc_max_requests);
-	newxprt->sc_max_bc_requests = min_t(u32, dev->attrs.max_qp_wr,
-					    svcrdma_max_bc_requests);
+	newxprt->sc_max_requests = svcrdma_max_requests;
+	newxprt->sc_max_bc_requests = svcrdma_max_bc_requests;
 	newxprt->sc_rq_depth = newxprt->sc_max_requests +
 			       newxprt->sc_max_bc_requests;
+	if (newxprt->sc_rq_depth > dev->attrs.max_qp_wr) {
+		pr_warn("svcrdma: reducing receive depth to %d\n",
+			dev->attrs.max_qp_wr);
+		newxprt->sc_rq_depth = dev->attrs.max_qp_wr;
+		newxprt->sc_max_requests = newxprt->sc_rq_depth - 2;
+		newxprt->sc_max_bc_requests = 2;
+	}
+	newxprt->sc_fc_credits = cpu_to_be32(newxprt->sc_max_requests);
 	newxprt->sc_sq_depth = newxprt->sc_rq_depth;
 	atomic_set(&newxprt->sc_sq_avail, newxprt->sc_sq_depth);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 2/3] rdma core: Add rdma_rw_mr_payload()
       [not found] ` <20170801205334.15781.18761.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
  2017-08-01 21:06   ` [PATCH RFC 1/3] svcrdma: Limit RQ depth Chuck Lever
@ 2017-08-01 21:07   ` Chuck Lever
       [not found]     ` <20170801210707.15781.36464.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
  2017-08-01 21:07   ` [PATCH RFC 3/3] svcrdma: Estimate Send Queue depth properly Chuck Lever
  2017-08-02  7:20   ` [PATCH RFC 0/3] Proposal for exposing rdma_rw MR factor Leon Romanovsky
  3 siblings, 1 reply; 11+ messages in thread
From: Chuck Lever @ 2017-08-01 21:07 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

The amount of payload per MR depends on device capabilities and
the memory registration mode in use. The new rdma_rw API hides both,
making it difficult for ULPs to determine how large their transport
send queues need to be.

Expose the MR payload information via a new API.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 drivers/infiniband/core/rw.c |   24 ++++++++++++++++++++++++
 include/rdma/rw.h            |    2 ++
 2 files changed, 26 insertions(+)

diff --git a/drivers/infiniband/core/rw.c b/drivers/infiniband/core/rw.c
index dbfd854..6ca607e 100644
--- a/drivers/infiniband/core/rw.c
+++ b/drivers/infiniband/core/rw.c
@@ -643,6 +643,30 @@ void rdma_rw_ctx_destroy_signature(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
 }
 EXPORT_SYMBOL(rdma_rw_ctx_destroy_signature);
 
+/**
+ * rdma_rw_mr_factor - return number of MRs required for a payload
+ * @device:	device handling the connection
+ * @port_num:	port num to which the connection is bound
+ * @maxpages:	maximum payload pages per rdma_rw_ctx
+ *
+ * Returns the number of MRs the device requires to move @maxpayload
+ * bytes. The returned value is used during transport creation to
+ * compute max_rdma_ctxts and the size of the transport's Send and
+ * Send Completion Queues.
+ */
+unsigned int rdma_rw_mr_factor(struct ib_device *device, u8 port_num,
+			       unsigned int maxpages)
+{
+	unsigned int mr_pages;
+
+	if (rdma_rw_can_use_mr(device, port_num))
+		mr_pages = rdma_rw_fr_page_list_len(device);
+	else
+		mr_pages = device->attrs.max_sge_rd;
+	return DIV_ROUND_UP(maxpages, mr_pages);
+}
+EXPORT_SYMBOL(rdma_rw_mr_factor);
+
 void rdma_rw_init_qp(struct ib_device *dev, struct ib_qp_init_attr *attr)
 {
 	u32 factor;
diff --git a/include/rdma/rw.h b/include/rdma/rw.h
index 377d865..a3cbbc7 100644
--- a/include/rdma/rw.h
+++ b/include/rdma/rw.h
@@ -81,6 +81,8 @@ struct ib_send_wr *rdma_rw_ctx_wrs(struct rdma_rw_ctx *ctx, struct ib_qp *qp,
 int rdma_rw_ctx_post(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8 port_num,
 		struct ib_cqe *cqe, struct ib_send_wr *chain_wr);
 
+unsigned int rdma_rw_mr_factor(struct ib_device *device, u8 port_num,
+		unsigned int maxpages);
 void rdma_rw_init_qp(struct ib_device *dev, struct ib_qp_init_attr *attr);
 int rdma_rw_init_mrs(struct ib_qp *qp, struct ib_qp_init_attr *attr);
 void rdma_rw_cleanup_mrs(struct ib_qp *qp);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH RFC 3/3] svcrdma: Estimate Send Queue depth properly
       [not found] ` <20170801205334.15781.18761.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
  2017-08-01 21:06   ` [PATCH RFC 1/3] svcrdma: Limit RQ depth Chuck Lever
  2017-08-01 21:07   ` [PATCH RFC 2/3] rdma core: Add rdma_rw_mr_payload() Chuck Lever
@ 2017-08-01 21:07   ` Chuck Lever
  2017-08-02  7:20   ` [PATCH RFC 0/3] Proposal for exposing rdma_rw MR factor Leon Romanovsky
  3 siblings, 0 replies; 11+ messages in thread
From: Chuck Lever @ 2017-08-01 21:07 UTC (permalink / raw)
  To: linux-rdma-u79uwXL29TY76Z2rM5mHXA

The rdma_rw API adjusts max_send_wr upwards during the
rdma_create_qp() call. If the ULP actually wants to take advantage
of these extra resources, it must increase the size of its send
completion queue (created before rdma_create_qp is called) and
increase its send queue accounting limit.

Use the new rdma_rw_mr_factor API to figure out the correct value
to use for the Send Queue and Send Completion Queue depths.

And, ensure that the chosen Send Queue depth for a newly created
transport does not overrun the QP WR limit of the underlying device.

Lastly, there's no longer a need to carry the Send Queue depth in
struct svcxprt_rdma, since the value is used only in the
svc_rdma_accept() path.

Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
---
 net/sunrpc/xprtrdma/svc_rdma_transport.c |   17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index cdb04f8..5caf8e7 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -51,6 +51,7 @@
 #include <linux/workqueue.h>
 #include <rdma/ib_verbs.h>
 #include <rdma/rdma_cm.h>
+#include <rdma/rw.h>
 #include <linux/sunrpc/svc_rdma.h>
 #include <linux/export.h>
 #include "xprt_rdma.h"
@@ -713,7 +714,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	struct ib_qp_init_attr qp_attr;
 	struct ib_device *dev;
 	struct sockaddr *sap;
-	unsigned int i;
+	unsigned int i, ctxts;
 	int ret = 0;
 
 	listen_rdma = container_of(xprt, struct svcxprt_rdma, sc_xprt);
@@ -754,7 +755,14 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 		newxprt->sc_max_bc_requests = 2;
 	}
 	newxprt->sc_fc_credits = cpu_to_be32(newxprt->sc_max_requests);
-	newxprt->sc_sq_depth = newxprt->sc_rq_depth;
+	ctxts = rdma_rw_mr_factor(dev, newxprt->sc_port_num, RPCSVC_MAXPAGES);
+	ctxts *= newxprt->sc_max_requests;
+	newxprt->sc_sq_depth = newxprt->sc_rq_depth + ctxts;
+	if (newxprt->sc_sq_depth > dev->attrs.max_qp_wr) {
+		pr_warn("svcrdma: reducing send depth to %d\n",
+			dev->attrs.max_qp_wr);
+		newxprt->sc_sq_depth = dev->attrs.max_qp_wr;
+	}
 	atomic_set(&newxprt->sc_sq_avail, newxprt->sc_sq_depth);
 
 	if (!svc_rdma_prealloc_ctxts(newxprt))
@@ -789,8 +797,8 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	qp_attr.event_handler = qp_event_handler;
 	qp_attr.qp_context = &newxprt->sc_xprt;
 	qp_attr.port_num = newxprt->sc_port_num;
-	qp_attr.cap.max_rdma_ctxs = newxprt->sc_max_requests;
-	qp_attr.cap.max_send_wr = newxprt->sc_sq_depth;
+	qp_attr.cap.max_rdma_ctxs = ctxts;
+	qp_attr.cap.max_send_wr = newxprt->sc_sq_depth - ctxts;
 	qp_attr.cap.max_recv_wr = newxprt->sc_rq_depth;
 	qp_attr.cap.max_send_sge = newxprt->sc_max_sge;
 	qp_attr.cap.max_recv_sge = newxprt->sc_max_sge;
@@ -858,6 +866,7 @@ static struct svc_xprt *svc_rdma_accept(struct svc_xprt *xprt)
 	dprintk("    remote address  : %pIS:%u\n", sap, rpc_get_port(sap));
 	dprintk("    max_sge         : %d\n", newxprt->sc_max_sge);
 	dprintk("    sq_depth        : %d\n", newxprt->sc_sq_depth);
+	dprintk("    rdma_rw_ctxs    : %d\n", ctxts);
 	dprintk("    max_requests    : %d\n", newxprt->sc_max_requests);
 	dprintk("    ord             : %d\n", newxprt->sc_ord);
 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC 0/3] Proposal for exposing rdma_rw MR factor
       [not found] ` <20170801205334.15781.18761.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
                     ` (2 preceding siblings ...)
  2017-08-01 21:07   ` [PATCH RFC 3/3] svcrdma: Estimate Send Queue depth properly Chuck Lever
@ 2017-08-02  7:20   ` Leon Romanovsky
       [not found]     ` <20170802072043.GZ13672-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
  3 siblings, 1 reply; 11+ messages in thread
From: Leon Romanovsky @ 2017-08-02  7:20 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA

[-- Attachment #1: Type: text/plain, Size: 2097 bytes --]

On Tue, Aug 01, 2017 at 05:06:51PM -0400, Chuck Lever wrote:
> Since I converted NFSD to use the new rdma_rw API, I've been
> struggling with how NFSD determines the size of its send CQ, and how
> it provisions its send queue limit (ie, when it waits for more send
> queue entries to become available).
>
> The problem arises because rdma_rw hides the exact number of send
> WQEs it has provisioned. It determines this number based on the MR
> registration mode (FRWR for iWARP devices, local DMA rkey for
> others), but the mode itself is no longer exposed to ULPs.
>
> Thus when FRWR is in use, rdma_rw adds more WQEs, but NFSD is no
> longer aware of this, does not provision a larger CQ, and does not
> raise its send queue accounting limit. That means the extra WQEs are
> never really used.
>
> At LSF this year, Christoph and Sagi proposed a simple API that
> expose information about how many CQ entries are needed so that ULPs
> can provision their CQs more accurately.

Indeed clean API. Does it make sense to get rid of sc_max_requests and
use this exported value directly?

>
> The first patch here is a pre-requisite for the third. The second
> patch adds the new API. The third patch changes NFSD to use the new
> API to provision its CQ and send queue accounting accurately.
>
> This approach was tested using the (fixed) force_mr core module
> parameter.

Good, I liked the word "tested" in the sentence :).

>
>
> ---
>
> Chuck Lever (3):
>       svcrdma: Limit RQ depth
>       rdma core: Add rdma_rw_mr_payload()
>       svcrdma: Estimate Send Queue depth properly
>
>
>  drivers/infiniband/core/rw.c             |   24 ++++++++++++++++++++
>  include/rdma/rw.h                        |    2 ++
>  net/sunrpc/xprtrdma/svc_rdma_transport.c |   36 +++++++++++++++++++++---------
>  3 files changed, 51 insertions(+), 11 deletions(-)
>
> --
> Chuck Lever
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC 0/3] Proposal for exposing rdma_rw MR factor
       [not found]     ` <20170802072043.GZ13672-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
@ 2017-08-02 18:06       ` Chuck Lever
  2017-08-08 18:28       ` Chuck Lever
  1 sibling, 0 replies; 11+ messages in thread
From: Chuck Lever @ 2017-08-02 18:06 UTC (permalink / raw)
  To: Leon Romanovsky; +Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA


> On Aug 2, 2017, at 3:20 AM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> 
> On Tue, Aug 01, 2017 at 05:06:51PM -0400, Chuck Lever wrote:
>> Since I converted NFSD to use the new rdma_rw API, I've been
>> struggling with how NFSD determines the size of its send CQ, and how
>> it provisions its send queue limit (ie, when it waits for more send
>> queue entries to become available).
>> 
>> The problem arises because rdma_rw hides the exact number of send
>> WQEs it has provisioned. It determines this number based on the MR
>> registration mode (FRWR for iWARP devices, local DMA rkey for
>> others), but the mode itself is no longer exposed to ULPs.
>> 
>> Thus when FRWR is in use, rdma_rw adds more WQEs, but NFSD is no
>> longer aware of this, does not provision a larger CQ, and does not
>> raise its send queue accounting limit. That means the extra WQEs are
>> never really used.
>> 
>> At LSF this year, Christoph and Sagi proposed a simple API that
>> expose information about how many CQ entries are needed so that ULPs
>> can provision their CQs more accurately.
> 
> Indeed clean API. Does it make sense to get rid of sc_max_requests and
> use this exported value directly?

That and sc_[rs]q_depth could now be made local variables
in svc_rdma_accept, with a little clean-up work.

Eventually I was hoping to make sc_max_requests a true
variable so that it could be dynamically reduced to
create back pressure on noisy clients. So I hesitate
to remove it outright at the moment.


>> The first patch here is a pre-requisite for the third. The second
>> patch adds the new API. The third patch changes NFSD to use the new
>> API to provision its CQ and send queue accounting accurately.
>> 
>> This approach was tested using the (fixed) force_mr core module
>> parameter.
> 
> Good, I liked the word "tested" in the sentence :).
> 
>> 
>> 
>> ---
>> 
>> Chuck Lever (3):
>>      svcrdma: Limit RQ depth
>>      rdma core: Add rdma_rw_mr_payload()
>>      svcrdma: Estimate Send Queue depth properly
>> 
>> 
>> drivers/infiniband/core/rw.c             |   24 ++++++++++++++++++++
>> include/rdma/rw.h                        |    2 ++
>> net/sunrpc/xprtrdma/svc_rdma_transport.c |   36 +++++++++++++++++++++---------
>> 3 files changed, 51 insertions(+), 11 deletions(-)
>> 
>> --
>> Chuck Lever
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC 0/3] Proposal for exposing rdma_rw MR factor
       [not found]     ` <20170802072043.GZ13672-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
  2017-08-02 18:06       ` Chuck Lever
@ 2017-08-08 18:28       ` Chuck Lever
       [not found]         ` <D8576C0A-6588-49D1-A2F9-AD39FB2E5FCE-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  1 sibling, 1 reply; 11+ messages in thread
From: Chuck Lever @ 2017-08-08 18:28 UTC (permalink / raw)
  To: linux-rdma, Doug Ledford


> On Aug 2, 2017, at 3:20 AM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
> 
> On Tue, Aug 01, 2017 at 05:06:51PM -0400, Chuck Lever wrote:
>> Since I converted NFSD to use the new rdma_rw API, I've been
>> struggling with how NFSD determines the size of its send CQ, and how
>> it provisions its send queue limit (ie, when it waits for more send
>> queue entries to become available).
>> 
>> The problem arises because rdma_rw hides the exact number of send
>> WQEs it has provisioned. It determines this number based on the MR
>> registration mode (FRWR for iWARP devices, local DMA rkey for
>> others), but the mode itself is no longer exposed to ULPs.
>> 
>> Thus when FRWR is in use, rdma_rw adds more WQEs, but NFSD is no
>> longer aware of this, does not provision a larger CQ, and does not
>> raise its send queue accounting limit. That means the extra WQEs are
>> never really used.
>> 
>> At LSF this year, Christoph and Sagi proposed a simple API that
>> expose information about how many CQ entries are needed so that ULPs
>> can provision their CQs more accurately.
> 
> Indeed clean API. Does it make sense to get rid of sc_max_requests and
> use this exported value directly?

I'm going to consider this clean up for a later patch.

So, how shall I submit this new API for upstream? I propose that
I submit all three of these via Bruce's tree, with an Acked-by
from Doug on the "Add rdma_rw_mr_payload()" patch, if Doug
approves.


>> The first patch here is a pre-requisite for the third. The second
>> patch adds the new API. The third patch changes NFSD to use the new
>> API to provision its CQ and send queue accounting accurately.
>> 
>> This approach was tested using the (fixed) force_mr core module
>> parameter.
> 
> Good, I liked the word "tested" in the sentence :).
> 
>> 
>> 
>> ---
>> 
>> Chuck Lever (3):
>>      svcrdma: Limit RQ depth
>>      rdma core: Add rdma_rw_mr_payload()
>>      svcrdma: Estimate Send Queue depth properly
>> 
>> 
>> drivers/infiniband/core/rw.c             |   24 ++++++++++++++++++++
>> include/rdma/rw.h                        |    2 ++
>> net/sunrpc/xprtrdma/svc_rdma_transport.c |   36 +++++++++++++++++++++---------
>> 3 files changed, 51 insertions(+), 11 deletions(-)
>> 
>> --
>> Chuck Lever
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC 0/3] Proposal for exposing rdma_rw MR factor
       [not found]         ` <D8576C0A-6588-49D1-A2F9-AD39FB2E5FCE-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2017-08-16 15:50           ` Chuck Lever
       [not found]             ` <6EAEEE15-819E-4CE2-8C67-997A244919FB-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Chuck Lever @ 2017-08-16 15:50 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma


> On Aug 8, 2017, at 2:28 PM, Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org> wrote:
> 
> 
>> On Aug 2, 2017, at 3:20 AM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org> wrote:
>> 
>> On Tue, Aug 01, 2017 at 05:06:51PM -0400, Chuck Lever wrote:
>>> Since I converted NFSD to use the new rdma_rw API, I've been
>>> struggling with how NFSD determines the size of its send CQ, and how
>>> it provisions its send queue limit (ie, when it waits for more send
>>> queue entries to become available).
>>> 
>>> The problem arises because rdma_rw hides the exact number of send
>>> WQEs it has provisioned. It determines this number based on the MR
>>> registration mode (FRWR for iWARP devices, local DMA rkey for
>>> others), but the mode itself is no longer exposed to ULPs.
>>> 
>>> Thus when FRWR is in use, rdma_rw adds more WQEs, but NFSD is no
>>> longer aware of this, does not provision a larger CQ, and does not
>>> raise its send queue accounting limit. That means the extra WQEs are
>>> never really used.
>>> 
>>> At LSF this year, Christoph and Sagi proposed a simple API that
>>> expose information about how many CQ entries are needed so that ULPs
>>> can provision their CQs more accurately.
>> 
>> Indeed clean API. Does it make sense to get rid of sc_max_requests and
>> use this exported value directly?
> 
> I'm going to consider this clean up for a later patch.
> 
> So, how shall I submit this new API for upstream? I propose that
> I submit all three of these via Bruce's tree, with an Acked-by
> from Doug on the "Add rdma_rw_mr_payload()" patch, if Doug
> approves.

Doug?


>>> The first patch here is a pre-requisite for the third. The second
>>> patch adds the new API. The third patch changes NFSD to use the new
>>> API to provision its CQ and send queue accounting accurately.
>>> 
>>> This approach was tested using the (fixed) force_mr core module
>>> parameter.
>> 
>> Good, I liked the word "tested" in the sentence :).
>> 
>>> 
>>> 
>>> ---
>>> 
>>> Chuck Lever (3):
>>>     svcrdma: Limit RQ depth
>>>     rdma core: Add rdma_rw_mr_payload()
>>>     svcrdma: Estimate Send Queue depth properly
>>> 
>>> 
>>> drivers/infiniband/core/rw.c             |   24 ++++++++++++++++++++
>>> include/rdma/rw.h                        |    2 ++
>>> net/sunrpc/xprtrdma/svc_rdma_transport.c |   36 +++++++++++++++++++++---------
>>> 3 files changed, 51 insertions(+), 11 deletions(-)
>>> 
>>> --
>>> Chuck Lever
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> Chuck Lever
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC 2/3] rdma core: Add rdma_rw_mr_payload()
       [not found]     ` <20170801210707.15781.36464.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
@ 2017-08-18 16:51       ` Doug Ledford
  0 siblings, 0 replies; 11+ messages in thread
From: Doug Ledford @ 2017-08-18 16:51 UTC (permalink / raw)
  To: Chuck Lever, linux-rdma-u79uwXL29TY76Z2rM5mHXA

On Tue, 2017-08-01 at 17:07 -0400, Chuck Lever wrote:
> The amount of payload per MR depends on device capabilities and
> the memory registration mode in use. The new rdma_rw API hides both,
> making it difficult for ULPs to determine how large their transport
> send queues need to be.
> 
> Expose the MR payload information via a new API.
> 
> Signed-off-by: Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>

Acked-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

> ---
>  drivers/infiniband/core/rw.c |   24 ++++++++++++++++++++++++
>  include/rdma/rw.h            |    2 ++
>  2 files changed, 26 insertions(+)
> 
> diff --git a/drivers/infiniband/core/rw.c
> b/drivers/infiniband/core/rw.c
> index dbfd854..6ca607e 100644
> --- a/drivers/infiniband/core/rw.c
> +++ b/drivers/infiniband/core/rw.c
> @@ -643,6 +643,30 @@ void rdma_rw_ctx_destroy_signature(struct
> rdma_rw_ctx *ctx, struct ib_qp *qp,
>  }
>  EXPORT_SYMBOL(rdma_rw_ctx_destroy_signature);
>  
> +/**
> + * rdma_rw_mr_factor - return number of MRs required for a payload
> + * @device:	device handling the connection
> + * @port_num:	port num to which the connection is bound
> + * @maxpages:	maximum payload pages per rdma_rw_ctx
> + *
> + * Returns the number of MRs the device requires to move @maxpayload
> + * bytes. The returned value is used during transport creation to
> + * compute max_rdma_ctxts and the size of the transport's Send and
> + * Send Completion Queues.
> + */
> +unsigned int rdma_rw_mr_factor(struct ib_device *device, u8
> port_num,
> +			       unsigned int maxpages)
> +{
> +	unsigned int mr_pages;
> +
> +	if (rdma_rw_can_use_mr(device, port_num))
> +		mr_pages = rdma_rw_fr_page_list_len(device);
> +	else
> +		mr_pages = device->attrs.max_sge_rd;
> +	return DIV_ROUND_UP(maxpages, mr_pages);
> +}
> +EXPORT_SYMBOL(rdma_rw_mr_factor);
> +
>  void rdma_rw_init_qp(struct ib_device *dev, struct ib_qp_init_attr
> *attr)
>  {
>  	u32 factor;
> diff --git a/include/rdma/rw.h b/include/rdma/rw.h
> index 377d865..a3cbbc7 100644
> --- a/include/rdma/rw.h
> +++ b/include/rdma/rw.h
> @@ -81,6 +81,8 @@ struct ib_send_wr *rdma_rw_ctx_wrs(struct
> rdma_rw_ctx *ctx, struct ib_qp *qp,
>  int rdma_rw_ctx_post(struct rdma_rw_ctx *ctx, struct ib_qp *qp, u8
> port_num,
>  		struct ib_cqe *cqe, struct ib_send_wr *chain_wr);
>  
> +unsigned int rdma_rw_mr_factor(struct ib_device *device, u8
> port_num,
> +		unsigned int maxpages);
>  void rdma_rw_init_qp(struct ib_device *dev, struct ib_qp_init_attr
> *attr);
>  int rdma_rw_init_mrs(struct ib_qp *qp, struct ib_qp_init_attr
> *attr);
>  void rdma_rw_cleanup_mrs(struct ib_qp *qp);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" 
> in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG KeyID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC 0/3] Proposal for exposing rdma_rw MR factor
       [not found]             ` <6EAEEE15-819E-4CE2-8C67-997A244919FB-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
@ 2017-08-18 16:52               ` Doug Ledford
       [not found]                 ` <1503075123.2598.27.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 11+ messages in thread
From: Doug Ledford @ 2017-08-18 16:52 UTC (permalink / raw)
  To: Chuck Lever; +Cc: linux-rdma

On Wed, 2017-08-16 at 11:50 -0400, Chuck Lever wrote:
> > On Aug 8, 2017, at 2:28 PM, Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
> > wrote:
> > 
> > 
> > > On Aug 2, 2017, at 3:20 AM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> > > wrote:
> > > 
> > > On Tue, Aug 01, 2017 at 05:06:51PM -0400, Chuck Lever wrote:
> > > > Since I converted NFSD to use the new rdma_rw API, I've been
> > > > struggling with how NFSD determines the size of its send CQ,
> > > > and how
> > > > it provisions its send queue limit (ie, when it waits for more
> > > > send
> > > > queue entries to become available).
> > > > 
> > > > The problem arises because rdma_rw hides the exact number of
> > > > send
> > > > WQEs it has provisioned. It determines this number based on the
> > > > MR
> > > > registration mode (FRWR for iWARP devices, local DMA rkey for
> > > > others), but the mode itself is no longer exposed to ULPs.
> > > > 
> > > > Thus when FRWR is in use, rdma_rw adds more WQEs, but NFSD is
> > > > no
> > > > longer aware of this, does not provision a larger CQ, and does
> > > > not
> > > > raise its send queue accounting limit. That means the extra
> > > > WQEs are
> > > > never really used.
> > > > 
> > > > At LSF this year, Christoph and Sagi proposed a simple API that
> > > > expose information about how many CQ entries are needed so that
> > > > ULPs
> > > > can provision their CQs more accurately.
> > > 
> > > Indeed clean API. Does it make sense to get rid of
> > > sc_max_requests and
> > > use this exported value directly?
> > 
> > I'm going to consider this clean up for a later patch.
> > 
> > So, how shall I submit this new API for upstream? I propose that
> > I submit all three of these via Bruce's tree, with an Acked-by
> > from Doug on the "Add rdma_rw_mr_payload()" patch, if Doug
> > approves.
> 
> Doug?

Your proposal is acceptable.  Ack given for 2nd patch to rdma core.

-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG KeyID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH RFC 0/3] Proposal for exposing rdma_rw MR factor
       [not found]                 ` <1503075123.2598.27.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2017-08-18 16:53                   ` Chuck Lever
  0 siblings, 0 replies; 11+ messages in thread
From: Chuck Lever @ 2017-08-18 16:53 UTC (permalink / raw)
  To: Doug Ledford; +Cc: linux-rdma


> On Aug 18, 2017, at 12:52 PM, Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:
> 
> On Wed, 2017-08-16 at 11:50 -0400, Chuck Lever wrote:
>>> On Aug 8, 2017, at 2:28 PM, Chuck Lever <chuck.lever-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
>>> wrote:
>>> 
>>> 
>>>> On Aug 2, 2017, at 3:20 AM, Leon Romanovsky <leon-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
>>>> wrote:
>>>> 
>>>> On Tue, Aug 01, 2017 at 05:06:51PM -0400, Chuck Lever wrote:
>>>>> Since I converted NFSD to use the new rdma_rw API, I've been
>>>>> struggling with how NFSD determines the size of its send CQ,
>>>>> and how
>>>>> it provisions its send queue limit (ie, when it waits for more
>>>>> send
>>>>> queue entries to become available).
>>>>> 
>>>>> The problem arises because rdma_rw hides the exact number of
>>>>> send
>>>>> WQEs it has provisioned. It determines this number based on the
>>>>> MR
>>>>> registration mode (FRWR for iWARP devices, local DMA rkey for
>>>>> others), but the mode itself is no longer exposed to ULPs.
>>>>> 
>>>>> Thus when FRWR is in use, rdma_rw adds more WQEs, but NFSD is
>>>>> no
>>>>> longer aware of this, does not provision a larger CQ, and does
>>>>> not
>>>>> raise its send queue accounting limit. That means the extra
>>>>> WQEs are
>>>>> never really used.
>>>>> 
>>>>> At LSF this year, Christoph and Sagi proposed a simple API that
>>>>> expose information about how many CQ entries are needed so that
>>>>> ULPs
>>>>> can provision their CQs more accurately.
>>>> 
>>>> Indeed clean API. Does it make sense to get rid of
>>>> sc_max_requests and
>>>> use this exported value directly?
>>> 
>>> I'm going to consider this clean up for a later patch.
>>> 
>>> So, how shall I submit this new API for upstream? I propose that
>>> I submit all three of these via Bruce's tree, with an Acked-by
>>> from Doug on the "Add rdma_rw_mr_payload()" patch, if Doug
>>> approves.
>> 
>> Doug?
> 
> Your proposal is acceptable.  Ack given for 2nd patch to rdma core.

Thanks!

--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-08-18 16:53 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-08-01 21:06 [PATCH RFC 0/3] Proposal for exposing rdma_rw MR factor Chuck Lever
     [not found] ` <20170801205334.15781.18761.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-08-01 21:06   ` [PATCH RFC 1/3] svcrdma: Limit RQ depth Chuck Lever
2017-08-01 21:07   ` [PATCH RFC 2/3] rdma core: Add rdma_rw_mr_payload() Chuck Lever
     [not found]     ` <20170801210707.15781.36464.stgit-Hs+gFlyCn65vLzlybtyyYzGyq/o6K9yX@public.gmane.org>
2017-08-18 16:51       ` Doug Ledford
2017-08-01 21:07   ` [PATCH RFC 3/3] svcrdma: Estimate Send Queue depth properly Chuck Lever
2017-08-02  7:20   ` [PATCH RFC 0/3] Proposal for exposing rdma_rw MR factor Leon Romanovsky
     [not found]     ` <20170802072043.GZ13672-U/DQcQFIOTAAJjI8aNfphQ@public.gmane.org>
2017-08-02 18:06       ` Chuck Lever
2017-08-08 18:28       ` Chuck Lever
     [not found]         ` <D8576C0A-6588-49D1-A2F9-AD39FB2E5FCE-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-08-16 15:50           ` Chuck Lever
     [not found]             ` <6EAEEE15-819E-4CE2-8C67-997A244919FB-QHcLZuEGTsvQT0dZR+AlfA@public.gmane.org>
2017-08-18 16:52               ` Doug Ledford
     [not found]                 ` <1503075123.2598.27.camel-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2017-08-18 16:53                   ` Chuck Lever

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.