linux-rdma.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/3] Optimize NFSD Send completion processing
@ 2021-07-26 14:46 Chuck Lever
  2021-07-26 14:46 ` [PATCH v1 1/3] svcrdma: Fewer calls to wake_up() in Send completion handler Chuck Lever
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Chuck Lever @ 2021-07-26 14:46 UTC (permalink / raw)
  To: linux-nfs, linux-rdma

The following series improves the efficiency of NFSD's Send
completion processing by removing costly operations from the svcrdma
Send completion handlers. Each of these patches reduces the CPU
utilized per RPC by Send completion by an average of 2-3%.

The goal is to improve the rate of RPCs that can be retired for a
single-transport workload, thus increasing the server's scalability.

These patches are also available for testing:

https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/log/?h=for-next

---

Chuck Lever (3):
      svcrdma: Fewer calls to wake_up() in Send completion handler
      svcrdma: Relieve contention on sc_send_lock.
      svcrdma: Convert rdma->sc_rw_ctxts to llist


 include/linux/sunrpc/svc_rdma.h          |  7 +--
 net/sunrpc/xprtrdma/svc_rdma_rw.c        | 56 ++++++++++++++++--------
 net/sunrpc/xprtrdma/svc_rdma_sendto.c    | 41 +++++++++--------
 net/sunrpc/xprtrdma/svc_rdma_transport.c |  4 +-
 4 files changed, 66 insertions(+), 42 deletions(-)

--
Chuck Lever


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH v1 1/3] svcrdma: Fewer calls to wake_up() in Send completion handler
  2021-07-26 14:46 [PATCH v1 0/3] Optimize NFSD Send completion processing Chuck Lever
@ 2021-07-26 14:46 ` Chuck Lever
  2021-07-26 16:50   ` Tom Talpey
  2021-07-26 14:47 ` [PATCH v1 2/3] svcrdma: Relieve contention on sc_send_lock Chuck Lever
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Chuck Lever @ 2021-07-26 14:46 UTC (permalink / raw)
  To: linux-nfs, linux-rdma

Because wake_up() takes an IRQ-safe lock, it can be expensive,
especially to call inside of a single-threaded completion handler.
What's more, the Send wait queue almost never has waiters, so
most of the time, this is an expensive no-op.

As always, the goal is to reduce the average overhead of each
completion, because a transport's completion handlers are single-
threaded on one CPU core. This change reduces CPU utilization of
the Send completion thread by 2-3% on my server.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h       |    1 +
 net/sunrpc/xprtrdma/svc_rdma_rw.c     |    7 ++-----
 net/sunrpc/xprtrdma/svc_rdma_sendto.c |   18 +++++++++++++++---
 3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 3184465de3a0..57c60ffe76dd 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -207,6 +207,7 @@ extern void svc_rdma_send_error_msg(struct svcxprt_rdma *rdma,
 				    struct svc_rdma_send_ctxt *sctxt,
 				    struct svc_rdma_recv_ctxt *rctxt,
 				    int status);
+extern void svc_rdma_wake_send_waiters(struct svcxprt_rdma *rdma, int avail);
 extern int svc_rdma_sendto(struct svc_rqst *);
 extern int svc_rdma_result_payload(struct svc_rqst *rqstp, unsigned int offset,
 				   unsigned int length);
diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
index 1e651447dc4e..3d1b119f6e3e 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
@@ -248,8 +248,7 @@ static void svc_rdma_write_done(struct ib_cq *cq, struct ib_wc *wc)
 
 	trace_svcrdma_wc_write(wc, &cc->cc_cid);
 
-	atomic_add(cc->cc_sqecount, &rdma->sc_sq_avail);
-	wake_up(&rdma->sc_send_wait);
+	svc_rdma_wake_send_waiters(rdma, cc->cc_sqecount);
 
 	if (unlikely(wc->status != IB_WC_SUCCESS))
 		svc_xprt_deferred_close(&rdma->sc_xprt);
@@ -304,9 +303,7 @@ static void svc_rdma_wc_read_done(struct ib_cq *cq, struct ib_wc *wc)
 
 	trace_svcrdma_wc_read(wc, &cc->cc_cid);
 
-	atomic_add(cc->cc_sqecount, &rdma->sc_sq_avail);
-	wake_up(&rdma->sc_send_wait);
-
+	svc_rdma_wake_send_waiters(rdma, cc->cc_sqecount);
 	cc->cc_status = wc->status;
 	complete(&cc->cc_done);
 	return;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index d6bbafb773e1..fba2ee4eb607 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -258,6 +258,20 @@ void svc_rdma_send_ctxt_put(struct svcxprt_rdma *rdma,
 	spin_unlock(&rdma->sc_send_lock);
 }
 
+/**
+ * svc_rdma_wake_send_waiters - manage Send Queue accounting
+ * @rdma: controlling transport
+ * @avail: Number of additional SQEs that are now available
+ *
+ */
+void svc_rdma_wake_send_waiters(struct svcxprt_rdma *rdma, int avail)
+{
+	atomic_add(avail, &rdma->sc_sq_avail);
+	smp_mb__after_atomic();
+	if (unlikely(waitqueue_active(&rdma->sc_send_wait)))
+		wake_up(&rdma->sc_send_wait);
+}
+
 /**
  * svc_rdma_wc_send - Invoked by RDMA provider for each polled Send WC
  * @cq: Completion Queue context
@@ -275,11 +289,9 @@ static void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc)
 
 	trace_svcrdma_wc_send(wc, &ctxt->sc_cid);
 
+	svc_rdma_wake_send_waiters(rdma, 1);
 	complete(&ctxt->sc_done);
 
-	atomic_inc(&rdma->sc_sq_avail);
-	wake_up(&rdma->sc_send_wait);
-
 	if (unlikely(wc->status != IB_WC_SUCCESS))
 		svc_xprt_deferred_close(&rdma->sc_xprt);
 }



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v1 2/3] svcrdma: Relieve contention on sc_send_lock.
  2021-07-26 14:46 [PATCH v1 0/3] Optimize NFSD Send completion processing Chuck Lever
  2021-07-26 14:46 ` [PATCH v1 1/3] svcrdma: Fewer calls to wake_up() in Send completion handler Chuck Lever
@ 2021-07-26 14:47 ` Chuck Lever
  2021-07-26 14:47 ` [PATCH v1 3/3] svcrdma: Convert rdma->sc_rw_ctxts to llist Chuck Lever
  2021-07-26 16:52 ` [PATCH v1 0/3] Optimize NFSD Send completion processing Tom Talpey
  3 siblings, 0 replies; 9+ messages in thread
From: Chuck Lever @ 2021-07-26 14:47 UTC (permalink / raw)
  To: linux-nfs, linux-rdma

/proc/lock_stat indicates the the sc_send_lock is heavily
contended when the server is under load from a single client.

To address this, convert the send_ctxt free list to an llist.
Returning an item to the send_ctxt cache is now waitless, which
reduces the instruction path length in the single-threaded Send
handler (svc_rdma_wc_send).

The goal is to enable the ib_comp_wq worker to handle a higher
RPC/RDMA Send completion rate given the same CPU resources. This
change reduces CPU utilization of Send completion by 2-3% on my
server.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h          |    4 ++--
 net/sunrpc/xprtrdma/svc_rdma_sendto.c    |   23 ++++++++---------------
 net/sunrpc/xprtrdma/svc_rdma_transport.c |    2 +-
 3 files changed, 11 insertions(+), 18 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 57c60ffe76dd..5f8d5af6556c 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -90,7 +90,7 @@ struct svcxprt_rdma {
 	struct ib_pd         *sc_pd;
 
 	spinlock_t	     sc_send_lock;
-	struct list_head     sc_send_ctxts;
+	struct llist_head    sc_send_ctxts;
 	spinlock_t	     sc_rw_ctxt_lock;
 	struct list_head     sc_rw_ctxts;
 
@@ -150,7 +150,7 @@ struct svc_rdma_recv_ctxt {
 };
 
 struct svc_rdma_send_ctxt {
-	struct list_head	sc_list;
+	struct llist_node	sc_node;
 	struct rpc_rdma_cid	sc_cid;
 
 	struct ib_send_wr	sc_send_wr;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_sendto.c b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
index fba2ee4eb607..599021b2391d 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_sendto.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_sendto.c
@@ -113,13 +113,6 @@
 
 static void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc);
 
-static inline struct svc_rdma_send_ctxt *
-svc_rdma_next_send_ctxt(struct list_head *list)
-{
-	return list_first_entry_or_null(list, struct svc_rdma_send_ctxt,
-					sc_list);
-}
-
 static void svc_rdma_send_cid_init(struct svcxprt_rdma *rdma,
 				   struct rpc_rdma_cid *cid)
 {
@@ -182,9 +175,10 @@ svc_rdma_send_ctxt_alloc(struct svcxprt_rdma *rdma)
 void svc_rdma_send_ctxts_destroy(struct svcxprt_rdma *rdma)
 {
 	struct svc_rdma_send_ctxt *ctxt;
+	struct llist_node *node;
 
-	while ((ctxt = svc_rdma_next_send_ctxt(&rdma->sc_send_ctxts))) {
-		list_del(&ctxt->sc_list);
+	while ((node = llist_del_first(&rdma->sc_send_ctxts)) != NULL) {
+		ctxt = llist_entry(node, struct svc_rdma_send_ctxt, sc_node);
 		ib_dma_unmap_single(rdma->sc_pd->device,
 				    ctxt->sc_sges[0].addr,
 				    rdma->sc_max_req_size,
@@ -204,12 +198,13 @@ void svc_rdma_send_ctxts_destroy(struct svcxprt_rdma *rdma)
 struct svc_rdma_send_ctxt *svc_rdma_send_ctxt_get(struct svcxprt_rdma *rdma)
 {
 	struct svc_rdma_send_ctxt *ctxt;
+	struct llist_node *node;
 
 	spin_lock(&rdma->sc_send_lock);
-	ctxt = svc_rdma_next_send_ctxt(&rdma->sc_send_ctxts);
-	if (!ctxt)
+	node = llist_del_first(&rdma->sc_send_ctxts);
+	if (!node)
 		goto out_empty;
-	list_del(&ctxt->sc_list);
+	ctxt = llist_entry(node, struct svc_rdma_send_ctxt, sc_node);
 	spin_unlock(&rdma->sc_send_lock);
 
 out:
@@ -253,9 +248,7 @@ void svc_rdma_send_ctxt_put(struct svcxprt_rdma *rdma,
 					     ctxt->sc_sges[i].length);
 	}
 
-	spin_lock(&rdma->sc_send_lock);
-	list_add(&ctxt->sc_list, &rdma->sc_send_ctxts);
-	spin_unlock(&rdma->sc_send_lock);
+	llist_add(&ctxt->sc_node, &rdma->sc_send_ctxts);
 }
 
 /**
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index d94b7759ada1..99474078c304 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -136,7 +136,7 @@ static struct svcxprt_rdma *svc_rdma_create_xprt(struct svc_serv *serv,
 	svc_xprt_init(net, &svc_rdma_class, &cma_xprt->sc_xprt, serv);
 	INIT_LIST_HEAD(&cma_xprt->sc_accept_q);
 	INIT_LIST_HEAD(&cma_xprt->sc_rq_dto_q);
-	INIT_LIST_HEAD(&cma_xprt->sc_send_ctxts);
+	init_llist_head(&cma_xprt->sc_send_ctxts);
 	init_llist_head(&cma_xprt->sc_recv_ctxts);
 	INIT_LIST_HEAD(&cma_xprt->sc_rw_ctxts);
 	init_waitqueue_head(&cma_xprt->sc_send_wait);



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v1 3/3] svcrdma: Convert rdma->sc_rw_ctxts to llist
  2021-07-26 14:46 [PATCH v1 0/3] Optimize NFSD Send completion processing Chuck Lever
  2021-07-26 14:46 ` [PATCH v1 1/3] svcrdma: Fewer calls to wake_up() in Send completion handler Chuck Lever
  2021-07-26 14:47 ` [PATCH v1 2/3] svcrdma: Relieve contention on sc_send_lock Chuck Lever
@ 2021-07-26 14:47 ` Chuck Lever
  2021-07-26 16:52 ` [PATCH v1 0/3] Optimize NFSD Send completion processing Tom Talpey
  3 siblings, 0 replies; 9+ messages in thread
From: Chuck Lever @ 2021-07-26 14:47 UTC (permalink / raw)
  To: linux-nfs, linux-rdma

Relieve contention on sc_rw_ctxt_lock by converting rdma->sc_rw_ctxts
to an llist.

The goal is to reduce the average overhead of Send completions,
because a transport's completion handlers are single-threaded on
one CPU core. This change reduces CPU utilization of each Send
completion by 2-3% on my server.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 include/linux/sunrpc/svc_rdma.h          |    2 +
 net/sunrpc/xprtrdma/svc_rdma_rw.c        |   49 +++++++++++++++++++++---------
 net/sunrpc/xprtrdma/svc_rdma_transport.c |    2 +
 3 files changed, 37 insertions(+), 16 deletions(-)

diff --git a/include/linux/sunrpc/svc_rdma.h b/include/linux/sunrpc/svc_rdma.h
index 5f8d5af6556c..24aa159d29a7 100644
--- a/include/linux/sunrpc/svc_rdma.h
+++ b/include/linux/sunrpc/svc_rdma.h
@@ -92,7 +92,7 @@ struct svcxprt_rdma {
 	spinlock_t	     sc_send_lock;
 	struct llist_head    sc_send_ctxts;
 	spinlock_t	     sc_rw_ctxt_lock;
-	struct list_head     sc_rw_ctxts;
+	struct llist_head    sc_rw_ctxts;
 
 	u32		     sc_pending_recvs;
 	u32		     sc_recv_batch;
diff --git a/net/sunrpc/xprtrdma/svc_rdma_rw.c b/net/sunrpc/xprtrdma/svc_rdma_rw.c
index 3d1b119f6e3e..e27433f08ca7 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_rw.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_rw.c
@@ -35,6 +35,7 @@ static void svc_rdma_wc_read_done(struct ib_cq *cq, struct ib_wc *wc);
  * controlling svcxprt_rdma is destroyed.
  */
 struct svc_rdma_rw_ctxt {
+	struct llist_node	rw_node;
 	struct list_head	rw_list;
 	struct rdma_rw_ctx	rw_ctx;
 	unsigned int		rw_nents;
@@ -53,19 +54,19 @@ static struct svc_rdma_rw_ctxt *
 svc_rdma_get_rw_ctxt(struct svcxprt_rdma *rdma, unsigned int sges)
 {
 	struct svc_rdma_rw_ctxt *ctxt;
+	struct llist_node *node;
 
 	spin_lock(&rdma->sc_rw_ctxt_lock);
-
-	ctxt = svc_rdma_next_ctxt(&rdma->sc_rw_ctxts);
-	if (ctxt) {
-		list_del(&ctxt->rw_list);
-		spin_unlock(&rdma->sc_rw_ctxt_lock);
+	node = llist_del_first(&rdma->sc_rw_ctxts);
+	spin_unlock(&rdma->sc_rw_ctxt_lock);
+	if (node) {
+		ctxt = llist_entry(node, struct svc_rdma_rw_ctxt, rw_node);
 	} else {
-		spin_unlock(&rdma->sc_rw_ctxt_lock);
 		ctxt = kmalloc(struct_size(ctxt, rw_first_sgl, SG_CHUNK_SIZE),
 			       GFP_KERNEL);
 		if (!ctxt)
 			goto out_noctx;
+
 		INIT_LIST_HEAD(&ctxt->rw_list);
 	}
 
@@ -83,14 +84,18 @@ svc_rdma_get_rw_ctxt(struct svcxprt_rdma *rdma, unsigned int sges)
 	return NULL;
 }
 
-static void svc_rdma_put_rw_ctxt(struct svcxprt_rdma *rdma,
-				 struct svc_rdma_rw_ctxt *ctxt)
+static void __svc_rdma_put_rw_ctxt(struct svcxprt_rdma *rdma,
+				   struct svc_rdma_rw_ctxt *ctxt,
+				   struct llist_head *list)
 {
 	sg_free_table_chained(&ctxt->rw_sg_table, SG_CHUNK_SIZE);
+	llist_add(&ctxt->rw_node, list);
+}
 
-	spin_lock(&rdma->sc_rw_ctxt_lock);
-	list_add(&ctxt->rw_list, &rdma->sc_rw_ctxts);
-	spin_unlock(&rdma->sc_rw_ctxt_lock);
+static void svc_rdma_put_rw_ctxt(struct svcxprt_rdma *rdma,
+				 struct svc_rdma_rw_ctxt *ctxt)
+{
+	__svc_rdma_put_rw_ctxt(rdma, ctxt, &rdma->sc_rw_ctxts);
 }
 
 /**
@@ -101,9 +106,10 @@ static void svc_rdma_put_rw_ctxt(struct svcxprt_rdma *rdma,
 void svc_rdma_destroy_rw_ctxts(struct svcxprt_rdma *rdma)
 {
 	struct svc_rdma_rw_ctxt *ctxt;
+	struct llist_node *node;
 
-	while ((ctxt = svc_rdma_next_ctxt(&rdma->sc_rw_ctxts)) != NULL) {
-		list_del(&ctxt->rw_list);
+	while ((node = llist_del_first(&rdma->sc_rw_ctxts)) != NULL) {
+		ctxt = llist_entry(node, struct svc_rdma_rw_ctxt, rw_node);
 		kfree(ctxt);
 	}
 }
@@ -171,20 +177,35 @@ static void svc_rdma_cc_init(struct svcxprt_rdma *rdma,
 	cc->cc_sqecount = 0;
 }
 
+/*
+ * The consumed rw_ctx's are cleaned and placed on a local llist so
+ * that only one atomic llist operation is needed to put them all
+ * back on the free list.
+ */
 static void svc_rdma_cc_release(struct svc_rdma_chunk_ctxt *cc,
 				enum dma_data_direction dir)
 {
 	struct svcxprt_rdma *rdma = cc->cc_rdma;
+	struct llist_node *first, *last;
 	struct svc_rdma_rw_ctxt *ctxt;
+	LLIST_HEAD(free);
 
+	first = last = NULL;
 	while ((ctxt = svc_rdma_next_ctxt(&cc->cc_rwctxts)) != NULL) {
 		list_del(&ctxt->rw_list);
 
 		rdma_rw_ctx_destroy(&ctxt->rw_ctx, rdma->sc_qp,
 				    rdma->sc_port_num, ctxt->rw_sg_table.sgl,
 				    ctxt->rw_nents, dir);
-		svc_rdma_put_rw_ctxt(rdma, ctxt);
+		__svc_rdma_put_rw_ctxt(rdma, ctxt, &free);
+
+		ctxt->rw_node.next = first;
+		first = &ctxt->rw_node;
+		if (!last)
+			last = first;
 	}
+	if (first)
+		llist_add_batch(first, last, &rdma->sc_rw_ctxts);
 }
 
 /* State for sending a Write or Reply chunk.
diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index 99474078c304..d1faa522c3dd 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -138,7 +138,7 @@ static struct svcxprt_rdma *svc_rdma_create_xprt(struct svc_serv *serv,
 	INIT_LIST_HEAD(&cma_xprt->sc_rq_dto_q);
 	init_llist_head(&cma_xprt->sc_send_ctxts);
 	init_llist_head(&cma_xprt->sc_recv_ctxts);
-	INIT_LIST_HEAD(&cma_xprt->sc_rw_ctxts);
+	init_llist_head(&cma_xprt->sc_rw_ctxts);
 	init_waitqueue_head(&cma_xprt->sc_send_wait);
 
 	spin_lock_init(&cma_xprt->sc_lock);



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 1/3] svcrdma: Fewer calls to wake_up() in Send completion handler
  2021-07-26 14:46 ` [PATCH v1 1/3] svcrdma: Fewer calls to wake_up() in Send completion handler Chuck Lever
@ 2021-07-26 16:50   ` Tom Talpey
  2021-07-26 17:53     ` Chuck Lever III
  0 siblings, 1 reply; 9+ messages in thread
From: Tom Talpey @ 2021-07-26 16:50 UTC (permalink / raw)
  To: Chuck Lever, linux-nfs, linux-rdma

On 7/26/2021 10:46 AM, Chuck Lever wrote:
>   /**
>    * svc_rdma_wc_send - Invoked by RDMA provider for each polled Send WC
>    * @cq: Completion Queue context
> @@ -275,11 +289,9 @@ static void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc)
>   
>   	trace_svcrdma_wc_send(wc, &ctxt->sc_cid);
>   
> +	svc_rdma_wake_send_waiters(rdma, 1);
>   	complete(&ctxt->sc_done);
>   
> -	atomic_inc(&rdma->sc_sq_avail);
> -	wake_up(&rdma->sc_send_wait);

This appears to change the order of wake_up() vs complete().
Is that intentional? Is there any possibility of a false
scheduler activation, later leading to a second wakeup or poll?

Tom.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 0/3] Optimize NFSD Send completion processing
  2021-07-26 14:46 [PATCH v1 0/3] Optimize NFSD Send completion processing Chuck Lever
                   ` (2 preceding siblings ...)
  2021-07-26 14:47 ` [PATCH v1 3/3] svcrdma: Convert rdma->sc_rw_ctxts to llist Chuck Lever
@ 2021-07-26 16:52 ` Tom Talpey
  2021-07-26 17:47   ` Chuck Lever III
  3 siblings, 1 reply; 9+ messages in thread
From: Tom Talpey @ 2021-07-26 16:52 UTC (permalink / raw)
  To: Chuck Lever, linux-nfs, linux-rdma

On 7/26/2021 10:46 AM, Chuck Lever wrote:
> The following series improves the efficiency of NFSD's Send
> completion processing by removing costly operations from the svcrdma
> Send completion handlers. Each of these patches reduces the CPU
> utilized per RPC by Send completion by an average of 2-3%.

Nice gain. Are the improvements additive, i.e. 5-10% CPU reduction
for all three together?

Tom

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 0/3] Optimize NFSD Send completion processing
  2021-07-26 16:52 ` [PATCH v1 0/3] Optimize NFSD Send completion processing Tom Talpey
@ 2021-07-26 17:47   ` Chuck Lever III
  0 siblings, 0 replies; 9+ messages in thread
From: Chuck Lever III @ 2021-07-26 17:47 UTC (permalink / raw)
  To: Tom Talpey; +Cc: Linux NFS Mailing List, linux-rdma



> On Jul 26, 2021, at 12:52 PM, Tom Talpey <tom@talpey.com> wrote:
> 
> On 7/26/2021 10:46 AM, Chuck Lever wrote:
>> The following series improves the efficiency of NFSD's Send
>> completion processing by removing costly operations from the svcrdma
>> Send completion handlers. Each of these patches reduces the CPU
>> utilized per RPC by Send completion by an average of 2-3%.
> 
> Nice gain. Are the improvements additive, i.e. 5-10% CPU reduction
> for all three together?

Yes, the improvements are additive.

--
Chuck Lever




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 1/3] svcrdma: Fewer calls to wake_up() in Send completion handler
  2021-07-26 16:50   ` Tom Talpey
@ 2021-07-26 17:53     ` Chuck Lever III
  2021-07-26 20:23       ` Tom Talpey
  0 siblings, 1 reply; 9+ messages in thread
From: Chuck Lever III @ 2021-07-26 17:53 UTC (permalink / raw)
  To: Tom Talpey; +Cc: Linux NFS Mailing List, linux-rdma



> On Jul 26, 2021, at 12:50 PM, Tom Talpey <tom@talpey.com> wrote:
> 
> On 7/26/2021 10:46 AM, Chuck Lever wrote:
>>  /**
>>   * svc_rdma_wc_send - Invoked by RDMA provider for each polled Send WC
>>   * @cq: Completion Queue context
>> @@ -275,11 +289,9 @@ static void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc)
>>    	trace_svcrdma_wc_send(wc, &ctxt->sc_cid);
>>  +	svc_rdma_wake_send_waiters(rdma, 1);
>>  	complete(&ctxt->sc_done);
>>  -	atomic_inc(&rdma->sc_sq_avail);
>> -	wake_up(&rdma->sc_send_wait);
> 
> This appears to change the order of wake_up() vs complete().
> Is that intentional?

IIRC I reversed the order here to be consistent with the other
Send completion handlers.


> Is there any possibility of a false
> scheduler activation, later leading to a second wakeup or poll?

The two "wake-ups" here are not related to each other, and RPC
Replies are transmitted already so this shouldn't have a direct
impact on server latency. But I might not understand your
question.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v1 1/3] svcrdma: Fewer calls to wake_up() in Send completion handler
  2021-07-26 17:53     ` Chuck Lever III
@ 2021-07-26 20:23       ` Tom Talpey
  0 siblings, 0 replies; 9+ messages in thread
From: Tom Talpey @ 2021-07-26 20:23 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: Linux NFS Mailing List, linux-rdma

On 7/26/2021 1:53 PM, Chuck Lever III wrote:
> 
> 
>> On Jul 26, 2021, at 12:50 PM, Tom Talpey <tom@talpey.com> wrote:
>>
>> On 7/26/2021 10:46 AM, Chuck Lever wrote:
>>>   /**
>>>    * svc_rdma_wc_send - Invoked by RDMA provider for each polled Send WC
>>>    * @cq: Completion Queue context
>>> @@ -275,11 +289,9 @@ static void svc_rdma_wc_send(struct ib_cq *cq, struct ib_wc *wc)
>>>     	trace_svcrdma_wc_send(wc, &ctxt->sc_cid);
>>>   +	svc_rdma_wake_send_waiters(rdma, 1);
>>>   	complete(&ctxt->sc_done);
>>>   -	atomic_inc(&rdma->sc_sq_avail);
>>> -	wake_up(&rdma->sc_send_wait);
>>
>> This appears to change the order of wake_up() vs complete().
>> Is that intentional?
> 
> IIRC I reversed the order here to be consistent with the other
> Send completion handlers.
> 
> 
>> Is there any possibility of a false
>> scheduler activation, later leading to a second wakeup or poll?
> 
> The two "wake-ups" here are not related to each other, and RPC
> Replies are transmitted already so this shouldn't have a direct
> impact on server latency. But I might not understand your
> question.

IIUC, you're saying that the thread which is awaiting the
completion of ctxt->sc_done is not also waiting to send
anything, therefore no thread is going to experience a
fire drill. Ok.

Feel free to add my
   Reviewed-By: Tom Talpey <tom@talpey.com>
to the series.

Tom.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-07-26 20:30 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-26 14:46 [PATCH v1 0/3] Optimize NFSD Send completion processing Chuck Lever
2021-07-26 14:46 ` [PATCH v1 1/3] svcrdma: Fewer calls to wake_up() in Send completion handler Chuck Lever
2021-07-26 16:50   ` Tom Talpey
2021-07-26 17:53     ` Chuck Lever III
2021-07-26 20:23       ` Tom Talpey
2021-07-26 14:47 ` [PATCH v1 2/3] svcrdma: Relieve contention on sc_send_lock Chuck Lever
2021-07-26 14:47 ` [PATCH v1 3/3] svcrdma: Convert rdma->sc_rw_ctxts to llist Chuck Lever
2021-07-26 16:52 ` [PATCH v1 0/3] Optimize NFSD Send completion processing Tom Talpey
2021-07-26 17:47   ` Chuck Lever III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).