All of lore.kernel.org
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "chuck.lever@oracle.com" <chuck.lever@oracle.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"bfields@redhat.com" <bfields@redhat.com>
Subject: Re: [PATCH] SUNRPC: Use TCP_CORK to optimise send performance on the server
Date: Sun, 14 Feb 2021 16:21:31 +0000	[thread overview]
Message-ID: <8ce015080f2f4eb57a81e10041947ad7e6f31499.camel@hammerspace.com> (raw)
In-Reply-To: <4CD2739A-D39B-48C9-BCCF-A9DF1047D507@oracle.com>

On Sat, 2021-02-13 at 23:30 +0000, Chuck Lever wrote:
> 
> 
> > On Feb 13, 2021, at 5:10 PM, Trond Myklebust <
> > trondmy@hammerspace.com> wrote:
> > 
> > On Sat, 2021-02-13 at 21:53 +0000, Chuck Lever wrote:
> > > Hi Trond-
> > > 
> > > > On Feb 13, 2021, at 3:25 PM, trondmy@kernel.org wrote:
> > > > 
> > > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > > 
> > > > Use a counter to keep track of how many requests are queued
> > > > behind
> > > > the
> > > > xprt->xpt_mutex, and keep TCP_CORK set until the queue is
> > > > empty.
> > > 
> > > I'm intrigued, but IMO, the patch description needs to explain
> > > why this change should be made. Why abandon Nagle?
> > > 
> > 
> > This doesn't change the Nagle/TCP_NODELAY settings. It just
> > switches to
> > using the new documented kernel interface.
> > 
> > The only change is to use TCP_CORK so that we don't send out
> > partially
> > filled TCP frames, when we can see that there are other RPC replies
> > that are queued up for transmission.
> > 
> > Note the combination TCP_CORK+TCP_NODELAY is common, and the main
> > effect of the latter is that when we turn off the TCP_CORK, then
> > there
> > is an immediate forced push of the TCP queue.
> 
> The description above suggests the patch is just a
> clean-up, but a forced push has potential to change
> the server's behavior.

Well, yes. That's very much the point.

Right now, the TCP_NODELAY/Nagle setting means that we're doing that
forced push at the end of _every_ RPC reply, whether or not there is
more stuff that can be queued up in the socket. The MSG_MORE is the
only thing that keeps us from doing the forced push on every sendpage()
call.
So the TCP_CORK is there to _further delay_ that forced push until we
think the queue is empty.

IOW: it attempts to optimise the scheduling of that push until we're
actually done pushing more stuff into the socket.

> I'm trying to assess what kind of additional testing
> would be valuable.
> 
> 
> > > If you expect a performance impact, the description should
> > > provide metrics to show it.
> > 
> > I don't have a performance system to measure the improvement
> > accurately.
> 
> Then let's have Daire try it out, if possible.
> 
> 
> > However I am seeing an notable improvement with the
> > equivalent client change. Specifically, xfstests generic/127 shows
> > a
> > ~20% improvement on my VM based test rig.
> 
> > > (We should have Daire try this change with his multi-client
> > > setup).
> > > 
> > > 
> > > > Signed-off-by: Trond Myklebust
> > > > <trond.myklebust@hammerspace.com>
> > > > ---
> > > > include/linux/sunrpc/svcsock.h | 2 ++
> > > > net/sunrpc/svcsock.c           | 8 +++++++-
> > > > 2 files changed, 9 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/include/linux/sunrpc/svcsock.h
> > > > b/include/linux/sunrpc/svcsock.h
> > > > index b7ac7fe68306..bcc555c7ae9c 100644
> > > > --- a/include/linux/sunrpc/svcsock.h
> > > > +++ b/include/linux/sunrpc/svcsock.h
> > > > @@ -35,6 +35,8 @@ struct svc_sock {
> > > >         /* Total length of the data (not including fragment
> > > > headers)
> > > >          * received so far in the fragments making up this rpc:
> > > > */
> > > >         u32                     sk_datalen;
> > > > +       /* Number of queued send requests */
> > > > +       atomic_t                sk_sendqlen;
> > > 
> > > Can you take advantage of xpt_mutex: update this field
> > > only in the critical section, and make it a simple
> > > integer type?
> 
> I see why atomic_inc(&sk_sendqlen); has to happen outside
> the mutex. However, it seems a little crazy/costly to have
> both of these serialization primitives.
> 
> I haven't found a mutex_unlock() that will report that there
> are no waiters to wake up, though.
> 
> 
> > > >         struct page *           sk_pages[RPCSVC_MAXPAGES];     
> > > > /*
> > > > received data */
> > > > };
> > > > diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> > > > index 5a809c64dc7b..231f510a4830 100644
> > > > --- a/net/sunrpc/svcsock.c
> > > > +++ b/net/sunrpc/svcsock.c
> > > > @@ -1171,18 +1171,23 @@ static int svc_tcp_sendto(struct
> > > > svc_rqst
> > > > *rqstp)
> > > > 
> > > >         svc_tcp_release_rqst(rqstp);
> > > > 
> > > > +       atomic_inc(&svsk->sk_sendqlen);
> > > >         mutex_lock(&xprt->xpt_mutex);
> > > >         if (svc_xprt_is_dead(xprt))
> > > >                 goto out_notconn;
> > > > +       tcp_sock_set_cork(svsk->sk_sk, true);
> > > >         err = svc_tcp_sendmsg(svsk->sk_sock, &msg, xdr, marker,
> > > > &sent);
> > > >         xdr_free_bvec(xdr);
> > > >         trace_svcsock_tcp_send(xprt, err < 0 ? err : sent);
> > > >         if (err < 0 || sent != (xdr->len + sizeof(marker)))
> > > >                 goto out_close;
> > > > +       if (atomic_dec_and_test(&svsk->sk_sendqlen))
> > > > +               tcp_sock_set_cork(svsk->sk_sk, false);
> > > >         mutex_unlock(&xprt->xpt_mutex);
> > > >         return sent;
> > > > 
> > > > out_notconn:
> > > > +       atomic_dec(&svsk->sk_sendqlen);
> > > >         mutex_unlock(&xprt->xpt_mutex);
> > > >         return -ENOTCONN;
> > > > out_close:
> > > > @@ -1192,6 +1197,7 @@ static int svc_tcp_sendto(struct svc_rqst
> > > > *rqstp)
> > > >                   (err < 0) ? err : sent, xdr->len);
> > > >         set_bit(XPT_CLOSE, &xprt->xpt_flags);
> > > >         svc_xprt_enqueue(xprt);
> > > > +       atomic_dec(&svsk->sk_sendqlen);
> > > >         mutex_unlock(&xprt->xpt_mutex);
> > > >         return -EAGAIN;
> > > > }
> > > > @@ -1261,7 +1267,7 @@ static void svc_tcp_init(struct svc_sock
> > > > *svsk, struct svc_serv *serv)
> > > >                 svsk->sk_datalen = 0;
> > > >                 memset(&svsk->sk_pages[0], 0, sizeof(svsk-
> > > > > sk_pages));
> > > > 
> > > > -               tcp_sk(sk)->nonagle |= TCP_NAGLE_OFF;
> > > > +               tcp_sock_set_nodelay(sk);
> > > > 
> > > >                 set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
> > > >                 switch (sk->sk_state) {
> > > > -- 
> > > > 2.29.2
> > > > 
> > > 
> > > --
> > > Chuck Lever
> > > 
> > > 
> > > 
> > 
> > -- 
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > trond.myklebust@hammerspace.com
> 
> --
> Chuck Lever
> 
> 
> 

-- 
Trond Myklebust
CTO, Hammerspace Inc
4984 El Camino Real, Suite 208
Los Altos, CA 94022
​
www.hammer.space


  parent reply	other threads:[~2021-02-14 16:22 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-13 20:25 [PATCH] SUNRPC: Use TCP_CORK to optimise send performance on the server trondmy
2021-02-13 21:53 ` Chuck Lever
2021-02-13 22:10   ` Trond Myklebust
2021-02-13 23:30     ` Chuck Lever
2021-02-14 14:13       ` Daire Byrne
2021-02-14 16:24         ` Trond Myklebust
2021-02-14 16:59         ` Chuck Lever
2021-02-15 14:08           ` Daire Byrne
2021-02-15 18:22             ` Chuck Lever
2021-02-14 16:21       ` Trond Myklebust [this message]
2021-02-14 17:27         ` Chuck Lever
2021-02-14 17:41           ` Trond Myklebust
2021-02-14 17:58             ` Chuck Lever
2021-02-14 18:18               ` Trond Myklebust
2021-02-14 19:43                 ` Chuck Lever
2021-02-16 16:27 ` Chuck Lever
2021-02-16 17:10   ` Trond Myklebust
2021-02-16 17:12     ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8ce015080f2f4eb57a81e10041947ad7e6f31499.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=bfields@redhat.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.