linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"chuck.lever@oracle.com" <chuck.lever@oracle.com>
Subject: Re: [PATCH RFC] SUNRPC: Use zero-copy to perform socket send operations
Date: Mon, 9 Nov 2020 17:08:56 +0000	[thread overview]
Message-ID: <9ce015245c916b2c90de72440a22f801142f2c6e.camel@hammerspace.com> (raw)
In-Reply-To: <160493771006.15633.8524084764848931537.stgit@klimt.1015granger.net>

On Mon, 2020-11-09 at 11:03 -0500, Chuck Lever wrote:
> Daire Byrne reports a ~50% aggregrate throughput regression on his
> Linux NFS server after commit da1661b93bf4 ("SUNRPC: Teach server to
> use xprt_sock_sendmsg for socket sends"), which replaced
> kernel_send_page() calls in NFSD's socket send path with calls to
> sock_sendmsg() using iov_iter.
> 
> Investigation showed that tcp_sendmsg() was not using zero-copy to
> send the xdr_buf's bvec pages, but instead was relying on memcpy.
> 
> Set up the socket and each msghdr that bears bvec pages to use the
> zero-copy mechanism in tcp_sendmsg.
> 
> Reported-by: Daire Byrne <daire@dneg.com>
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209439
> Fixes: da1661b93bf4 ("SUNRPC: Teach server to use xprt_sock_sendmsg
> for socket sends")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  net/sunrpc/socklib.c  |    5 ++++-
>  net/sunrpc/svcsock.c  |    1 +
>  net/sunrpc/xprtsock.c |    1 +
>  3 files changed, 6 insertions(+), 1 deletion(-)
> 
> This patch does not fully resolve the issue. Daire reports high
> softIRQ activity after the patch is applied, and this activity
> seems to prevent full restoration of previous performance.
> 
> 
> diff --git a/net/sunrpc/socklib.c b/net/sunrpc/socklib.c
> index d52313af82bc..af47596a7bdd 100644
> --- a/net/sunrpc/socklib.c
> +++ b/net/sunrpc/socklib.c
> @@ -226,9 +226,12 @@ static int xprt_send_pagedata(struct socket
> *sock, struct msghdr *msg,
>         if (err < 0)
>                 return err;
>  
> +       msg->msg_flags |= MSG_ZEROCOPY;
>         iov_iter_bvec(&msg->msg_iter, WRITE, xdr->bvec,
> xdr_buf_pagecount(xdr),
>                       xdr->page_len + xdr->page_base);
> -       return xprt_sendmsg(sock, msg, base + xdr->page_base);
> +       err = xprt_sendmsg(sock, msg, base + xdr->page_base);
> +       msg->msg_flags &= ~MSG_ZEROCOPY;
> +       return err;
>  }
>  
>  /* Common case:
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index c2752e2b9ce3..c814b4953b15 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -1176,6 +1176,7 @@ static void svc_tcp_init(struct svc_sock *svsk,
> struct svc_serv *serv)
>                 svsk->sk_datalen = 0;
>                 memset(&svsk->sk_pages[0], 0, sizeof(svsk-
> >sk_pages));
>  
> +               sock_set_flag(sk, SOCK_ZEROCOPY);
>                 tcp_sk(sk)->nonagle |= TCP_NAGLE_OFF;
>  
>                 set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags);
> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> index 7090bbee0ec5..343c6396b297 100644
> --- a/net/sunrpc/xprtsock.c
> +++ b/net/sunrpc/xprtsock.c
> @@ -2175,6 +2175,7 @@ static int xs_tcp_finish_connecting(struct
> rpc_xprt *xprt, struct socket *sock)
>  
>                 /* socket options */
>                 sock_reset_flag(sk, SOCK_LINGER);
> +               sock_set_flag(sk, SOCK_ZEROCOPY);
>                 tcp_sk(sk)->nonagle |= TCP_NAGLE_OFF;
>  
>                 xprt_clear_connected(xprt);
> 
> 
I'm thinking we are not really allowed to do that here. The pages we
pass in to the RPC layer are not guaranteed to contain stable data
since they include unlocked page cache pages as well as O_DIRECT pages.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



  reply	other threads:[~2020-11-09 17:09 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-09 16:03 [PATCH RFC] SUNRPC: Use zero-copy to perform socket send operations Chuck Lever
2020-11-09 17:08 ` Trond Myklebust [this message]
2020-11-09 17:12   ` Chuck Lever
2020-11-09 17:32     ` Trond Myklebust
2020-11-09 17:36       ` Chuck Lever
2020-11-09 17:55         ` J. Bruce Fields
2020-11-09 18:16         ` Trond Myklebust
2020-11-09 19:31           ` Chuck Lever
2020-11-09 20:10             ` Eric Dumazet
2020-11-09 20:11               ` Chuck Lever
2020-11-10 14:49               ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9ce015245c916b2c90de72440a22f801142f2c6e.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).