All of lore.kernel.org
 help / color / mirror / Atom feed
From: "NeilBrown" <neilb@suse.de>
To: "Trond Myklebust" <trondmy@hammerspace.com>
Cc: "david@fromorbit.com" <david@fromorbit.com>,
	"bfields@fieldses.org" <bfields@fieldses.org>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"chuck.lever@oracle.com" <chuck.lever@oracle.com>
Subject: Re: sporadic hangs on generic/186
Date: Thu, 07 Apr 2022 14:11:28 +1000	[thread overview]
Message-ID: <164930468885.10985.9905950866720150663@noble.neil.brown.name> (raw)
In-Reply-To: <b282c5b98c4518952f62662ea3ba1d4e6ef85f26.camel@hammerspace.com>

On Thu, 07 Apr 2022, Trond Myklebust wrote:
> On Thu, 2022-04-07 at 11:19 +1000, NeilBrown wrote:
> > On Thu, 07 Apr 2022, NeilBrown wrote:
> > > On Thu, 07 Apr 2022, Dave Chinner wrote:
> > > > On Wed, Apr 06, 2022 at 03:54:24PM -0400, J. Bruce Fields wrote:
> > > > > In the last couple days I've started getting hangs on xfstests
> > > > > generic/186 on upstream.  I also notice the test completes
> > > > > after 10+
> > > > > hours (usually it takes about 5 minutes).  Sometimes this is
> > > > > accompanied
> > > > > by "nfs: RPC call returned error 12" on the client.
> > > > 
> > > > #define ENOMEM          12      /* Out of memory */
> > > > 
> > > > So either the client or the server is running out of memory
> > > > somewhere?
> > > 
> > > Probably the client.  There are a bunch of changes recently which
> > > add
> > > __GFP_NORETRY to memory allocations from PF_WQ_WORKERs because that
> > > can
> > > result in deadlocks when swapping over NFS.
> > > This means that kmalloc request that previously never failed
> > > (because
> > > GFP_KERNEL never fails for kernel threads I think) can now fail. 
> > > This
> > > has tickled one bug that I know of.  There are likely to be more.
> > > 
> > > The RPC code should simply retry these allocations after a short
> > > delay. 
> > > HZ/4 is the number that is used in a couple of places.  Possibly
> > > there
> > > are more places that need to handle -ENOMEM with rpc_delay().
> > 
> > I had a look through the various places where alloc can now fail.
> > 
> > I think xdr_alloc_bvec() in xprt_sent_pagedata() is the most likely
> > cause of a problem here.  I don't think an -ENOMEM from there is
> > caught,
> > so it could likely filter up to NFS and result in the message you
> > got.
> > 
> > I don't think we can easily handle failure there.  We need to stay
> > with
> > GFP_KERNEL rely on PF_MEMALLOC to make forward progress for
> > swap-over-NFS.
> > 
> > Bruce: can you change that one line back to GFP_KERNEL and see if the
> > problem goes away?
> > 
> 
> Can we please just move the call to xdr_alloc_bvec() out of
> xprt_send_pagedata()? Move the client side allocation into
> xs_stream_prepare_request() and xs_udp_send_request(), then move the
> server side allocation into svc_udp_sendto().
> 
> That makes it possible to handle errors.

Like the below I guess.  Seems sensible, but I don't know the code well
enough to really review it.

> 
> > The other problem I found is that rpc_alloc_task() can now fail, but
> > rpc_new_task assumes that it never will.  If it does, then we get a
> > NULL
> > deref.
> > 
> > I don't think rpc_new_task() can ever be called from the rpciod work
> > queue, so it is safe to just use a mempool with GFP_KERNEL like we
> > did
> > before. 
> > 
> No. We shouldn't ever use mempools with GFP_KERNEL.

Why not?  mempools with GFP_KERNEL make perfect sense outside of the
rpciod and nfsiod threads.

> 
> Most, if not all of the callers of rpc_run_task() are still capable of
> dealing with errors, and ditto for the callers of rpc_run_bc_task().

Yes, they can deal with errors.  But in many cases that do so by passing
the error up the call stack so we could start getting ENOMEM for
systemcalls like stat().  I don't think that is a good idea.

Thanks,
NeilBrown


> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
> 
> 

diff --git a/net/sunrpc/socklib.c b/net/sunrpc/socklib.c
index 05b38bf68316..71ba4cf513bc 100644
--- a/net/sunrpc/socklib.c
+++ b/net/sunrpc/socklib.c
@@ -221,12 +221,6 @@ static int xprt_send_kvec(struct socket *sock, struct msghdr *msg,
 static int xprt_send_pagedata(struct socket *sock, struct msghdr *msg,
 			      struct xdr_buf *xdr, size_t base)
 {
-	int err;
-
-	err = xdr_alloc_bvec(xdr, rpc_task_gfp_mask());
-	if (err < 0)
-		return err;
-
 	iov_iter_bvec(&msg->msg_iter, WRITE, xdr->bvec, xdr_buf_pagecount(xdr),
 		      xdr->page_len + xdr->page_base);
 	return xprt_sendmsg(sock, msg, base + xdr->page_base);
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 78af7518f263..2661828f7a85 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -828,6 +828,9 @@ xs_stream_prepare_request(struct rpc_rqst *req)
 	xdr_free_bvec(&req->rq_rcv_buf);
 	req->rq_task->tk_status = xdr_alloc_bvec(
 		&req->rq_rcv_buf, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
+	if (req->rq_task->tk_status == 0)
+	req->rq_task->tk_status = xdr_alloc_bvec(
+		&req->rq_snd_buf, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
 }
 
 /*

  reply	other threads:[~2022-04-07  4:11 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-06 19:54 sporadic hangs on generic/186 J. Bruce Fields
2022-04-06 19:58 ` Chuck Lever III
2022-04-07  0:14 ` Dave Chinner
2022-04-07  0:27   ` NeilBrown
2022-04-07  1:19     ` NeilBrown
2022-04-07  1:49       ` J. Bruce Fields
2022-04-07  4:23         ` NeilBrown
2022-04-07  1:54       ` Trond Myklebust
2022-04-07  4:11         ` NeilBrown [this message]
2022-04-07 13:01           ` Trond Myklebust
2022-04-08  2:46             ` NeilBrown
2022-04-08  5:03               ` Dave Chinner
2022-04-08  5:32                 ` NeilBrown
2022-04-10 23:34                   ` Dave Chinner
2022-04-12  3:13                     ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=164930468885.10985.9905950866720150663@noble.neil.brown.name \
    --to=neilb@suse.de \
    --cc=bfields@fieldses.org \
    --cc=chuck.lever@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.