* sporadic hangs on generic/186 @ 2022-04-06 19:54 J. Bruce Fields 2022-04-06 19:58 ` Chuck Lever III 2022-04-07 0:14 ` Dave Chinner 0 siblings, 2 replies; 15+ messages in thread From: J. Bruce Fields @ 2022-04-06 19:54 UTC (permalink / raw) To: Chuck Lever; +Cc: linux-nfs, linux-fsdevel In the last couple days I've started getting hangs on xfstests generic/186 on upstream. I also notice the test completes after 10+ hours (usually it takes about 5 minutes). Sometimes this is accompanied by "nfs: RPC call returned error 12" on the client. Test description is: # Ensuring that copy on write in buffered mode works when free space # is heavily fragmented. # - Create two files # - Reflink the odd blocks of the first file into a third file. # - Reflink the even blocks of the second file into the third file. # - Try to fragment the free space by allocating a huge file and # punching out every other block. # - CoW across the halfway mark. # - Check that the files are now different where we say they're # different. so maybe it's really some xfs change, I don't know. Or maybe it's a problem with my particular test filesystem (which doesn't get recreated for each test run). The problem doesn't reproduce easily enough to bisect. I may just turn off that test for now. --b. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: sporadic hangs on generic/186 2022-04-06 19:54 sporadic hangs on generic/186 J. Bruce Fields @ 2022-04-06 19:58 ` Chuck Lever III 2022-04-07 0:14 ` Dave Chinner 1 sibling, 0 replies; 15+ messages in thread From: Chuck Lever III @ 2022-04-06 19:58 UTC (permalink / raw) To: Bruce Fields; +Cc: Linux NFS Mailing List, linux-fsdevel > On Apr 6, 2022, at 3:54 PM, J. Bruce Fields <bfields@fieldses.org> wrote: > > In the last couple days I've started getting hangs on xfstests > generic/186 on upstream. I also notice the test completes after 10+ > hours (usually it takes about 5 minutes). Sometimes this is accompanied > by "nfs: RPC call returned error 12" on the client. > > Test description is: > > # Ensuring that copy on write in buffered mode works when free space > # is heavily fragmented. > # - Create two files > # - Reflink the odd blocks of the first file into a third file. > # - Reflink the even blocks of the second file into the third file. > # - Try to fragment the free space by allocating a huge file and > # punching out every other block. > # - CoW across the halfway mark. > # - Check that the files are now different where we say they're > # different. > > so maybe it's really some xfs change, I don't know. Or maybe it's a > problem with my particular test filesystem (which doesn't get recreated > for each test run). > > The problem doesn't reproduce easily enough to bisect. > > I may just turn off that test for now. Thanks for the report. I don't have a SCRATCH_DEV so generic/186 isn't run on my systems. -- Chuck Lever ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: sporadic hangs on generic/186 2022-04-06 19:54 sporadic hangs on generic/186 J. Bruce Fields 2022-04-06 19:58 ` Chuck Lever III @ 2022-04-07 0:14 ` Dave Chinner 2022-04-07 0:27 ` NeilBrown 1 sibling, 1 reply; 15+ messages in thread From: Dave Chinner @ 2022-04-07 0:14 UTC (permalink / raw) To: J. Bruce Fields; +Cc: Chuck Lever, linux-nfs, linux-fsdevel On Wed, Apr 06, 2022 at 03:54:24PM -0400, J. Bruce Fields wrote: > In the last couple days I've started getting hangs on xfstests > generic/186 on upstream. I also notice the test completes after 10+ > hours (usually it takes about 5 minutes). Sometimes this is accompanied > by "nfs: RPC call returned error 12" on the client. #define ENOMEM 12 /* Out of memory */ So either the client or the server is running out of memory somewhere? > Test description is: > > # Ensuring that copy on write in buffered mode works when free space > # is heavily fragmented. > # - Create two files > # - Reflink the odd blocks of the first file into a third file. > # - Reflink the even blocks of the second file into the third file. > # - Try to fragment the free space by allocating a huge file and > # punching out every other block. > # - CoW across the halfway mark. > # - Check that the files are now different where we say they're > # different. It's using clone_file_range and hole punch, so I'm assuming that this is nfsv4 client side testing? In which case, the behaviour is going to be largely dependent on the server side implementation of these functions? Does reverting the client or the server to an older kernel resolve the issue? Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: sporadic hangs on generic/186 2022-04-07 0:14 ` Dave Chinner @ 2022-04-07 0:27 ` NeilBrown 2022-04-07 1:19 ` NeilBrown 0 siblings, 1 reply; 15+ messages in thread From: NeilBrown @ 2022-04-07 0:27 UTC (permalink / raw) To: Dave Chinner, Trond Myklebust Cc: J. Bruce Fields, Chuck Lever, linux-nfs, linux-fsdevel On Thu, 07 Apr 2022, Dave Chinner wrote: > On Wed, Apr 06, 2022 at 03:54:24PM -0400, J. Bruce Fields wrote: > > In the last couple days I've started getting hangs on xfstests > > generic/186 on upstream. I also notice the test completes after 10+ > > hours (usually it takes about 5 minutes). Sometimes this is accompanied > > by "nfs: RPC call returned error 12" on the client. > > #define ENOMEM 12 /* Out of memory */ > > So either the client or the server is running out of memory > somewhere? Probably the client. There are a bunch of changes recently which add __GFP_NORETRY to memory allocations from PF_WQ_WORKERs because that can result in deadlocks when swapping over NFS. This means that kmalloc request that previously never failed (because GFP_KERNEL never fails for kernel threads I think) can now fail. This has tickled one bug that I know of. There are likely to be more. The RPC code should simply retry these allocations after a short delay. HZ/4 is the number that is used in a couple of places. Possibly there are more places that need to handle -ENOMEM with rpc_delay(). NeilBrown ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: sporadic hangs on generic/186 2022-04-07 0:27 ` NeilBrown @ 2022-04-07 1:19 ` NeilBrown 2022-04-07 1:49 ` J. Bruce Fields 2022-04-07 1:54 ` Trond Myklebust 0 siblings, 2 replies; 15+ messages in thread From: NeilBrown @ 2022-04-07 1:19 UTC (permalink / raw) To: Dave Chinner, Trond Myklebust Cc: J. Bruce Fields, Chuck Lever, linux-nfs, linux-fsdevel On Thu, 07 Apr 2022, NeilBrown wrote: > On Thu, 07 Apr 2022, Dave Chinner wrote: > > On Wed, Apr 06, 2022 at 03:54:24PM -0400, J. Bruce Fields wrote: > > > In the last couple days I've started getting hangs on xfstests > > > generic/186 on upstream. I also notice the test completes after 10+ > > > hours (usually it takes about 5 minutes). Sometimes this is accompanied > > > by "nfs: RPC call returned error 12" on the client. > > > > #define ENOMEM 12 /* Out of memory */ > > > > So either the client or the server is running out of memory > > somewhere? > > Probably the client. There are a bunch of changes recently which add > __GFP_NORETRY to memory allocations from PF_WQ_WORKERs because that can > result in deadlocks when swapping over NFS. > This means that kmalloc request that previously never failed (because > GFP_KERNEL never fails for kernel threads I think) can now fail. This > has tickled one bug that I know of. There are likely to be more. > > The RPC code should simply retry these allocations after a short delay. > HZ/4 is the number that is used in a couple of places. Possibly there > are more places that need to handle -ENOMEM with rpc_delay(). I had a look through the various places where alloc can now fail. I think xdr_alloc_bvec() in xprt_sent_pagedata() is the most likely cause of a problem here. I don't think an -ENOMEM from there is caught, so it could likely filter up to NFS and result in the message you got. I don't think we can easily handle failure there. We need to stay with GFP_KERNEL rely on PF_MEMALLOC to make forward progress for swap-over-NFS. Bruce: can you change that one line back to GFP_KERNEL and see if the problem goes away? The other problem I found is that rpc_alloc_task() can now fail, but rpc_new_task assumes that it never will. If it does, then we get a NULL deref. I don't think rpc_new_task() can ever be called from the rpciod work queue, so it is safe to just use a mempool with GFP_KERNEL like we did before. NeilBrown ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: sporadic hangs on generic/186 2022-04-07 1:19 ` NeilBrown @ 2022-04-07 1:49 ` J. Bruce Fields 2022-04-07 4:23 ` NeilBrown 2022-04-07 1:54 ` Trond Myklebust 1 sibling, 1 reply; 15+ messages in thread From: J. Bruce Fields @ 2022-04-07 1:49 UTC (permalink / raw) To: NeilBrown Cc: Dave Chinner, Trond Myklebust, Chuck Lever, linux-nfs, linux-fsdevel On Thu, Apr 07, 2022 at 11:19:34AM +1000, NeilBrown wrote: > I had a look through the various places where alloc can now fail. > > I think xdr_alloc_bvec() in xprt_sent_pagedata() is the most likely > cause of a problem here. I don't think an -ENOMEM from there is caught, > so it could likely filter up to NFS and result in the message you got. > > I don't think we can easily handle failure there. We need to stay with > GFP_KERNEL rely on PF_MEMALLOC to make forward progress for > swap-over-NFS. > > Bruce: can you change that one line back to GFP_KERNEL and see if the > problem goes away? Like this? Sure--might take me a day or two to run the tests and get results back.--b. diff --git a/net/sunrpc/socklib.c b/net/sunrpc/socklib.c index 05b38bf68316..506627dc9a0f 100644 --- a/net/sunrpc/socklib.c +++ b/net/sunrpc/socklib.c @@ -223,7 +223,7 @@ static int xprt_send_pagedata(struct socket *sock, struct msghdr *msg, { int err; - err = xdr_alloc_bvec(xdr, rpc_task_gfp_mask()); + err = xdr_alloc_bvec(xdr, GFP_KERNEL); if (err < 0) return err; ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: sporadic hangs on generic/186 2022-04-07 1:49 ` J. Bruce Fields @ 2022-04-07 4:23 ` NeilBrown 0 siblings, 0 replies; 15+ messages in thread From: NeilBrown @ 2022-04-07 4:23 UTC (permalink / raw) To: J. Bruce Fields Cc: Dave Chinner, Trond Myklebust, Chuck Lever, linux-nfs, linux-fsdevel On Thu, 07 Apr 2022, J. Bruce Fields wrote: > On Thu, Apr 07, 2022 at 11:19:34AM +1000, NeilBrown wrote: > > I had a look through the various places where alloc can now fail. > > > > I think xdr_alloc_bvec() in xprt_sent_pagedata() is the most likely > > cause of a problem here. I don't think an -ENOMEM from there is caught, > > so it could likely filter up to NFS and result in the message you got. > > > > I don't think we can easily handle failure there. We need to stay with > > GFP_KERNEL rely on PF_MEMALLOC to make forward progress for > > swap-over-NFS. > > > > Bruce: can you change that one line back to GFP_KERNEL and see if the > > problem goes away? > > Like this? Sure--might take me a day or two to run the tests and get > results back.--b. > > diff --git a/net/sunrpc/socklib.c b/net/sunrpc/socklib.c > index 05b38bf68316..506627dc9a0f 100644 > --- a/net/sunrpc/socklib.c > +++ b/net/sunrpc/socklib.c > @@ -223,7 +223,7 @@ static int xprt_send_pagedata(struct socket *sock, struct msghdr *msg, > { > int err; > > - err = xdr_alloc_bvec(xdr, rpc_task_gfp_mask()); > + err = xdr_alloc_bvec(xdr, GFP_KERNEL); > if (err < 0) > return err; > > That looks right. I instrumented my kernel to deliberately fail 10% of the time, and I got lots of nfs: RPC call returned error 12 so I'm fairly sure this explains that message. But you say the hangs were only occasionally accompanied by the message, so it probably doesn't explain the hangs. NeilBrown ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: sporadic hangs on generic/186 2022-04-07 1:19 ` NeilBrown 2022-04-07 1:49 ` J. Bruce Fields @ 2022-04-07 1:54 ` Trond Myklebust 2022-04-07 4:11 ` NeilBrown 1 sibling, 1 reply; 15+ messages in thread From: Trond Myklebust @ 2022-04-07 1:54 UTC (permalink / raw) To: david, neilb; +Cc: bfields, linux-nfs, linux-fsdevel, chuck.lever On Thu, 2022-04-07 at 11:19 +1000, NeilBrown wrote: > On Thu, 07 Apr 2022, NeilBrown wrote: > > On Thu, 07 Apr 2022, Dave Chinner wrote: > > > On Wed, Apr 06, 2022 at 03:54:24PM -0400, J. Bruce Fields wrote: > > > > In the last couple days I've started getting hangs on xfstests > > > > generic/186 on upstream. I also notice the test completes > > > > after 10+ > > > > hours (usually it takes about 5 minutes). Sometimes this is > > > > accompanied > > > > by "nfs: RPC call returned error 12" on the client. > > > > > > #define ENOMEM 12 /* Out of memory */ > > > > > > So either the client or the server is running out of memory > > > somewhere? > > > > Probably the client. There are a bunch of changes recently which > > add > > __GFP_NORETRY to memory allocations from PF_WQ_WORKERs because that > > can > > result in deadlocks when swapping over NFS. > > This means that kmalloc request that previously never failed > > (because > > GFP_KERNEL never fails for kernel threads I think) can now fail. > > This > > has tickled one bug that I know of. There are likely to be more. > > > > The RPC code should simply retry these allocations after a short > > delay. > > HZ/4 is the number that is used in a couple of places. Possibly > > there > > are more places that need to handle -ENOMEM with rpc_delay(). > > I had a look through the various places where alloc can now fail. > > I think xdr_alloc_bvec() in xprt_sent_pagedata() is the most likely > cause of a problem here. I don't think an -ENOMEM from there is > caught, > so it could likely filter up to NFS and result in the message you > got. > > I don't think we can easily handle failure there. We need to stay > with > GFP_KERNEL rely on PF_MEMALLOC to make forward progress for > swap-over-NFS. > > Bruce: can you change that one line back to GFP_KERNEL and see if the > problem goes away? > Can we please just move the call to xdr_alloc_bvec() out of xprt_send_pagedata()? Move the client side allocation into xs_stream_prepare_request() and xs_udp_send_request(), then move the server side allocation into svc_udp_sendto(). That makes it possible to handle errors. > The other problem I found is that rpc_alloc_task() can now fail, but > rpc_new_task assumes that it never will. If it does, then we get a > NULL > deref. > > I don't think rpc_new_task() can ever be called from the rpciod work > queue, so it is safe to just use a mempool with GFP_KERNEL like we > did > before. > No. We shouldn't ever use mempools with GFP_KERNEL. Most, if not all of the callers of rpc_run_task() are still capable of dealing with errors, and ditto for the callers of rpc_run_bc_task(). -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: sporadic hangs on generic/186 2022-04-07 1:54 ` Trond Myklebust @ 2022-04-07 4:11 ` NeilBrown 2022-04-07 13:01 ` Trond Myklebust 0 siblings, 1 reply; 15+ messages in thread From: NeilBrown @ 2022-04-07 4:11 UTC (permalink / raw) To: Trond Myklebust; +Cc: david, bfields, linux-nfs, linux-fsdevel, chuck.lever On Thu, 07 Apr 2022, Trond Myklebust wrote: > On Thu, 2022-04-07 at 11:19 +1000, NeilBrown wrote: > > On Thu, 07 Apr 2022, NeilBrown wrote: > > > On Thu, 07 Apr 2022, Dave Chinner wrote: > > > > On Wed, Apr 06, 2022 at 03:54:24PM -0400, J. Bruce Fields wrote: > > > > > In the last couple days I've started getting hangs on xfstests > > > > > generic/186 on upstream. I also notice the test completes > > > > > after 10+ > > > > > hours (usually it takes about 5 minutes). Sometimes this is > > > > > accompanied > > > > > by "nfs: RPC call returned error 12" on the client. > > > > > > > > #define ENOMEM 12 /* Out of memory */ > > > > > > > > So either the client or the server is running out of memory > > > > somewhere? > > > > > > Probably the client. There are a bunch of changes recently which > > > add > > > __GFP_NORETRY to memory allocations from PF_WQ_WORKERs because that > > > can > > > result in deadlocks when swapping over NFS. > > > This means that kmalloc request that previously never failed > > > (because > > > GFP_KERNEL never fails for kernel threads I think) can now fail. > > > This > > > has tickled one bug that I know of. There are likely to be more. > > > > > > The RPC code should simply retry these allocations after a short > > > delay. > > > HZ/4 is the number that is used in a couple of places. Possibly > > > there > > > are more places that need to handle -ENOMEM with rpc_delay(). > > > > I had a look through the various places where alloc can now fail. > > > > I think xdr_alloc_bvec() in xprt_sent_pagedata() is the most likely > > cause of a problem here. I don't think an -ENOMEM from there is > > caught, > > so it could likely filter up to NFS and result in the message you > > got. > > > > I don't think we can easily handle failure there. We need to stay > > with > > GFP_KERNEL rely on PF_MEMALLOC to make forward progress for > > swap-over-NFS. > > > > Bruce: can you change that one line back to GFP_KERNEL and see if the > > problem goes away? > > > > Can we please just move the call to xdr_alloc_bvec() out of > xprt_send_pagedata()? Move the client side allocation into > xs_stream_prepare_request() and xs_udp_send_request(), then move the > server side allocation into svc_udp_sendto(). > > That makes it possible to handle errors. Like the below I guess. Seems sensible, but I don't know the code well enough to really review it. > > > The other problem I found is that rpc_alloc_task() can now fail, but > > rpc_new_task assumes that it never will. If it does, then we get a > > NULL > > deref. > > > > I don't think rpc_new_task() can ever be called from the rpciod work > > queue, so it is safe to just use a mempool with GFP_KERNEL like we > > did > > before. > > > No. We shouldn't ever use mempools with GFP_KERNEL. Why not? mempools with GFP_KERNEL make perfect sense outside of the rpciod and nfsiod threads. > > Most, if not all of the callers of rpc_run_task() are still capable of > dealing with errors, and ditto for the callers of rpc_run_bc_task(). Yes, they can deal with errors. But in many cases that do so by passing the error up the call stack so we could start getting ENOMEM for systemcalls like stat(). I don't think that is a good idea. Thanks, NeilBrown > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > > > diff --git a/net/sunrpc/socklib.c b/net/sunrpc/socklib.c index 05b38bf68316..71ba4cf513bc 100644 --- a/net/sunrpc/socklib.c +++ b/net/sunrpc/socklib.c @@ -221,12 +221,6 @@ static int xprt_send_kvec(struct socket *sock, struct msghdr *msg, static int xprt_send_pagedata(struct socket *sock, struct msghdr *msg, struct xdr_buf *xdr, size_t base) { - int err; - - err = xdr_alloc_bvec(xdr, rpc_task_gfp_mask()); - if (err < 0) - return err; - iov_iter_bvec(&msg->msg_iter, WRITE, xdr->bvec, xdr_buf_pagecount(xdr), xdr->page_len + xdr->page_base); return xprt_sendmsg(sock, msg, base + xdr->page_base); diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 78af7518f263..2661828f7a85 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -828,6 +828,9 @@ xs_stream_prepare_request(struct rpc_rqst *req) xdr_free_bvec(&req->rq_rcv_buf); req->rq_task->tk_status = xdr_alloc_bvec( &req->rq_rcv_buf, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN); + if (req->rq_task->tk_status == 0) + req->rq_task->tk_status = xdr_alloc_bvec( + &req->rq_snd_buf, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN); } /* ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: sporadic hangs on generic/186 2022-04-07 4:11 ` NeilBrown @ 2022-04-07 13:01 ` Trond Myklebust 2022-04-08 2:46 ` NeilBrown 0 siblings, 1 reply; 15+ messages in thread From: Trond Myklebust @ 2022-04-07 13:01 UTC (permalink / raw) To: neilb; +Cc: bfields, david, linux-nfs, linux-fsdevel, chuck.lever On Thu, 2022-04-07 at 14:11 +1000, NeilBrown wrote: > On Thu, 07 Apr 2022, Trond Myklebust wrote: > > On Thu, 2022-04-07 at 11:19 +1000, NeilBrown wrote: > > > On Thu, 07 Apr 2022, NeilBrown wrote: > > > > On Thu, 07 Apr 2022, Dave Chinner wrote: > > > > > On Wed, Apr 06, 2022 at 03:54:24PM -0400, J. Bruce Fields > > > > > wrote: > > > > > > In the last couple days I've started getting hangs on > > > > > > xfstests > > > > > > generic/186 on upstream. I also notice the test completes > > > > > > after 10+ > > > > > > hours (usually it takes about 5 minutes). Sometimes this > > > > > > is > > > > > > accompanied > > > > > > by "nfs: RPC call returned error 12" on the client. > > > > > > > > > > #define ENOMEM 12 /* Out of memory */ > > > > > > > > > > So either the client or the server is running out of memory > > > > > somewhere? > > > > > > > > Probably the client. There are a bunch of changes recently > > > > which > > > > add > > > > __GFP_NORETRY to memory allocations from PF_WQ_WORKERs because > > > > that > > > > can > > > > result in deadlocks when swapping over NFS. > > > > This means that kmalloc request that previously never failed > > > > (because > > > > GFP_KERNEL never fails for kernel threads I think) can now > > > > fail. > > > > This > > > > has tickled one bug that I know of. There are likely to be > > > > more. > > > > > > > > The RPC code should simply retry these allocations after a > > > > short > > > > delay. > > > > HZ/4 is the number that is used in a couple of places. > > > > Possibly > > > > there > > > > are more places that need to handle -ENOMEM with rpc_delay(). > > > > > > I had a look through the various places where alloc can now fail. > > > > > > I think xdr_alloc_bvec() in xprt_sent_pagedata() is the most > > > likely > > > cause of a problem here. I don't think an -ENOMEM from there is > > > caught, > > > so it could likely filter up to NFS and result in the message you > > > got. > > > > > > I don't think we can easily handle failure there. We need to > > > stay > > > with > > > GFP_KERNEL rely on PF_MEMALLOC to make forward progress for > > > swap-over-NFS. > > > > > > Bruce: can you change that one line back to GFP_KERNEL and see if > > > the > > > problem goes away? > > > > > > > Can we please just move the call to xdr_alloc_bvec() out of > > xprt_send_pagedata()? Move the client side allocation into > > xs_stream_prepare_request() and xs_udp_send_request(), then move > > the > > server side allocation into svc_udp_sendto(). > > > > That makes it possible to handle errors. > > Like the below I guess. Seems sensible, but I don't know the code > well > enough to really review it. > > > > > > The other problem I found is that rpc_alloc_task() can now fail, > > > but > > > rpc_new_task assumes that it never will. If it does, then we get > > > a > > > NULL > > > deref. > > > > > > I don't think rpc_new_task() can ever be called from the rpciod > > > work > > > queue, so it is safe to just use a mempool with GFP_KERNEL like > > > we > > > did > > > before. > > > > > No. We shouldn't ever use mempools with GFP_KERNEL. > > Why not? mempools with GFP_KERNEL make perfect sense outside of the > rpciod and nfsiod threads. If you can afford to make it an infinite wait, there is __GFP_NOFAIL, so why waste the resources of an emergency pool? In my opinion, however, an infinite uninterruptible sleep is bad policy for almost all cases because someone will want to break out at some point. > > > > > Most, if not all of the callers of rpc_run_task() are still capable > > of > > dealing with errors, and ditto for the callers of > > rpc_run_bc_task(). > > Yes, they can deal with errors. But in many cases that do so by > passing > the error up the call stack so we could start getting ENOMEM for > systemcalls like stat(). I don't think that is a good idea. > stat() has always been capable of returning ENOMEM if, for instance, inode allocation fails. There are almost no calls in NFS (or most other filesystems for that matter) that can't fail somehow when memory starts to get really scarce. The bottom line is that we use ordinary GFP_KERNEL memory allocations where we can. The new code follows that rule, breaking it only in cases where the specific rules of rpciod/xprtiod/nfsiod make it impossible to wait forever in the memory manager. I am preparing a set of patches to address the issues that you've identified, plus a case in call_transmit_status/call_bc_transmit_status where we're not handling ENOMEM. There are also patches to fix up two cases in the NFS code itself where we're not currently handling errors from rpc_run_task. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@hammerspace.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: sporadic hangs on generic/186 2022-04-07 13:01 ` Trond Myklebust @ 2022-04-08 2:46 ` NeilBrown 2022-04-08 5:03 ` Dave Chinner 0 siblings, 1 reply; 15+ messages in thread From: NeilBrown @ 2022-04-08 2:46 UTC (permalink / raw) To: Trond Myklebust; +Cc: bfields, david, linux-nfs, linux-fsdevel, chuck.lever On Thu, 07 Apr 2022, Trond Myklebust wrote: > On Thu, 2022-04-07 at 14:11 +1000, NeilBrown wrote: > > On Thu, 07 Apr 2022, Trond Myklebust wrote: > > > On Thu, 2022-04-07 at 11:19 +1000, NeilBrown wrote: > > > > On Thu, 07 Apr 2022, NeilBrown wrote: > > > > > On Thu, 07 Apr 2022, Dave Chinner wrote: > > > > > > On Wed, Apr 06, 2022 at 03:54:24PM -0400, J. Bruce Fields > > > > > > wrote: > > > > > > > In the last couple days I've started getting hangs on > > > > > > > xfstests > > > > > > > generic/186 on upstream. I also notice the test completes > > > > > > > after 10+ > > > > > > > hours (usually it takes about 5 minutes). Sometimes this > > > > > > > is > > > > > > > accompanied > > > > > > > by "nfs: RPC call returned error 12" on the client. > > > > > > > > > > > > #define ENOMEM 12 /* Out of memory */ > > > > > > > > > > > > So either the client or the server is running out of memory > > > > > > somewhere? > > > > > > > > > > Probably the client. There are a bunch of changes recently > > > > > which > > > > > add > > > > > __GFP_NORETRY to memory allocations from PF_WQ_WORKERs because > > > > > that > > > > > can > > > > > result in deadlocks when swapping over NFS. > > > > > This means that kmalloc request that previously never failed > > > > > (because > > > > > GFP_KERNEL never fails for kernel threads I think) can now > > > > > fail. > > > > > This > > > > > has tickled one bug that I know of. There are likely to be > > > > > more. > > > > > > > > > > The RPC code should simply retry these allocations after a > > > > > short > > > > > delay. > > > > > HZ/4 is the number that is used in a couple of places. > > > > > Possibly > > > > > there > > > > > are more places that need to handle -ENOMEM with rpc_delay(). > > > > > > > > I had a look through the various places where alloc can now fail. > > > > > > > > I think xdr_alloc_bvec() in xprt_sent_pagedata() is the most > > > > likely > > > > cause of a problem here. I don't think an -ENOMEM from there is > > > > caught, > > > > so it could likely filter up to NFS and result in the message you > > > > got. > > > > > > > > I don't think we can easily handle failure there. We need to > > > > stay > > > > with > > > > GFP_KERNEL rely on PF_MEMALLOC to make forward progress for > > > > swap-over-NFS. > > > > > > > > Bruce: can you change that one line back to GFP_KERNEL and see if > > > > the > > > > problem goes away? > > > > > > > > > > Can we please just move the call to xdr_alloc_bvec() out of > > > xprt_send_pagedata()? Move the client side allocation into > > > xs_stream_prepare_request() and xs_udp_send_request(), then move > > > the > > > server side allocation into svc_udp_sendto(). > > > > > > That makes it possible to handle errors. > > > > Like the below I guess. Seems sensible, but I don't know the code > > well > > enough to really review it. > > > > > > > > > The other problem I found is that rpc_alloc_task() can now fail, > > > > but > > > > rpc_new_task assumes that it never will. If it does, then we get > > > > a > > > > NULL > > > > deref. > > > > > > > > I don't think rpc_new_task() can ever be called from the rpciod > > > > work > > > > queue, so it is safe to just use a mempool with GFP_KERNEL like > > > > we > > > > did > > > > before. > > > > > > > No. We shouldn't ever use mempools with GFP_KERNEL. > > > > Why not? mempools with GFP_KERNEL make perfect sense outside of the > > rpciod and nfsiod threads. > > If you can afford to make it an infinite wait, there is __GFP_NOFAIL, > so why waste the resources of an emergency pool? In my opinion, > however, an infinite uninterruptible sleep is bad policy for almost all > cases because someone will want to break out at some point. "infinite" isn't a useful description. The important question is "what will allow the allocation to complete?". For __GFP_NOFAIL there is no clear answer beyond "reduction of memory pressure", and sometimes that is enough. For a mempool we have a much more specific answer. Memory becomes available as previous requests complete. rpc_task_mempool has a size of 8 so there can always be 8 requests in flight. Waiting on the mempool will wait at most until there are seven requests in flight, and then will return a task. This is a much better guarantee than for __GFP_NOFAIL. If you ever need an rpc task to relieve memory pressure, then __GFP_NOFAIL could deadlock, while using the mempool won't. If you are never going to block waiting on a mempool and would rather fail instead, then there seems little point in having the mempool. > > > > > > > > > Most, if not all of the callers of rpc_run_task() are still capable > > > of > > > dealing with errors, and ditto for the callers of > > > rpc_run_bc_task(). > > > > Yes, they can deal with errors. But in many cases that do so by > > passing > > the error up the call stack so we could start getting ENOMEM for > > systemcalls like stat(). I don't think that is a good idea. > > > > stat() has always been capable of returning ENOMEM if, for instance, > inode allocation fails. There are almost no calls in NFS (or most other > filesystems for that matter) that can't fail somehow when memory starts > to get really scarce. Fair enough. It is writes that are really important. > > The bottom line is that we use ordinary GFP_KERNEL memory allocations > where we can. The new code follows that rule, breaking it only in cases > where the specific rules of rpciod/xprtiod/nfsiod make it impossible to > wait forever in the memory manager. It is not safe to use GFP_KERNEL for an allocation that is needed in order to free memory - and so any allocation that is needed to write out data from the page cache. > > I am preparing a set of patches to address the issues that you've > identified, plus a case in call_transmit_status/call_bc_transmit_status > where we're not handling ENOMEM. There are also patches to fix up two > cases in the NFS code itself where we're not currently handling errors > from rpc_run_task. I look forward to reviewing them. Thanks, NeilBrown ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: sporadic hangs on generic/186 2022-04-08 2:46 ` NeilBrown @ 2022-04-08 5:03 ` Dave Chinner 2022-04-08 5:32 ` NeilBrown 0 siblings, 1 reply; 15+ messages in thread From: Dave Chinner @ 2022-04-08 5:03 UTC (permalink / raw) To: NeilBrown; +Cc: Trond Myklebust, bfields, linux-nfs, linux-fsdevel, chuck.lever On Fri, Apr 08, 2022 at 12:46:08PM +1000, NeilBrown wrote: > On Thu, 07 Apr 2022, Trond Myklebust wrote: > > The bottom line is that we use ordinary GFP_KERNEL memory allocations > > where we can. The new code follows that rule, breaking it only in cases > > where the specific rules of rpciod/xprtiod/nfsiod make it impossible to > > wait forever in the memory manager. > > It is not safe to use GFP_KERNEL for an allocation that is needed in > order to free memory - and so any allocation that is needed to write out > data from the page cache. Except that same page cache writeback path can be called from syscall context (e.g. fsync()) which has nothing to do with memory reclaim. In that case GFP_KERNEL is the correct allocation context to use because there are no constraints on what memory reclaim can be performed from this path. IOWs, if the context initiating data writeback doesn't allow GFP_KERNEL allocations, then it should be calling memalloc_nofs_save() or memalloc_noio_save() to constrain all allocations to the required context. We should not be requiring the filesystem (or any other subsystem) to magically infer that the IO is being done in a constrained allocation context and modify the context they use appropriately. If we this, then all filesystems would simply use GFP_NOIO everywhere because the loop device layers the entire filesystem IO path under block device context (i.e. requiring GFP_NOIO allocation context). We don't do this - the loop device sets PF_MEMALLOC_NOIO instead so all allocations in that path run with at least GFP_NOIO constraints and filesystems are none the wiser about the constraints of the calling context. IOWs, GFP_KERNEL is generally right context to be using in filesystem IO paths and callers need to restrict allocation contexts via task flags if they cannot allow certain types of reclaim recursion to occur... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: sporadic hangs on generic/186 2022-04-08 5:03 ` Dave Chinner @ 2022-04-08 5:32 ` NeilBrown 2022-04-10 23:34 ` Dave Chinner 0 siblings, 1 reply; 15+ messages in thread From: NeilBrown @ 2022-04-08 5:32 UTC (permalink / raw) To: Dave Chinner Cc: Trond Myklebust, bfields, linux-nfs, linux-fsdevel, chuck.lever On Fri, 08 Apr 2022, Dave Chinner wrote: > On Fri, Apr 08, 2022 at 12:46:08PM +1000, NeilBrown wrote: > > On Thu, 07 Apr 2022, Trond Myklebust wrote: > > > The bottom line is that we use ordinary GFP_KERNEL memory allocations > > > where we can. The new code follows that rule, breaking it only in cases > > > where the specific rules of rpciod/xprtiod/nfsiod make it impossible to > > > wait forever in the memory manager. > > > > It is not safe to use GFP_KERNEL for an allocation that is needed in > > order to free memory - and so any allocation that is needed to write out > > data from the page cache. > > Except that same page cache writeback path can be called from > syscall context (e.g. fsync()) which has nothing to do with memory > reclaim. In that case GFP_KERNEL is the correct allocation context > to use because there are no constraints on what memory reclaim can > be performed from this path. > > IOWs, if the context initiating data writeback doesn't allow > GFP_KERNEL allocations, then it should be calling > memalloc_nofs_save() or memalloc_noio_save() to constrain all > allocations to the required context. We should not be requiring the > filesystem (or any other subsystem) to magically infer that the IO > is being done in a constrained allocation context and modify the > context they use appropriately. > > If we this, then all filesystems would simply use GFP_NOIO > everywhere because the loop device layers the entire filesystem IO > path under block device context (i.e. requiring GFP_NOIO allocation > context). We don't do this - the loop device sets PF_MEMALLOC_NOIO > instead so all allocations in that path run with at least GFP_NOIO > constraints and filesystems are none the wiser about the constraints > of the calling context. > > IOWs, GFP_KERNEL is generally right context to be using in > filesystem IO paths and callers need to restrict allocation contexts > via task flags if they cannot allow certain types of reclaim > recursion to occur... NOIO and NOFS are not the issue here. We all agree that memalloc_noXX_save() is the right thing to do. The issue is that memalloc can block indefinitely in low-memory situations, and any code that has to make progress in low-memory situations - like writeout - needs to be careful. This is why the block layer uses mempools for request headers etc - so that progress is guaranteed without depending on alloc_page() to succeed. File systems do often get away with using GFP_KERNEL because the important paths has PF_MEMALLOC and hence __GFP_MEMALLOC in effect and that provides access to some shared reserves. Shared reserves are risky - the other users you are sharing with might steal it all. File systems tend to survive anyway because there is a limit on the mount of dirty filesystem data - so there is lots of non-filesystem data around, and a good chance that some of that can be freed. I say "tend to" because I believe the is no real guarantee. It seems to actually work 99.999% of the time, and maybe that is enough. I suspect you might be able to deadlock filesystem writeout by memlocking lots of memory while there are lots of dirty pages. It probably wouldn't be easy though. swap-out is different. There is no limit the the amount of dirty anon data, so it is fairly easy to get a situation where you absolutely must write out some anon pages before alloc_page() can succeed. Most filesystems don't handle swap-out directly - they just tell the MM which disk addresses to use and submit_bio() is used for writing. The bio is allocated from a mempool, and nothing below submit_bio() uses GFP_KERNEL to alloc_page() - they all use mempools (or accept failure in some other way). A separate mempool at each level - they aren't shared (so they are quite different from __GFP_MEMALLOC). NFS is different. NFS handles swap using the same paths as other writes, so it is much more likely to face indefinite waits in alloc_page() - it least when handling swap. __GFP_MEMALLOC helps to a degree but there a lots of levels, and the more levels we have have local reserves (even if the mempool only reserves a single element), the better. The networking people refuse to use mempools (or at least, they did once some years ago) and I cannot entirely blame them as there are lots of moving parts - lots of different things that might need to be allocated (arp table entries?) but usually aren't. So for everything in the socket layer and below we rely on __GFP_MEMALLOC (and recommend increasing the reserves a bit above the default, just in case). But in NFS and particularly in SUNRPC we already have the mempool, and there is only a small number of things that we need to allocate to ensure forward progress. So using a mempool as designed, rather than relying on MEMALLOC reserves is the sensible thing to do, and leaves more of the reserves for the networking layer. In fact, the allocation that SUNRPC now does before trying a mempool should really be using __GFP_NOMEMALLOC so that they don't take from the shared reserves (even when PF_MEMALLOC is set). As it has a private reserve (the mempool) it should leave the common reserve for other layers (sockets etc). NeilBrown ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: sporadic hangs on generic/186 2022-04-08 5:32 ` NeilBrown @ 2022-04-10 23:34 ` Dave Chinner 2022-04-12 3:13 ` NeilBrown 0 siblings, 1 reply; 15+ messages in thread From: Dave Chinner @ 2022-04-10 23:34 UTC (permalink / raw) To: NeilBrown; +Cc: Trond Myklebust, bfields, linux-nfs, linux-fsdevel, chuck.lever On Fri, Apr 08, 2022 at 03:32:38PM +1000, NeilBrown wrote: > On Fri, 08 Apr 2022, Dave Chinner wrote: > > On Fri, Apr 08, 2022 at 12:46:08PM +1000, NeilBrown wrote: > > > On Thu, 07 Apr 2022, Trond Myklebust wrote: > > > > The bottom line is that we use ordinary GFP_KERNEL memory allocations > > > > where we can. The new code follows that rule, breaking it only in cases > > > > where the specific rules of rpciod/xprtiod/nfsiod make it impossible to > > > > wait forever in the memory manager. > > > > > > It is not safe to use GFP_KERNEL for an allocation that is needed in > > > order to free memory - and so any allocation that is needed to write out > > > data from the page cache. > > > > Except that same page cache writeback path can be called from > > syscall context (e.g. fsync()) which has nothing to do with memory > > reclaim. In that case GFP_KERNEL is the correct allocation context > > to use because there are no constraints on what memory reclaim can > > be performed from this path. > > > > IOWs, if the context initiating data writeback doesn't allow > > GFP_KERNEL allocations, then it should be calling > > memalloc_nofs_save() or memalloc_noio_save() to constrain all > > allocations to the required context. We should not be requiring the > > filesystem (or any other subsystem) to magically infer that the IO > > is being done in a constrained allocation context and modify the > > context they use appropriately. > > > > If we this, then all filesystems would simply use GFP_NOIO > > everywhere because the loop device layers the entire filesystem IO > > path under block device context (i.e. requiring GFP_NOIO allocation > > context). We don't do this - the loop device sets PF_MEMALLOC_NOIO > > instead so all allocations in that path run with at least GFP_NOIO > > constraints and filesystems are none the wiser about the constraints > > of the calling context. > > > > IOWs, GFP_KERNEL is generally right context to be using in > > filesystem IO paths and callers need to restrict allocation contexts > > via task flags if they cannot allow certain types of reclaim > > recursion to occur... > > NOIO and NOFS are not the issue here. We all agree that > memalloc_noXX_save() is the right thing to do. > > The issue is that memalloc can block indefinitely in low-memory > situations, and any code that has to make progress in low-memory > situations - like writeout - needs to be careful. Yup, and you've missed my point entirely, then explained exactly why high level memory allocation context needs to be set by mempool based allocations... > The bio is allocated from a mempool, and nothing below submit_bio() uses > GFP_KERNEL to alloc_page() - they all use mempools (or accept failure in > some other way). A separate mempool at each level - they aren't shared > (so they are quite different from __GFP_MEMALLOC). .... because this house of cards using mempools only works if it is nested mempools all the way down. > The networking people refuse to use mempools (or at least, they did once Same as many filesystem people refuse to use mempools, because the page writeback IO path in a filesystem can have *unbound* memory demand. mempools just don't work in filesytsems that need to run transactions or read metadata (e.g. for block allocation transactions). We've been through this many times before; it's why filesystems like XFS and Btrfs do not allow ->writepage from memory reclaim contexts. > But in NFS and particularly in SUNRPC we already have the mempool, and > there is only a small number of things that we need to allocate to > ensure forward progress. So using a mempool as designed, rather than > relying on MEMALLOC reserves is the sensible thing to do, and leaves > more of the reserves for the networking layer. Except the mempool now requires everything in the path below it not to block on memory allocation for it to guarantee forwards progress i.e. nested allocations need to succeed or error out - either guarantees *mempool* forwards progress, not necessarily forwards progress cleaning memory, but this is only needed for the paths where we actually need to provide a forwards progress guarantee. THe RPC code puts the entire network stack under a mempool, and as you say the network stack is resistent to using mempools. At which point, NFS needs memalloc_noreclaim_save() context to be set explicitly in the path nested inside the mempool allocation so that non-mempool allocations don't block trying to reclaim memory that might never come available.... That's the point I was trying to make - GFP_KERNEL is not the problem here - it's that mempools only work when it's mempools all the way down. Individual code paths and/or allocations may have no idea that they are running in a nested mempool allocation context, and so all allocations in that path - GFP_KERNEL or otherwise - need to be automatically converted to fail-instead-of-block semantics by the high level code that uses a mempool based allocation... > In fact, the allocation that SUNRPC now does before trying a mempool > should really be using __GFP_NOMEMALLOC so that they don't take from the > shared reserves (even when PF_MEMALLOC is set). As it has a private > reserve (the mempool) it should leave the common reserve for other > layers (sockets etc). Yup, but that's a high level, pre-mempool allocation optimisation which is not useful to future allocations done inside the mempool forwards progress guarantee context.... Cheers, Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: sporadic hangs on generic/186 2022-04-10 23:34 ` Dave Chinner @ 2022-04-12 3:13 ` NeilBrown 0 siblings, 0 replies; 15+ messages in thread From: NeilBrown @ 2022-04-12 3:13 UTC (permalink / raw) To: Dave Chinner Cc: Trond Myklebust, bfields, linux-nfs, linux-fsdevel, chuck.lever On Mon, 11 Apr 2022, Dave Chinner wrote: > Yup, and you've missed my point entirely, Hmm... It seems that I did. Sorry. Your point is that the flags passed to kmalloc et al should be relevant, and that we should be focusing on process context - including for access to reserves in memalloc context (using memalloc_noreclaim_save() as appropriate). I agree. My point is that memalloc_noreclaim_save() isn't necessarily sufficient. It provides access to a shared pool of reserves, which is hard to reason about. It *might* provide a guarantee that every allocation will succeed without deadlock - but only if there is a reasonable limit on the number or parallel allocation chains - and we don't have any mechanism for tracking those. So there is still a place for mempools - they complement __GFP_MEMALLOC reserves and can both be used together. mempools() are particularly appropriate when the allocation commonly (or always) occurs in PF_MEMALLOC context, as using the mempool provides a better guarantee, and removes a burden from the shared pool. This suggests to me that a different interface to mempools could be useful. One that uses the mempool preallocation instead of the shared preallocation If PF_MEMALLOC or GFP_MEMALLOC are set, then the mempool is used which avoids the shared reserves. Otherwise (or if __GFP_NOMEMALLOC is set), follow the normal allocation path and don't use the mempool reserves. Something like the following. Do you agree? Thanks, NeilBrown diff --git a/include/linux/mempool.h b/include/linux/mempool.h index 0c964ac107c2..4414644c49d5 100644 --- a/include/linux/mempool.h +++ b/include/linux/mempool.h @@ -46,6 +46,7 @@ extern mempool_t *mempool_create_node(int min_nr, mempool_alloc_t *alloc_fn, extern int mempool_resize(mempool_t *pool, int new_min_nr); extern void mempool_destroy(mempool_t *pool); extern void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask) __malloc; +extern void *mempool_memalloc(mempool_t *pool, gfp_t gfp_mask) __malloc; extern void mempool_free(void *element, mempool_t *pool); /* diff --git a/mm/mempool.c b/mm/mempool.c index b933d0fc21b8..1182bd3443cc 100644 --- a/mm/mempool.c +++ b/mm/mempool.c @@ -417,8 +417,12 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask) goto repeat_alloc; } - /* We must not sleep if !__GFP_DIRECT_RECLAIM */ - if (!(gfp_mask & __GFP_DIRECT_RECLAIM)) { + /* + * We must not sleep if !__GFP_DIRECT_RECLAIM + * and must not retry in __GFP_NORETRY + */ + if (!(gfp_mask & __GFP_DIRECT_RECLAIM) || + (gfp_mask & __GFP_NORETRY)) { spin_unlock_irqrestore(&pool->lock, flags); return NULL; } @@ -440,6 +444,30 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask) } EXPORT_SYMBOL(mempool_alloc); +/** + * mempool_memalloc - allocate, using mempool rather than shared reserves + * @pool: mempool to allocate from + * @gfp_mask: GFP allocation flags + * + * + * If the process context or gfp flags permit allocation from reserves + * (i.e. PF_MEMALLOC or GFP_MEMALLOC), then use the mempool for allocation, + * otherwise allocate directly using the underlying allocator + * + * Return: pointer to allocated element, or %NULL on failure. + */ + +void *mempool_memalloc(mempool_t *pool, gfp_t gfp_mask) +{ + if (gfp_mask & __GFP_NOMEMALLOC || + (!(current->flags & PF_MEMALLOC) && + !(gfp_mask & __GFP_MEMALLOC))) + /* No reserves requested */ + return pool->alloc(gfp_mask, pool->pool_data); + else + return mempool_alloc(pool, gfp_mask); +} + /** * mempool_free - return an element to the pool. * @element: pool element pointer. diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c index 682fcd24bf43..2ecbe6b89fbb 100644 --- a/net/sunrpc/auth.c +++ b/net/sunrpc/auth.c @@ -615,8 +615,6 @@ rpcauth_bind_root_cred(struct rpc_task *task, int lookupflags) }; struct rpc_cred *ret; - if (RPC_IS_ASYNC(task)) - lookupflags |= RPCAUTH_LOOKUP_ASYNC; ret = auth->au_ops->lookup_cred(auth, &acred, lookupflags); put_cred(acred.cred); return ret; @@ -633,8 +631,6 @@ rpcauth_bind_machine_cred(struct rpc_task *task, int lookupflags) if (!acred.principal) return NULL; - if (RPC_IS_ASYNC(task)) - lookupflags |= RPCAUTH_LOOKUP_ASYNC; return auth->au_ops->lookup_cred(auth, &acred, lookupflags); } @@ -658,7 +654,7 @@ rpcauth_bindcred(struct rpc_task *task, const struct cred *cred, int flags) }; if (flags & RPC_TASK_ASYNC) - lookupflags |= RPCAUTH_LOOKUP_NEW | RPCAUTH_LOOKUP_ASYNC; + lookupflags |= RPCAUTH_LOOKUP_NEW; if (task->tk_op_cred) /* Task must use exactly this rpc_cred */ new = get_rpccred(task->tk_op_cred); diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c index de7e5b41ab8f..4f68934dbeb5 100644 --- a/net/sunrpc/auth_gss/auth_gss.c +++ b/net/sunrpc/auth_gss/auth_gss.c @@ -1343,11 +1343,7 @@ gss_hash_cred(struct auth_cred *acred, unsigned int hashbits) static struct rpc_cred * gss_lookup_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags) { - gfp_t gfp = GFP_KERNEL; - - if (flags & RPCAUTH_LOOKUP_ASYNC) - gfp = GFP_NOWAIT | __GFP_NOWARN; - return rpcauth_lookup_credcache(auth, acred, flags, gfp); + return rpcauth_lookup_credcache(auth, acred, flags, GFP_KERNEL); } static struct rpc_cred * diff --git a/net/sunrpc/auth_unix.c b/net/sunrpc/auth_unix.c index 1e091d3fa607..6170d4d34687 100644 --- a/net/sunrpc/auth_unix.c +++ b/net/sunrpc/auth_unix.c @@ -45,14 +45,10 @@ static struct rpc_cred *unx_lookup_cred(struct rpc_auth *auth, { struct rpc_cred *ret; - ret = kmalloc(sizeof(*ret), rpc_task_gfp_mask()); - if (!ret) { - if (!(flags & RPCAUTH_LOOKUP_ASYNC)) - return ERR_PTR(-ENOMEM); - ret = mempool_alloc(unix_pool, GFP_NOWAIT); - if (!ret) - return ERR_PTR(-ENOMEM); - } + ret = mempool_memalloc(unix_pool, rpc_task_gfp_mask()); + if (!ret) + return ERR_PTR(-ENOMEM); + rpcauth_init_cred(ret, acred, auth, &unix_credops); ret->cr_flags = 1UL << RPCAUTH_CRED_UPTODATE; return ret; diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index 7f70c1e608b7..4138aa62d3f3 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -1040,12 +1040,9 @@ int rpc_malloc(struct rpc_task *task) gfp_t gfp = rpc_task_gfp_mask(); size += sizeof(struct rpc_buffer); - if (size <= RPC_BUFFER_MAXSIZE) { - buf = kmem_cache_alloc(rpc_buffer_slabp, gfp); - /* Reach for the mempool if dynamic allocation fails */ - if (!buf && RPC_IS_ASYNC(task)) - buf = mempool_alloc(rpc_buffer_mempool, GFP_NOWAIT); - } else + if (size <= RPC_BUFFER_MAXSIZE) + buf = mempool_memalloc(rpc_buffer_mempool, gfp); + else buf = kmalloc(size, gfp); if (!buf) @@ -1110,12 +1107,7 @@ static void rpc_init_task(struct rpc_task *task, const struct rpc_task_setup *ta static struct rpc_task *rpc_alloc_task(void) { - struct rpc_task *task; - - task = kmem_cache_alloc(rpc_task_slabp, rpc_task_gfp_mask()); - if (task) - return task; - return mempool_alloc(rpc_task_mempool, GFP_NOWAIT); + return mempool_memalloc(rpc_task_mempool, rpc_task_gfp_mask()); } /* diff --git a/include/linux/sunrpc/auth.h b/include/linux/sunrpc/auth.h index 3e6ce288a7fc..98da816b5fc2 100644 --- a/include/linux/sunrpc/auth.h +++ b/include/linux/sunrpc/auth.h @@ -99,7 +99,6 @@ struct rpc_auth_create_args { /* Flags for rpcauth_lookupcred() */ #define RPCAUTH_LOOKUP_NEW 0x01 /* Accept an uninitialised cred */ -#define RPCAUTH_LOOKUP_ASYNC 0x02 /* Don't block waiting for memory */ /* * Client authentication ops ^ permalink raw reply related [flat|nested] 15+ messages in thread
end of thread, other threads:[~2022-04-12 3:14 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-04-06 19:54 sporadic hangs on generic/186 J. Bruce Fields 2022-04-06 19:58 ` Chuck Lever III 2022-04-07 0:14 ` Dave Chinner 2022-04-07 0:27 ` NeilBrown 2022-04-07 1:19 ` NeilBrown 2022-04-07 1:49 ` J. Bruce Fields 2022-04-07 4:23 ` NeilBrown 2022-04-07 1:54 ` Trond Myklebust 2022-04-07 4:11 ` NeilBrown 2022-04-07 13:01 ` Trond Myklebust 2022-04-08 2:46 ` NeilBrown 2022-04-08 5:03 ` Dave Chinner 2022-04-08 5:32 ` NeilBrown 2022-04-10 23:34 ` Dave Chinner 2022-04-12 3:13 ` NeilBrown
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.