linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/9] Crossing our fingers is not a strategy
@ 2022-03-22  1:16 trondmy
  2022-03-22  1:16 ` [PATCH v2 1/9] NFS: Fix memory allocation in rpc_malloc() trondmy
  2022-03-23 22:57 ` [PATCH v2 0/9] Crossing our fingers is not a strategy NeilBrown
  0 siblings, 2 replies; 14+ messages in thread
From: trondmy @ 2022-03-22  1:16 UTC (permalink / raw)
  To: linux-nfs; +Cc: Neil Brown

From: Trond Myklebust <trond.myklebust@hammerspace.com>

We'd like to avoid GFP_NOWAIT whenever possible, because it has no fall-
back reclaim strategy for dealing with a failure of the initial
allocation.
At the same time, memory pools appear to be suboptimal as an allocation
strategy when used with anything other than GFP_NOWAIT, since they cause
threads to hang upon failure.

This patch series therefore aims to demote the mempools to being a
strategy of last resort, with the primary allocation strategy being to
use the underlying slabs.
While we're at it, we want to ensure that rpciod, xprtiod and nfsiod all
use the same allocation strategy, so that the latter two don't thwart
our ability to complete writeback RPC calls by getting blocked in the mm
layer.

Trond Myklebust (9):
  NFS: Fix memory allocation in rpc_malloc()
  NFS: Fix memory allocation in rpc_alloc_task()
  SUNRPC: Fix unx_lookup_cred() allocation
  SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent
  NFS: nfsiod should not block forever in mempool_alloc()
  NFS: Avoid writeback threads getting stuck in mempool_alloc()
  NFSv4/pnfs: Ensure pNFS allocation modes are consistent with nfsiod
  pNFS/flexfiles: Ensure pNFS allocation modes are consistent with
    nfsiod
  pNFS/files: Ensure pNFS allocation modes are consistent with nfsiod

 fs/nfs/filelayout/filelayout.c         |  2 +-
 fs/nfs/flexfilelayout/flexfilelayout.c | 50 +++++++++++---------------
 fs/nfs/internal.h                      |  7 ++++
 fs/nfs/nfs42proc.c                     |  2 +-
 fs/nfs/pagelist.c                      | 10 +++---
 fs/nfs/pnfs.c                          | 39 +++++++++-----------
 fs/nfs/pnfs_nfs.c                      |  8 +++--
 fs/nfs/write.c                         | 34 +++++++++---------
 include/linux/nfs_fs.h                 |  2 +-
 include/linux/sunrpc/sched.h           |  1 +
 net/sunrpc/auth_unix.c                 | 18 +++++-----
 net/sunrpc/backchannel_rqst.c          |  8 ++---
 net/sunrpc/rpcb_clnt.c                 |  4 +--
 net/sunrpc/sched.c                     | 31 ++++++++++------
 net/sunrpc/socklib.c                   |  3 +-
 net/sunrpc/xprt.c                      |  5 +--
 net/sunrpc/xprtsock.c                  | 11 +++---
 17 files changed, 123 insertions(+), 112 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 1/9] NFS: Fix memory allocation in rpc_malloc()
  2022-03-22  1:16 [PATCH v2 0/9] Crossing our fingers is not a strategy trondmy
@ 2022-03-22  1:16 ` trondmy
  2022-03-22  1:16   ` [PATCH v2 2/9] NFS: Fix memory allocation in rpc_alloc_task() trondmy
  2022-03-23 22:57 ` [PATCH v2 0/9] Crossing our fingers is not a strategy NeilBrown
  1 sibling, 1 reply; 14+ messages in thread
From: trondmy @ 2022-03-22  1:16 UTC (permalink / raw)
  To: linux-nfs; +Cc: Neil Brown

From: Trond Myklebust <trond.myklebust@hammerspace.com>

When in a low memory situation, we do want rpciod to kick off direct
reclaim in the case where that helps, however we don't want it looping
forever in mempool_alloc().
So first try allocating from the slab using GFP_KERNEL | __GFP_NORETRY,
and then fall back to a GFP_NOWAIT allocation from the mempool.

Ditto for rpc_alloc_task()

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 include/linux/sunrpc/sched.h |  1 +
 net/sunrpc/sched.c           | 21 ++++++++++++++-------
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index 56710f8056d3..1d7a3e51b795 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -262,6 +262,7 @@ void		rpc_destroy_mempool(void);
 extern struct workqueue_struct *rpciod_workqueue;
 extern struct workqueue_struct *xprtiod_workqueue;
 void		rpc_prepare_task(struct rpc_task *task);
+gfp_t		rpc_task_gfp_mask(void);
 
 static inline int rpc_wait_for_completion_task(struct rpc_task *task)
 {
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 7c8f87ebdbc0..d59a033820be 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -57,6 +57,13 @@ struct workqueue_struct *rpciod_workqueue __read_mostly;
 struct workqueue_struct *xprtiod_workqueue __read_mostly;
 EXPORT_SYMBOL_GPL(xprtiod_workqueue);
 
+gfp_t rpc_task_gfp_mask(void)
+{
+	if (current->flags & PF_WQ_WORKER)
+		return GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN;
+	return GFP_KERNEL;
+}
+
 unsigned long
 rpc_task_timeout(const struct rpc_task *task)
 {
@@ -1030,15 +1037,15 @@ int rpc_malloc(struct rpc_task *task)
 	struct rpc_rqst *rqst = task->tk_rqstp;
 	size_t size = rqst->rq_callsize + rqst->rq_rcvsize;
 	struct rpc_buffer *buf;
-	gfp_t gfp = GFP_KERNEL;
-
-	if (RPC_IS_ASYNC(task))
-		gfp = GFP_NOWAIT | __GFP_NOWARN;
+	gfp_t gfp = rpc_task_gfp_mask();
 
 	size += sizeof(struct rpc_buffer);
-	if (size <= RPC_BUFFER_MAXSIZE)
-		buf = mempool_alloc(rpc_buffer_mempool, gfp);
-	else
+	if (size <= RPC_BUFFER_MAXSIZE) {
+		buf = kmem_cache_alloc(rpc_buffer_slabp, gfp);
+		/* Reach for the mempool if dynamic allocation fails */
+		if (!buf && RPC_IS_ASYNC(task))
+			buf = mempool_alloc(rpc_buffer_mempool, GFP_NOWAIT);
+	} else
 		buf = kmalloc(size, gfp);
 
 	if (!buf)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 2/9] NFS: Fix memory allocation in rpc_alloc_task()
  2022-03-22  1:16 ` [PATCH v2 1/9] NFS: Fix memory allocation in rpc_malloc() trondmy
@ 2022-03-22  1:16   ` trondmy
  2022-03-22  1:16     ` [PATCH v2 3/9] SUNRPC: Fix unx_lookup_cred() allocation trondmy
  0 siblings, 1 reply; 14+ messages in thread
From: trondmy @ 2022-03-22  1:16 UTC (permalink / raw)
  To: linux-nfs; +Cc: Neil Brown

From: Trond Myklebust <trond.myklebust@hammerspace.com>

As for rpc_malloc(), we first try allocating from the slab, then fall
back to a non-waiting allocation from the mempool.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 net/sunrpc/sched.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index d59a033820be..b258b87a3ec2 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -1108,10 +1108,14 @@ static void rpc_init_task(struct rpc_task *task, const struct rpc_task_setup *ta
 	rpc_init_task_statistics(task);
 }
 
-static struct rpc_task *
-rpc_alloc_task(void)
+static struct rpc_task *rpc_alloc_task(void)
 {
-	return (struct rpc_task *)mempool_alloc(rpc_task_mempool, GFP_KERNEL);
+	struct rpc_task *task;
+
+	task = kmem_cache_alloc(rpc_task_slabp, rpc_task_gfp_mask());
+	if (task)
+		return task;
+	return mempool_alloc(rpc_task_mempool, GFP_NOWAIT);
 }
 
 /*
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 3/9] SUNRPC: Fix unx_lookup_cred() allocation
  2022-03-22  1:16   ` [PATCH v2 2/9] NFS: Fix memory allocation in rpc_alloc_task() trondmy
@ 2022-03-22  1:16     ` trondmy
  2022-03-22  1:16       ` [PATCH v2 4/9] SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent trondmy
  0 siblings, 1 reply; 14+ messages in thread
From: trondmy @ 2022-03-22  1:16 UTC (permalink / raw)
  To: linux-nfs; +Cc: Neil Brown

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Default to the same mempool allocation strategy as for rpc_malloc().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 net/sunrpc/auth_unix.c | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/net/sunrpc/auth_unix.c b/net/sunrpc/auth_unix.c
index c629d366030e..1e091d3fa607 100644
--- a/net/sunrpc/auth_unix.c
+++ b/net/sunrpc/auth_unix.c
@@ -40,17 +40,19 @@ unx_destroy(struct rpc_auth *auth)
 /*
  * Lookup AUTH_UNIX creds for current process
  */
-static struct rpc_cred *
-unx_lookup_cred(struct rpc_auth *auth, struct auth_cred *acred, int flags)
+static struct rpc_cred *unx_lookup_cred(struct rpc_auth *auth,
+					struct auth_cred *acred, int flags)
 {
-	gfp_t gfp = GFP_KERNEL;
 	struct rpc_cred *ret;
 
-	if (flags & RPCAUTH_LOOKUP_ASYNC)
-		gfp = GFP_NOWAIT | __GFP_NOWARN;
-	ret = mempool_alloc(unix_pool, gfp);
-	if (!ret)
-		return ERR_PTR(-ENOMEM);
+	ret = kmalloc(sizeof(*ret), rpc_task_gfp_mask());
+	if (!ret) {
+		if (!(flags & RPCAUTH_LOOKUP_ASYNC))
+			return ERR_PTR(-ENOMEM);
+		ret = mempool_alloc(unix_pool, GFP_NOWAIT);
+		if (!ret)
+			return ERR_PTR(-ENOMEM);
+	}
 	rpcauth_init_cred(ret, acred, auth, &unix_credops);
 	ret->cr_flags = 1UL << RPCAUTH_CRED_UPTODATE;
 	return ret;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 4/9] SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent
  2022-03-22  1:16     ` [PATCH v2 3/9] SUNRPC: Fix unx_lookup_cred() allocation trondmy
@ 2022-03-22  1:16       ` trondmy
  2022-03-22  1:16         ` [PATCH v2 5/9] NFS: nfsiod should not block forever in mempool_alloc() trondmy
  2022-03-25  1:49         ` [PATCH v2 4/9] SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent NeilBrown
  0 siblings, 2 replies; 14+ messages in thread
From: trondmy @ 2022-03-22  1:16 UTC (permalink / raw)
  To: linux-nfs; +Cc: Neil Brown

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Make sure that rpciod and xprtiod are always using the same slab
allocation modes.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 net/sunrpc/backchannel_rqst.c |  8 ++++----
 net/sunrpc/rpcb_clnt.c        |  4 ++--
 net/sunrpc/socklib.c          |  3 ++-
 net/sunrpc/xprt.c             |  5 +----
 net/sunrpc/xprtsock.c         | 11 ++++++-----
 5 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/net/sunrpc/backchannel_rqst.c b/net/sunrpc/backchannel_rqst.c
index 22a2c235abf1..5a6b61dcdf2d 100644
--- a/net/sunrpc/backchannel_rqst.c
+++ b/net/sunrpc/backchannel_rqst.c
@@ -75,9 +75,9 @@ static int xprt_alloc_xdr_buf(struct xdr_buf *buf, gfp_t gfp_flags)
 	return 0;
 }
 
-static
-struct rpc_rqst *xprt_alloc_bc_req(struct rpc_xprt *xprt, gfp_t gfp_flags)
+static struct rpc_rqst *xprt_alloc_bc_req(struct rpc_xprt *xprt)
 {
+	gfp_t gfp_flags = GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN;
 	struct rpc_rqst *req;
 
 	/* Pre-allocate one backchannel rpc_rqst */
@@ -154,7 +154,7 @@ int xprt_setup_bc(struct rpc_xprt *xprt, unsigned int min_reqs)
 	INIT_LIST_HEAD(&tmp_list);
 	for (i = 0; i < min_reqs; i++) {
 		/* Pre-allocate one backchannel rpc_rqst */
-		req = xprt_alloc_bc_req(xprt, GFP_KERNEL);
+		req = xprt_alloc_bc_req(xprt);
 		if (req == NULL) {
 			printk(KERN_ERR "Failed to create bc rpc_rqst\n");
 			goto out_free;
@@ -343,7 +343,7 @@ struct rpc_rqst *xprt_lookup_bc_request(struct rpc_xprt *xprt, __be32 xid)
 			break;
 		} else if (req)
 			break;
-		new = xprt_alloc_bc_req(xprt, GFP_KERNEL);
+		new = xprt_alloc_bc_req(xprt);
 	} while (new);
 	return req;
 }
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 0fdeb8666bfd..5a8e6d46809a 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -714,7 +714,7 @@ void rpcb_getport_async(struct rpc_task *task)
 		goto bailout_nofree;
 	}
 
-	map = kzalloc(sizeof(struct rpcbind_args), GFP_KERNEL);
+	map = kzalloc(sizeof(struct rpcbind_args), rpc_task_gfp_mask());
 	if (!map) {
 		status = -ENOMEM;
 		goto bailout_release_client;
@@ -730,7 +730,7 @@ void rpcb_getport_async(struct rpc_task *task)
 	case RPCBVERS_4:
 	case RPCBVERS_3:
 		map->r_netid = xprt->address_strings[RPC_DISPLAY_NETID];
-		map->r_addr = rpc_sockaddr2uaddr(sap, GFP_KERNEL);
+		map->r_addr = rpc_sockaddr2uaddr(sap, rpc_task_gfp_mask());
 		if (!map->r_addr) {
 			status = -ENOMEM;
 			goto bailout_free_args;
diff --git a/net/sunrpc/socklib.c b/net/sunrpc/socklib.c
index d52313af82bc..05b38bf68316 100644
--- a/net/sunrpc/socklib.c
+++ b/net/sunrpc/socklib.c
@@ -15,6 +15,7 @@
 #include <linux/pagemap.h>
 #include <linux/udp.h>
 #include <linux/sunrpc/msg_prot.h>
+#include <linux/sunrpc/sched.h>
 #include <linux/sunrpc/xdr.h>
 #include <linux/export.h>
 
@@ -222,7 +223,7 @@ static int xprt_send_pagedata(struct socket *sock, struct msghdr *msg,
 {
 	int err;
 
-	err = xdr_alloc_bvec(xdr, GFP_KERNEL);
+	err = xdr_alloc_bvec(xdr, rpc_task_gfp_mask());
 	if (err < 0)
 		return err;
 
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index bbe913121f43..744c6c1d536f 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1679,15 +1679,12 @@ static bool xprt_throttle_congested(struct rpc_xprt *xprt, struct rpc_task *task
 static struct rpc_rqst *xprt_dynamic_alloc_slot(struct rpc_xprt *xprt)
 {
 	struct rpc_rqst *req = ERR_PTR(-EAGAIN);
-	gfp_t gfp_mask = GFP_KERNEL;
 
 	if (xprt->num_reqs >= xprt->max_reqs)
 		goto out;
 	++xprt->num_reqs;
 	spin_unlock(&xprt->reserve_lock);
-	if (current->flags & PF_WQ_WORKER)
-		gfp_mask |= __GFP_NORETRY | __GFP_NOWARN;
-	req = kzalloc(sizeof(*req), gfp_mask);
+	req = kzalloc(sizeof(*req), rpc_task_gfp_mask());
 	spin_lock(&xprt->reserve_lock);
 	if (req != NULL)
 		goto out;
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 8909c768fe71..b52eaa8a0cda 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -428,9 +428,9 @@ xs_read_xdr_buf(struct socket *sock, struct msghdr *msg, int flags,
 		offset += want;
 	}
 
-	want = xs_alloc_sparse_pages(buf,
-			min_t(size_t, count - offset, buf->page_len),
-			GFP_KERNEL);
+	want = xs_alloc_sparse_pages(
+		buf, min_t(size_t, count - offset, buf->page_len),
+		GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
 	if (seek < want) {
 		ret = xs_read_bvec(sock, msg, flags, buf->bvec,
 				xdr_buf_pagecount(buf),
@@ -826,7 +826,8 @@ static void
 xs_stream_prepare_request(struct rpc_rqst *req)
 {
 	xdr_free_bvec(&req->rq_rcv_buf);
-	req->rq_task->tk_status = xdr_alloc_bvec(&req->rq_rcv_buf, GFP_KERNEL);
+	req->rq_task->tk_status = xdr_alloc_bvec(
+		&req->rq_rcv_buf, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
 }
 
 /*
@@ -2487,7 +2488,7 @@ static int bc_malloc(struct rpc_task *task)
 		return -EINVAL;
 	}
 
-	page = alloc_page(GFP_KERNEL);
+	page = alloc_page(GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
 	if (!page)
 		return -ENOMEM;
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 5/9] NFS: nfsiod should not block forever in mempool_alloc()
  2022-03-22  1:16       ` [PATCH v2 4/9] SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent trondmy
@ 2022-03-22  1:16         ` trondmy
  2022-03-22  1:16           ` [PATCH v2 6/9] NFS: Avoid writeback threads getting stuck " trondmy
  2022-03-25  1:49         ` [PATCH v2 4/9] SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent NeilBrown
  1 sibling, 1 reply; 14+ messages in thread
From: trondmy @ 2022-03-22  1:16 UTC (permalink / raw)
  To: linux-nfs; +Cc: Neil Brown

From: Trond Myklebust <trond.myklebust@hammerspace.com>

The concern is that since nfsiod is sometimes required to kick off a
commit, it can get locked up waiting forever in mempool_alloc() instead
of failing gracefully and leaving the commit until later.

Try to allocate from the slab first, with GFP_KERNEL | __GFP_NORETRY,
then fall back to a non-blocking attempt to allocate from the memory
pool.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/internal.h      |  7 +++++++
 fs/nfs/pnfs_nfs.c      |  8 ++++++--
 fs/nfs/write.c         | 24 +++++++++---------------
 include/linux/nfs_fs.h |  2 +-
 4 files changed, 23 insertions(+), 18 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 194840a97e3a..57b0497105c8 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -587,6 +587,13 @@ nfs_write_match_verf(const struct nfs_writeverf *verf,
 		!nfs_write_verifier_cmp(&req->wb_verf, &verf->verifier);
 }
 
+static inline gfp_t nfs_io_gfp_mask(void)
+{
+	if (current->flags & PF_WQ_WORKER)
+		return GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN;
+	return GFP_KERNEL;
+}
+
 /* unlink.c */
 extern struct rpc_task *
 nfs_async_rename(struct inode *old_dir, struct inode *new_dir,
diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
index 316f68f96e57..657c242a18ff 100644
--- a/fs/nfs/pnfs_nfs.c
+++ b/fs/nfs/pnfs_nfs.c
@@ -419,7 +419,7 @@ static struct nfs_commit_data *
 pnfs_bucket_fetch_commitdata(struct pnfs_commit_bucket *bucket,
 			     struct nfs_commit_info *cinfo)
 {
-	struct nfs_commit_data *data = nfs_commitdata_alloc(false);
+	struct nfs_commit_data *data = nfs_commitdata_alloc();
 
 	if (!data)
 		return NULL;
@@ -515,7 +515,11 @@ pnfs_generic_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
 	unsigned int nreq = 0;
 
 	if (!list_empty(mds_pages)) {
-		data = nfs_commitdata_alloc(true);
+		data = nfs_commitdata_alloc();
+		if (!data) {
+			nfs_retry_commit(mds_pages, NULL, cinfo, -1);
+			return -ENOMEM;
+		}
 		data->ds_commit_index = -1;
 		list_splice_init(mds_pages, &data->pages);
 		list_add_tail(&data->list, &list);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 599a82406d38..ef47e3700e4b 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -70,27 +70,17 @@ static mempool_t *nfs_wdata_mempool;
 static struct kmem_cache *nfs_cdata_cachep;
 static mempool_t *nfs_commit_mempool;
 
-struct nfs_commit_data *nfs_commitdata_alloc(bool never_fail)
+struct nfs_commit_data *nfs_commitdata_alloc(void)
 {
 	struct nfs_commit_data *p;
 
-	if (never_fail)
-		p = mempool_alloc(nfs_commit_mempool, GFP_NOIO);
-	else {
-		/* It is OK to do some reclaim, not no safe to wait
-		 * for anything to be returned to the pool.
-		 * mempool_alloc() cannot handle that particular combination,
-		 * so we need two separate attempts.
-		 */
+	p = kmem_cache_zalloc(nfs_cdata_cachep, nfs_io_gfp_mask());
+	if (!p) {
 		p = mempool_alloc(nfs_commit_mempool, GFP_NOWAIT);
-		if (!p)
-			p = kmem_cache_alloc(nfs_cdata_cachep, GFP_NOIO |
-					     __GFP_NOWARN | __GFP_NORETRY);
 		if (!p)
 			return NULL;
+		memset(p, 0, sizeof(*p));
 	}
-
-	memset(p, 0, sizeof(*p));
 	INIT_LIST_HEAD(&p->pages);
 	return p;
 }
@@ -1826,7 +1816,11 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
 	if (list_empty(head))
 		return 0;
 
-	data = nfs_commitdata_alloc(true);
+	data = nfs_commitdata_alloc();
+	if (!data) {
+		nfs_retry_commit(head, NULL, cinfo, -1);
+		return -ENOMEM;
+	}
 
 	/* Set up the argument struct */
 	nfs_init_commit(data, head, NULL, cinfo);
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index c47c448befc8..db305abafc9e 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -580,7 +580,7 @@ extern int nfs_wb_all(struct inode *inode);
 extern int nfs_wb_page(struct inode *inode, struct page *page);
 extern int nfs_wb_page_cancel(struct inode *inode, struct page* page);
 extern int  nfs_commit_inode(struct inode *, int);
-extern struct nfs_commit_data *nfs_commitdata_alloc(bool never_fail);
+extern struct nfs_commit_data *nfs_commitdata_alloc(void);
 extern void nfs_commit_free(struct nfs_commit_data *data);
 bool nfs_commit_end(struct nfs_mds_commit_info *cinfo);
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 6/9] NFS: Avoid writeback threads getting stuck in mempool_alloc()
  2022-03-22  1:16         ` [PATCH v2 5/9] NFS: nfsiod should not block forever in mempool_alloc() trondmy
@ 2022-03-22  1:16           ` trondmy
  2022-03-22  1:16             ` [PATCH v2 7/9] NFSv4/pnfs: Ensure pNFS allocation modes are consistent with nfsiod trondmy
  0 siblings, 1 reply; 14+ messages in thread
From: trondmy @ 2022-03-22  1:16 UTC (permalink / raw)
  To: linux-nfs; +Cc: Neil Brown

From: Trond Myklebust <trond.myklebust@hammerspace.com>

In a low memory situation, allow the NFS writeback code to fail without
getting stuck in infinite loops in mempool_alloc().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/pagelist.c | 10 +++++-----
 fs/nfs/write.c    | 10 ++++++++--
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index ad7f83dc9a2d..3156db526cc4 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -90,10 +90,10 @@ void nfs_set_pgio_error(struct nfs_pgio_header *hdr, int error, loff_t pos)
 	}
 }
 
-static inline struct nfs_page *
-nfs_page_alloc(void)
+static inline struct nfs_page *nfs_page_alloc(void)
 {
-	struct nfs_page	*p = kmem_cache_zalloc(nfs_page_cachep, GFP_KERNEL);
+	struct nfs_page *p =
+		kmem_cache_zalloc(nfs_page_cachep, nfs_io_gfp_mask());
 	if (p)
 		INIT_LIST_HEAD(&p->wb_list);
 	return p;
@@ -892,7 +892,7 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *desc,
 	struct nfs_commit_info cinfo;
 	struct nfs_page_array *pg_array = &hdr->page_array;
 	unsigned int pagecount, pageused;
-	gfp_t gfp_flags = GFP_KERNEL;
+	gfp_t gfp_flags = nfs_io_gfp_mask();
 
 	pagecount = nfs_page_array_len(mirror->pg_base, mirror->pg_count);
 	pg_array->npages = pagecount;
@@ -979,7 +979,7 @@ nfs_pageio_alloc_mirrors(struct nfs_pageio_descriptor *desc,
 	desc->pg_mirrors_dynamic = NULL;
 	if (mirror_count == 1)
 		return desc->pg_mirrors_static;
-	ret = kmalloc_array(mirror_count, sizeof(*ret), GFP_KERNEL);
+	ret = kmalloc_array(mirror_count, sizeof(*ret), nfs_io_gfp_mask());
 	if (ret != NULL) {
 		for (i = 0; i < mirror_count; i++)
 			nfs_pageio_mirror_init(&ret[i], desc->pg_bsize);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index ef47e3700e4b..e864ac836237 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -94,9 +94,15 @@ EXPORT_SYMBOL_GPL(nfs_commit_free);
 
 static struct nfs_pgio_header *nfs_writehdr_alloc(void)
 {
-	struct nfs_pgio_header *p = mempool_alloc(nfs_wdata_mempool, GFP_KERNEL);
+	struct nfs_pgio_header *p;
 
-	memset(p, 0, sizeof(*p));
+	p = kmem_cache_zalloc(nfs_wdata_cachep, nfs_io_gfp_mask());
+	if (!p) {
+		p = mempool_alloc(nfs_wdata_mempool, GFP_NOWAIT);
+		if (!p)
+			return NULL;
+		memset(p, 0, sizeof(*p));
+	}
 	p->rw_mode = FMODE_WRITE;
 	return p;
 }
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 7/9] NFSv4/pnfs: Ensure pNFS allocation modes are consistent with nfsiod
  2022-03-22  1:16           ` [PATCH v2 6/9] NFS: Avoid writeback threads getting stuck " trondmy
@ 2022-03-22  1:16             ` trondmy
  2022-03-22  1:16               ` [PATCH v2 8/9] pNFS/flexfiles: " trondmy
  0 siblings, 1 reply; 14+ messages in thread
From: trondmy @ 2022-03-22  1:16 UTC (permalink / raw)
  To: linux-nfs; +Cc: Neil Brown

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Ensure that pNFS allocations that can be called from rpciod/nfsiod
callback can fail in low memory mode, so that the threads don't block
and loop forever.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs42proc.c |  2 +-
 fs/nfs/pnfs.c      | 39 +++++++++++++++++----------------------
 2 files changed, 18 insertions(+), 23 deletions(-)

diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index 882bf84484ac..b841e267b764 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -1017,7 +1017,7 @@ int nfs42_proc_layouterror(struct pnfs_layout_segment *lseg,
 		return -EOPNOTSUPP;
 	if (n > NFS42_LAYOUTERROR_MAX)
 		return -EINVAL;
-	data = nfs42_alloc_layouterror_data(lseg, GFP_KERNEL);
+	data = nfs42_alloc_layouterror_data(lseg, nfs_io_gfp_mask());
 	if (!data)
 		return -ENOMEM;
 	for (i = 0; i < n; i++) {
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index f089e11fd001..de318bb5d349 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1233,7 +1233,7 @@ pnfs_send_layoutreturn(struct pnfs_layout_hdr *lo,
 	int status = 0;
 
 	*pcred = NULL;
-	lrp = kzalloc(sizeof(*lrp), GFP_KERNEL);
+	lrp = kzalloc(sizeof(*lrp), nfs_io_gfp_mask());
 	if (unlikely(lrp == NULL)) {
 		status = -ENOMEM;
 		spin_lock(&ino->i_lock);
@@ -2206,7 +2206,7 @@ _pnfs_grab_empty_layout(struct inode *ino, struct nfs_open_context *ctx)
 	struct pnfs_layout_hdr *lo;
 
 	spin_lock(&ino->i_lock);
-	lo = pnfs_find_alloc_layout(ino, ctx, GFP_KERNEL);
+	lo = pnfs_find_alloc_layout(ino, ctx, nfs_io_gfp_mask());
 	if (!lo)
 		goto out_unlock;
 	if (!test_bit(NFS_LAYOUT_INVALID_STID, &lo->plh_flags))
@@ -2249,8 +2249,8 @@ static void _lgopen_prepare_attached(struct nfs4_opendata *data,
 	lo = _pnfs_grab_empty_layout(ino, ctx);
 	if (!lo)
 		return;
-	lgp = pnfs_alloc_init_layoutget_args(ino, ctx, &current_stateid,
-					     &rng, GFP_KERNEL);
+	lgp = pnfs_alloc_init_layoutget_args(ino, ctx, &current_stateid, &rng,
+					     nfs_io_gfp_mask());
 	if (!lgp) {
 		pnfs_clear_first_layoutget(lo);
 		nfs_layoutget_end(lo);
@@ -2275,8 +2275,8 @@ static void _lgopen_prepare_floating(struct nfs4_opendata *data,
 	};
 	struct nfs4_layoutget *lgp;
 
-	lgp = pnfs_alloc_init_layoutget_args(ino, ctx, &current_stateid,
-					     &rng, GFP_KERNEL);
+	lgp = pnfs_alloc_init_layoutget_args(ino, ctx, &current_stateid, &rng,
+					     nfs_io_gfp_mask());
 	if (!lgp)
 		return;
 	data->lgp = lgp;
@@ -2691,13 +2691,11 @@ pnfs_generic_pg_init_read(struct nfs_pageio_descriptor *pgio, struct nfs_page *r
 		else
 			rd_size = nfs_dreq_bytes_left(pgio->pg_dreq);
 
-		pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
-						   nfs_req_openctx(req),
-						   req_offset(req),
-						   rd_size,
-						   IOMODE_READ,
-						   false,
-						   GFP_KERNEL);
+		pgio->pg_lseg =
+			pnfs_update_layout(pgio->pg_inode, nfs_req_openctx(req),
+					   req_offset(req), rd_size,
+					   IOMODE_READ, false,
+					   nfs_io_gfp_mask());
 		if (IS_ERR(pgio->pg_lseg)) {
 			pgio->pg_error = PTR_ERR(pgio->pg_lseg);
 			pgio->pg_lseg = NULL;
@@ -2718,13 +2716,10 @@ pnfs_generic_pg_init_write(struct nfs_pageio_descriptor *pgio,
 	pnfs_generic_pg_check_layout(pgio);
 	pnfs_generic_pg_check_range(pgio, req);
 	if (pgio->pg_lseg == NULL) {
-		pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
-						   nfs_req_openctx(req),
-						   req_offset(req),
-						   wb_size,
-						   IOMODE_RW,
-						   false,
-						   GFP_KERNEL);
+		pgio->pg_lseg =
+			pnfs_update_layout(pgio->pg_inode, nfs_req_openctx(req),
+					   req_offset(req), wb_size, IOMODE_RW,
+					   false, nfs_io_gfp_mask());
 		if (IS_ERR(pgio->pg_lseg)) {
 			pgio->pg_error = PTR_ERR(pgio->pg_lseg);
 			pgio->pg_lseg = NULL;
@@ -3183,7 +3178,7 @@ pnfs_layoutcommit_inode(struct inode *inode, bool sync)
 
 	status = -ENOMEM;
 	/* Note kzalloc ensures data->res.seq_res.sr_slot == NULL */
-	data = kzalloc(sizeof(*data), GFP_NOFS);
+	data = kzalloc(sizeof(*data), nfs_io_gfp_mask());
 	if (!data)
 		goto clear_layoutcommitting;
 
@@ -3250,7 +3245,7 @@ struct nfs4_threshold *pnfs_mdsthreshold_alloc(void)
 {
 	struct nfs4_threshold *thp;
 
-	thp = kzalloc(sizeof(*thp), GFP_KERNEL);
+	thp = kzalloc(sizeof(*thp), nfs_io_gfp_mask());
 	if (!thp) {
 		dprintk("%s mdsthreshold allocation failed\n", __func__);
 		return NULL;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 8/9] pNFS/flexfiles: Ensure pNFS allocation modes are consistent with nfsiod
  2022-03-22  1:16             ` [PATCH v2 7/9] NFSv4/pnfs: Ensure pNFS allocation modes are consistent with nfsiod trondmy
@ 2022-03-22  1:16               ` trondmy
  2022-03-22  1:16                 ` [PATCH v2 9/9] pNFS/files: " trondmy
  0 siblings, 1 reply; 14+ messages in thread
From: trondmy @ 2022-03-22  1:16 UTC (permalink / raw)
  To: linux-nfs; +Cc: Neil Brown

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Ensure that pNFS flexfile allocations in rpciod/nfsiod callbacks can
fail in low memory mode, so that the threads don't block and loop
forever.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/flexfilelayout/flexfilelayout.c | 50 +++++++++++---------------
 1 file changed, 21 insertions(+), 29 deletions(-)

diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index e28f2177afb7..604be402ae13 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -663,7 +663,7 @@ nfs4_ff_layout_stat_io_start_read(struct inode *inode,
 	spin_unlock(&mirror->lock);
 
 	if (report)
-		pnfs_report_layoutstat(inode, GFP_KERNEL);
+		pnfs_report_layoutstat(inode, nfs_io_gfp_mask());
 }
 
 static void
@@ -694,7 +694,7 @@ nfs4_ff_layout_stat_io_start_write(struct inode *inode,
 	spin_unlock(&mirror->lock);
 
 	if (report)
-		pnfs_report_layoutstat(inode, GFP_KERNEL);
+		pnfs_report_layoutstat(inode, nfs_io_gfp_mask());
 }
 
 static void
@@ -806,13 +806,10 @@ ff_layout_pg_get_read(struct nfs_pageio_descriptor *pgio,
 		      bool strict_iomode)
 {
 	pnfs_put_lseg(pgio->pg_lseg);
-	pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
-					   nfs_req_openctx(req),
-					   req_offset(req),
-					   req->wb_bytes,
-					   IOMODE_READ,
-					   strict_iomode,
-					   GFP_KERNEL);
+	pgio->pg_lseg =
+		pnfs_update_layout(pgio->pg_inode, nfs_req_openctx(req),
+				   req_offset(req), req->wb_bytes, IOMODE_READ,
+				   strict_iomode, nfs_io_gfp_mask());
 	if (IS_ERR(pgio->pg_lseg)) {
 		pgio->pg_error = PTR_ERR(pgio->pg_lseg);
 		pgio->pg_lseg = NULL;
@@ -894,13 +891,10 @@ ff_layout_pg_init_write(struct nfs_pageio_descriptor *pgio,
 retry:
 	ff_layout_pg_check_layout(pgio, req);
 	if (!pgio->pg_lseg) {
-		pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
-						   nfs_req_openctx(req),
-						   req_offset(req),
-						   req->wb_bytes,
-						   IOMODE_RW,
-						   false,
-						   GFP_KERNEL);
+		pgio->pg_lseg =
+			pnfs_update_layout(pgio->pg_inode, nfs_req_openctx(req),
+					   req_offset(req), req->wb_bytes,
+					   IOMODE_RW, false, nfs_io_gfp_mask());
 		if (IS_ERR(pgio->pg_lseg)) {
 			pgio->pg_error = PTR_ERR(pgio->pg_lseg);
 			pgio->pg_lseg = NULL;
@@ -953,13 +947,10 @@ ff_layout_pg_get_mirror_count_write(struct nfs_pageio_descriptor *pgio,
 				    struct nfs_page *req)
 {
 	if (!pgio->pg_lseg) {
-		pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
-						   nfs_req_openctx(req),
-						   req_offset(req),
-						   req->wb_bytes,
-						   IOMODE_RW,
-						   false,
-						   GFP_KERNEL);
+		pgio->pg_lseg =
+			pnfs_update_layout(pgio->pg_inode, nfs_req_openctx(req),
+					   req_offset(req), req->wb_bytes,
+					   IOMODE_RW, false, nfs_io_gfp_mask());
 		if (IS_ERR(pgio->pg_lseg)) {
 			pgio->pg_error = PTR_ERR(pgio->pg_lseg);
 			pgio->pg_lseg = NULL;
@@ -1258,7 +1249,7 @@ static void ff_layout_io_track_ds_error(struct pnfs_layout_segment *lseg,
 	mirror = FF_LAYOUT_COMP(lseg, idx);
 	err = ff_layout_track_ds_error(FF_LAYOUT_FROM_HDR(lseg->pls_layout),
 				       mirror, offset, length, status, opnum,
-				       GFP_KERNEL);
+				       nfs_io_gfp_mask());
 
 	switch (status) {
 	case NFS4ERR_DELAY:
@@ -1973,7 +1964,8 @@ ff_layout_setup_ds_info(struct pnfs_ds_commit_info *fl_cinfo,
 	struct inode *inode = lseg->pls_layout->plh_inode;
 	struct pnfs_commit_array *array, *new;
 
-	new = pnfs_alloc_commit_array(flseg->mirror_array_cnt, GFP_KERNEL);
+	new = pnfs_alloc_commit_array(flseg->mirror_array_cnt,
+				      nfs_io_gfp_mask());
 	if (new) {
 		spin_lock(&inode->i_lock);
 		array = pnfs_add_commit_array(fl_cinfo, new, lseg);
@@ -2152,10 +2144,10 @@ ff_layout_prepare_layoutreturn(struct nfs4_layoutreturn_args *args)
 	struct nfs4_flexfile_layoutreturn_args *ff_args;
 	struct nfs4_flexfile_layout *ff_layout = FF_LAYOUT_FROM_HDR(args->layout);
 
-	ff_args = kmalloc(sizeof(*ff_args), GFP_KERNEL);
+	ff_args = kmalloc(sizeof(*ff_args), nfs_io_gfp_mask());
 	if (!ff_args)
 		goto out_nomem;
-	ff_args->pages[0] = alloc_page(GFP_KERNEL);
+	ff_args->pages[0] = alloc_page(nfs_io_gfp_mask());
 	if (!ff_args->pages[0])
 		goto out_nomem_free;
 
@@ -2193,7 +2185,7 @@ ff_layout_send_layouterror(struct pnfs_layout_segment *lseg)
 		return;
 
 	errors = kmalloc_array(NFS42_LAYOUTERROR_MAX, sizeof(*errors),
-			       GFP_KERNEL);
+			       nfs_io_gfp_mask());
 	if (errors != NULL) {
 		const struct nfs4_ff_layout_ds_err *pos;
 		size_t n = 0;
@@ -2445,7 +2437,7 @@ ff_layout_prepare_layoutstats(struct nfs42_layoutstat_args *args)
 
 	/* For now, send at most PNFS_LAYOUTSTATS_MAXDEV statistics */
 	args->devinfo = kmalloc_array(dev_count, sizeof(*args->devinfo),
-				      GFP_KERNEL);
+				      nfs_io_gfp_mask());
 	if (!args->devinfo)
 		return -ENOMEM;
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v2 9/9] pNFS/files: Ensure pNFS allocation modes are consistent with nfsiod
  2022-03-22  1:16               ` [PATCH v2 8/9] pNFS/flexfiles: " trondmy
@ 2022-03-22  1:16                 ` trondmy
  0 siblings, 0 replies; 14+ messages in thread
From: trondmy @ 2022-03-22  1:16 UTC (permalink / raw)
  To: linux-nfs; +Cc: Neil Brown

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Ensure that pNFS file commit allocations in rpciod/nfsiod callbacks can
fail in low memory mode, so that the threads don't block and loop
forever.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/filelayout/filelayout.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 9c96e3e5ed35..76deddab0a8f 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -1075,7 +1075,7 @@ filelayout_setup_ds_info(struct pnfs_ds_commit_info *fl_cinfo,
 	unsigned int size = (fl->stripe_type == STRIPE_SPARSE) ?
 		fl->dsaddr->ds_num : fl->dsaddr->stripe_count;
 
-	new = pnfs_alloc_commit_array(size, GFP_NOIO);
+	new = pnfs_alloc_commit_array(size, nfs_io_gfp_mask());
 	if (new) {
 		spin_lock(&inode->i_lock);
 		array = pnfs_add_commit_array(fl_cinfo, new, lseg);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/9] Crossing our fingers is not a strategy
  2022-03-22  1:16 [PATCH v2 0/9] Crossing our fingers is not a strategy trondmy
  2022-03-22  1:16 ` [PATCH v2 1/9] NFS: Fix memory allocation in rpc_malloc() trondmy
@ 2022-03-23 22:57 ` NeilBrown
  2022-03-23 23:09   ` Trond Myklebust
  1 sibling, 1 reply; 14+ messages in thread
From: NeilBrown @ 2022-03-23 22:57 UTC (permalink / raw)
  To: trondmy; +Cc: linux-nfs

On Tue, 22 Mar 2022, trondmy@kernel.org wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> We'd like to avoid GFP_NOWAIT whenever possible, because it has no fall-
> back reclaim strategy for dealing with a failure of the initial
> allocation.

I'm not sure I entirely agree with that.  GFP_NOWAIT will ensure kswapd
runs on failure, so waiting briefly and retrying (which sunrpc does on
-ENOMEM (at least ni call_refreshresult) is a valid fallback.

However, I do like the new rpc_task_gfp_mask() and that fact that you
have used it quite widely.

So: looks good to me.  I haven't carefully reviewed each patch enough to
say Reviewed-by, but I did see an easy problems.

Thanks,
NeilBrown


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 0/9] Crossing our fingers is not a strategy
  2022-03-23 22:57 ` [PATCH v2 0/9] Crossing our fingers is not a strategy NeilBrown
@ 2022-03-23 23:09   ` Trond Myklebust
  0 siblings, 0 replies; 14+ messages in thread
From: Trond Myklebust @ 2022-03-23 23:09 UTC (permalink / raw)
  To: neilb; +Cc: linux-nfs

On Thu, 2022-03-24 at 09:57 +1100, NeilBrown wrote:
> On Tue, 22 Mar 2022, trondmy@kernel.org wrote:
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > 
> > We'd like to avoid GFP_NOWAIT whenever possible, because it has no
> > fall-
> > back reclaim strategy for dealing with a failure of the initial
> > allocation.
> 
> I'm not sure I entirely agree with that.  GFP_NOWAIT will ensure
> kswapd
> runs on failure, so waiting briefly and retrying (which sunrpc does
> on
> -ENOMEM (at least ni call_refreshresult) is a valid fallback.
> 
> However, I do like the new rpc_task_gfp_mask() and that fact that you
> have used it quite widely.
> 
> So: looks good to me.  I haven't carefully reviewed each patch enough
> to
> say Reviewed-by, but I did see an easy problems.

Thanks Neil!

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v2 4/9] SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent
  2022-03-22  1:16       ` [PATCH v2 4/9] SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent trondmy
  2022-03-22  1:16         ` [PATCH v2 5/9] NFS: nfsiod should not block forever in mempool_alloc() trondmy
@ 2022-03-25  1:49         ` NeilBrown
  1 sibling, 0 replies; 14+ messages in thread
From: NeilBrown @ 2022-03-25  1:49 UTC (permalink / raw)
  To: trondmy; +Cc: linux-nfs

On Tue, 22 Mar 2022, trondmy@kernel.org wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> Make sure that rpciod and xprtiod are always using the same slab
> allocation modes.
> 
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
....
>  xs_stream_prepare_request(struct rpc_rqst *req)
>  {
>  	xdr_free_bvec(&req->rq_rcv_buf);
> -	req->rq_task->tk_status = xdr_alloc_bvec(&req->rq_rcv_buf, GFP_KERNEL);
> +	req->rq_task->tk_status = xdr_alloc_bvec(
> +		&req->rq_rcv_buf, GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN);
>  }

I did some testing of swap-over-NFS, and got a crash quite quickly, due
to this change.
The problem is that GFP_KERNEL allocations almost never fail.
Multi-page allocations might occasionally fail, and others might fail
for a process that has been killed by the OOM killer (or maybe just has
a fatal signal pending), but in general GFP_KERNEL is more likely to
wait (and wait and wait) than to fail.
So the failure paths haven't been tested.

xs_stream_prepare_request() is called from xprt_request_prepare(), which
is called from xprt_request_enqueue_receive() which is called in
call_encode() *after* ->tk_status has been tested.
So when the above code sets ->tk_status to -ENOMEM - which is now more
likely - that fact is ignored and we get a crash

[  298.911356] Workqueue: xprtiod xs_stream_data_receive_workfn
[  298.911696] RIP: 0010:_copy_to_iter+0x1cc/0x435
..
[  298.918259]  __skb_datagram_iter+0x64/0x225
[  298.918507]  skb_copy_datagram_iter+0xe9/0xf2
[  298.918767]  tcp_recvmsg_locked+0x653/0x77e
[  298.919015]  tcp_recvmsg+0x100/0x188
[  298.919226]  inet_recvmsg+0x5d/0x86
[  298.919431]  xs_read_stream_request.constprop.0+0x247/0x378
[  298.919754]  xs_read_stream.constprop.0+0x1c2/0x39b
[  298.920038]  xs_stream_data_receive_workfn+0x50/0x160
[  298.920331]  process_one_work+0x267/0x422
[  298.920568]  worker_thread+0x193/0x234

So we really need to audit all these places where we add __GFP_NORETRY
and ensure errors are actually handled.

For call_encode(), it might be easiest to move
	/* Add task to reply queue before transmission to avoid races */
	if (rpc_reply_expected(task))
		xprt_request_enqueue_receive(task);

up before the
	/* Did the encode result in an error condition? */
	if (task->tk_status != 0) {

and change it to

	/* Add task to reply queue before transmission to avoid races */
	if (task->tk_status == 0 && rpc_reply_expected(task))
		xprt_request_enqueue_receive(task);

I'll try a bit more testing and auditing.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v2 0/9] Crossing our fingers is not a strategy
@ 2022-03-22  1:47 trondmy
  0 siblings, 0 replies; 14+ messages in thread
From: trondmy @ 2022-03-22  1:47 UTC (permalink / raw)
  To: linux-nfs; +Cc: Chuck Lever, Enrico Scholz

From: Trond Myklebust <trond.myklebust@hammerspace.com>

We'd like to avoid GFP_NOWAIT whenever possible, because it has no fall-
back reclaim strategy for dealing with a failure of the initial
allocation.
At the same time, memory pools appear to be suboptimal as an allocation
strategy when used with anything other than GFP_NOWAIT, since they cause
threads to hang upon failure.

This patch series therefore aims to demote the mempools to being a
strategy of last resort, with the primary allocation strategy being to
use the underlying slabs.
While we're at it, we want to ensure that rpciod, xprtiod and nfsiod all
use the same allocation strategy, so that the latter two don't thwart
our ability to complete writeback RPC calls by getting blocked in the mm
layer.

Trond Myklebust (9):
  NFS: Fix memory allocation in rpc_malloc()
  NFS: Fix memory allocation in rpc_alloc_task()
  SUNRPC: Fix unx_lookup_cred() allocation
  SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent
  NFS: nfsiod should not block forever in mempool_alloc()
  NFS: Avoid writeback threads getting stuck in mempool_alloc()
  NFSv4/pnfs: Ensure pNFS allocation modes are consistent with nfsiod
  pNFS/flexfiles: Ensure pNFS allocation modes are consistent with
    nfsiod
  pNFS/files: Ensure pNFS allocation modes are consistent with nfsiod

 fs/nfs/filelayout/filelayout.c         |  2 +-
 fs/nfs/flexfilelayout/flexfilelayout.c | 50 +++++++++++---------------
 fs/nfs/internal.h                      |  7 ++++
 fs/nfs/nfs42proc.c                     |  2 +-
 fs/nfs/pagelist.c                      | 10 +++---
 fs/nfs/pnfs.c                          | 39 +++++++++-----------
 fs/nfs/pnfs_nfs.c                      |  8 +++--
 fs/nfs/write.c                         | 34 +++++++++---------
 include/linux/nfs_fs.h                 |  2 +-
 include/linux/sunrpc/sched.h           |  1 +
 net/sunrpc/auth_unix.c                 | 18 +++++-----
 net/sunrpc/backchannel_rqst.c          |  8 ++---
 net/sunrpc/rpcb_clnt.c                 |  4 +--
 net/sunrpc/sched.c                     | 31 ++++++++++------
 net/sunrpc/socklib.c                   |  3 +-
 net/sunrpc/xprt.c                      |  5 +--
 net/sunrpc/xprtsock.c                  | 11 +++---
 17 files changed, 123 insertions(+), 112 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-03-25  1:49 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-22  1:16 [PATCH v2 0/9] Crossing our fingers is not a strategy trondmy
2022-03-22  1:16 ` [PATCH v2 1/9] NFS: Fix memory allocation in rpc_malloc() trondmy
2022-03-22  1:16   ` [PATCH v2 2/9] NFS: Fix memory allocation in rpc_alloc_task() trondmy
2022-03-22  1:16     ` [PATCH v2 3/9] SUNRPC: Fix unx_lookup_cred() allocation trondmy
2022-03-22  1:16       ` [PATCH v2 4/9] SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent trondmy
2022-03-22  1:16         ` [PATCH v2 5/9] NFS: nfsiod should not block forever in mempool_alloc() trondmy
2022-03-22  1:16           ` [PATCH v2 6/9] NFS: Avoid writeback threads getting stuck " trondmy
2022-03-22  1:16             ` [PATCH v2 7/9] NFSv4/pnfs: Ensure pNFS allocation modes are consistent with nfsiod trondmy
2022-03-22  1:16               ` [PATCH v2 8/9] pNFS/flexfiles: " trondmy
2022-03-22  1:16                 ` [PATCH v2 9/9] pNFS/files: " trondmy
2022-03-25  1:49         ` [PATCH v2 4/9] SUNRPC: Make the rpciod and xprtiod slab allocation modes consistent NeilBrown
2022-03-23 22:57 ` [PATCH v2 0/9] Crossing our fingers is not a strategy NeilBrown
2022-03-23 23:09   ` Trond Myklebust
2022-03-22  1:47 trondmy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).