[PATCH 0/2] NFSv4.x client improvements for knfsd re-exporting

linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/2] NFSv4.x client improvements for knfsd re-exporting
@ 2023-09-11 18:50 trondmy
  2023-09-11 18:50 ` [PATCH 1/2] NFSv4: Add a parameter to limit the number of retries after NFS4ERR_DELAY trondmy
  2023-09-12  9:40 ` [PATCH 0/2] NFSv4.x client improvements for knfsd re-exporting Daire Byrne
  0 siblings, 2 replies; 4+ messages in thread
From: trondmy @ 2023-09-11 18:50 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

When re-exporting a NFSv4.x filesystem through knfsd, we want to ensure
that the individual knfsd threads don't get stuck waiting for the server
in a NFS4ERR_DELAY loop. While it may make sense to have the re-exported
client retry a few times, particularly when the clients are using NFSv3,
ultimately we want to just punt a EAGAIN back to knfsd, so that it can
return NFS4ERR_DELAY/NFS3ERR_JUKEBOX, and free up the thread.

With that in mind, add a client module parameter, 'delay_retrans', that
specifies how many times a 'softerr' mounted NFSv4 filesystem should
retry before returning EAGAIN.
In order to avoid disrupting existing setups, the feature is disabled by
default, however it can be enabled by specifying a positive value for
the new parameter.

Trond Myklebust (2):
  NFSv4: Add a parameter to limit the number of retries after
    NFS4ERR_DELAY
  NFSv4/pnfs: Allow layoutget to return EAGAIN for softerr mounts

 .../admin-guide/kernel-parameters.txt         |  7 +++
 fs/nfs/nfs4_fs.h                              |  2 +
 fs/nfs/nfs4proc.c                             | 43 +++++++++++++++----
 fs/nfs/pnfs.c                                 |  8 +++-
 fs/nfs/pnfs.h                                 |  5 ++-
 fs/nfs/super.c                                |  8 +++-
 fs/nfs/write.c                                |  2 +
 7 files changed, 63 insertions(+), 12 deletions(-)

-- 
2.41.0

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH 1/2] NFSv4: Add a parameter to limit the number of retries after NFS4ERR_DELAY
  2023-09-11 18:50 [PATCH 0/2] NFSv4.x client improvements for knfsd re-exporting trondmy
@ 2023-09-11 18:50 ` trondmy
  2023-09-11 18:50   ` [PATCH 2/2] NFSv4/pnfs: Allow layoutget to return EAGAIN for softerr mounts trondmy
  2023-09-12  9:40 ` [PATCH 0/2] NFSv4.x client improvements for knfsd re-exporting Daire Byrne
  1 sibling, 1 reply; 4+ messages in thread
From: trondmy @ 2023-09-11 18:50 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

When using a 'softerr' mount, the NFSv4 client can get stuck waiting
forever while the server just returns NFS4ERR_DELAY. Among other things,
this causes the knfsd server threads to busy wait.
Add a parameter that tells the NFSv4 client how many times to retry
before giving up.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 .../admin-guide/kernel-parameters.txt         |  7 ++++++
 fs/nfs/nfs4_fs.h                              |  2 ++
 fs/nfs/nfs4proc.c                             | 25 +++++++++++++++++++
 fs/nfs/super.c                                |  8 +++++-
 4 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 722b6eca2e93..62af591ee2ad 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3517,6 +3517,13 @@
 			[NFS] set the TCP port on which the NFSv4 callback
 			channel should listen.
 
+	nfs.delay_retrans=
+			[NFS] specifies the number of times the NFSv4 client
+			retries the request before returning an EAGAIN error,
+			after a reply of NFS4ERR_DELAY from the server.
+			Only applies if the softerr mount option is enabled,
+			and the specified value is >= 0.
+
 	nfs.enable_ino64=
 			[NFS] enable 64-bit inode numbers.
 			If zero, the NFS client will fake up a 32-bit inode
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 4c9f8bd866ab..29b91482e5f4 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -209,6 +209,7 @@ struct nfs4_exception {
 	struct inode *inode;
 	nfs4_stateid *stateid;
 	long timeout;
+	unsigned short retrans;
 	unsigned char task_is_privileged : 1;
 	unsigned char delay : 1,
 		      recovering : 1,
@@ -546,6 +547,7 @@ extern unsigned short max_session_slots;
 extern unsigned short max_session_cb_slots;
 extern unsigned short send_implementation_id;
 extern bool recover_lost_locks;
+extern short nfs_delay_retrans;
 
 #define NFS4_CLIENT_ID_UNIQ_LEN		(64)
 extern char nfs4_client_id_uniquifier[NFS4_CLIENT_ID_UNIQ_LEN];
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 832fa226b8f2..99c054a6c7b5 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -585,6 +585,21 @@ static int nfs4_do_handle_exception(struct nfs_server *server,
 	return 0;
 }
 
+/*
+ * Track the number of NFS4ERR_DELAY related retransmissions and return
+ * EAGAIN if the 'softerr' mount option is set, and we've exceeded the limit
+ * set by 'nfs_delay_retrans'.
+ */
+static int nfs4_exception_should_retrans(const struct nfs_server *server,
+					 struct nfs4_exception *exception)
+{
+	if (server->flags & NFS_MOUNT_SOFTERR && nfs_delay_retrans >= 0) {
+		if (exception->retrans++ >= (unsigned short)nfs_delay_retrans)
+			return -EAGAIN;
+	}
+	return 0;
+}
+
 /* This is the error handling routine for processes that are allowed
  * to sleep.
  */
@@ -595,6 +610,11 @@ int nfs4_handle_exception(struct nfs_server *server, int errorcode, struct nfs4_
 
 	ret = nfs4_do_handle_exception(server, errorcode, exception);
 	if (exception->delay) {
+		int ret2 = nfs4_exception_should_retrans(server, exception);
+		if (ret2 < 0) {
+			exception->retry = 0;
+			return ret2;
+		}
 		ret = nfs4_delay(&exception->timeout,
 				exception->interruptible);
 		goto out_retry;
@@ -623,6 +643,11 @@ nfs4_async_handle_exception(struct rpc_task *task, struct nfs_server *server,
 
 	ret = nfs4_do_handle_exception(server, errorcode, exception);
 	if (exception->delay) {
+		int ret2 = nfs4_exception_should_retrans(server, exception);
+		if (ret2 < 0) {
+			exception->retry = 0;
+			return ret2;
+		}
 		rpc_delay(task, nfs4_update_delay(&exception->timeout));
 		goto out_retry;
 	}
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 2284f749d892..929e122ff34b 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -1368,6 +1368,7 @@ unsigned short max_session_cb_slots = NFS4_DEF_CB_SLOT_TABLE_SIZE;
 unsigned short send_implementation_id = 1;
 char nfs4_client_id_uniquifier[NFS4_CLIENT_ID_UNIQ_LEN] = "";
 bool recover_lost_locks = false;
+short nfs_delay_retrans = -1;
 
 EXPORT_SYMBOL_GPL(nfs_callback_nr_threads);
 EXPORT_SYMBOL_GPL(nfs_callback_set_tcpport);
@@ -1378,6 +1379,7 @@ EXPORT_SYMBOL_GPL(max_session_cb_slots);
 EXPORT_SYMBOL_GPL(send_implementation_id);
 EXPORT_SYMBOL_GPL(nfs4_client_id_uniquifier);
 EXPORT_SYMBOL_GPL(recover_lost_locks);
+EXPORT_SYMBOL_GPL(nfs_delay_retrans);
 
 #define NFS_CALLBACK_MAXPORTNR (65535U)
 
@@ -1426,5 +1428,9 @@ MODULE_PARM_DESC(recover_lost_locks,
 		 "If the server reports that a lock might be lost, "
 		 "try to recover it risking data corruption.");
 
-
+module_param_named(delay_retrans, nfs_delay_retrans, short, 0644);
+MODULE_PARM_DESC(delay_retrans,
+		 "Unless negative, specifies the number of times the NFSv4 "
+		 "client retries a request before returning an EAGAIN error, "
+		 "after a reply of NFS4ERR_DELAY from the server.");
 #endif /* CONFIG_NFS_V4 */
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] NFSv4/pnfs: Allow layoutget to return EAGAIN for softerr mounts
  2023-09-11 18:50 ` [PATCH 1/2] NFSv4: Add a parameter to limit the number of retries after NFS4ERR_DELAY trondmy
@ 2023-09-11 18:50   ` trondmy
  0 siblings, 0 replies; 4+ messages in thread
From: trondmy @ 2023-09-11 18:50 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

If we're using the 'softerr' mount option, we may want to allow
layoutget to return EAGAIN to allow knfsd server threads to return a
JUKEBOX/DELAY error to the client instead of busy waiting.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs4proc.c | 18 ++++++++++--------
 fs/nfs/pnfs.c     |  8 ++++++--
 fs/nfs/pnfs.h     |  5 ++++-
 fs/nfs/write.c    |  2 ++
 4 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 99c054a6c7b5..5deeaea8026e 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -9651,6 +9651,9 @@ nfs4_layoutget_handle_exception(struct rpc_task *task,
 
 	nfs4_sequence_free_slot(&lgp->res.seq_res);
 
+	exception->state = NULL;
+	exception->stateid = NULL;
+
 	switch (nfs4err) {
 	case 0:
 		goto out;
@@ -9746,7 +9749,8 @@ static const struct rpc_call_ops nfs4_layoutget_call_ops = {
 };
 
 struct pnfs_layout_segment *
-nfs4_proc_layoutget(struct nfs4_layoutget *lgp, long *timeout)
+nfs4_proc_layoutget(struct nfs4_layoutget *lgp,
+		    struct nfs4_exception *exception)
 {
 	struct inode *inode = lgp->args.inode;
 	struct nfs_server *server = NFS_SERVER(inode);
@@ -9766,13 +9770,10 @@ nfs4_proc_layoutget(struct nfs4_layoutget *lgp, long *timeout)
 			 RPC_TASK_MOVEABLE,
 	};
 	struct pnfs_layout_segment *lseg = NULL;
-	struct nfs4_exception exception = {
-		.inode = inode,
-		.timeout = *timeout,
-	};
 	int status = 0;
 
 	nfs4_init_sequence(&lgp->args.seq_args, &lgp->res.seq_res, 0, 0);
+	exception->retry = 0;
 
 	task = rpc_run_task(&task_setup_data);
 	if (IS_ERR(task))
@@ -9783,11 +9784,12 @@ nfs4_proc_layoutget(struct nfs4_layoutget *lgp, long *timeout)
 		goto out;
 
 	if (task->tk_status < 0) {
-		status = nfs4_layoutget_handle_exception(task, lgp, &exception);
-		*timeout = exception.timeout;
+		exception->retry = 1;
+		status = nfs4_layoutget_handle_exception(task, lgp, exception);
 	} else if (lgp->res.layoutp->len == 0) {
+		exception->retry = 1;
 		status = -EAGAIN;
-		*timeout = nfs4_update_delay(&exception.timeout);
+		nfs4_update_delay(&exception->timeout);
 	} else
 		lseg = pnfs_layout_process(lgp);
 out:
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 306cba0b9e69..63904a372b2f 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1980,7 +1980,9 @@ pnfs_update_layout(struct inode *ino,
 	struct pnfs_layout_segment *lseg = NULL;
 	struct nfs4_layoutget *lgp;
 	nfs4_stateid stateid;
-	long timeout = 0;
+	struct nfs4_exception exception = {
+		.inode = ino,
+	};
 	unsigned long giveup = jiffies + (clp->cl_lease_time << 1);
 	bool first;
 
@@ -2144,7 +2146,7 @@ pnfs_update_layout(struct inode *ino,
 	lgp->lo = lo;
 	pnfs_get_layout_hdr(lo);
 
-	lseg = nfs4_proc_layoutget(lgp, &timeout);
+	lseg = nfs4_proc_layoutget(lgp, &exception);
 	trace_pnfs_update_layout(ino, pos, count, iomode, lo, lseg,
 				 PNFS_UPDATE_LAYOUT_SEND_LAYOUTGET);
 	nfs_layoutget_end(lo);
@@ -2171,6 +2173,8 @@ pnfs_update_layout(struct inode *ino,
 			goto out_put_layout_hdr;
 		}
 		if (lseg) {
+			if (!exception.retry)
+				goto out_put_layout_hdr;
 			if (first)
 				pnfs_clear_first_layoutget(lo);
 			trace_pnfs_update_layout(ino, pos, count,
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index d886c8226d8f..db57a85500ee 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -35,6 +35,7 @@
 #include <linux/nfs_page.h>
 #include <linux/workqueue.h>
 
+struct nfs4_exception;
 struct nfs4_opendata;
 
 enum {
@@ -245,7 +246,9 @@ extern size_t max_response_pages(struct nfs_server *server);
 extern int nfs4_proc_getdeviceinfo(struct nfs_server *server,
 				   struct pnfs_device *dev,
 				   const struct cred *cred);
-extern struct pnfs_layout_segment* nfs4_proc_layoutget(struct nfs4_layoutget *lgp, long *timeout);
+extern struct pnfs_layout_segment *
+nfs4_proc_layoutget(struct nfs4_layoutget *lgp,
+		    struct nfs4_exception *exception);
 extern int nfs4_proc_layoutreturn(struct nfs4_layoutreturn *lrp, bool sync);
 
 /* pnfs.c */
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 8c1ee1a1a28f..59478a32e19f 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -739,6 +739,8 @@ int nfs_writepages(struct address_space *mapping, struct writeback_control *wbc)
 					&pgio);
 		pgio.pg_error = 0;
 		nfs_pageio_complete(&pgio);
+		if (err == -EAGAIN && mntflags & NFS_MOUNT_SOFTERR)
+			break;
 	} while (err < 0 && !nfs_error_is_fatal(err));
 	nfs_io_completion_put(ioc);
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 0/2] NFSv4.x client improvements for knfsd re-exporting
  2023-09-11 18:50 [PATCH 0/2] NFSv4.x client improvements for knfsd re-exporting trondmy
  2023-09-11 18:50 ` [PATCH 1/2] NFSv4: Add a parameter to limit the number of retries after NFS4ERR_DELAY trondmy
@ 2023-09-12  9:40 ` Daire Byrne
  1 sibling, 0 replies; 4+ messages in thread
From: Daire Byrne @ 2023-09-12  9:40 UTC (permalink / raw)
  To: trondmy; +Cc: linux-nfs

Just to say, we see similar problems with NFSv3 servers re-exported to
NFSv3 clients.

In our case we have a single server re-exporting multiple NFSv3 remote
server mounts. If one of those re-exported mounts goes "bad" (network
loss, network congestion, server load), the knfsd threads slowly get
consumed by eager clients of that (hung) mount until there are no
threads left to serve the clients of all the other mounts/servers
being re-exported by that single server (that are still good).

The "softerr" mount option on the re-export server does not help with
this and the svc_rpc_per_connection_limit can make this much worse by
allowing a handful of clients to lock up all the knfsd threads very
quickly.

Even when the conditions of that "bad" server improve, there seems to
be a feedback loop of both the re-export servers retrans and the
clients of the re-export servers retrans that means many duplicate
lookups occur for a long time - it is often quicker to just reboot
that re-export server. Even worse, these duplicate lookups can
themselves cause high ops load on the original server and so the
requests timeout and retrans etc etc.

The only thing we have found to make this a little more bearable is to
increase the timeo (>30 mins) to minimise retrans and set the
svc_rpc_per_connection_limit=4. This at least reduces the chance that
a single re-export server that is serving multiple mounts can remain
responsive for all other mounts it serves. The other option would be
to just have a unique re-export server for a single mountpoint but
there are resource constraints when you have 30+ servers and mounts to
deal with.

We are still unable to use NFSv4 for our workloads because they often
involve high latency re-export servers 150+ms away and NFSv4 re-export
server performance is still limited by parallel metadata ops:

https://lore.kernel.org/all/CAPt2mGMZh9=Vwcqjh0J4XoTu3stOnKwswdzApL4wCA_usOFV_g@mail.gmail.com/#t
https://bugzilla.linux-nfs.org/show_bug.cgi?id=375

Daire

On Mon, 11 Sept 2023 at 23:01, <trondmy@gmail.com> wrote:
>
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> When re-exporting a NFSv4.x filesystem through knfsd, we want to ensure
> that the individual knfsd threads don't get stuck waiting for the server
> in a NFS4ERR_DELAY loop. While it may make sense to have the re-exported
> client retry a few times, particularly when the clients are using NFSv3,
> ultimately we want to just punt a EAGAIN back to knfsd, so that it can
> return NFS4ERR_DELAY/NFS3ERR_JUKEBOX, and free up the thread.
>
> With that in mind, add a client module parameter, 'delay_retrans', that
> specifies how many times a 'softerr' mounted NFSv4 filesystem should
> retry before returning EAGAIN.
> In order to avoid disrupting existing setups, the feature is disabled by
> default, however it can be enabled by specifying a positive value for
> the new parameter.
>
> Trond Myklebust (2):
>   NFSv4: Add a parameter to limit the number of retries after
>     NFS4ERR_DELAY
>   NFSv4/pnfs: Allow layoutget to return EAGAIN for softerr mounts
>
>  .../admin-guide/kernel-parameters.txt         |  7 +++
>  fs/nfs/nfs4_fs.h                              |  2 +
>  fs/nfs/nfs4proc.c                             | 43 +++++++++++++++----
>  fs/nfs/pnfs.c                                 |  8 +++-
>  fs/nfs/pnfs.h                                 |  5 ++-
>  fs/nfs/super.c                                |  8 +++-
>  fs/nfs/write.c                                |  2 +
>  7 files changed, 63 insertions(+), 12 deletions(-)
>
> --
> 2.41.0
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-09-12  9:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-11 18:50 [PATCH 0/2] NFSv4.x client improvements for knfsd re-exporting trondmy
2023-09-11 18:50 ` [PATCH 1/2] NFSv4: Add a parameter to limit the number of retries after NFS4ERR_DELAY trondmy
2023-09-11 18:50   ` [PATCH 2/2] NFSv4/pnfs: Allow layoutget to return EAGAIN for softerr mounts trondmy
2023-09-12  9:40 ` [PATCH 0/2] NFSv4.x client improvements for knfsd re-exporting Daire Byrne

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).