linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/28] Fix up soft mounts for NFSv4.x
@ 2019-03-29 21:59 Trond Myklebust
  2019-03-29 21:59 ` [PATCH v2 01/28] SUNRPC: Fix up task signalling Trond Myklebust
  2019-04-01 16:54 ` [PATCH v2 00/28] Fix up soft mounts for NFSv4.x Olga Kornievskaia
  0 siblings, 2 replies; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

This patchset aims to make soft mounts a viable option for NFSv4 clients
by minimising the risk of false positive timeouts, while allowing for
faster failover of reads and writes once a timeout is actually observed.

The patches rely on the NFS server correctly implementing the contract
specified in RFC7530 section 3.1.1 with respect to not dropping requests
while the transport connection is up. When this is the case, the client
can safely assume that if the request has not received a reply after
transmitting a RPC request, it is not because the request was dropped,
but rather is due to congestion, or slow processing on the server.
IOW: as long as the connection remains up, there is no need for requests
to time out.

The patches break down roughly as follows:
- A set of patches to clean up the RPC engine timeouts, and ensure they
  are accurate.
- A set of patches to change the 'soft' mount semantics for NFSv4.x.
- A set of patches to add a new 'softerr' mount option that works like
  soft, but explicitly signals timeouts using the ETIMEDOUT error code
  rather than using EIO. This allows applications to tune their
  behaviour (e.g. by failing over to a different server) if a timeout
  occurs.
- A set of patches to change the NFS error reporting so that it matches
  that of local filesystems w.r.t. guarantees that filesystem errors are
  seen once and once only.
- A patch to ensure the safe interruption of NFS4ERR_DELAYed operations
- A patch to ensure that pNFS operations can be forced to break out
  of layout error cycles after a certain number of retries.
- A few cleanups...

-------
Changes since v1:
- Change NFSv4 soft timeout condition to prevent all requests from
  timing out when the connection is still up, instead of just the
  ones that have been sent.
- RPC queue timer cleanups
- Ratelimit the "server not responding" messages


*** BLURB HERE ***

Trond Myklebust (28):
  SUNRPC: Fix up task signalling
  SUNRPC: Refactor rpc_restart_call/rpc_restart_call_prepare
  SUNRPC: Refactor xprt_request_wait_receive()
  SUNRPC: Refactor rpc_sleep_on()
  SUNRPC: Remove unused argument 'action' from rpc_sleep_on_priority()
  SUNRPC: Add function rpc_sleep_on_timeout()
  SUNRPC: Fix up tracking of timeouts
  SUNRPC: Simplify queue timeouts using timer_reduce()
  SUNRPC: Declare RPC timers as TIMER_DEFERRABLE
  SUNRPC: Ensure that the transport layer respect major timeouts
  SUNRPC: Add tracking of RPC level errors
  SUNRPC: Make "no retrans timeout" soft tasks behave like softconn for
    timeouts
  SUNRPC: Start the first major timeout calculation at task creation
  SUNRPC: Ensure to ratelimit the "server not responding" syslog
    messages
  SUNRPC: Add the 'softerr' rpc_client flag
  NFS: Consider ETIMEDOUT to be a fatal error
  NFS: Move internal constants out of uapi/linux/nfs_mount.h
  NFS: Add a mount option "softerr" to allow clients to see ETIMEDOUT
    errors
  NFS: Don't interrupt file writeout due to fatal errors
  NFS: Don't call generic_error_remove_page() while holding locks
  NFS: Don't inadvertently clear writeback errors
  NFS: Replace custom error reporting mechanism with generic one
  NFS: Fix up NFS I/O subrequest creation
  NFS: Remove unused argument from nfs_create_request()
  pNFS: Add tracking to limit the number of pNFS retries
  NFS: Allow signal interruption of NFS4ERR_DELAYed operations
  NFS: Add a helper to return a pointer to the open context of a struct
    nfs_page
  NFS: Remove redundant open context from nfs_page

 fs/lockd/clntproc.c                        |   4 +-
 fs/nfs/client.c                            |   2 +
 fs/nfs/direct.c                            |  11 +-
 fs/nfs/file.c                              |  31 +---
 fs/nfs/filelayout/filelayout.c             |   4 +-
 fs/nfs/flexfilelayout/flexfilelayout.c     |  14 +-
 fs/nfs/internal.h                          |   7 +-
 fs/nfs/nfs4_fs.h                           |   1 +
 fs/nfs/nfs4file.c                          |   2 +-
 fs/nfs/nfs4proc.c                          | 159 +++++++++++++++------
 fs/nfs/pagelist.c                          | 122 +++++++++-------
 fs/nfs/pnfs.c                              |   4 +-
 fs/nfs/pnfs.h                              |   4 +-
 fs/nfs/read.c                              |   6 +-
 fs/nfs/super.c                             |  15 +-
 fs/nfs/write.c                             |  67 +++++----
 fs/nfsd/nfs4callback.c                     |   4 +-
 include/linux/nfs_fs.h                     |   1 -
 include/linux/nfs_fs_sb.h                  |  10 ++
 include/linux/nfs_page.h                   |  12 +-
 include/linux/sunrpc/clnt.h                |   2 +
 include/linux/sunrpc/sched.h               |  20 ++-
 include/linux/sunrpc/xprt.h                |   6 +-
 include/trace/events/sunrpc.h              |   8 +-
 include/uapi/linux/nfs_mount.h             |   9 --
 net/sunrpc/auth_gss/auth_gss.c             |   5 +-
 net/sunrpc/clnt.c                          | 116 +++++++++------
 net/sunrpc/debugfs.c                       |   2 +-
 net/sunrpc/rpcb_clnt.c                     |   3 +-
 net/sunrpc/sched.c                         | 158 +++++++++++++++-----
 net/sunrpc/xprt.c                          | 150 ++++++++++++-------
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |   2 +-
 net/sunrpc/xprtrdma/transport.c            |   2 +-
 net/sunrpc/xprtsock.c                      |   9 +-
 34 files changed, 631 insertions(+), 341 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH v2 01/28] SUNRPC: Fix up task signalling
  2019-03-29 21:59 [PATCH v2 00/28] Fix up soft mounts for NFSv4.x Trond Myklebust
@ 2019-03-29 21:59 ` Trond Myklebust
  2019-03-29 21:59   ` [PATCH v2 02/28] SUNRPC: Refactor rpc_restart_call/rpc_restart_call_prepare Trond Myklebust
  2019-04-01 16:54 ` [PATCH v2 00/28] Fix up soft mounts for NFSv4.x Olga Kornievskaia
  1 sibling, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

The RPC_TASK_KILLED flag should really not be set from another context
because it can clobber data in the struct task when task->tk_flags is
changed non-atomically.
Let's therefore swap out RPC_TASK_KILLED with an atomic flag, and add
a function to set that flag and safely wake up the task.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/lockd/clntproc.c           |  4 ++--
 fs/nfsd/nfs4callback.c        |  4 ++--
 include/linux/sunrpc/sched.h  |  6 ++++--
 include/trace/events/sunrpc.h |  6 +++---
 net/sunrpc/clnt.c             | 14 ++------------
 net/sunrpc/sched.c            | 28 +++++++++++++++++++++++-----
 net/sunrpc/xprt.c             |  4 ++++
 7 files changed, 40 insertions(+), 26 deletions(-)

diff --git a/fs/lockd/clntproc.c b/fs/lockd/clntproc.c
index e8a004097d18..d9c32d1a20c0 100644
--- a/fs/lockd/clntproc.c
+++ b/fs/lockd/clntproc.c
@@ -715,7 +715,7 @@ static void nlmclnt_unlock_callback(struct rpc_task *task, void *data)
 	struct nlm_rqst	*req = data;
 	u32 status = ntohl(req->a_res.status);
 
-	if (RPC_ASSASSINATED(task))
+	if (RPC_SIGNALLED(task))
 		goto die;
 
 	if (task->tk_status < 0) {
@@ -783,7 +783,7 @@ static void nlmclnt_cancel_callback(struct rpc_task *task, void *data)
 	struct nlm_rqst	*req = data;
 	u32 status = ntohl(req->a_res.status);
 
-	if (RPC_ASSASSINATED(task))
+	if (RPC_SIGNALLED(task))
 		goto die;
 
 	if (task->tk_status < 0) {
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c
index d219159b98af..f7494be8dbe2 100644
--- a/fs/nfsd/nfs4callback.c
+++ b/fs/nfsd/nfs4callback.c
@@ -1032,7 +1032,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
 		 * the submission code will error out, so we don't need to
 		 * handle that case here.
 		 */
-		if (task->tk_flags & RPC_TASK_KILLED)
+		if (RPC_SIGNALLED(task))
 			goto need_restart;
 
 		return true;
@@ -1081,7 +1081,7 @@ static bool nfsd4_cb_sequence_done(struct rpc_task *task, struct nfsd4_callback
 	dprintk("%s: freed slot, new seqid=%d\n", __func__,
 		clp->cl_cb_session->se_cb_seq_nr);
 
-	if (task->tk_flags & RPC_TASK_KILLED)
+	if (RPC_SIGNALLED(task))
 		goto need_restart;
 out:
 	return ret;
diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index ec861cd0cfe8..852ca0f2c56c 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -125,7 +125,6 @@ struct rpc_task_setup {
 #define RPC_CALL_MAJORSEEN	0x0020		/* major timeout seen */
 #define RPC_TASK_ROOTCREDS	0x0040		/* force root creds */
 #define RPC_TASK_DYNAMIC	0x0080		/* task was kmalloc'ed */
-#define RPC_TASK_KILLED		0x0100		/* task was killed */
 #define RPC_TASK_SOFT		0x0200		/* Use soft timeouts */
 #define RPC_TASK_SOFTCONN	0x0400		/* Fail if can't connect */
 #define RPC_TASK_SENT		0x0800		/* message was sent */
@@ -135,7 +134,6 @@ struct rpc_task_setup {
 
 #define RPC_IS_ASYNC(t)		((t)->tk_flags & RPC_TASK_ASYNC)
 #define RPC_IS_SWAPPER(t)	((t)->tk_flags & RPC_TASK_SWAPPER)
-#define RPC_ASSASSINATED(t)	((t)->tk_flags & RPC_TASK_KILLED)
 #define RPC_IS_SOFT(t)		((t)->tk_flags & (RPC_TASK_SOFT|RPC_TASK_TIMEOUT))
 #define RPC_IS_SOFTCONN(t)	((t)->tk_flags & RPC_TASK_SOFTCONN)
 #define RPC_WAS_SENT(t)		((t)->tk_flags & RPC_TASK_SENT)
@@ -146,6 +144,7 @@ struct rpc_task_setup {
 #define RPC_TASK_NEED_XMIT	3
 #define RPC_TASK_NEED_RECV	4
 #define RPC_TASK_MSG_PIN_WAIT	5
+#define RPC_TASK_SIGNALLED	6
 
 #define RPC_IS_RUNNING(t)	test_bit(RPC_TASK_RUNNING, &(t)->tk_runstate)
 #define rpc_set_running(t)	set_bit(RPC_TASK_RUNNING, &(t)->tk_runstate)
@@ -169,6 +168,8 @@ struct rpc_task_setup {
 
 #define RPC_IS_ACTIVATED(t)	test_bit(RPC_TASK_ACTIVE, &(t)->tk_runstate)
 
+#define RPC_SIGNALLED(t)	test_bit(RPC_TASK_SIGNALLED, &(t)->tk_runstate)
+
 /*
  * Task priorities.
  * Note: if you change these, you must also change
@@ -217,6 +218,7 @@ struct rpc_task *rpc_run_task(const struct rpc_task_setup *);
 struct rpc_task *rpc_run_bc_task(struct rpc_rqst *req);
 void		rpc_put_task(struct rpc_task *);
 void		rpc_put_task_async(struct rpc_task *);
+void		rpc_signal_task(struct rpc_task *);
 void		rpc_exit_task(struct rpc_task *);
 void		rpc_exit(struct rpc_task *, int);
 void		rpc_release_calldata(const struct rpc_call_ops *, void *);
diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
index 7e899e635d33..5e3b77d9daa7 100644
--- a/include/trace/events/sunrpc.h
+++ b/include/trace/events/sunrpc.h
@@ -82,7 +82,6 @@ TRACE_DEFINE_ENUM(RPC_TASK_SWAPPER);
 TRACE_DEFINE_ENUM(RPC_CALL_MAJORSEEN);
 TRACE_DEFINE_ENUM(RPC_TASK_ROOTCREDS);
 TRACE_DEFINE_ENUM(RPC_TASK_DYNAMIC);
-TRACE_DEFINE_ENUM(RPC_TASK_KILLED);
 TRACE_DEFINE_ENUM(RPC_TASK_SOFT);
 TRACE_DEFINE_ENUM(RPC_TASK_SOFTCONN);
 TRACE_DEFINE_ENUM(RPC_TASK_SENT);
@@ -97,7 +96,6 @@ TRACE_DEFINE_ENUM(RPC_TASK_NO_RETRANS_TIMEOUT);
 		{ RPC_CALL_MAJORSEEN, "MAJORSEEN" },			\
 		{ RPC_TASK_ROOTCREDS, "ROOTCREDS" },			\
 		{ RPC_TASK_DYNAMIC, "DYNAMIC" },			\
-		{ RPC_TASK_KILLED, "KILLED" },				\
 		{ RPC_TASK_SOFT, "SOFT" },				\
 		{ RPC_TASK_SOFTCONN, "SOFTCONN" },			\
 		{ RPC_TASK_SENT, "SENT" },				\
@@ -111,6 +109,7 @@ TRACE_DEFINE_ENUM(RPC_TASK_ACTIVE);
 TRACE_DEFINE_ENUM(RPC_TASK_NEED_XMIT);
 TRACE_DEFINE_ENUM(RPC_TASK_NEED_RECV);
 TRACE_DEFINE_ENUM(RPC_TASK_MSG_PIN_WAIT);
+TRACE_DEFINE_ENUM(RPC_TASK_SIGNALLED);
 
 #define rpc_show_runstate(flags)					\
 	__print_flags(flags, "|",					\
@@ -119,7 +118,8 @@ TRACE_DEFINE_ENUM(RPC_TASK_MSG_PIN_WAIT);
 		{ (1UL << RPC_TASK_ACTIVE), "ACTIVE" },			\
 		{ (1UL << RPC_TASK_NEED_XMIT), "NEED_XMIT" },		\
 		{ (1UL << RPC_TASK_NEED_RECV), "NEED_RECV" },		\
-		{ (1UL << RPC_TASK_MSG_PIN_WAIT), "MSG_PIN_WAIT" })
+		{ (1UL << RPC_TASK_MSG_PIN_WAIT), "MSG_PIN_WAIT" },	\
+		{ (1UL << RPC_TASK_SIGNALLED), "SIGNALLED" })
 
 DECLARE_EVENT_CLASS(rpc_task_running,
 
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 187d10443a15..30f5995c6a68 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -827,14 +827,8 @@ void rpc_killall_tasks(struct rpc_clnt *clnt)
 	 * Spin lock all_tasks to prevent changes...
 	 */
 	spin_lock(&clnt->cl_lock);
-	list_for_each_entry(rovr, &clnt->cl_tasks, tk_task) {
-		if (!RPC_IS_ACTIVATED(rovr))
-			continue;
-		if (!(rovr->tk_flags & RPC_TASK_KILLED)) {
-			rovr->tk_flags |= RPC_TASK_KILLED;
-			rpc_exit(rovr, -EIO);
-		}
-	}
+	list_for_each_entry(rovr, &clnt->cl_tasks, tk_task)
+		rpc_signal_task(rovr);
 	spin_unlock(&clnt->cl_lock);
 }
 EXPORT_SYMBOL_GPL(rpc_killall_tasks);
@@ -1477,8 +1471,6 @@ EXPORT_SYMBOL_GPL(rpc_force_rebind);
 int
 rpc_restart_call_prepare(struct rpc_task *task)
 {
-	if (RPC_ASSASSINATED(task))
-		return 0;
 	task->tk_action = call_start;
 	task->tk_status = 0;
 	if (task->tk_ops->rpc_call_prepare != NULL)
@@ -1494,8 +1486,6 @@ EXPORT_SYMBOL_GPL(rpc_restart_call_prepare);
 int
 rpc_restart_call(struct rpc_task *task)
 {
-	if (RPC_ASSASSINATED(task))
-		return 0;
 	task->tk_action = call_start;
 	task->tk_status = 0;
 	return 1;
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 28956c70100a..3d6cb91ba598 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -759,8 +759,7 @@ static void
 rpc_reset_task_statistics(struct rpc_task *task)
 {
 	task->tk_timeouts = 0;
-	task->tk_flags &= ~(RPC_CALL_MAJORSEEN|RPC_TASK_KILLED|RPC_TASK_SENT);
-
+	task->tk_flags &= ~(RPC_CALL_MAJORSEEN|RPC_TASK_SENT);
 	rpc_init_task_statistics(task);
 }
 
@@ -773,7 +772,6 @@ void rpc_exit_task(struct rpc_task *task)
 	if (task->tk_ops->rpc_call_done != NULL) {
 		task->tk_ops->rpc_call_done(task, task->tk_calldata);
 		if (task->tk_action != NULL) {
-			WARN_ON(RPC_ASSASSINATED(task));
 			/* Always release the RPC slot and buffer memory */
 			xprt_release(task);
 			rpc_reset_task_statistics(task);
@@ -781,6 +779,19 @@ void rpc_exit_task(struct rpc_task *task)
 	}
 }
 
+void rpc_signal_task(struct rpc_task *task)
+{
+	struct rpc_wait_queue *queue;
+
+	if (!RPC_IS_ACTIVATED(task))
+		return;
+	set_bit(RPC_TASK_SIGNALLED, &task->tk_runstate);
+	smp_mb__after_atomic();
+	queue = READ_ONCE(task->tk_waitqueue);
+	if (queue)
+		rpc_wake_up_queued_task_set_status(queue, task, -ERESTARTSYS);
+}
+
 void rpc_exit(struct rpc_task *task, int status)
 {
 	task->tk_status = status;
@@ -836,6 +847,13 @@ static void __rpc_execute(struct rpc_task *task)
 		 */
 		if (!RPC_IS_QUEUED(task))
 			continue;
+
+		/*
+		 * Signalled tasks should exit rather than sleep.
+		 */
+		if (RPC_SIGNALLED(task))
+			rpc_exit(task, -ERESTARTSYS);
+
 		/*
 		 * The queue->lock protects against races with
 		 * rpc_make_runnable().
@@ -861,7 +879,7 @@ static void __rpc_execute(struct rpc_task *task)
 		status = out_of_line_wait_on_bit(&task->tk_runstate,
 				RPC_TASK_QUEUED, rpc_wait_bit_killable,
 				TASK_KILLABLE);
-		if (status == -ERESTARTSYS) {
+		if (status < 0) {
 			/*
 			 * When a sync task receives a signal, it exits with
 			 * -ERESTARTSYS. In order to catch any callbacks that
@@ -869,7 +887,7 @@ static void __rpc_execute(struct rpc_task *task)
 			 * break the loop here, but go around once more.
 			 */
 			dprintk("RPC: %5u got signal\n", task->tk_pid);
-			task->tk_flags |= RPC_TASK_KILLED;
+			set_bit(RPC_TASK_SIGNALLED, &task->tk_runstate);
 			rpc_exit(task, -ERESTARTSYS);
 		}
 		dprintk("RPC: %5u sync task resuming\n", task->tk_pid);
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index d7117d241460..3a4156cb0134 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1337,6 +1337,10 @@ xprt_request_transmit(struct rpc_rqst *req, struct rpc_task *snd_task)
 			if (status < 0)
 				goto out_dequeue;
 		}
+		if (RPC_SIGNALLED(task)) {
+			status = -ERESTARTSYS;
+			goto out_dequeue;
+		}
 	}
 
 	/*
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 02/28] SUNRPC: Refactor rpc_restart_call/rpc_restart_call_prepare
  2019-03-29 21:59 ` [PATCH v2 01/28] SUNRPC: Fix up task signalling Trond Myklebust
@ 2019-03-29 21:59   ` Trond Myklebust
  2019-03-29 21:59     ` [PATCH v2 03/28] SUNRPC: Refactor xprt_request_wait_receive() Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

Clean up.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 net/sunrpc/clnt.c | 30 +++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 30f5995c6a68..88b4685e7977 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1464,20 +1464,13 @@ void rpc_force_rebind(struct rpc_clnt *clnt)
 }
 EXPORT_SYMBOL_GPL(rpc_force_rebind);
 
-/*
- * Restart an (async) RPC call from the call_prepare state.
- * Usually called from within the exit handler.
- */
-int
-rpc_restart_call_prepare(struct rpc_task *task)
+static int
+__rpc_restart_call(struct rpc_task *task, void (*action)(struct rpc_task *))
 {
-	task->tk_action = call_start;
 	task->tk_status = 0;
-	if (task->tk_ops->rpc_call_prepare != NULL)
-		task->tk_action = rpc_prepare_task;
+	task->tk_action = action;
 	return 1;
 }
-EXPORT_SYMBOL_GPL(rpc_restart_call_prepare);
 
 /*
  * Restart an (async) RPC call. Usually called from within the
@@ -1486,12 +1479,23 @@ EXPORT_SYMBOL_GPL(rpc_restart_call_prepare);
 int
 rpc_restart_call(struct rpc_task *task)
 {
-	task->tk_action = call_start;
-	task->tk_status = 0;
-	return 1;
+	return __rpc_restart_call(task, call_start);
 }
 EXPORT_SYMBOL_GPL(rpc_restart_call);
 
+/*
+ * Restart an (async) RPC call from the call_prepare state.
+ * Usually called from within the exit handler.
+ */
+int
+rpc_restart_call_prepare(struct rpc_task *task)
+{
+	if (task->tk_ops->rpc_call_prepare != NULL)
+		return __rpc_restart_call(task, rpc_prepare_task);
+	return rpc_restart_call(task);
+}
+EXPORT_SYMBOL_GPL(rpc_restart_call_prepare);
+
 const char
 *rpc_proc_name(const struct rpc_task *task)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 03/28] SUNRPC: Refactor xprt_request_wait_receive()
  2019-03-29 21:59   ` [PATCH v2 02/28] SUNRPC: Refactor rpc_restart_call/rpc_restart_call_prepare Trond Myklebust
@ 2019-03-29 21:59     ` Trond Myklebust
  2019-03-29 21:59       ` [PATCH v2 04/28] SUNRPC: Refactor rpc_sleep_on() Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

Convert the transport callback to actually put the request to sleep
instead of just setting a timeout. This is in preparation for
rpc_sleep_on_timeout().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 include/linux/sunrpc/xprt.h                |  6 +-
 net/sunrpc/xprt.c                          | 79 ++++++++++++----------
 net/sunrpc/xprtrdma/svc_rdma_backchannel.c |  2 +-
 net/sunrpc/xprtrdma/transport.c            |  2 +-
 net/sunrpc/xprtsock.c                      |  8 +--
 5 files changed, 51 insertions(+), 46 deletions(-)

diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h
index 3a391544299e..a6d9fce7f20e 100644
--- a/include/linux/sunrpc/xprt.h
+++ b/include/linux/sunrpc/xprt.h
@@ -143,7 +143,7 @@ struct rpc_xprt_ops {
 	void		(*buf_free)(struct rpc_task *task);
 	void		(*prepare_request)(struct rpc_rqst *req);
 	int		(*send_request)(struct rpc_rqst *req);
-	void		(*set_retrans_timeout)(struct rpc_task *task);
+	void		(*wait_for_reply_request)(struct rpc_task *task);
 	void		(*timer)(struct rpc_xprt *xprt, struct rpc_task *task);
 	void		(*release_request)(struct rpc_task *task);
 	void		(*close)(struct rpc_xprt *xprt);
@@ -378,8 +378,8 @@ xprt_disable_swap(struct rpc_xprt *xprt)
 int			xprt_register_transport(struct xprt_class *type);
 int			xprt_unregister_transport(struct xprt_class *type);
 int			xprt_load_transport(const char *);
-void			xprt_set_retrans_timeout_def(struct rpc_task *task);
-void			xprt_set_retrans_timeout_rtt(struct rpc_task *task);
+void			xprt_wait_for_reply_request_def(struct rpc_task *task);
+void			xprt_wait_for_reply_request_rtt(struct rpc_task *task);
 void			xprt_wake_pending_tasks(struct rpc_xprt *xprt, int status);
 void			xprt_wait_for_buffer_space(struct rpc_xprt *xprt);
 bool			xprt_write_space(struct rpc_xprt *xprt);
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 3a4156cb0134..5afffa669d04 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -554,41 +554,6 @@ bool xprt_write_space(struct rpc_xprt *xprt)
 }
 EXPORT_SYMBOL_GPL(xprt_write_space);
 
-/**
- * xprt_set_retrans_timeout_def - set a request's retransmit timeout
- * @task: task whose timeout is to be set
- *
- * Set a request's retransmit timeout based on the transport's
- * default timeout parameters.  Used by transports that don't adjust
- * the retransmit timeout based on round-trip time estimation.
- */
-void xprt_set_retrans_timeout_def(struct rpc_task *task)
-{
-	task->tk_timeout = task->tk_rqstp->rq_timeout;
-}
-EXPORT_SYMBOL_GPL(xprt_set_retrans_timeout_def);
-
-/**
- * xprt_set_retrans_timeout_rtt - set a request's retransmit timeout
- * @task: task whose timeout is to be set
- *
- * Set a request's retransmit timeout using the RTT estimator.
- */
-void xprt_set_retrans_timeout_rtt(struct rpc_task *task)
-{
-	int timer = task->tk_msg.rpc_proc->p_timer;
-	struct rpc_clnt *clnt = task->tk_client;
-	struct rpc_rtt *rtt = clnt->cl_rtt;
-	struct rpc_rqst *req = task->tk_rqstp;
-	unsigned long max_timeout = clnt->cl_timeout->to_maxval;
-
-	task->tk_timeout = rpc_calc_rto(rtt, timer);
-	task->tk_timeout <<= rpc_ntimeo(rtt, timer) + req->rq_retries;
-	if (task->tk_timeout > max_timeout || task->tk_timeout == 0)
-		task->tk_timeout = max_timeout;
-}
-EXPORT_SYMBOL_GPL(xprt_set_retrans_timeout_rtt);
-
 static void xprt_reset_majortimeo(struct rpc_rqst *req)
 {
 	const struct rpc_timeout *to = req->rq_task->tk_client->cl_timeout;
@@ -1102,6 +1067,47 @@ static void xprt_timer(struct rpc_task *task)
 		task->tk_status = 0;
 }
 
+/**
+ * xprt_wait_for_reply_request_def - wait for reply
+ * @task: pointer to rpc_task
+ *
+ * Set a request's retransmit timeout based on the transport's
+ * default timeout parameters.  Used by transports that don't adjust
+ * the retransmit timeout based on round-trip time estimation,
+ * and put the task to sleep on the pending queue.
+ */
+void xprt_wait_for_reply_request_def(struct rpc_task *task)
+{
+	struct rpc_rqst *req = task->tk_rqstp;
+
+	task->tk_timeout = req->rq_timeout;
+	rpc_sleep_on(&req->rq_xprt->pending, task, xprt_timer);
+}
+EXPORT_SYMBOL_GPL(xprt_wait_for_reply_request_def);
+
+/**
+ * xprt_wait_for_reply_request_rtt - wait for reply using RTT estimator
+ * @task: pointer to rpc_task
+ *
+ * Set a request's retransmit timeout using the RTT estimator,
+ * and put the task to sleep on the pending queue.
+ */
+void xprt_wait_for_reply_request_rtt(struct rpc_task *task)
+{
+	int timer = task->tk_msg.rpc_proc->p_timer;
+	struct rpc_clnt *clnt = task->tk_client;
+	struct rpc_rtt *rtt = clnt->cl_rtt;
+	struct rpc_rqst *req = task->tk_rqstp;
+	unsigned long max_timeout = clnt->cl_timeout->to_maxval;
+
+	task->tk_timeout = rpc_calc_rto(rtt, timer);
+	task->tk_timeout <<= rpc_ntimeo(rtt, timer) + req->rq_retries;
+	if (task->tk_timeout > max_timeout || task->tk_timeout == 0)
+		task->tk_timeout = max_timeout;
+	rpc_sleep_on(&req->rq_xprt->pending, task, xprt_timer);
+}
+EXPORT_SYMBOL_GPL(xprt_wait_for_reply_request_rtt);
+
 /**
  * xprt_request_wait_receive - wait for the reply to an RPC request
  * @task: RPC task about to send a request
@@ -1121,8 +1127,7 @@ void xprt_request_wait_receive(struct rpc_task *task)
 	 */
 	spin_lock(&xprt->queue_lock);
 	if (test_bit(RPC_TASK_NEED_RECV, &task->tk_runstate)) {
-		xprt->ops->set_retrans_timeout(task);
-		rpc_sleep_on(&xprt->pending, task, xprt_timer);
+		xprt->ops->wait_for_reply_request(task);
 		/*
 		 * Send an extra queue wakeup call if the
 		 * connection was dropped in case the call to
diff --git a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
index 907464c2a9f0..bed57d8b5c19 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_backchannel.c
@@ -261,7 +261,7 @@ static const struct rpc_xprt_ops xprt_rdma_bc_procs = {
 	.buf_alloc		= xprt_rdma_bc_allocate,
 	.buf_free		= xprt_rdma_bc_free,
 	.send_request		= xprt_rdma_bc_send_request,
-	.set_retrans_timeout	= xprt_set_retrans_timeout_def,
+	.wait_for_reply_request	= xprt_wait_for_reply_request_def,
 	.close			= xprt_rdma_bc_close,
 	.destroy		= xprt_rdma_bc_put,
 	.print_stats		= xprt_rdma_print_stats
diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c
index 5d261353bd90..7e73abe01cfe 100644
--- a/net/sunrpc/xprtrdma/transport.c
+++ b/net/sunrpc/xprtrdma/transport.c
@@ -815,7 +815,7 @@ static const struct rpc_xprt_ops xprt_rdma_procs = {
 	.alloc_slot		= xprt_rdma_alloc_slot,
 	.free_slot		= xprt_rdma_free_slot,
 	.release_request	= xprt_release_rqst_cong,       /* ditto */
-	.set_retrans_timeout	= xprt_set_retrans_timeout_def, /* ditto */
+	.wait_for_reply_request	= xprt_wait_for_reply_request_def, /* ditto */
 	.timer			= xprt_rdma_timer,
 	.rpcbind		= rpcb_getport_async,	/* sunrpc/rpcb_clnt.c */
 	.set_port		= xprt_rdma_set_port,
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 732d4b57411a..b4b4b8db143c 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2690,7 +2690,7 @@ static const struct rpc_xprt_ops xs_local_ops = {
 	.buf_free		= rpc_free,
 	.prepare_request	= xs_stream_prepare_request,
 	.send_request		= xs_local_send_request,
-	.set_retrans_timeout	= xprt_set_retrans_timeout_def,
+	.wait_for_reply_request	= xprt_wait_for_reply_request_def,
 	.close			= xs_close,
 	.destroy		= xs_destroy,
 	.print_stats		= xs_local_print_stats,
@@ -2710,7 +2710,7 @@ static const struct rpc_xprt_ops xs_udp_ops = {
 	.buf_alloc		= rpc_malloc,
 	.buf_free		= rpc_free,
 	.send_request		= xs_udp_send_request,
-	.set_retrans_timeout	= xprt_set_retrans_timeout_rtt,
+	.wait_for_reply_request	= xprt_wait_for_reply_request_rtt,
 	.timer			= xs_udp_timer,
 	.release_request	= xprt_release_rqst_cong,
 	.close			= xs_close,
@@ -2733,7 +2733,7 @@ static const struct rpc_xprt_ops xs_tcp_ops = {
 	.buf_free		= rpc_free,
 	.prepare_request	= xs_stream_prepare_request,
 	.send_request		= xs_tcp_send_request,
-	.set_retrans_timeout	= xprt_set_retrans_timeout_def,
+	.wait_for_reply_request	= xprt_wait_for_reply_request_def,
 	.close			= xs_tcp_shutdown,
 	.destroy		= xs_destroy,
 	.set_connect_timeout	= xs_tcp_set_connect_timeout,
@@ -2761,7 +2761,7 @@ static const struct rpc_xprt_ops bc_tcp_ops = {
 	.buf_alloc		= bc_malloc,
 	.buf_free		= bc_free,
 	.send_request		= bc_send_request,
-	.set_retrans_timeout	= xprt_set_retrans_timeout_def,
+	.wait_for_reply_request	= xprt_wait_for_reply_request_def,
 	.close			= bc_close,
 	.destroy		= bc_destroy,
 	.print_stats		= xs_tcp_print_stats,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 04/28] SUNRPC: Refactor rpc_sleep_on()
  2019-03-29 21:59     ` [PATCH v2 03/28] SUNRPC: Refactor xprt_request_wait_receive() Trond Myklebust
@ 2019-03-29 21:59       ` Trond Myklebust
  2019-03-29 21:59         ` [PATCH v2 05/28] SUNRPC: Remove unused argument 'action' from rpc_sleep_on_priority() Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

rpc_sleep_on() does not need to set the task->tk_callback under the
queue lock, so move that out.
Also refactor the check for whether the task is active.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 net/sunrpc/sched.c | 40 ++++++++++++++++++++++++----------------
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 3d6cb91ba598..8e96a841dd11 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -362,7 +362,6 @@ static void rpc_make_runnable(struct workqueue_struct *wq,
  */
 static void __rpc_sleep_on_priority(struct rpc_wait_queue *q,
 		struct rpc_task *task,
-		rpc_action action,
 		unsigned char queue_priority)
 {
 	dprintk("RPC: %5u sleep_on(queue \"%s\" time %lu)\n",
@@ -372,27 +371,39 @@ static void __rpc_sleep_on_priority(struct rpc_wait_queue *q,
 
 	__rpc_add_wait_queue(q, task, queue_priority);
 
-	WARN_ON_ONCE(task->tk_callback != NULL);
-	task->tk_callback = action;
 	__rpc_add_timer(q, task);
 }
 
-void rpc_sleep_on(struct rpc_wait_queue *q, struct rpc_task *task,
-				rpc_action action)
+static void rpc_set_tk_callback(struct rpc_task *task, rpc_action action)
+{
+	if (action && !WARN_ON_ONCE(task->tk_callback != NULL))
+		task->tk_callback = action;
+}
+
+static bool rpc_sleep_check_activated(struct rpc_task *task)
 {
 	/* We shouldn't ever put an inactive task to sleep */
-	WARN_ON_ONCE(!RPC_IS_ACTIVATED(task));
-	if (!RPC_IS_ACTIVATED(task)) {
+	if (WARN_ON_ONCE(!RPC_IS_ACTIVATED(task))) {
 		task->tk_status = -EIO;
 		rpc_put_task_async(task);
-		return;
+		return false;
 	}
+	return true;
+}
+
+void rpc_sleep_on(struct rpc_wait_queue *q, struct rpc_task *task,
+				rpc_action action)
+{
+	if (!rpc_sleep_check_activated(task))
+		return;
+
+	rpc_set_tk_callback(task, action);
 
 	/*
 	 * Protect the queue operations.
 	 */
 	spin_lock_bh(&q->lock);
-	__rpc_sleep_on_priority(q, task, action, task->tk_priority);
+	__rpc_sleep_on_priority(q, task, task->tk_priority);
 	spin_unlock_bh(&q->lock);
 }
 EXPORT_SYMBOL_GPL(rpc_sleep_on);
@@ -400,19 +411,16 @@ EXPORT_SYMBOL_GPL(rpc_sleep_on);
 void rpc_sleep_on_priority(struct rpc_wait_queue *q, struct rpc_task *task,
 		rpc_action action, int priority)
 {
-	/* We shouldn't ever put an inactive task to sleep */
-	WARN_ON_ONCE(!RPC_IS_ACTIVATED(task));
-	if (!RPC_IS_ACTIVATED(task)) {
-		task->tk_status = -EIO;
-		rpc_put_task_async(task);
+	if (!rpc_sleep_check_activated(task))
 		return;
-	}
+
+	rpc_set_tk_callback(task, action);
 
 	/*
 	 * Protect the queue operations.
 	 */
 	spin_lock_bh(&q->lock);
-	__rpc_sleep_on_priority(q, task, action, priority - RPC_PRIORITY_LOW);
+	__rpc_sleep_on_priority(q, task, priority - RPC_PRIORITY_LOW);
 	spin_unlock_bh(&q->lock);
 }
 EXPORT_SYMBOL_GPL(rpc_sleep_on_priority);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 05/28] SUNRPC: Remove unused argument 'action' from rpc_sleep_on_priority()
  2019-03-29 21:59       ` [PATCH v2 04/28] SUNRPC: Refactor rpc_sleep_on() Trond Myklebust
@ 2019-03-29 21:59         ` Trond Myklebust
  2019-03-29 21:59           ` [PATCH v2 06/28] SUNRPC: Add function rpc_sleep_on_timeout() Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

None of the callers set the 'action' argument, so let's just remove it.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs4proc.c            | 2 +-
 include/linux/sunrpc/sched.h | 1 -
 net/sunrpc/sched.c           | 7 +++----
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 741ff8c9c6ed..222eccd8a715 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1008,7 +1008,7 @@ int nfs4_setup_sequence(struct nfs_client *client,
 out_sleep:
 	if (args->sa_privileged)
 		rpc_sleep_on_priority(&tbl->slot_tbl_waitq, task,
-				NULL, RPC_PRIORITY_PRIVILEGED);
+				RPC_PRIORITY_PRIVILEGED);
 	else
 		rpc_sleep_on(&tbl->slot_tbl_waitq, task, NULL);
 	spin_unlock(&tbl->slot_tbl_lock);
diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index 852ca0f2c56c..541b89d41afb 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -231,7 +231,6 @@ void		rpc_sleep_on(struct rpc_wait_queue *, struct rpc_task *,
 					rpc_action action);
 void		rpc_sleep_on_priority(struct rpc_wait_queue *,
 					struct rpc_task *,
-					rpc_action action,
 					int priority);
 void rpc_wake_up_queued_task_on_wq(struct workqueue_struct *wq,
 		struct rpc_wait_queue *queue,
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 8e96a841dd11..04170c08b2cf 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -409,18 +409,17 @@ void rpc_sleep_on(struct rpc_wait_queue *q, struct rpc_task *task,
 EXPORT_SYMBOL_GPL(rpc_sleep_on);
 
 void rpc_sleep_on_priority(struct rpc_wait_queue *q, struct rpc_task *task,
-		rpc_action action, int priority)
+		int priority)
 {
 	if (!rpc_sleep_check_activated(task))
 		return;
 
-	rpc_set_tk_callback(task, action);
-
+	priority -= RPC_PRIORITY_LOW;
 	/*
 	 * Protect the queue operations.
 	 */
 	spin_lock_bh(&q->lock);
-	__rpc_sleep_on_priority(q, task, priority - RPC_PRIORITY_LOW);
+	__rpc_sleep_on_priority(q, task, priority);
 	spin_unlock_bh(&q->lock);
 }
 EXPORT_SYMBOL_GPL(rpc_sleep_on_priority);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 06/28] SUNRPC: Add function rpc_sleep_on_timeout()
  2019-03-29 21:59         ` [PATCH v2 05/28] SUNRPC: Remove unused argument 'action' from rpc_sleep_on_priority() Trond Myklebust
@ 2019-03-29 21:59           ` Trond Myklebust
  2019-03-29 21:59             ` [PATCH v2 07/28] SUNRPC: Fix up tracking of timeouts Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

Clean up the RPC task sleep interfaces by replacing the task->tk_timeout
'hidden parameter' to rpc_sleep_on() with a new function that takes an
absolute timeout.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs4proc.c              | 18 ++++++---
 include/linux/sunrpc/sched.h   |  9 ++++-
 net/sunrpc/auth_gss/auth_gss.c |  5 +--
 net/sunrpc/clnt.c              |  1 -
 net/sunrpc/rpcb_clnt.c         |  3 +-
 net/sunrpc/sched.c             | 69 +++++++++++++++++++++++++++-------
 net/sunrpc/xprt.c              | 36 ++++++++++--------
 7 files changed, 101 insertions(+), 40 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 222eccd8a715..96b988689f0e 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -978,10 +978,8 @@ int nfs4_setup_sequence(struct nfs_client *client,
 	if (res->sr_slot != NULL)
 		goto out_start;
 
-	if (session) {
+	if (session)
 		tbl = &session->fc_slot_table;
-		task->tk_timeout = 0;
-	}
 
 	spin_lock(&tbl->slot_tbl_lock);
 	/* The state manager will wait until the slot table is empty */
@@ -990,9 +988,8 @@ int nfs4_setup_sequence(struct nfs_client *client,
 
 	slot = nfs4_alloc_slot(tbl);
 	if (IS_ERR(slot)) {
-		/* Try again in 1/4 second */
 		if (slot == ERR_PTR(-ENOMEM))
-			task->tk_timeout = HZ >> 2;
+			goto out_sleep_timeout;
 		goto out_sleep;
 	}
 	spin_unlock(&tbl->slot_tbl_lock);
@@ -1004,7 +1001,16 @@ int nfs4_setup_sequence(struct nfs_client *client,
 	nfs41_sequence_res_init(res);
 	rpc_call_start(task);
 	return 0;
-
+out_sleep_timeout:
+	/* Try again in 1/4 second */
+	if (args->sa_privileged)
+		rpc_sleep_on_priority_timeout(&tbl->slot_tbl_waitq, task,
+				jiffies + (HZ >> 2), RPC_PRIORITY_PRIVILEGED);
+	else
+		rpc_sleep_on_timeout(&tbl->slot_tbl_waitq, task,
+				NULL, jiffies + (HZ >> 2));
+	spin_unlock(&tbl->slot_tbl_lock);
+	return -EAGAIN;
 out_sleep:
 	if (args->sa_privileged)
 		rpc_sleep_on_priority(&tbl->slot_tbl_waitq, task,
diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index 541b89d41afb..81f5d93f7109 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -35,7 +35,6 @@ struct rpc_wait {
 	struct list_head	list;		/* wait queue links */
 	struct list_head	links;		/* Links to related tasks */
 	struct list_head	timer_list;	/* Timer list */
-	unsigned long		expires;
 };
 
 /*
@@ -227,8 +226,16 @@ void		rpc_execute(struct rpc_task *);
 void		rpc_init_priority_wait_queue(struct rpc_wait_queue *, const char *);
 void		rpc_init_wait_queue(struct rpc_wait_queue *, const char *);
 void		rpc_destroy_wait_queue(struct rpc_wait_queue *);
+void		rpc_sleep_on_timeout(struct rpc_wait_queue *queue,
+					struct rpc_task *task,
+					rpc_action action,
+					unsigned long timeout);
 void		rpc_sleep_on(struct rpc_wait_queue *, struct rpc_task *,
 					rpc_action action);
+void		rpc_sleep_on_priority_timeout(struct rpc_wait_queue *queue,
+					struct rpc_task *task,
+					unsigned long timeout,
+					int priority);
 void		rpc_sleep_on_priority(struct rpc_wait_queue *,
 					struct rpc_task *,
 					int priority);
diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
index 3fd56c0c90ae..c055edfec55e 100644
--- a/net/sunrpc/auth_gss/auth_gss.c
+++ b/net/sunrpc/auth_gss/auth_gss.c
@@ -581,8 +581,8 @@ gss_refresh_upcall(struct rpc_task *task)
 		/* XXX: warning on the first, under the assumption we
 		 * shouldn't normally hit this case on a refresh. */
 		warn_gssd();
-		task->tk_timeout = 15*HZ;
-		rpc_sleep_on(&pipe_version_rpc_waitqueue, task, NULL);
+		rpc_sleep_on_timeout(&pipe_version_rpc_waitqueue,
+				task, NULL, jiffies + (15 * HZ));
 		err = -EAGAIN;
 		goto out;
 	}
@@ -595,7 +595,6 @@ gss_refresh_upcall(struct rpc_task *task)
 	if (gss_cred->gc_upcall != NULL)
 		rpc_sleep_on(&gss_cred->gc_upcall->rpc_waitqueue, task, NULL);
 	else if (gss_msg->ctx == NULL && gss_msg->msg.errno >= 0) {
-		task->tk_timeout = 0;
 		gss_cred->gc_upcall = gss_msg;
 		/* gss_upcall_callback will release the reference to gss_upcall_msg */
 		refcount_inc(&gss_msg->count);
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 88b4685e7977..4d853b5420c8 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1869,7 +1869,6 @@ call_bind(struct rpc_task *task)
 	if (!xprt_prepare_transmit(task))
 		return;
 
-	task->tk_timeout = xprt->bind_timeout;
 	xprt->ops->rpcbind(task);
 }
 
diff --git a/net/sunrpc/rpcb_clnt.c b/net/sunrpc/rpcb_clnt.c
index 41a971ac1c63..18b0cf2a923f 100644
--- a/net/sunrpc/rpcb_clnt.c
+++ b/net/sunrpc/rpcb_clnt.c
@@ -694,7 +694,8 @@ void rpcb_getport_async(struct rpc_task *task)
 
 	/* Put self on the wait queue to ensure we get notified if
 	 * some other task is already attempting to bind the port */
-	rpc_sleep_on(&xprt->binding, task, NULL);
+	rpc_sleep_on_timeout(&xprt->binding, task,
+			NULL, jiffies + xprt->bind_timeout);
 
 	if (xprt_test_and_set_binding(xprt)) {
 		dprintk("RPC: %5u %s: waiting for another binder\n",
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 04170c08b2cf..7e0f7b83262f 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -66,7 +66,7 @@ struct workqueue_struct *xprtiod_workqueue __read_mostly;
 static void
 __rpc_disable_timer(struct rpc_wait_queue *queue, struct rpc_task *task)
 {
-	if (task->tk_timeout == 0)
+	if (list_empty(&task->u.tk_wait.timer_list))
 		return;
 	dprintk("RPC: %5u disabling timer\n", task->tk_pid);
 	task->tk_timeout = 0;
@@ -86,17 +86,15 @@ rpc_set_queue_timer(struct rpc_wait_queue *queue, unsigned long expires)
  * Set up a timer for the current task.
  */
 static void
-__rpc_add_timer(struct rpc_wait_queue *queue, struct rpc_task *task)
+__rpc_add_timer(struct rpc_wait_queue *queue, struct rpc_task *task,
+		unsigned long timeout)
 {
-	if (!task->tk_timeout)
-		return;
-
 	dprintk("RPC: %5u setting alarm for %u ms\n",
-		task->tk_pid, jiffies_to_msecs(task->tk_timeout));
+		task->tk_pid, jiffies_to_msecs(timeout - jiffies));
 
-	task->u.tk_wait.expires = jiffies + task->tk_timeout;
-	if (list_empty(&queue->timer_list.list) || time_before(task->u.tk_wait.expires, queue->timer_list.expires))
-		rpc_set_queue_timer(queue, task->u.tk_wait.expires);
+	task->tk_timeout = timeout;
+	if (list_empty(&queue->timer_list.list) || time_before(timeout, queue->timer_list.expires))
+		rpc_set_queue_timer(queue, timeout);
 	list_add(&task->u.tk_wait.timer_list, &queue->timer_list.list);
 }
 
@@ -188,6 +186,7 @@ static void __rpc_add_wait_queue(struct rpc_wait_queue *queue,
 	if (RPC_IS_QUEUED(task))
 		return;
 
+	INIT_LIST_HEAD(&task->u.tk_wait.timer_list);
 	if (RPC_IS_PRIORITY(queue))
 		__rpc_add_wait_queue_priority(queue, task, queue_priority);
 	else if (RPC_IS_SWAPPER(task))
@@ -371,7 +370,17 @@ static void __rpc_sleep_on_priority(struct rpc_wait_queue *q,
 
 	__rpc_add_wait_queue(q, task, queue_priority);
 
-	__rpc_add_timer(q, task);
+}
+
+static void __rpc_sleep_on_priority_timeout(struct rpc_wait_queue *q,
+		struct rpc_task *task, unsigned long timeout,
+		unsigned char queue_priority)
+{
+	if (time_is_after_jiffies(timeout)) {
+		__rpc_sleep_on_priority(q, task, queue_priority);
+		__rpc_add_timer(q, task, timeout);
+	} else
+		task->tk_status = -ETIMEDOUT;
 }
 
 static void rpc_set_tk_callback(struct rpc_task *task, rpc_action action)
@@ -391,6 +400,23 @@ static bool rpc_sleep_check_activated(struct rpc_task *task)
 	return true;
 }
 
+void rpc_sleep_on_timeout(struct rpc_wait_queue *q, struct rpc_task *task,
+				rpc_action action, unsigned long timeout)
+{
+	if (!rpc_sleep_check_activated(task))
+		return;
+
+	rpc_set_tk_callback(task, action);
+
+	/*
+	 * Protect the queue operations.
+	 */
+	spin_lock_bh(&q->lock);
+	__rpc_sleep_on_priority_timeout(q, task, timeout, task->tk_priority);
+	spin_unlock_bh(&q->lock);
+}
+EXPORT_SYMBOL_GPL(rpc_sleep_on_timeout);
+
 void rpc_sleep_on(struct rpc_wait_queue *q, struct rpc_task *task,
 				rpc_action action)
 {
@@ -399,6 +425,7 @@ void rpc_sleep_on(struct rpc_wait_queue *q, struct rpc_task *task,
 
 	rpc_set_tk_callback(task, action);
 
+	WARN_ON_ONCE(task->tk_timeout != 0);
 	/*
 	 * Protect the queue operations.
 	 */
@@ -408,12 +435,29 @@ void rpc_sleep_on(struct rpc_wait_queue *q, struct rpc_task *task,
 }
 EXPORT_SYMBOL_GPL(rpc_sleep_on);
 
+void rpc_sleep_on_priority_timeout(struct rpc_wait_queue *q,
+		struct rpc_task *task, unsigned long timeout, int priority)
+{
+	if (!rpc_sleep_check_activated(task))
+		return;
+
+	priority -= RPC_PRIORITY_LOW;
+	/*
+	 * Protect the queue operations.
+	 */
+	spin_lock_bh(&q->lock);
+	__rpc_sleep_on_priority_timeout(q, task, timeout, priority);
+	spin_unlock_bh(&q->lock);
+}
+EXPORT_SYMBOL_GPL(rpc_sleep_on_priority_timeout);
+
 void rpc_sleep_on_priority(struct rpc_wait_queue *q, struct rpc_task *task,
 		int priority)
 {
 	if (!rpc_sleep_check_activated(task))
 		return;
 
+	WARN_ON_ONCE(task->tk_timeout != 0);
 	priority -= RPC_PRIORITY_LOW;
 	/*
 	 * Protect the queue operations.
@@ -711,7 +755,7 @@ static void __rpc_queue_timer_fn(struct timer_list *t)
 	spin_lock(&queue->lock);
 	expires = now = jiffies;
 	list_for_each_entry_safe(task, n, &queue->timer_list.list, u.tk_wait.timer_list) {
-		timeo = task->u.tk_wait.expires;
+		timeo = task->tk_timeout;
 		if (time_after_eq(now, timeo)) {
 			dprintk("RPC: %5u timeout\n", task->tk_pid);
 			task->tk_status = -ETIMEDOUT;
@@ -737,8 +781,7 @@ static void __rpc_atrun(struct rpc_task *task)
  */
 void rpc_delay(struct rpc_task *task, unsigned long delay)
 {
-	task->tk_timeout = delay;
-	rpc_sleep_on(&delay_queue, task, __rpc_atrun);
+	rpc_sleep_on_timeout(&delay_queue, task, __rpc_atrun, jiffies + delay);
 }
 EXPORT_SYMBOL_GPL(rpc_delay);
 
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 5afffa669d04..7c3623b17493 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -209,9 +209,12 @@ int xprt_reserve_xprt(struct rpc_xprt *xprt, struct rpc_task *task)
 out_sleep:
 	dprintk("RPC: %5u failed to lock transport %p\n",
 			task->tk_pid, xprt);
-	task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0;
 	task->tk_status = -EAGAIN;
-	rpc_sleep_on(&xprt->sending, task, NULL);
+	if  (RPC_IS_SOFT(task))
+		rpc_sleep_on_timeout(&xprt->sending, task, NULL,
+				jiffies + req->rq_timeout);
+	else
+		rpc_sleep_on(&xprt->sending, task, NULL);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(xprt_reserve_xprt);
@@ -273,9 +276,12 @@ int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task)
 	xprt_clear_locked(xprt);
 out_sleep:
 	dprintk("RPC: %5u failed to lock transport %p\n", task->tk_pid, xprt);
-	task->tk_timeout = RPC_IS_SOFT(task) ? req->rq_timeout : 0;
 	task->tk_status = -EAGAIN;
-	rpc_sleep_on(&xprt->sending, task, NULL);
+	if (RPC_IS_SOFT(task))
+		rpc_sleep_on_timeout(&xprt->sending, task, NULL,
+				jiffies + req->rq_timeout);
+	else
+		rpc_sleep_on(&xprt->sending, task, NULL);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(xprt_reserve_xprt_cong);
@@ -787,9 +793,9 @@ void xprt_connect(struct rpc_task *task)
 		xprt->ops->close(xprt);
 
 	if (!xprt_connected(xprt)) {
-		task->tk_timeout = task->tk_rqstp->rq_timeout;
 		task->tk_rqstp->rq_connect_cookie = xprt->connect_cookie;
-		rpc_sleep_on(&xprt->pending, task, NULL);
+		rpc_sleep_on_timeout(&xprt->pending, task, NULL,
+				jiffies + task->tk_rqstp->rq_timeout);
 
 		if (test_bit(XPRT_CLOSING, &xprt->state))
 			return;
@@ -1080,8 +1086,8 @@ void xprt_wait_for_reply_request_def(struct rpc_task *task)
 {
 	struct rpc_rqst *req = task->tk_rqstp;
 
-	task->tk_timeout = req->rq_timeout;
-	rpc_sleep_on(&req->rq_xprt->pending, task, xprt_timer);
+	rpc_sleep_on_timeout(&req->rq_xprt->pending, task, xprt_timer,
+			jiffies + req->rq_timeout);
 }
 EXPORT_SYMBOL_GPL(xprt_wait_for_reply_request_def);
 
@@ -1099,12 +1105,14 @@ void xprt_wait_for_reply_request_rtt(struct rpc_task *task)
 	struct rpc_rtt *rtt = clnt->cl_rtt;
 	struct rpc_rqst *req = task->tk_rqstp;
 	unsigned long max_timeout = clnt->cl_timeout->to_maxval;
+	unsigned long timeout;
 
-	task->tk_timeout = rpc_calc_rto(rtt, timer);
-	task->tk_timeout <<= rpc_ntimeo(rtt, timer) + req->rq_retries;
-	if (task->tk_timeout > max_timeout || task->tk_timeout == 0)
-		task->tk_timeout = max_timeout;
-	rpc_sleep_on(&req->rq_xprt->pending, task, xprt_timer);
+	timeout = rpc_calc_rto(rtt, timer);
+	timeout <<= rpc_ntimeo(rtt, timer) + req->rq_retries;
+	if (timeout > max_timeout || timeout == 0)
+		timeout = max_timeout;
+	rpc_sleep_on_timeout(&req->rq_xprt->pending, task, xprt_timer,
+			jiffies + timeout);
 }
 EXPORT_SYMBOL_GPL(xprt_wait_for_reply_request_rtt);
 
@@ -1656,7 +1664,6 @@ void xprt_reserve(struct rpc_task *task)
 	if (task->tk_rqstp != NULL)
 		return;
 
-	task->tk_timeout = 0;
 	task->tk_status = -EAGAIN;
 	if (!xprt_throttle_congested(xprt, task))
 		xprt_do_reserve(xprt, task);
@@ -1679,7 +1686,6 @@ void xprt_retry_reserve(struct rpc_task *task)
 	if (task->tk_rqstp != NULL)
 		return;
 
-	task->tk_timeout = 0;
 	task->tk_status = -EAGAIN;
 	xprt_do_reserve(xprt, task);
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 07/28] SUNRPC: Fix up tracking of timeouts
  2019-03-29 21:59           ` [PATCH v2 06/28] SUNRPC: Add function rpc_sleep_on_timeout() Trond Myklebust
@ 2019-03-29 21:59             ` Trond Myklebust
  2019-03-29 21:59               ` [PATCH v2 08/28] SUNRPC: Simplify queue timeouts using timer_reduce() Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

Add a helper to ensure that debugfs and friends print out the
correct current task timeout value.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 include/linux/sunrpc/sched.h  |  1 +
 include/trace/events/sunrpc.h |  2 +-
 net/sunrpc/clnt.c             |  2 +-
 net/sunrpc/debugfs.c          |  2 +-
 net/sunrpc/sched.c            | 14 ++++++++++++++
 5 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index 81f5d93f7109..0b7d333f624e 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -226,6 +226,7 @@ void		rpc_execute(struct rpc_task *);
 void		rpc_init_priority_wait_queue(struct rpc_wait_queue *, const char *);
 void		rpc_init_wait_queue(struct rpc_wait_queue *, const char *);
 void		rpc_destroy_wait_queue(struct rpc_wait_queue *);
+unsigned long	rpc_task_timeout(const struct rpc_task *task);
 void		rpc_sleep_on_timeout(struct rpc_wait_queue *queue,
 					struct rpc_task *task,
 					rpc_action action,
diff --git a/include/trace/events/sunrpc.h b/include/trace/events/sunrpc.h
index 5e3b77d9daa7..dd301db64521 100644
--- a/include/trace/events/sunrpc.h
+++ b/include/trace/events/sunrpc.h
@@ -186,7 +186,7 @@ DECLARE_EVENT_CLASS(rpc_task_queued,
 		__entry->client_id = task->tk_client ?
 				     task->tk_client->cl_clid : -1;
 		__entry->task_id = task->tk_pid;
-		__entry->timeout = task->tk_timeout;
+		__entry->timeout = rpc_task_timeout(task);
 		__entry->runstate = task->tk_runstate;
 		__entry->status = task->tk_status;
 		__entry->flags = task->tk_flags;
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 4d853b5420c8..3ab743c0e888 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -2900,7 +2900,7 @@ static void rpc_show_task(const struct rpc_clnt *clnt,
 
 	printk(KERN_INFO "%5u %04x %6d %8p %8p %8ld %8p %sv%u %s a:%ps q:%s\n",
 		task->tk_pid, task->tk_flags, task->tk_status,
-		clnt, task->tk_rqstp, task->tk_timeout, task->tk_ops,
+		clnt, task->tk_rqstp, rpc_task_timeout(task), task->tk_ops,
 		clnt->cl_program->name, clnt->cl_vers, rpc_proc_name(task),
 		task->tk_action, rpc_waitq);
 }
diff --git a/net/sunrpc/debugfs.c b/net/sunrpc/debugfs.c
index 19bb356230ed..95ebd76b132d 100644
--- a/net/sunrpc/debugfs.c
+++ b/net/sunrpc/debugfs.c
@@ -33,7 +33,7 @@ tasks_show(struct seq_file *f, void *v)
 
 	seq_printf(f, "%5u %04x %6d 0x%x 0x%x %8ld %ps %sv%u %s a:%ps q:%s\n",
 		task->tk_pid, task->tk_flags, task->tk_status,
-		clnt->cl_clid, xid, task->tk_timeout, task->tk_ops,
+		clnt->cl_clid, xid, rpc_task_timeout(task), task->tk_ops,
 		clnt->cl_program->name, clnt->cl_vers, rpc_proc_name(task),
 		task->tk_action, rpc_waitq);
 	return 0;
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 7e0f7b83262f..40944c34a9e4 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -58,6 +58,20 @@ static struct rpc_wait_queue delay_queue;
 struct workqueue_struct *rpciod_workqueue __read_mostly;
 struct workqueue_struct *xprtiod_workqueue __read_mostly;
 
+unsigned long
+rpc_task_timeout(const struct rpc_task *task)
+{
+	unsigned long timeout = READ_ONCE(task->tk_timeout);
+
+	if (timeout != 0) {
+		unsigned long now = jiffies;
+		if (time_before(now, timeout))
+			return timeout - now;
+	}
+	return 0;
+}
+EXPORT_SYMBOL_GPL(rpc_task_timeout);
+
 /*
  * Disable the timer for a given RPC task. Should be called with
  * queue->lock and bh_disabled in order to avoid races within
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 08/28] SUNRPC: Simplify queue timeouts using timer_reduce()
  2019-03-29 21:59             ` [PATCH v2 07/28] SUNRPC: Fix up tracking of timeouts Trond Myklebust
@ 2019-03-29 21:59               ` Trond Myklebust
  2019-03-29 21:59                 ` [PATCH v2 09/28] SUNRPC: Declare RPC timers as TIMER_DEFERRABLE Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

Simplify the setting of queue timeouts by using the timer_reduce()
function.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 include/linux/sunrpc/sched.h | 1 -
 net/sunrpc/sched.c           | 6 ++----
 2 files changed, 2 insertions(+), 5 deletions(-)

diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index 0b7d333f624e..c41f5f498b3b 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -183,7 +183,6 @@ struct rpc_task_setup {
 struct rpc_timer {
 	struct timer_list timer;
 	struct list_head list;
-	unsigned long expires;
 };
 
 /*
diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 40944c34a9e4..301e0f7f1dc9 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -92,8 +92,7 @@ __rpc_disable_timer(struct rpc_wait_queue *queue, struct rpc_task *task)
 static void
 rpc_set_queue_timer(struct rpc_wait_queue *queue, unsigned long expires)
 {
-	queue->timer_list.expires = expires;
-	mod_timer(&queue->timer_list.timer, expires);
+	timer_reduce(&queue->timer_list.timer, expires);
 }
 
 /*
@@ -107,8 +106,7 @@ __rpc_add_timer(struct rpc_wait_queue *queue, struct rpc_task *task,
 		task->tk_pid, jiffies_to_msecs(timeout - jiffies));
 
 	task->tk_timeout = timeout;
-	if (list_empty(&queue->timer_list.list) || time_before(timeout, queue->timer_list.expires))
-		rpc_set_queue_timer(queue, timeout);
+	rpc_set_queue_timer(queue, timeout);
 	list_add(&task->u.tk_wait.timer_list, &queue->timer_list.list);
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 09/28] SUNRPC: Declare RPC timers as TIMER_DEFERRABLE
  2019-03-29 21:59               ` [PATCH v2 08/28] SUNRPC: Simplify queue timeouts using timer_reduce() Trond Myklebust
@ 2019-03-29 21:59                 ` Trond Myklebust
  2019-03-29 21:59                   ` [PATCH v2 10/28] SUNRPC: Ensure that the transport layer respect major timeouts Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

Don't wake idle CPUs only for the purpose of servicing an RPC
queue timeout.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 net/sunrpc/sched.c | 4 +++-
 net/sunrpc/xprt.c  | 4 +++-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index 301e0f7f1dc9..1a12fb03e611 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -249,7 +249,9 @@ static void __rpc_init_priority_wait_queue(struct rpc_wait_queue *queue, const c
 	queue->maxpriority = nr_queues - 1;
 	rpc_reset_waitqueue_priority(queue);
 	queue->qlen = 0;
-	timer_setup(&queue->timer_list.timer, __rpc_queue_timer_fn, 0);
+	timer_setup(&queue->timer_list.timer,
+			__rpc_queue_timer_fn,
+			TIMER_DEFERRABLE);
 	INIT_LIST_HEAD(&queue->timer_list.list);
 	rpc_assign_waitqueue_name(queue, qname);
 }
diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 7c3623b17493..36af1a1929af 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1842,7 +1842,9 @@ struct rpc_xprt *xprt_create_transport(struct xprt_create *args)
 		xprt->idle_timeout = 0;
 	INIT_WORK(&xprt->task_cleanup, xprt_autoclose);
 	if (xprt_has_timer(xprt))
-		timer_setup(&xprt->timer, xprt_init_autodisconnect, 0);
+		timer_setup(&xprt->timer,
+				xprt_init_autodisconnect,
+				TIMER_DEFERRABLE);
 	else
 		timer_setup(&xprt->timer, NULL, 0);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 10/28] SUNRPC: Ensure that the transport layer respect major timeouts
  2019-03-29 21:59                 ` [PATCH v2 09/28] SUNRPC: Declare RPC timers as TIMER_DEFERRABLE Trond Myklebust
@ 2019-03-29 21:59                   ` Trond Myklebust
  2019-03-29 21:59                     ` [PATCH v2 11/28] SUNRPC: Add tracking of RPC level errors Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

Ensure that when in the transport layer, we don't sleep past
a major timeout.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 net/sunrpc/xprt.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 36af1a1929af..642cc0f64e44 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -73,6 +73,15 @@ static void	 xprt_destroy(struct rpc_xprt *xprt);
 static DEFINE_SPINLOCK(xprt_list_lock);
 static LIST_HEAD(xprt_list);
 
+static unsigned long xprt_request_timeout(const struct rpc_rqst *req)
+{
+	unsigned long timeout = jiffies + req->rq_timeout;
+
+	if (time_before(timeout, req->rq_majortimeo))
+		return timeout;
+	return req->rq_majortimeo;
+}
+
 /**
  * xprt_register_transport - register a transport implementation
  * @transport: transport to register
@@ -212,7 +221,7 @@ int xprt_reserve_xprt(struct rpc_xprt *xprt, struct rpc_task *task)
 	task->tk_status = -EAGAIN;
 	if  (RPC_IS_SOFT(task))
 		rpc_sleep_on_timeout(&xprt->sending, task, NULL,
-				jiffies + req->rq_timeout);
+				xprt_request_timeout(req));
 	else
 		rpc_sleep_on(&xprt->sending, task, NULL);
 	return 0;
@@ -279,7 +288,7 @@ int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task)
 	task->tk_status = -EAGAIN;
 	if (RPC_IS_SOFT(task))
 		rpc_sleep_on_timeout(&xprt->sending, task, NULL,
-				jiffies + req->rq_timeout);
+				xprt_request_timeout(req));
 	else
 		rpc_sleep_on(&xprt->sending, task, NULL);
 	return 0;
@@ -795,7 +804,7 @@ void xprt_connect(struct rpc_task *task)
 	if (!xprt_connected(xprt)) {
 		task->tk_rqstp->rq_connect_cookie = xprt->connect_cookie;
 		rpc_sleep_on_timeout(&xprt->pending, task, NULL,
-				jiffies + task->tk_rqstp->rq_timeout);
+				xprt_request_timeout(task->tk_rqstp));
 
 		if (test_bit(XPRT_CLOSING, &xprt->state))
 			return;
@@ -1087,7 +1096,7 @@ void xprt_wait_for_reply_request_def(struct rpc_task *task)
 	struct rpc_rqst *req = task->tk_rqstp;
 
 	rpc_sleep_on_timeout(&req->rq_xprt->pending, task, xprt_timer,
-			jiffies + req->rq_timeout);
+			xprt_request_timeout(req));
 }
 EXPORT_SYMBOL_GPL(xprt_wait_for_reply_request_def);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 11/28] SUNRPC: Add tracking of RPC level errors
  2019-03-29 21:59                   ` [PATCH v2 10/28] SUNRPC: Ensure that the transport layer respect major timeouts Trond Myklebust
@ 2019-03-29 21:59                     ` Trond Myklebust
  2019-03-29 21:59                       ` [PATCH v2 12/28] SUNRPC: Make "no retrans timeout" soft tasks behave like softconn for timeouts Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

Add variables to track RPC level errors so that we can distinguish
between issue that arose in the RPC transport layer as opposed to
those arising from the reply message.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 include/linux/sunrpc/sched.h |  2 ++
 net/sunrpc/clnt.c            | 40 ++++++++++++++++++++++++------------
 net/sunrpc/xprtsock.c        |  1 +
 3 files changed, 30 insertions(+), 13 deletions(-)

diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h
index c41f5f498b3b..f01990c6423d 100644
--- a/include/linux/sunrpc/sched.h
+++ b/include/linux/sunrpc/sched.h
@@ -61,6 +61,8 @@ struct rpc_task {
 		struct rpc_wait		tk_wait;	/* RPC wait */
 	} u;
 
+	int			tk_rpc_status;	/* Result of last RPC operation */
+
 	/*
 	 * RPC call state
 	 */
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 3ab743c0e888..a9e5e81c67be 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1468,6 +1468,7 @@ static int
 __rpc_restart_call(struct rpc_task *task, void (*action)(struct rpc_task *))
 {
 	task->tk_status = 0;
+	task->tk_rpc_status = 0;
 	task->tk_action = action;
 	return 1;
 }
@@ -1510,6 +1511,19 @@ const char
 		return "no proc";
 }
 
+static void
+__rpc_call_rpcerror(struct rpc_task *task, int tk_status, int rpc_status)
+{
+	task->tk_rpc_status = rpc_status;
+	rpc_exit(task, tk_status);
+}
+
+static void
+rpc_call_rpcerror(struct rpc_task *task, int status)
+{
+	__rpc_call_rpcerror(task, status, status);
+}
+
 /*
  * 0.  Initial state
  *
@@ -1579,7 +1593,7 @@ call_reserveresult(struct rpc_task *task)
 
 		printk(KERN_ERR "%s: status=%d, but no request slot, exiting\n",
 				__func__, status);
-		rpc_exit(task, -EIO);
+		rpc_call_rpcerror(task, -EIO);
 		return;
 	}
 
@@ -1608,7 +1622,7 @@ call_reserveresult(struct rpc_task *task)
 				__func__, status);
 		break;
 	}
-	rpc_exit(task, status);
+	rpc_call_rpcerror(task, status);
 }
 
 /*
@@ -1684,7 +1698,7 @@ call_refreshresult(struct rpc_task *task)
 	}
 	dprintk("RPC: %5u %s: refresh creds failed with error %d\n",
 				task->tk_pid, __func__, status);
-	rpc_exit(task, status);
+	rpc_call_rpcerror(task, status);
 }
 
 /*
@@ -1741,7 +1755,7 @@ call_allocate(struct rpc_task *task)
 		return;
 	}
 	if (status != -ENOMEM) {
-		rpc_exit(task, status);
+		rpc_call_rpcerror(task, status);
 		return;
 	}
 
@@ -1810,7 +1824,7 @@ call_encode(struct rpc_task *task)
 			task->tk_action = call_refresh;
 			break;
 		default:
-			rpc_exit(task, task->tk_status);
+			rpc_call_rpcerror(task, task->tk_status);
 		}
 		return;
 	} else {
@@ -1950,7 +1964,7 @@ call_bind_status(struct rpc_task *task)
 				task->tk_pid, -task->tk_status);
 	}
 
-	rpc_exit(task, status);
+	rpc_call_rpcerror(task, status);
 	return;
 
 retry_timeout:
@@ -1986,7 +2000,7 @@ call_connect(struct rpc_task *task)
 	if (task->tk_status < 0)
 		return;
 	if (task->tk_flags & RPC_TASK_NOCONNECT) {
-		rpc_exit(task, -ENOTCONN);
+		rpc_call_rpcerror(task, -ENOTCONN);
 		return;
 	}
 	if (!xprt_prepare_transmit(task))
@@ -2047,7 +2061,7 @@ call_connect_status(struct rpc_task *task)
 		call_transmit(task);
 		return;
 	}
-	rpc_exit(task, status);
+	rpc_call_rpcerror(task, status);
 	return;
 out_retry:
 	/* Check for timeouts before looping back to call_bind */
@@ -2138,7 +2152,7 @@ call_transmit_status(struct rpc_task *task)
 			if (!task->tk_msg.rpc_proc->p_proc)
 				trace_xprt_ping(task->tk_xprt,
 						task->tk_status);
-			rpc_exit(task, task->tk_status);
+			rpc_call_rpcerror(task, task->tk_status);
 			return;
 		}
 		/* fall through */
@@ -2301,7 +2315,7 @@ call_status(struct rpc_task *task)
 	rpc_check_timeout(task);
 	return;
 out_exit:
-	rpc_exit(task, status);
+	rpc_call_rpcerror(task, status);
 }
 
 static bool
@@ -2325,7 +2339,7 @@ rpc_check_timeout(struct rpc_task *task)
 	task->tk_timeouts++;
 
 	if (RPC_IS_SOFTCONN(task) && !rpc_check_connected(task->tk_rqstp)) {
-		rpc_exit(task, -ETIMEDOUT);
+		rpc_call_rpcerror(task, -ETIMEDOUT);
 		return;
 	}
 
@@ -2336,9 +2350,9 @@ rpc_check_timeout(struct rpc_task *task)
 				task->tk_xprt->servername);
 		}
 		if (task->tk_flags & RPC_TASK_TIMEOUT)
-			rpc_exit(task, -ETIMEDOUT);
+			rpc_call_rpcerror(task, -ETIMEDOUT);
 		else
-			rpc_exit(task, -EIO);
+			__rpc_call_rpcerror(task, -EIO, -ETIMEDOUT);
 		return;
 	}
 
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index b4b4b8db143c..c69951ed2ebc 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -2017,6 +2017,7 @@ static void xs_local_connect(struct rpc_xprt *xprt, struct rpc_task *task)
 		 * we'll need to figure out how to pass a namespace to
 		 * connect.
 		 */
+		task->tk_rpc_status = -ENOTCONN;
 		rpc_exit(task, -ENOTCONN);
 		return;
 	}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 12/28] SUNRPC: Make "no retrans timeout" soft tasks behave like softconn for timeouts
  2019-03-29 21:59                     ` [PATCH v2 11/28] SUNRPC: Add tracking of RPC level errors Trond Myklebust
@ 2019-03-29 21:59                       ` Trond Myklebust
  2019-03-29 21:59                         ` [PATCH v2 13/28] SUNRPC: Start the first major timeout calculation at task creation Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

If a soft NFSv4 request is sent, then we don't need it to time out unless
the connection breaks. The reason is that as long as the connection is
unbroken, the protocol states that the server is not allowed to drop the
request.  IOW: as long as the connection remains unbroken, the client may
assume that all transmitted RPC requests are being processed by the server,
and that retransmissions and timeouts of those requests are unwarranted.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 net/sunrpc/clnt.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index a9e5e81c67be..c921a5aff790 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -2344,6 +2344,15 @@ rpc_check_timeout(struct rpc_task *task)
 	}
 
 	if (RPC_IS_SOFT(task)) {
+		/*
+		 * Once a "no retrans timeout" soft tasks (a.k.a NFSv4) has
+		 * been sent, it should time out only if the transport
+		 * connection gets terminally broken.
+		 */
+		if ((task->tk_flags & RPC_TASK_NO_RETRANS_TIMEOUT) &&
+		    rpc_check_connected(task->tk_rqstp))
+			return;
+
 		if (clnt->cl_chatty) {
 			printk(KERN_NOTICE "%s: server %s not responding, timed out\n",
 				clnt->cl_program->name,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 13/28] SUNRPC: Start the first major timeout calculation at task creation
  2019-03-29 21:59                       ` [PATCH v2 12/28] SUNRPC: Make "no retrans timeout" soft tasks behave like softconn for timeouts Trond Myklebust
@ 2019-03-29 21:59                         ` Trond Myklebust
  2019-03-29 21:59                           ` [PATCH v2 14/28] SUNRPC: Ensure to ratelimit the "server not responding" syslog messages Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

When calculating the major timeout for a new task, when we know that the
connection has been broken, use the task->tk_start to ensure that we also
take into account the time spent waiting for a slot or session slot. This
ensures that we fail over soft requests relatively quickly once the
connection has actually been broken, and the first requests have
started to fail.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 net/sunrpc/xprt.c | 44 ++++++++++++++++++++++++++++++++++----------
 1 file changed, 34 insertions(+), 10 deletions(-)

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index 642cc0f64e44..bc1c8247750d 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -569,18 +569,44 @@ bool xprt_write_space(struct rpc_xprt *xprt)
 }
 EXPORT_SYMBOL_GPL(xprt_write_space);
 
-static void xprt_reset_majortimeo(struct rpc_rqst *req)
+static unsigned long xprt_abs_ktime_to_jiffies(ktime_t abstime)
+{
+	s64 delta = ktime_to_ns(ktime_get() - abstime);
+	return likely(delta >= 0) ?
+		jiffies - nsecs_to_jiffies(delta) :
+		jiffies + nsecs_to_jiffies(-delta);
+}
+
+static unsigned long xprt_calc_majortimeo(struct rpc_rqst *req)
 {
 	const struct rpc_timeout *to = req->rq_task->tk_client->cl_timeout;
+	unsigned long majortimeo = req->rq_timeout;
 
-	req->rq_majortimeo = req->rq_timeout;
 	if (to->to_exponential)
-		req->rq_majortimeo <<= to->to_retries;
+		majortimeo <<= to->to_retries;
+	else
+		majortimeo += to->to_increment * to->to_retries;
+	if (majortimeo > to->to_maxval || majortimeo == 0)
+		majortimeo = to->to_maxval;
+	return majortimeo;
+}
+
+static void xprt_reset_majortimeo(struct rpc_rqst *req)
+{
+	req->rq_majortimeo += xprt_calc_majortimeo(req);
+}
+
+static void xprt_init_majortimeo(struct rpc_task *task, struct rpc_rqst *req)
+{
+	unsigned long time_init;
+	struct rpc_xprt *xprt = req->rq_xprt;
+
+	if (likely(xprt && xprt_connected(xprt)))
+		time_init = jiffies;
 	else
-		req->rq_majortimeo += to->to_increment * to->to_retries;
-	if (req->rq_majortimeo > to->to_maxval || req->rq_majortimeo == 0)
-		req->rq_majortimeo = to->to_maxval;
-	req->rq_majortimeo += jiffies;
+		time_init = xprt_abs_ktime_to_jiffies(task->tk_start);
+	req->rq_timeout = task->tk_client->cl_timeout->to_initval;
+	req->rq_majortimeo = time_init + xprt_calc_majortimeo(req);
 }
 
 /**
@@ -997,7 +1023,6 @@ xprt_request_enqueue_receive(struct rpc_task *task)
 	set_bit(RPC_TASK_NEED_RECV, &task->tk_runstate);
 	spin_unlock(&xprt->queue_lock);
 
-	xprt_reset_majortimeo(req);
 	/* Turn off autodisconnect */
 	del_singleshot_timer_sync(&xprt->timer);
 }
@@ -1631,7 +1656,6 @@ xprt_request_init(struct rpc_task *task)
 	struct rpc_xprt *xprt = task->tk_xprt;
 	struct rpc_rqst	*req = task->tk_rqstp;
 
-	req->rq_timeout = task->tk_client->cl_timeout->to_initval;
 	req->rq_task	= task;
 	req->rq_xprt    = xprt;
 	req->rq_buffer  = NULL;
@@ -1644,7 +1668,7 @@ xprt_request_init(struct rpc_task *task)
 	req->rq_snd_buf.bvec = NULL;
 	req->rq_rcv_buf.bvec = NULL;
 	req->rq_release_snd_buf = NULL;
-	xprt_reset_majortimeo(req);
+	xprt_init_majortimeo(task, req);
 	dprintk("RPC: %5u reserved req %p xid %08x\n", task->tk_pid,
 			req, ntohl(req->rq_xid));
 }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 14/28] SUNRPC: Ensure to ratelimit the "server not responding" syslog messages
  2019-03-29 21:59                         ` [PATCH v2 13/28] SUNRPC: Start the first major timeout calculation at task creation Trond Myklebust
@ 2019-03-29 21:59                           ` Trond Myklebust
  2019-03-29 21:59                             ` [PATCH v2 15/28] SUNRPC: Add the 'softerr' rpc_client flag Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

In particular, the timeout messages can be very noisy, so we ought to
ratelimit them in order to avoid spamming the syslog.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 net/sunrpc/clnt.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index c921a5aff790..c0b79d307579 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -2354,7 +2354,8 @@ rpc_check_timeout(struct rpc_task *task)
 			return;
 
 		if (clnt->cl_chatty) {
-			printk(KERN_NOTICE "%s: server %s not responding, timed out\n",
+			pr_notice_ratelimited(
+				"%s: server %s not responding, timed out\n",
 				clnt->cl_program->name,
 				task->tk_xprt->servername);
 		}
@@ -2368,9 +2369,10 @@ rpc_check_timeout(struct rpc_task *task)
 	if (!(task->tk_flags & RPC_CALL_MAJORSEEN)) {
 		task->tk_flags |= RPC_CALL_MAJORSEEN;
 		if (clnt->cl_chatty) {
-			printk(KERN_NOTICE "%s: server %s not responding, still trying\n",
-			clnt->cl_program->name,
-			task->tk_xprt->servername);
+			pr_notice_ratelimited(
+				"%s: server %s not responding, still trying\n",
+				clnt->cl_program->name,
+				task->tk_xprt->servername);
 		}
 	}
 	rpc_force_rebind(clnt);
@@ -2400,7 +2402,7 @@ call_decode(struct rpc_task *task)
 
 	if (task->tk_flags & RPC_CALL_MAJORSEEN) {
 		if (clnt->cl_chatty) {
-			printk(KERN_NOTICE "%s: server %s OK\n",
+			pr_notice_ratelimited("%s: server %s OK\n",
 				clnt->cl_program->name,
 				task->tk_xprt->servername);
 		}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 15/28] SUNRPC: Add the 'softerr' rpc_client flag
  2019-03-29 21:59                           ` [PATCH v2 14/28] SUNRPC: Ensure to ratelimit the "server not responding" syslog messages Trond Myklebust
@ 2019-03-29 21:59                             ` Trond Myklebust
  2019-03-29 21:59                               ` [PATCH v2 16/28] NFS: Consider ETIMEDOUT to be a fatal error Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

Add the 'softerr' rpc client flag that sets the RPC_TASK_TIMEOUT
flag on all new rpc tasks that are attached to that rpc client.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 include/linux/sunrpc/clnt.h | 2 ++
 net/sunrpc/clnt.c           | 8 +++++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/include/linux/sunrpc/clnt.h b/include/linux/sunrpc/clnt.h
index 98bc9883b230..943762acfcd4 100644
--- a/include/linux/sunrpc/clnt.h
+++ b/include/linux/sunrpc/clnt.h
@@ -50,6 +50,7 @@ struct rpc_clnt {
 	struct rpc_iostats *	cl_metrics;	/* per-client statistics */
 
 	unsigned int		cl_softrtry : 1,/* soft timeouts */
+				cl_softerr  : 1,/* Timeouts return errors */
 				cl_discrtry : 1,/* disconnect before retry */
 				cl_noretranstimeo: 1,/* No retransmit timeouts */
 				cl_autobind : 1,/* use getport() */
@@ -144,6 +145,7 @@ struct rpc_add_xprt_test {
 #define RPC_CLNT_CREATE_INFINITE_SLOTS	(1UL << 7)
 #define RPC_CLNT_CREATE_NO_IDLE_TIMEOUT	(1UL << 8)
 #define RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT	(1UL << 9)
+#define RPC_CLNT_CREATE_SOFTERR		(1UL << 10)
 
 struct rpc_clnt *rpc_create(struct rpc_create_args *args);
 struct rpc_clnt	*rpc_bind_new_program(struct rpc_clnt *,
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index c0b79d307579..fb809e1912e3 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -484,8 +484,11 @@ static struct rpc_clnt *rpc_create_xprt(struct rpc_create_args *args,
 	}
 
 	clnt->cl_softrtry = 1;
-	if (args->flags & RPC_CLNT_CREATE_HARDRTRY)
+	if (args->flags & (RPC_CLNT_CREATE_HARDRTRY|RPC_CLNT_CREATE_SOFTERR)) {
 		clnt->cl_softrtry = 0;
+		if (args->flags & RPC_CLNT_CREATE_SOFTERR)
+			clnt->cl_softerr = 1;
+	}
 
 	if (args->flags & RPC_CLNT_CREATE_AUTOBIND)
 		clnt->cl_autobind = 1;
@@ -623,6 +626,7 @@ static struct rpc_clnt *__rpc_clone_client(struct rpc_create_args *args,
 	/* Turn off autobind on clones */
 	new->cl_autobind = 0;
 	new->cl_softrtry = clnt->cl_softrtry;
+	new->cl_softerr = clnt->cl_softerr;
 	new->cl_noretranstimeo = clnt->cl_noretranstimeo;
 	new->cl_discrtry = clnt->cl_discrtry;
 	new->cl_chatty = clnt->cl_chatty;
@@ -1001,6 +1005,8 @@ void rpc_task_set_client(struct rpc_task *task, struct rpc_clnt *clnt)
 		atomic_inc(&clnt->cl_count);
 		if (clnt->cl_softrtry)
 			task->tk_flags |= RPC_TASK_SOFT;
+		if (clnt->cl_softerr)
+			task->tk_flags |= RPC_TASK_TIMEOUT;
 		if (clnt->cl_noretranstimeo)
 			task->tk_flags |= RPC_TASK_NO_RETRANS_TIMEOUT;
 		if (atomic_read(&clnt->cl_swapper))
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 16/28] NFS: Consider ETIMEDOUT to be a fatal error
  2019-03-29 21:59                             ` [PATCH v2 15/28] SUNRPC: Add the 'softerr' rpc_client flag Trond Myklebust
@ 2019-03-29 21:59                               ` Trond Myklebust
  2019-03-29 21:59                                 ` [PATCH v2 17/28] NFS: Move internal constants out of uapi/linux/nfs_mount.h Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

When we introduce the 'softerr' mount option, we will see the RPC
layer returning ETIMEDOUT errors if the server is unresponsive. We
want to consider those errors to be fatal on par with the EIO errors
that are returned by ordinary 'soft' timeouts..

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/internal.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index c7cf23ae6597..3cefd0ed01be 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -766,6 +766,7 @@ static inline bool nfs_error_is_fatal(int err)
 	case -ESTALE:
 	case -E2BIG:
 	case -ENOMEM:
+	case -ETIMEDOUT:
 		return true;
 	default:
 		return false;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 17/28] NFS: Move internal constants out of uapi/linux/nfs_mount.h
  2019-03-29 21:59                               ` [PATCH v2 16/28] NFS: Consider ETIMEDOUT to be a fatal error Trond Myklebust
@ 2019-03-29 21:59                                 ` Trond Myklebust
  2019-03-29 21:59                                   ` [PATCH v2 18/28] NFS: Add a mount option "softerr" to allow clients to see ETIMEDOUT errors Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

When the label says "for internal use only", then it doesn't belong
in the 'uapi' subtree.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 include/linux/nfs_fs_sb.h      | 9 +++++++++
 include/uapi/linux/nfs_mount.h | 9 ---------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index c827d31298cc..013ac5b54a09 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -139,6 +139,15 @@ struct nfs_server {
 	struct nfs_iostats __percpu *io_stats;	/* I/O statistics */
 	atomic_long_t		writeback;	/* number of writeback pages */
 	int			flags;		/* various flags */
+
+/* The following are for internal use only. Also see uapi/linux/nfs_mount.h */
+#define NFS_MOUNT_LOOKUP_CACHE_NONEG	0x10000
+#define NFS_MOUNT_LOOKUP_CACHE_NONE	0x20000
+#define NFS_MOUNT_NORESVPORT		0x40000
+#define NFS_MOUNT_LEGACY_INTERFACE	0x80000
+#define NFS_MOUNT_LOCAL_FLOCK		0x100000
+#define NFS_MOUNT_LOCAL_FCNTL		0x200000
+
 	unsigned int		caps;		/* server capabilities */
 	unsigned int		rsize;		/* read size */
 	unsigned int		rpages;		/* read size (in pages) */
diff --git a/include/uapi/linux/nfs_mount.h b/include/uapi/linux/nfs_mount.h
index e44e00616ab5..e3bcfc6aa3b0 100644
--- a/include/uapi/linux/nfs_mount.h
+++ b/include/uapi/linux/nfs_mount.h
@@ -66,13 +66,4 @@ struct nfs_mount_data {
 #define NFS_MOUNT_UNSHARED	0x8000	/* 5 */
 #define NFS_MOUNT_FLAGMASK	0xFFFF
 
-/* The following are for internal use only */
-#define NFS_MOUNT_LOOKUP_CACHE_NONEG	0x10000
-#define NFS_MOUNT_LOOKUP_CACHE_NONE	0x20000
-#define NFS_MOUNT_NORESVPORT		0x40000
-#define NFS_MOUNT_LEGACY_INTERFACE	0x80000
-
-#define NFS_MOUNT_LOCAL_FLOCK	0x100000
-#define NFS_MOUNT_LOCAL_FCNTL	0x200000
-
 #endif
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 18/28] NFS: Add a mount option "softerr" to allow clients to see ETIMEDOUT errors
  2019-03-29 21:59                                 ` [PATCH v2 17/28] NFS: Move internal constants out of uapi/linux/nfs_mount.h Trond Myklebust
@ 2019-03-29 21:59                                   ` Trond Myklebust
  2019-03-29 21:59                                     ` [PATCH v2 19/28] NFS: Don't interrupt file writeout due to fatal errors Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

Add a mount option that exposes the ETIMEDOUT errors that occur during
soft timeouts to the application. This allows aware applications to
distinguish between server disk IO errors and client timeout errors.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/client.c           |  2 ++
 fs/nfs/super.c            | 15 ++++++++++++---
 include/linux/nfs_fs_sb.h |  1 +
 3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 90d71fda65ce..f74638c5e5b4 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -598,6 +598,8 @@ int nfs_init_server_rpcclient(struct nfs_server *server,
 			sizeof(server->client->cl_timeout_default));
 	server->client->cl_timeout = &server->client->cl_timeout_default;
 	server->client->cl_softrtry = 0;
+	if (server->flags & NFS_MOUNT_SOFTERR)
+		server->client->cl_softerr = 1;
 	if (server->flags & NFS_MOUNT_SOFT)
 		server->client->cl_softrtry = 1;
 
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index 23790c7b2289..61cae26dbf78 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -78,7 +78,7 @@
 
 enum {
 	/* Mount options that take no arguments */
-	Opt_soft, Opt_hard,
+	Opt_soft, Opt_softerr, Opt_hard,
 	Opt_posix, Opt_noposix,
 	Opt_cto, Opt_nocto,
 	Opt_ac, Opt_noac,
@@ -125,6 +125,7 @@ static const match_table_t nfs_mount_option_tokens = {
 	{ Opt_sloppy, "sloppy" },
 
 	{ Opt_soft, "soft" },
+	{ Opt_softerr, "softerr" },
 	{ Opt_hard, "hard" },
 	{ Opt_deprecated, "intr" },
 	{ Opt_deprecated, "nointr" },
@@ -628,7 +629,8 @@ static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss,
 		const char *str;
 		const char *nostr;
 	} nfs_info[] = {
-		{ NFS_MOUNT_SOFT, ",soft", ",hard" },
+		{ NFS_MOUNT_SOFT, ",soft", "" },
+		{ NFS_MOUNT_SOFTERR, ",softerr", "" },
 		{ NFS_MOUNT_POSIX, ",posix", "" },
 		{ NFS_MOUNT_NOCTO, ",nocto", "" },
 		{ NFS_MOUNT_NOAC, ",noac", "" },
@@ -658,6 +660,8 @@ static void nfs_show_mount_options(struct seq_file *m, struct nfs_server *nfss,
 		seq_printf(m, ",acdirmin=%u", nfss->acdirmin/HZ);
 	if (nfss->acdirmax != NFS_DEF_ACDIRMAX*HZ || showdefaults)
 		seq_printf(m, ",acdirmax=%u", nfss->acdirmax/HZ);
+	if (!(nfss->flags & (NFS_MOUNT_SOFT|NFS_MOUNT_SOFTERR)))
+			seq_puts(m, ",hard");
 	for (nfs_infop = nfs_info; nfs_infop->flag; nfs_infop++) {
 		if (nfss->flags & nfs_infop->flag)
 			seq_puts(m, nfs_infop->str);
@@ -1239,10 +1243,15 @@ static int nfs_parse_mount_options(char *raw,
 		 */
 		case Opt_soft:
 			mnt->flags |= NFS_MOUNT_SOFT;
+			mnt->flags &= ~NFS_MOUNT_SOFTERR;
 			break;
-		case Opt_hard:
+		case Opt_softerr:
+			mnt->flags |= NFS_MOUNT_SOFTERR;
 			mnt->flags &= ~NFS_MOUNT_SOFT;
 			break;
+		case Opt_hard:
+			mnt->flags &= ~(NFS_MOUNT_SOFT|NFS_MOUNT_SOFTERR);
+			break;
 		case Opt_posix:
 			mnt->flags |= NFS_MOUNT_POSIX;
 			break;
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 013ac5b54a09..0fbc5d3c5e53 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -147,6 +147,7 @@ struct nfs_server {
 #define NFS_MOUNT_LEGACY_INTERFACE	0x80000
 #define NFS_MOUNT_LOCAL_FLOCK		0x100000
 #define NFS_MOUNT_LOCAL_FCNTL		0x200000
+#define NFS_MOUNT_SOFTERR		0x400000
 
 	unsigned int		caps;		/* server capabilities */
 	unsigned int		rsize;		/* read size */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 19/28] NFS: Don't interrupt file writeout due to fatal errors
  2019-03-29 21:59                                   ` [PATCH v2 18/28] NFS: Add a mount option "softerr" to allow clients to see ETIMEDOUT errors Trond Myklebust
@ 2019-03-29 21:59                                     ` Trond Myklebust
  2019-03-29 21:59                                       ` [PATCH v2 20/28] NFS: Don't call generic_error_remove_page() while holding locks Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

When flushing out dirty pages, the fact that we may hit fatal errors
is not a reason to stop writeback. Those errors are reported through
fsync(), not through the flush mechanism.

Fixes: a6598813a4c5b ("NFS: Don't write back further requests if there...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/write.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index f3ebabaa291d..9198a23bb58b 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -647,7 +647,7 @@ static int nfs_page_async_flush(struct nfs_pageio_descriptor *pgio,
 	return ret;
 out_launder:
 	nfs_write_error_remove_page(req);
-	return ret;
+	return 0;
 }
 
 static int nfs_do_writepage(struct page *page, struct writeback_control *wbc,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 20/28] NFS: Don't call generic_error_remove_page() while holding locks
  2019-03-29 21:59                                     ` [PATCH v2 19/28] NFS: Don't interrupt file writeout due to fatal errors Trond Myklebust
@ 2019-03-29 21:59                                       ` Trond Myklebust
  2019-03-29 21:59                                         ` [PATCH v2 21/28] NFS: Don't inadvertently clear writeback errors Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

The NFS read code can trigger writeback while holding the page lock.
If an error then triggers a call to nfs_write_error_remove_page(),
we can deadlock.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/write.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 9198a23bb58b..64cf6a340ba6 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -584,9 +584,8 @@ nfs_lock_and_join_requests(struct page *page)
 
 static void nfs_write_error_remove_page(struct nfs_page *req)
 {
+	SetPageError(req->wb_page);
 	nfs_end_page_writeback(req);
-	generic_error_remove_page(page_file_mapping(req->wb_page),
-				  req->wb_page);
 	nfs_release_request(req);
 }
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 21/28] NFS: Don't inadvertently clear writeback errors
  2019-03-29 21:59                                       ` [PATCH v2 20/28] NFS: Don't call generic_error_remove_page() while holding locks Trond Myklebust
@ 2019-03-29 21:59                                         ` Trond Myklebust
  2019-03-29 21:59                                           ` [PATCH v2 22/28] NFS: Replace custom error reporting mechanism with generic one Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

vfs_fsync() has the side effect of clearing unreported writeback errors,
so we need to make sure that we do not abuse it in situations where
applications might not normally expect us to report those errors.

The solution is to replace calls to vfs_fsync() with calls to nfs_wb_all().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/file.c     | 6 +++---
 fs/nfs/nfs4file.c | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index 4899b85f9b3c..f807e8643ae6 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -147,7 +147,7 @@ nfs_file_flush(struct file *file, fl_owner_t id)
 		return 0;
 
 	/* Flush writes to the server and return any errors */
-	return vfs_fsync(file, 0);
+	return nfs_wb_all(inode);
 }
 
 ssize_t
@@ -655,7 +655,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
 
 	/* Return error values */
 	if (nfs_need_check_write(file, inode)) {
-		int err = vfs_fsync(file, 0);
+		int err = nfs_wb_all(inode);
 		if (err < 0)
 			result = err;
 	}
@@ -709,7 +709,7 @@ do_unlk(struct file *filp, int cmd, struct file_lock *fl, int is_local)
 	 * Flush all pending writes before doing anything
 	 * with locks..
 	 */
-	vfs_fsync(filp, 0);
+	nfs_wb_all(inode);
 
 	l_ctx = nfs_get_lock_context(nfs_file_open_context(filp));
 	if (!IS_ERR(l_ctx)) {
diff --git a/fs/nfs/nfs4file.c b/fs/nfs/nfs4file.c
index 45b2322e092d..90bf5ab6d579 100644
--- a/fs/nfs/nfs4file.c
+++ b/fs/nfs/nfs4file.c
@@ -125,7 +125,7 @@ nfs4_file_flush(struct file *file, fl_owner_t id)
 		return filemap_fdatawrite(file->f_mapping);
 
 	/* Flush writes to the server and return any errors */
-	return vfs_fsync(file, 0);
+	return nfs_wb_all(inode);
 }
 
 #ifdef CONFIG_NFS_V4_2
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 22/28] NFS: Replace custom error reporting mechanism with generic one
  2019-03-29 21:59                                         ` [PATCH v2 21/28] NFS: Don't inadvertently clear writeback errors Trond Myklebust
@ 2019-03-29 21:59                                           ` Trond Myklebust
  2019-03-29 21:59                                             ` [PATCH v2 23/28] NFS: Fix up NFS I/O subrequest creation Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

Replace the NFS custom error reporting mechanism with the generic
mapping_set_error().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/file.c          | 25 ++++---------------------
 fs/nfs/internal.h      |  6 ------
 fs/nfs/write.c         | 37 ++++++++++++++++++++-----------------
 include/linux/nfs_fs.h |  1 -
 4 files changed, 24 insertions(+), 45 deletions(-)

diff --git a/fs/nfs/file.c b/fs/nfs/file.c
index f807e8643ae6..144e183250c3 100644
--- a/fs/nfs/file.c
+++ b/fs/nfs/file.c
@@ -199,13 +199,6 @@ EXPORT_SYMBOL_GPL(nfs_file_mmap);
  * Flush any dirty pages for this process, and check for write errors.
  * The return status from this call provides a reliable indication of
  * whether any write errors occurred for this process.
- *
- * Notice that it clears the NFS_CONTEXT_ERROR_WRITE before synching to
- * disk, but it retrieves and clears ctx->error after synching, despite
- * the two being set at the same time in nfs_context_set_write_error().
- * This is because the former is used to notify the _next_ call to
- * nfs_file_write() that a write error occurred, and hence cause it to
- * fall back to doing a synchronous write.
  */
 static int
 nfs_file_fsync_commit(struct file *file, int datasync)
@@ -220,11 +213,8 @@ nfs_file_fsync_commit(struct file *file, int datasync)
 	nfs_inc_stats(inode, NFSIOS_VFSFSYNC);
 	do_resend = test_and_clear_bit(NFS_CONTEXT_RESEND_WRITES, &ctx->flags);
 	status = nfs_commit_inode(inode, FLUSH_SYNC);
-	if (test_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags)) {
-		ret = xchg(&ctx->error, 0);
-		if (ret)
-			goto out;
-	}
+	if (status == 0)
+		status = file_check_and_advance_wb_err(file);
 	if (status < 0) {
 		ret = status;
 		goto out;
@@ -245,13 +235,7 @@ nfs_file_fsync(struct file *file, loff_t start, loff_t end, int datasync)
 	trace_nfs_fsync_enter(inode);
 
 	do {
-		struct nfs_open_context *ctx = nfs_file_open_context(file);
-		ret = filemap_write_and_wait_range(inode->i_mapping, start, end);
-		if (test_and_clear_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags)) {
-			int ret2 = xchg(&ctx->error, 0);
-			if (ret2)
-				ret = ret2;
-		}
+		ret = file_write_and_wait_range(file, start, end);
 		if (ret != 0)
 			break;
 		ret = nfs_file_fsync_commit(file, datasync);
@@ -600,8 +584,7 @@ static int nfs_need_check_write(struct file *filp, struct inode *inode)
 	struct nfs_open_context *ctx;
 
 	ctx = nfs_file_open_context(filp);
-	if (test_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags) ||
-	    nfs_ctx_key_to_expire(ctx, inode))
+	if (nfs_ctx_key_to_expire(ctx, inode))
 		return 1;
 	return 0;
 }
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 3cefd0ed01be..196534634c3a 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -773,9 +773,3 @@ static inline bool nfs_error_is_fatal(int err)
 	}
 }
 
-static inline void nfs_context_set_write_error(struct nfs_open_context *ctx, int error)
-{
-	ctx->error = error;
-	smp_wmb();
-	set_bit(NFS_CONTEXT_ERROR_WRITE, &ctx->flags);
-}
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 64cf6a340ba6..627a2b6ed1fe 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -244,6 +244,12 @@ static void nfs_set_pageerror(struct address_space *mapping)
 	nfs_zap_mapping(mapping->host, mapping);
 }
 
+static void nfs_mapping_set_error(struct page *page, int error)
+{
+	SetPageError(page);
+	mapping_set_error(page_file_mapping(page), error);
+}
+
 /*
  * nfs_page_group_search_locked
  * @head - head request of page group
@@ -582,9 +588,9 @@ nfs_lock_and_join_requests(struct page *page)
 	return ERR_PTR(ret);
 }
 
-static void nfs_write_error_remove_page(struct nfs_page *req)
+static void nfs_write_error(struct nfs_page *req, int error)
 {
-	SetPageError(req->wb_page);
+	nfs_mapping_set_error(req->wb_page, error);
 	nfs_end_page_writeback(req);
 	nfs_release_request(req);
 }
@@ -608,6 +614,7 @@ nfs_error_is_fatal_on_server(int err)
 static int nfs_page_async_flush(struct nfs_pageio_descriptor *pgio,
 				struct page *page)
 {
+	struct address_space *mapping;
 	struct nfs_page *req;
 	int ret = 0;
 
@@ -621,19 +628,19 @@ static int nfs_page_async_flush(struct nfs_pageio_descriptor *pgio,
 	nfs_set_page_writeback(page);
 	WARN_ON_ONCE(test_bit(PG_CLEAN, &req->wb_flags));
 
-	ret = req->wb_context->error;
 	/* If there is a fatal error that covers this write, just exit */
-	if (nfs_error_is_fatal_on_server(ret))
+	ret = 0;
+	mapping = page_file_mapping(page);
+	if (test_bit(AS_ENOSPC, &mapping->flags) ||
+	    test_bit(AS_EIO, &mapping->flags))
 		goto out_launder;
 
-	ret = 0;
 	if (!nfs_pageio_add_request(pgio, req)) {
 		ret = pgio->pg_error;
 		/*
 		 * Remove the problematic req upon fatal errors on the server
 		 */
 		if (nfs_error_is_fatal(ret)) {
-			nfs_context_set_write_error(req->wb_context, ret);
 			if (nfs_error_is_fatal_on_server(ret))
 				goto out_launder;
 		} else
@@ -645,7 +652,7 @@ static int nfs_page_async_flush(struct nfs_pageio_descriptor *pgio,
 out:
 	return ret;
 out_launder:
-	nfs_write_error_remove_page(req);
+	nfs_write_error(req, ret);
 	return 0;
 }
 
@@ -998,7 +1005,7 @@ static void nfs_write_completion(struct nfs_pgio_header *hdr)
 		if (test_bit(NFS_IOHDR_ERROR, &hdr->flags) &&
 		    (hdr->good_bytes < bytes)) {
 			nfs_set_pageerror(page_file_mapping(req->wb_page));
-			nfs_context_set_write_error(req->wb_context, hdr->error);
+			nfs_mapping_set_error(req->wb_page, hdr->error);
 			goto remove_req;
 		}
 		if (nfs_write_need_commit(hdr)) {
@@ -1422,14 +1429,10 @@ static void nfs_async_write_error(struct list_head *head, int error)
 	while (!list_empty(head)) {
 		req = nfs_list_entry(head->next);
 		nfs_list_remove_request(req);
-		if (nfs_error_is_fatal(error)) {
-			nfs_context_set_write_error(req->wb_context, error);
-			if (nfs_error_is_fatal_on_server(error)) {
-				nfs_write_error_remove_page(req);
-				continue;
-			}
-		}
-		nfs_redirty_request(req);
+		if (nfs_error_is_fatal(error))
+			nfs_write_error(req, error);
+		else
+			nfs_redirty_request(req);
 	}
 }
 
@@ -1843,7 +1846,7 @@ static void nfs_commit_release_pages(struct nfs_commit_data *data)
 			req->wb_bytes,
 			(long long)req_offset(req));
 		if (status < 0) {
-			nfs_context_set_write_error(req->wb_context, status);
+			nfs_mapping_set_error(req->wb_page, status);
 			if (req->wb_page)
 				nfs_inode_remove_request(req);
 			dprintk_cont(", error = %d\n", status);
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 40e30376130b..d363d5765cdf 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -76,7 +76,6 @@ struct nfs_open_context {
 	fmode_t mode;
 
 	unsigned long flags;
-#define NFS_CONTEXT_ERROR_WRITE		(0)
 #define NFS_CONTEXT_RESEND_WRITES	(1)
 #define NFS_CONTEXT_BAD			(2)
 #define NFS_CONTEXT_UNLOCK	(3)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 23/28] NFS: Fix up NFS I/O subrequest creation
  2019-03-29 21:59                                           ` [PATCH v2 22/28] NFS: Replace custom error reporting mechanism with generic one Trond Myklebust
@ 2019-03-29 21:59                                             ` Trond Myklebust
  2019-03-29 21:59                                               ` [PATCH v2 24/28] NFS: Remove unused argument from nfs_create_request() Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

We require all NFS I/O subrequests to duplicate the lock context as well
as the open context.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/pagelist.c | 93 ++++++++++++++++++++++++++++-------------------
 1 file changed, 55 insertions(+), 38 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index e9f39fa5964b..66a5c5d4a777 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -295,25 +295,13 @@ nfs_page_group_destroy(struct kref *kref)
 		nfs_release_request(head);
 }
 
-/**
- * nfs_create_request - Create an NFS read/write request.
- * @ctx: open context to use
- * @page: page to write
- * @last: last nfs request created for this page group or NULL if head
- * @offset: starting offset within the page for the write
- * @count: number of bytes to read/write
- *
- * The page must be locked by the caller. This makes sure we never
- * create two different requests for the same page.
- * User should ensure it is safe to sleep in this function.
- */
-struct nfs_page *
-nfs_create_request(struct nfs_open_context *ctx, struct page *page,
-		   struct nfs_page *last, unsigned int offset,
-		   unsigned int count)
+static struct nfs_page *
+__nfs_create_request(struct nfs_lock_context *l_ctx, struct page *page,
+		   struct nfs_page *last, unsigned int pgbase,
+		   unsigned int offset, unsigned int count)
 {
 	struct nfs_page		*req;
-	struct nfs_lock_context *l_ctx;
+	struct nfs_open_context *ctx = l_ctx->open_context;
 
 	if (test_bit(NFS_CONTEXT_BAD, &ctx->flags))
 		return ERR_PTR(-EBADF);
@@ -322,13 +310,8 @@ nfs_create_request(struct nfs_open_context *ctx, struct page *page,
 	if (req == NULL)
 		return ERR_PTR(-ENOMEM);
 
-	/* get lock context early so we can deal with alloc failures */
-	l_ctx = nfs_get_lock_context(ctx);
-	if (IS_ERR(l_ctx)) {
-		nfs_page_free(req);
-		return ERR_CAST(l_ctx);
-	}
 	req->wb_lock_context = l_ctx;
+	refcount_inc(&l_ctx->count);
 	atomic_inc(&l_ctx->io_count);
 
 	/* Initialize the request struct. Initially, we assume a
@@ -340,7 +323,7 @@ nfs_create_request(struct nfs_open_context *ctx, struct page *page,
 		get_page(page);
 	}
 	req->wb_offset  = offset;
-	req->wb_pgbase	= offset;
+	req->wb_pgbase	= pgbase;
 	req->wb_bytes   = count;
 	req->wb_context = get_nfs_open_context(ctx);
 	kref_init(&req->wb_kref);
@@ -348,6 +331,49 @@ nfs_create_request(struct nfs_open_context *ctx, struct page *page,
 	return req;
 }
 
+/**
+ * nfs_create_request - Create an NFS read/write request.
+ * @ctx: open context to use
+ * @page: page to write
+ * @last: last nfs request created for this page group or NULL if head
+ * @offset: starting offset within the page for the write
+ * @count: number of bytes to read/write
+ *
+ * The page must be locked by the caller. This makes sure we never
+ * create two different requests for the same page.
+ * User should ensure it is safe to sleep in this function.
+ */
+struct nfs_page *
+nfs_create_request(struct nfs_open_context *ctx, struct page *page,
+		   struct nfs_page *last, unsigned int offset,
+		   unsigned int count)
+{
+	struct nfs_lock_context *l_ctx = nfs_get_lock_context(ctx);
+	struct nfs_page *ret;
+
+	if (IS_ERR(l_ctx))
+		return ERR_CAST(l_ctx);
+	ret = __nfs_create_request(l_ctx, page, last, offset, offset, count);
+	nfs_put_lock_context(l_ctx);
+	return ret;
+}
+
+static struct nfs_page *
+nfs_create_subreq(struct nfs_page *req, struct nfs_page *last,
+		  unsigned int pgbase, unsigned int offset,
+		  unsigned int count)
+{
+	struct nfs_page *ret;
+
+	ret = __nfs_create_request(req->wb_lock_context, req->wb_page, last,
+			pgbase, offset, count);
+	if (!IS_ERR(ret)) {
+		nfs_lock_request(ret);
+		ret->wb_index = req->wb_index;
+	}
+	return ret;
+}
+
 /**
  * nfs_unlock_request - Unlock request and wake up sleepers.
  * @req: pointer to request
@@ -1049,14 +1075,10 @@ static int __nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
 		pgbase += subreq->wb_bytes;
 
 		if (bytes_left) {
-			subreq = nfs_create_request(req->wb_context,
-					req->wb_page,
-					subreq, pgbase, bytes_left);
+			subreq = nfs_create_subreq(req, subreq, pgbase,
+					offset, bytes_left);
 			if (IS_ERR(subreq))
 				goto err_ptr;
-			nfs_lock_request(subreq);
-			subreq->wb_offset  = offset;
-			subreq->wb_index = req->wb_index;
 		}
 	} while (bytes_left > 0);
 
@@ -1158,19 +1180,14 @@ int nfs_pageio_add_request(struct nfs_pageio_descriptor *desc,
 			     lastreq = lastreq->wb_this_page)
 				;
 
-			dupreq = nfs_create_request(req->wb_context,
-					req->wb_page, lastreq, pgbase, bytes);
+			dupreq = nfs_create_subreq(req, lastreq,
+					pgbase, offset, bytes);
 
+			nfs_page_group_unlock(req);
 			if (IS_ERR(dupreq)) {
-				nfs_page_group_unlock(req);
 				desc->pg_error = PTR_ERR(dupreq);
 				goto out_failed;
 			}
-
-			nfs_lock_request(dupreq);
-			nfs_page_group_unlock(req);
-			dupreq->wb_offset = offset;
-			dupreq->wb_index = req->wb_index;
 		} else
 			dupreq = req;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 24/28] NFS: Remove unused argument from nfs_create_request()
  2019-03-29 21:59                                             ` [PATCH v2 23/28] NFS: Fix up NFS I/O subrequest creation Trond Myklebust
@ 2019-03-29 21:59                                               ` Trond Myklebust
  2019-03-29 21:59                                                 ` [PATCH v2 25/28] pNFS: Add tracking to limit the number of pNFS retries Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

All the callers of nfs_create_request() are now creating page group
heads, so we can remove the redundant 'last' page argument.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/direct.c          |  4 ++--
 fs/nfs/pagelist.c        | 16 ++++++++--------
 fs/nfs/read.c            |  4 ++--
 fs/nfs/write.c           |  2 +-
 include/linux/nfs_page.h |  1 -
 5 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 0fd811ac08b5..2d301a1a73e2 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -492,7 +492,7 @@ static ssize_t nfs_direct_read_schedule_iovec(struct nfs_direct_req *dreq,
 			struct nfs_page *req;
 			unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
 			/* XXX do we need to do the eof zeroing found in async_filler? */
-			req = nfs_create_request(dreq->ctx, pagevec[i], NULL,
+			req = nfs_create_request(dreq->ctx, pagevec[i],
 						 pgbase, req_len);
 			if (IS_ERR(req)) {
 				result = PTR_ERR(req);
@@ -899,7 +899,7 @@ static ssize_t nfs_direct_write_schedule_iovec(struct nfs_direct_req *dreq,
 			struct nfs_page *req;
 			unsigned int req_len = min_t(size_t, bytes, PAGE_SIZE - pgbase);
 
-			req = nfs_create_request(dreq->ctx, pagevec[i], NULL,
+			req = nfs_create_request(dreq->ctx, pagevec[i],
 						 pgbase, req_len);
 			if (IS_ERR(req)) {
 				result = PTR_ERR(req);
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 66a5c5d4a777..b8301c40dd78 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -297,8 +297,8 @@ nfs_page_group_destroy(struct kref *kref)
 
 static struct nfs_page *
 __nfs_create_request(struct nfs_lock_context *l_ctx, struct page *page,
-		   struct nfs_page *last, unsigned int pgbase,
-		   unsigned int offset, unsigned int count)
+		   unsigned int pgbase, unsigned int offset,
+		   unsigned int count)
 {
 	struct nfs_page		*req;
 	struct nfs_open_context *ctx = l_ctx->open_context;
@@ -327,7 +327,6 @@ __nfs_create_request(struct nfs_lock_context *l_ctx, struct page *page,
 	req->wb_bytes   = count;
 	req->wb_context = get_nfs_open_context(ctx);
 	kref_init(&req->wb_kref);
-	nfs_page_group_init(req, last);
 	return req;
 }
 
@@ -335,7 +334,6 @@ __nfs_create_request(struct nfs_lock_context *l_ctx, struct page *page,
  * nfs_create_request - Create an NFS read/write request.
  * @ctx: open context to use
  * @page: page to write
- * @last: last nfs request created for this page group or NULL if head
  * @offset: starting offset within the page for the write
  * @count: number of bytes to read/write
  *
@@ -345,15 +343,16 @@ __nfs_create_request(struct nfs_lock_context *l_ctx, struct page *page,
  */
 struct nfs_page *
 nfs_create_request(struct nfs_open_context *ctx, struct page *page,
-		   struct nfs_page *last, unsigned int offset,
-		   unsigned int count)
+		   unsigned int offset, unsigned int count)
 {
 	struct nfs_lock_context *l_ctx = nfs_get_lock_context(ctx);
 	struct nfs_page *ret;
 
 	if (IS_ERR(l_ctx))
 		return ERR_CAST(l_ctx);
-	ret = __nfs_create_request(l_ctx, page, last, offset, offset, count);
+	ret = __nfs_create_request(l_ctx, page, offset, offset, count);
+	if (!IS_ERR(ret))
+		nfs_page_group_init(ret, NULL);
 	nfs_put_lock_context(l_ctx);
 	return ret;
 }
@@ -365,11 +364,12 @@ nfs_create_subreq(struct nfs_page *req, struct nfs_page *last,
 {
 	struct nfs_page *ret;
 
-	ret = __nfs_create_request(req->wb_lock_context, req->wb_page, last,
+	ret = __nfs_create_request(req->wb_lock_context, req->wb_page,
 			pgbase, offset, count);
 	if (!IS_ERR(ret)) {
 		nfs_lock_request(ret);
 		ret->wb_index = req->wb_index;
+		nfs_page_group_init(ret, last);
 	}
 	return ret;
 }
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index 1d95a60b2586..fad1333dbf71 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -118,7 +118,7 @@ int nfs_readpage_async(struct nfs_open_context *ctx, struct inode *inode,
 	len = nfs_page_length(page);
 	if (len == 0)
 		return nfs_return_empty_page(page);
-	new = nfs_create_request(ctx, page, NULL, 0, len);
+	new = nfs_create_request(ctx, page, 0, len);
 	if (IS_ERR(new)) {
 		unlock_page(page);
 		return PTR_ERR(new);
@@ -363,7 +363,7 @@ readpage_async_filler(void *data, struct page *page)
 	if (len == 0)
 		return nfs_return_empty_page(page);
 
-	new = nfs_create_request(desc->ctx, page, NULL, 0, len);
+	new = nfs_create_request(desc->ctx, page, 0, len);
 	if (IS_ERR(new))
 		goto out_error;
 
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 627a2b6ed1fe..0712d886ff08 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1171,7 +1171,7 @@ static struct nfs_page * nfs_setup_write_request(struct nfs_open_context* ctx,
 	req = nfs_try_to_update_request(inode, page, offset, bytes);
 	if (req != NULL)
 		goto out;
-	req = nfs_create_request(ctx, page, NULL, offset, bytes);
+	req = nfs_create_request(ctx, page, offset, bytes);
 	if (IS_ERR(req))
 		goto out;
 	nfs_inode_add_request(inode, req);
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index ad69430fd0eb..b7d0f15615c2 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -114,7 +114,6 @@ struct nfs_pageio_descriptor {
 
 extern	struct nfs_page *nfs_create_request(struct nfs_open_context *ctx,
 					    struct page *page,
-					    struct nfs_page *last,
 					    unsigned int offset,
 					    unsigned int count);
 extern	void nfs_release_request(struct nfs_page *);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 25/28] pNFS: Add tracking to limit the number of pNFS retries
  2019-03-29 21:59                                               ` [PATCH v2 24/28] NFS: Remove unused argument from nfs_create_request() Trond Myklebust
@ 2019-03-29 21:59                                                 ` Trond Myklebust
  2019-03-29 21:59                                                   ` [PATCH v2 26/28] NFS: Allow signal interruption of NFS4ERR_DELAYed operations Trond Myklebust
  2019-04-01 16:27                                                   ` [PATCH v2 25/28] pNFS: Add tracking to limit the number of pNFS retries Olga Kornievskaia
  0 siblings, 2 replies; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

When the client is reading or writing using pNFS, and hits an error
on the DS, then it typically sends a LAYOUTERROR and/or LAYOUTRETURN
to the MDS, before redirtying the failed pages, and going for a new
round of reads/writebacks. The problem is that if the server has no
way to fix the DS, then we may need a way to interrupt this loop
after a set number of attempts have been made.
This patch adds an optional module parameter that allows the admin
to specify how many times to retry the read/writeback process before
failing with a fatal error.
The default behaviour is to retry forever.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/direct.c                        |  7 +++++++
 fs/nfs/flexfilelayout/flexfilelayout.c |  8 ++++++++
 fs/nfs/pagelist.c                      | 14 +++++++++++++-
 fs/nfs/write.c                         |  5 +++++
 include/linux/nfs_page.h               |  4 +++-
 5 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
index 2d301a1a73e2..2436bd92bc00 100644
--- a/fs/nfs/direct.c
+++ b/fs/nfs/direct.c
@@ -663,6 +663,8 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq)
 	}
 
 	list_for_each_entry_safe(req, tmp, &reqs, wb_list) {
+		/* Bump the transmission count */
+		req->wb_nio++;
 		if (!nfs_pageio_add_request(&desc, req)) {
 			nfs_list_move_request(req, &failed);
 			spin_lock(&cinfo.inode->i_lock);
@@ -703,6 +705,11 @@ static void nfs_direct_commit_complete(struct nfs_commit_data *data)
 		req = nfs_list_entry(data->pages.next);
 		nfs_list_remove_request(req);
 		if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES) {
+			/*
+			 * Despite the reboot, the write was successful,
+			 * so reset wb_nio.
+			 */
+			req->wb_nio = 0;
 			/* Note the rewrite will go through mds */
 			nfs_mark_request_commit(req, NULL, &cinfo, 0);
 		} else
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 6673d4ff5a2a..9fdbcfd3e39d 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -28,6 +28,8 @@
 #define FF_LAYOUT_POLL_RETRY_MAX     (15*HZ)
 #define FF_LAYOUTRETURN_MAXERR 20
 
+static unsigned short io_maxretrans;
+
 static void ff_layout_read_record_layoutstats_done(struct rpc_task *task,
 		struct nfs_pgio_header *hdr);
 static int ff_layout_mirror_prepare_stats(struct pnfs_layout_hdr *lo,
@@ -925,6 +927,7 @@ ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
 	pgm = &pgio->pg_mirrors[0];
 	pgm->pg_bsize = mirror->mirror_ds->ds_versions[0].rsize;
 
+	pgio->pg_maxretrans = io_maxretrans;
 	return;
 out_nolseg:
 	if (pgio->pg_error < 0)
@@ -992,6 +995,7 @@ ff_layout_pg_init_write(struct nfs_pageio_descriptor *pgio,
 		pgm->pg_bsize = mirror->mirror_ds->ds_versions[0].wsize;
 	}
 
+	pgio->pg_maxretrans = io_maxretrans;
 	return;
 
 out_mds:
@@ -2515,3 +2519,7 @@ MODULE_DESCRIPTION("The NFSv4 flexfile layout driver");
 
 module_init(nfs4flexfilelayout_init);
 module_exit(nfs4flexfilelayout_exit);
+
+module_param(io_maxretrans, ushort, 0644);
+MODULE_PARM_DESC(io_maxretrans, "The  number of times the NFSv4.1 client "
+			"retries an I/O request before returning an error. ");
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index b8301c40dd78..4a31284f411e 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -16,8 +16,8 @@
 #include <linux/nfs.h>
 #include <linux/nfs3.h>
 #include <linux/nfs4.h>
-#include <linux/nfs_page.h>
 #include <linux/nfs_fs.h>
+#include <linux/nfs_page.h>
 #include <linux/nfs_mount.h>
 #include <linux/export.h>
 
@@ -327,6 +327,7 @@ __nfs_create_request(struct nfs_lock_context *l_ctx, struct page *page,
 	req->wb_bytes   = count;
 	req->wb_context = get_nfs_open_context(ctx);
 	kref_init(&req->wb_kref);
+	req->wb_nio = 0;
 	return req;
 }
 
@@ -370,6 +371,7 @@ nfs_create_subreq(struct nfs_page *req, struct nfs_page *last,
 		nfs_lock_request(ret);
 		ret->wb_index = req->wb_index;
 		nfs_page_group_init(ret, last);
+		ret->wb_nio = req->wb_nio;
 	}
 	return ret;
 }
@@ -724,6 +726,7 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
 	desc->pg_mirrors_dynamic = NULL;
 	desc->pg_mirrors = desc->pg_mirrors_static;
 	nfs_pageio_mirror_init(&desc->pg_mirrors[0], bsize);
+	desc->pg_maxretrans = 0;
 }
 
 /**
@@ -983,6 +986,15 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
 			return 0;
 		mirror->pg_base = req->wb_pgbase;
 	}
+
+	if (desc->pg_maxretrans && req->wb_nio > desc->pg_maxretrans) {
+		if (NFS_SERVER(desc->pg_inode)->flags & NFS_MOUNT_SOFTERR)
+			desc->pg_error = -ETIMEDOUT;
+		else
+			desc->pg_error = -EIO;
+		return 0;
+	}
+
 	if (!nfs_can_coalesce_requests(prev, req, desc))
 		return 0;
 	nfs_list_move_request(req, &mirror->pg_list);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 0712d886ff08..908b166d635d 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1009,6 +1009,8 @@ static void nfs_write_completion(struct nfs_pgio_header *hdr)
 			goto remove_req;
 		}
 		if (nfs_write_need_commit(hdr)) {
+			/* Reset wb_nio, since the write was successful. */
+			req->wb_nio = 0;
 			memcpy(&req->wb_verf, &hdr->verf.verifier, sizeof(req->wb_verf));
 			nfs_mark_request_commit(req, hdr->lseg, &cinfo,
 				hdr->pgio_mirror_idx);
@@ -1142,6 +1144,7 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
 		req->wb_bytes = end - req->wb_offset;
 	else
 		req->wb_bytes = rqend - req->wb_offset;
+	req->wb_nio = 0;
 	return req;
 out_flushme:
 	/*
@@ -1416,6 +1419,8 @@ static void nfs_initiate_write(struct nfs_pgio_header *hdr,
  */
 static void nfs_redirty_request(struct nfs_page *req)
 {
+	/* Bump the transmission count */
+	req->wb_nio++;
 	nfs_mark_request_dirty(req);
 	set_bit(NFS_CONTEXT_RESEND_WRITES, &req->wb_context->flags);
 	nfs_end_page_writeback(req);
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index b7d0f15615c2..8b36800d342d 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -53,6 +53,7 @@ struct nfs_page {
 	struct nfs_write_verifier	wb_verf;	/* Commit cookie */
 	struct nfs_page		*wb_this_page;  /* list of reqs for this page */
 	struct nfs_page		*wb_head;       /* head pointer for req list */
+	unsigned short		wb_nio;		/* Number of I/O attempts */
 };
 
 struct nfs_pageio_descriptor;
@@ -87,7 +88,6 @@ struct nfs_pgio_mirror {
 };
 
 struct nfs_pageio_descriptor {
-	unsigned char		pg_moreio : 1;
 	struct inode		*pg_inode;
 	const struct nfs_pageio_ops *pg_ops;
 	const struct nfs_rw_ops *pg_rw_ops;
@@ -105,6 +105,8 @@ struct nfs_pageio_descriptor {
 	struct nfs_pgio_mirror	pg_mirrors_static[1];
 	struct nfs_pgio_mirror	*pg_mirrors_dynamic;
 	u32			pg_mirror_idx;	/* current mirror */
+	unsigned short		pg_maxretrans;
+	unsigned char		pg_moreio : 1;
 };
 
 /* arbitrarily selected limit to number of mirrors */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 26/28] NFS: Allow signal interruption of NFS4ERR_DELAYed operations
  2019-03-29 21:59                                                 ` [PATCH v2 25/28] pNFS: Add tracking to limit the number of pNFS retries Trond Myklebust
@ 2019-03-29 21:59                                                   ` Trond Myklebust
  2019-03-29 21:59                                                     ` [PATCH v2 27/28] NFS: Add a helper to return a pointer to the open context of a struct nfs_page Trond Myklebust
  2019-04-01 16:27                                                   ` [PATCH v2 25/28] pNFS: Add tracking to limit the number of pNFS retries Olga Kornievskaia
  1 sibling, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

If the server is unable to immediately execute an RPC call, and returns
an NFS4ERR_DELAY then we can assume it is safe to interrupt the operation
in order to handle ordinary signals. This allows the application to
service timer interrupts that would otherwise have to wait until the
server is again able to respond.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs4_fs.h  |   1 +
 fs/nfs/nfs4proc.c | 139 ++++++++++++++++++++++++++++++++++------------
 2 files changed, 106 insertions(+), 34 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 06ac3d9ac7c6..8a38a254f516 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -206,6 +206,7 @@ struct nfs4_exception {
 	unsigned char delay : 1,
 		      recovering : 1,
 		      retry : 1;
+	bool interruptible;
 };
 
 struct nfs4_state_recovery_ops {
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 96b988689f0e..c29cbef6b53f 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -400,17 +400,32 @@ static long nfs4_update_delay(long *timeout)
 	return ret;
 }
 
-static int nfs4_delay(struct rpc_clnt *clnt, long *timeout)
+static int nfs4_delay_killable(long *timeout)
 {
-	int res = 0;
-
 	might_sleep();
 
 	freezable_schedule_timeout_killable_unsafe(
 		nfs4_update_delay(timeout));
-	if (fatal_signal_pending(current))
-		res = -ERESTARTSYS;
-	return res;
+	if (!__fatal_signal_pending(current))
+		return 0;
+	return -EINTR;
+}
+
+static int nfs4_delay_interruptible(long *timeout)
+{
+	might_sleep();
+
+	freezable_schedule_timeout_interruptible(nfs4_update_delay(timeout));
+	if (!signal_pending(current))
+		return 0;
+	return __fatal_signal_pending(current) ? -EINTR :-ERESTARTSYS;
+}
+
+static int nfs4_delay(long *timeout, bool interruptible)
+{
+	if (interruptible)
+		return nfs4_delay_interruptible(timeout);
+	return nfs4_delay_killable(timeout);
 }
 
 /* This is the error handling routine for processes that are allowed
@@ -546,7 +561,8 @@ int nfs4_handle_exception(struct nfs_server *server, int errorcode, struct nfs4_
 
 	ret = nfs4_do_handle_exception(server, errorcode, exception);
 	if (exception->delay) {
-		ret = nfs4_delay(server->client, &exception->timeout);
+		ret = nfs4_delay(&exception->timeout,
+				exception->interruptible);
 		goto out_retry;
 	}
 	if (exception->recovering) {
@@ -3066,7 +3082,9 @@ static struct nfs4_state *nfs4_do_open(struct inode *dir,
 					int *opened)
 {
 	struct nfs_server *server = NFS_SERVER(dir);
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	struct nfs4_state *res;
 	struct nfs4_open_createattrs c = {
 		.label = label,
@@ -3679,7 +3697,9 @@ static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *f
 
 int nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *fhandle)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	int err;
 	do {
 		err = nfs4_handle_exception(server,
@@ -3721,7 +3741,9 @@ static int _nfs4_lookup_root(struct nfs_server *server, struct nfs_fh *fhandle,
 static int nfs4_lookup_root(struct nfs_server *server, struct nfs_fh *fhandle,
 		struct nfs_fsinfo *info)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	int err;
 	do {
 		err = _nfs4_lookup_root(server, fhandle, info);
@@ -3948,7 +3970,9 @@ static int nfs4_proc_getattr(struct nfs_server *server, struct nfs_fh *fhandle,
 				struct nfs_fattr *fattr, struct nfs4_label *label,
 				struct inode *inode)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	int err;
 	do {
 		err = _nfs4_proc_getattr(server, fhandle, fattr, label, inode);
@@ -4071,7 +4095,9 @@ static int nfs4_proc_lookup_common(struct rpc_clnt **clnt, struct inode *dir,
 				   const struct qstr *name, struct nfs_fh *fhandle,
 				   struct nfs_fattr *fattr, struct nfs4_label *label)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	struct rpc_clnt *client = *clnt;
 	int err;
 	do {
@@ -4175,7 +4201,9 @@ static int _nfs4_proc_lookupp(struct inode *inode,
 static int nfs4_proc_lookupp(struct inode *inode, struct nfs_fh *fhandle,
 			     struct nfs_fattr *fattr, struct nfs4_label *label)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	int err;
 	do {
 		err = _nfs4_proc_lookupp(inode, fhandle, fattr, label);
@@ -4222,7 +4250,9 @@ static int _nfs4_proc_access(struct inode *inode, struct nfs_access_entry *entry
 
 static int nfs4_proc_access(struct inode *inode, struct nfs_access_entry *entry)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	int err;
 	do {
 		err = _nfs4_proc_access(inode, entry);
@@ -4277,7 +4307,9 @@ static int _nfs4_proc_readlink(struct inode *inode, struct page *page,
 static int nfs4_proc_readlink(struct inode *inode, struct page *page,
 		unsigned int pgbase, unsigned int pglen)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	int err;
 	do {
 		err = _nfs4_proc_readlink(inode, page, pgbase, pglen);
@@ -4353,7 +4385,9 @@ _nfs4_proc_remove(struct inode *dir, const struct qstr *name, u32 ftype)
 
 static int nfs4_proc_remove(struct inode *dir, struct dentry *dentry)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	struct inode *inode = d_inode(dentry);
 	int err;
 
@@ -4374,7 +4408,9 @@ static int nfs4_proc_remove(struct inode *dir, struct dentry *dentry)
 
 static int nfs4_proc_rmdir(struct inode *dir, const struct qstr *name)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	int err;
 
 	do {
@@ -4533,7 +4569,9 @@ static int _nfs4_proc_link(struct inode *inode, struct inode *dir, const struct
 
 static int nfs4_proc_link(struct inode *inode, struct inode *dir, const struct qstr *name)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	int err;
 	do {
 		err = nfs4_handle_exception(NFS_SERVER(inode),
@@ -4640,7 +4678,9 @@ static int _nfs4_proc_symlink(struct inode *dir, struct dentry *dentry,
 static int nfs4_proc_symlink(struct inode *dir, struct dentry *dentry,
 		struct page *page, unsigned int len, struct iattr *sattr)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	struct nfs4_label l, *label = NULL;
 	int err;
 
@@ -4679,7 +4719,9 @@ static int nfs4_proc_mkdir(struct inode *dir, struct dentry *dentry,
 		struct iattr *sattr)
 {
 	struct nfs_server *server = NFS_SERVER(dir);
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	struct nfs4_label l, *label = NULL;
 	int err;
 
@@ -4739,7 +4781,9 @@ static int _nfs4_proc_readdir(struct dentry *dentry, const struct cred *cred,
 static int nfs4_proc_readdir(struct dentry *dentry, const struct cred *cred,
 		u64 cookie, struct page **pages, unsigned int count, bool plus)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	int err;
 	do {
 		err = _nfs4_proc_readdir(dentry, cred, cookie,
@@ -4790,7 +4834,9 @@ static int nfs4_proc_mknod(struct inode *dir, struct dentry *dentry,
 		struct iattr *sattr, dev_t rdev)
 {
 	struct nfs_server *server = NFS_SERVER(dir);
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	struct nfs4_label l, *label = NULL;
 	int err;
 
@@ -4832,7 +4878,9 @@ static int _nfs4_proc_statfs(struct nfs_server *server, struct nfs_fh *fhandle,
 
 static int nfs4_proc_statfs(struct nfs_server *server, struct nfs_fh *fhandle, struct nfs_fsstat *fsstat)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	int err;
 	do {
 		err = nfs4_handle_exception(server,
@@ -4863,7 +4911,9 @@ static int _nfs4_do_fsinfo(struct nfs_server *server, struct nfs_fh *fhandle,
 
 static int nfs4_do_fsinfo(struct nfs_server *server, struct nfs_fh *fhandle, struct nfs_fsinfo *fsinfo)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	unsigned long now = jiffies;
 	int err;
 
@@ -4925,7 +4975,9 @@ static int _nfs4_proc_pathconf(struct nfs_server *server, struct nfs_fh *fhandle
 static int nfs4_proc_pathconf(struct nfs_server *server, struct nfs_fh *fhandle,
 		struct nfs_pathconf *pathconf)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	int err;
 
 	do {
@@ -5494,7 +5546,9 @@ static ssize_t __nfs4_get_acl_uncached(struct inode *inode, void *buf, size_t bu
 
 static ssize_t nfs4_get_acl_uncached(struct inode *inode, void *buf, size_t buflen)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	ssize_t ret;
 	do {
 		ret = __nfs4_get_acl_uncached(inode, buf, buflen);
@@ -5628,7 +5682,9 @@ static int _nfs4_get_security_label(struct inode *inode, void *buf,
 static int nfs4_get_security_label(struct inode *inode, void *buf,
 					size_t buflen)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	int err;
 
 	if (!nfs_server_capable(inode, NFS_CAP_SECURITY_LABEL))
@@ -6269,7 +6325,9 @@ static int _nfs4_proc_getlk(struct nfs4_state *state, int cmd, struct file_lock
 
 static int nfs4_proc_getlk(struct nfs4_state *state, int cmd, struct file_lock *request)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	int err;
 
 	do {
@@ -6833,6 +6891,7 @@ static int nfs4_proc_setlk(struct nfs4_state *state, int cmd, struct file_lock *
 	struct nfs4_exception exception = {
 		.state = state,
 		.inode = state->inode,
+		.interruptible = true,
 	};
 	int err;
 
@@ -7246,7 +7305,9 @@ int nfs4_proc_fs_locations(struct rpc_clnt *client, struct inode *dir,
 			   struct nfs4_fs_locations *fs_locations,
 			   struct page *page)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	int err;
 	do {
 		err = _nfs4_proc_fs_locations(client, dir, name,
@@ -7389,7 +7450,9 @@ int nfs4_proc_get_locations(struct inode *inode,
 	struct nfs_client *clp = server->nfs_client;
 	const struct nfs4_mig_recovery_ops *ops =
 					clp->cl_mvops->mig_recovery_ops;
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	int status;
 
 	dprintk("%s: FSID %llx:%llx on \"%s\"\n", __func__,
@@ -7513,7 +7576,9 @@ int nfs4_proc_fsid_present(struct inode *inode, const struct cred *cred)
 	struct nfs_client *clp = server->nfs_client;
 	const struct nfs4_mig_recovery_ops *ops =
 					clp->cl_mvops->mig_recovery_ops;
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	int status;
 
 	dprintk("%s: FSID %llx:%llx on \"%s\"\n", __func__,
@@ -7579,7 +7644,9 @@ static int _nfs4_proc_secinfo(struct inode *dir, const struct qstr *name, struct
 int nfs4_proc_secinfo(struct inode *dir, const struct qstr *name,
 		      struct nfs4_secinfo_flavors *flavors)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	int err;
 	do {
 		err = -NFS4ERR_WRONGSEC;
@@ -9269,7 +9336,9 @@ static int
 nfs41_proc_secinfo_no_name(struct nfs_server *server, struct nfs_fh *fhandle,
 			   struct nfs_fsinfo *info, struct nfs4_secinfo_flavors *flavors)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	int err;
 	do {
 		/* first try using integrity protection */
@@ -9436,7 +9505,9 @@ static int nfs41_test_stateid(struct nfs_server *server,
 		nfs4_stateid *stateid,
 		const struct cred *cred)
 {
-	struct nfs4_exception exception = { };
+	struct nfs4_exception exception = {
+		.interruptible = true,
+	};
 	int err;
 	do {
 		err = _nfs41_test_stateid(server, stateid, cred);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 27/28] NFS: Add a helper to return a pointer to the open context of a struct nfs_page
  2019-03-29 21:59                                                   ` [PATCH v2 26/28] NFS: Allow signal interruption of NFS4ERR_DELAYed operations Trond Myklebust
@ 2019-03-29 21:59                                                     ` Trond Myklebust
  2019-03-29 21:59                                                       ` [PATCH v2 28/28] NFS: Remove redundant open context from nfs_page Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

Add a helper for when we remove the explicit pointer to the open
context.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/filelayout/filelayout.c         |  4 ++--
 fs/nfs/flexfilelayout/flexfilelayout.c |  6 +++---
 fs/nfs/pagelist.c                      |  8 ++++----
 fs/nfs/pnfs.c                          |  4 ++--
 fs/nfs/pnfs.h                          |  4 ++--
 fs/nfs/read.c                          |  2 +-
 fs/nfs/write.c                         | 20 +++++++++++---------
 include/linux/nfs_page.h               |  6 ++++++
 8 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 61f46facb39c..21d9f3bfbc81 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -917,7 +917,7 @@ filelayout_pg_init_read(struct nfs_pageio_descriptor *pgio,
 	pnfs_generic_pg_check_layout(pgio);
 	if (!pgio->pg_lseg) {
 		pgio->pg_lseg = fl_pnfs_update_layout(pgio->pg_inode,
-						      req->wb_context,
+						      nfs_req_openctx(req),
 						      0,
 						      NFS4_MAX_UINT64,
 						      IOMODE_READ,
@@ -944,7 +944,7 @@ filelayout_pg_init_write(struct nfs_pageio_descriptor *pgio,
 	pnfs_generic_pg_check_layout(pgio);
 	if (!pgio->pg_lseg) {
 		pgio->pg_lseg = fl_pnfs_update_layout(pgio->pg_inode,
-						      req->wb_context,
+						      nfs_req_openctx(req),
 						      0,
 						      NFS4_MAX_UINT64,
 						      IOMODE_RW,
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 9fdbcfd3e39d..9920c52bd0cd 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -873,7 +873,7 @@ ff_layout_pg_get_read(struct nfs_pageio_descriptor *pgio,
 {
 	pnfs_put_lseg(pgio->pg_lseg);
 	pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
-					   req->wb_context,
+					   nfs_req_openctx(req),
 					   0,
 					   NFS4_MAX_UINT64,
 					   IOMODE_READ,
@@ -953,7 +953,7 @@ ff_layout_pg_init_write(struct nfs_pageio_descriptor *pgio,
 	pnfs_generic_pg_check_layout(pgio);
 	if (!pgio->pg_lseg) {
 		pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
-						   req->wb_context,
+						   nfs_req_openctx(req),
 						   0,
 						   NFS4_MAX_UINT64,
 						   IOMODE_RW,
@@ -1010,7 +1010,7 @@ ff_layout_pg_get_mirror_count_write(struct nfs_pageio_descriptor *pgio,
 {
 	if (!pgio->pg_lseg) {
 		pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
-						   req->wb_context,
+						   nfs_req_openctx(req),
 						   0,
 						   NFS4_MAX_UINT64,
 						   IOMODE_RW,
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 4a31284f411e..ce6440b79328 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -47,7 +47,7 @@ void nfs_pgheader_init(struct nfs_pageio_descriptor *desc,
 
 	hdr->req = nfs_list_entry(mirror->pg_list.next);
 	hdr->inode = desc->pg_inode;
-	hdr->cred = hdr->req->wb_context->cred;
+	hdr->cred = nfs_req_openctx(hdr->req)->cred;
 	hdr->io_start = req_offset(hdr->req);
 	hdr->good_bytes = mirror->pg_count;
 	hdr->io_completion = desc->pg_io_completion;
@@ -578,7 +578,7 @@ static void nfs_pgio_rpcsetup(struct nfs_pgio_header *hdr,
 	hdr->args.pgbase = req->wb_pgbase;
 	hdr->args.pages  = hdr->page_array.pagevec;
 	hdr->args.count  = count;
-	hdr->args.context = get_nfs_open_context(req->wb_context);
+	hdr->args.context = get_nfs_open_context(nfs_req_openctx(req));
 	hdr->args.lock_context = req->wb_lock_context;
 	hdr->args.stable  = NFS_UNSTABLE;
 	switch (how & (FLUSH_STABLE | FLUSH_COND_STABLE)) {
@@ -935,9 +935,9 @@ static bool nfs_can_coalesce_requests(struct nfs_page *prev,
 	struct file_lock_context *flctx;
 
 	if (prev) {
-		if (!nfs_match_open_context(req->wb_context, prev->wb_context))
+		if (!nfs_match_open_context(nfs_req_openctx(req), nfs_req_openctx(prev)))
 			return false;
-		flctx = d_inode(req->wb_context->dentry)->i_flctx;
+		flctx = d_inode(nfs_req_openctx(req)->dentry)->i_flctx;
 		if (flctx != NULL &&
 		    !(list_empty_careful(&flctx->flc_posix) &&
 		      list_empty_careful(&flctx->flc_flock)) &&
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 7066cd7c7aff..83722e936b4a 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -2436,7 +2436,7 @@ pnfs_generic_pg_init_read(struct nfs_pageio_descriptor *pgio, struct nfs_page *r
 			rd_size = nfs_dreq_bytes_left(pgio->pg_dreq);
 
 		pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
-						   req->wb_context,
+						   nfs_req_openctx(req),
 						   req_offset(req),
 						   rd_size,
 						   IOMODE_READ,
@@ -2463,7 +2463,7 @@ pnfs_generic_pg_init_write(struct nfs_pageio_descriptor *pgio,
 	pnfs_generic_pg_check_range(pgio, req);
 	if (pgio->pg_lseg == NULL) {
 		pgio->pg_lseg = pnfs_update_layout(pgio->pg_inode,
-						   req->wb_context,
+						   nfs_req_openctx(req),
 						   req_offset(req),
 						   wb_size,
 						   IOMODE_RW,
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index c0420b979d88..f15609c003d8 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -459,7 +459,7 @@ static inline bool
 pnfs_mark_request_commit(struct nfs_page *req, struct pnfs_layout_segment *lseg,
 			 struct nfs_commit_info *cinfo, u32 ds_commit_idx)
 {
-	struct inode *inode = d_inode(req->wb_context->dentry);
+	struct inode *inode = d_inode(nfs_req_openctx(req)->dentry);
 	struct pnfs_layoutdriver_type *ld = NFS_SERVER(inode)->pnfs_curr_ld;
 
 	if (lseg == NULL || ld->mark_request_commit == NULL)
@@ -471,7 +471,7 @@ pnfs_mark_request_commit(struct nfs_page *req, struct pnfs_layout_segment *lseg,
 static inline bool
 pnfs_clear_request_commit(struct nfs_page *req, struct nfs_commit_info *cinfo)
 {
-	struct inode *inode = d_inode(req->wb_context->dentry);
+	struct inode *inode = d_inode(nfs_req_openctx(req)->dentry);
 	struct pnfs_layoutdriver_type *ld = NFS_SERVER(inode)->pnfs_curr_ld;
 
 	if (ld == NULL || ld->clear_request_commit == NULL)
diff --git a/fs/nfs/read.c b/fs/nfs/read.c
index fad1333dbf71..c799e540ed1e 100644
--- a/fs/nfs/read.c
+++ b/fs/nfs/read.c
@@ -92,7 +92,7 @@ EXPORT_SYMBOL_GPL(nfs_pageio_reset_read_mds);
 
 static void nfs_readpage_release(struct nfs_page *req)
 {
-	struct inode *inode = d_inode(req->wb_context->dentry);
+	struct inode *inode = d_inode(nfs_req_openctx(req)->dentry);
 
 	dprintk("NFS: read done (%s/%llu %d@%lld)\n", inode->i_sb->s_id,
 		(unsigned long long)NFS_FILEID(inode), req->wb_bytes,
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 908b166d635d..76688df8827f 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -964,7 +964,8 @@ static void
 nfs_clear_request_commit(struct nfs_page *req)
 {
 	if (test_bit(PG_CLEAN, &req->wb_flags)) {
-		struct inode *inode = d_inode(req->wb_context->dentry);
+		struct nfs_open_context *ctx = nfs_req_openctx(req);
+		struct inode *inode = d_inode(ctx->dentry);
 		struct nfs_commit_info cinfo;
 
 		nfs_init_cinfo_from_inode(&cinfo, inode);
@@ -1219,7 +1220,7 @@ int nfs_flush_incompatible(struct file *file, struct page *page)
 			return 0;
 		l_ctx = req->wb_lock_context;
 		do_flush = req->wb_page != page ||
-			!nfs_match_open_context(req->wb_context, ctx);
+			!nfs_match_open_context(nfs_req_openctx(req), ctx);
 		if (l_ctx && flctx &&
 		    !(list_empty_careful(&flctx->flc_posix) &&
 		      list_empty_careful(&flctx->flc_flock))) {
@@ -1422,7 +1423,7 @@ static void nfs_redirty_request(struct nfs_page *req)
 	/* Bump the transmission count */
 	req->wb_nio++;
 	nfs_mark_request_dirty(req);
-	set_bit(NFS_CONTEXT_RESEND_WRITES, &req->wb_context->flags);
+	set_bit(NFS_CONTEXT_RESEND_WRITES, &nfs_req_openctx(req)->flags);
 	nfs_end_page_writeback(req);
 	nfs_release_request(req);
 }
@@ -1742,7 +1743,8 @@ void nfs_init_commit(struct nfs_commit_data *data,
 		     struct nfs_commit_info *cinfo)
 {
 	struct nfs_page *first = nfs_list_entry(head->next);
-	struct inode *inode = d_inode(first->wb_context->dentry);
+	struct nfs_open_context *ctx = nfs_req_openctx(first);
+	struct inode *inode = d_inode(ctx->dentry);
 
 	/* Set up the RPC argument and reply structs
 	 * NB: take care not to mess about with data->commit et al. */
@@ -1750,7 +1752,7 @@ void nfs_init_commit(struct nfs_commit_data *data,
 	list_splice_init(head, &data->pages);
 
 	data->inode	  = inode;
-	data->cred	  = first->wb_context->cred;
+	data->cred	  = ctx->cred;
 	data->lseg	  = lseg; /* reference transferred */
 	/* only set lwb for pnfs commit */
 	if (lseg)
@@ -1763,7 +1765,7 @@ void nfs_init_commit(struct nfs_commit_data *data,
 	/* Note: we always request a commit of the entire inode */
 	data->args.offset = 0;
 	data->args.count  = 0;
-	data->context     = get_nfs_open_context(first->wb_context);
+	data->context     = get_nfs_open_context(ctx);
 	data->res.fattr   = &data->fattr;
 	data->res.verf    = &data->verf;
 	nfs_fattr_init(&data->fattr);
@@ -1846,8 +1848,8 @@ static void nfs_commit_release_pages(struct nfs_commit_data *data)
 			nfs_clear_page_commit(req->wb_page);
 
 		dprintk("NFS:       commit (%s/%llu %d@%lld)",
-			req->wb_context->dentry->d_sb->s_id,
-			(unsigned long long)NFS_FILEID(d_inode(req->wb_context->dentry)),
+			nfs_req_openctx(req)->dentry->d_sb->s_id,
+			(unsigned long long)NFS_FILEID(d_inode(nfs_req_openctx(req)->dentry)),
 			req->wb_bytes,
 			(long long)req_offset(req));
 		if (status < 0) {
@@ -1870,7 +1872,7 @@ static void nfs_commit_release_pages(struct nfs_commit_data *data)
 		/* We have a mismatch. Write the page again */
 		dprintk_cont(" mismatch\n");
 		nfs_mark_request_dirty(req);
-		set_bit(NFS_CONTEXT_RESEND_WRITES, &req->wb_context->flags);
+		set_bit(NFS_CONTEXT_RESEND_WRITES, &nfs_req_openctx(req)->flags);
 	next:
 		nfs_unlock_and_release_request(req);
 		/* Latency breaker */
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 8b36800d342d..1ea13e94feb7 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -200,4 +200,10 @@ loff_t req_offset(struct nfs_page *req)
 	return (((loff_t)req->wb_index) << PAGE_SHIFT) + req->wb_offset;
 }
 
+static inline struct nfs_open_context *
+nfs_req_openctx(struct nfs_page *req)
+{
+	return req->wb_context;
+}
+
 #endif /* _LINUX_NFS_PAGE_H */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* [PATCH v2 28/28] NFS: Remove redundant open context from nfs_page
  2019-03-29 21:59                                                     ` [PATCH v2 27/28] NFS: Add a helper to return a pointer to the open context of a struct nfs_page Trond Myklebust
@ 2019-03-29 21:59                                                       ` Trond Myklebust
  2019-04-05 19:42                                                         ` Anna Schumaker
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-03-29 21:59 UTC (permalink / raw)
  To: linux-nfs

The lock context already references and tracks the open context, so
take the opportunity to save some space in struct nfs_page.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/pagelist.c        | 7 +------
 include/linux/nfs_page.h | 3 +--
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index ce6440b79328..5d5ac5df93e2 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -325,7 +325,6 @@ __nfs_create_request(struct nfs_lock_context *l_ctx, struct page *page,
 	req->wb_offset  = offset;
 	req->wb_pgbase	= pgbase;
 	req->wb_bytes   = count;
-	req->wb_context = get_nfs_open_context(ctx);
 	kref_init(&req->wb_kref);
 	req->wb_nio = 0;
 	return req;
@@ -414,8 +413,8 @@ void nfs_unlock_and_release_request(struct nfs_page *req)
 static void nfs_clear_request(struct nfs_page *req)
 {
 	struct page *page = req->wb_page;
-	struct nfs_open_context *ctx = req->wb_context;
 	struct nfs_lock_context *l_ctx = req->wb_lock_context;
+	struct nfs_open_context *ctx = l_ctx->open_context;
 
 	if (page != NULL) {
 		put_page(page);
@@ -430,10 +429,6 @@ static void nfs_clear_request(struct nfs_page *req)
 		nfs_put_lock_context(l_ctx);
 		req->wb_lock_context = NULL;
 	}
-	if (ctx != NULL) {
-		put_nfs_open_context(ctx);
-		req->wb_context = NULL;
-	}
 }
 
 /**
diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
index 1ea13e94feb7..0bbd587fac6a 100644
--- a/include/linux/nfs_page.h
+++ b/include/linux/nfs_page.h
@@ -42,7 +42,6 @@ struct nfs_inode;
 struct nfs_page {
 	struct list_head	wb_list;	/* Defines state of page: */
 	struct page		*wb_page;	/* page to read in/write out */
-	struct nfs_open_context	*wb_context;	/* File state context info */
 	struct nfs_lock_context	*wb_lock_context;	/* lock context info */
 	pgoff_t			wb_index;	/* Offset >> PAGE_SHIFT */
 	unsigned int		wb_offset,	/* Offset & ~PAGE_MASK */
@@ -203,7 +202,7 @@ loff_t req_offset(struct nfs_page *req)
 static inline struct nfs_open_context *
 nfs_req_openctx(struct nfs_page *req)
 {
-	return req->wb_context;
+	return req->wb_lock_context->open_context;
 }
 
 #endif /* _LINUX_NFS_PAGE_H */
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 25/28] pNFS: Add tracking to limit the number of pNFS retries
  2019-03-29 21:59                                                 ` [PATCH v2 25/28] pNFS: Add tracking to limit the number of pNFS retries Trond Myklebust
  2019-03-29 21:59                                                   ` [PATCH v2 26/28] NFS: Allow signal interruption of NFS4ERR_DELAYed operations Trond Myklebust
@ 2019-04-01 16:27                                                   ` Olga Kornievskaia
  2019-04-02 18:23                                                     ` Trond Myklebust
  1 sibling, 1 reply; 42+ messages in thread
From: Olga Kornievskaia @ 2019-04-01 16:27 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Fri, Mar 29, 2019 at 6:03 PM Trond Myklebust <trondmy@gmail.com> wrote:
>
> When the client is reading or writing using pNFS, and hits an error
> on the DS,

Doesn't the client retry IO against the MDS when IO to the DS fails? I
find the commit message confusing. What re-tries are we talking about?
I recall after a while the client will try to get a layout again and
if it succeeds it will send the IO to the DS. So are you trying to
prevent these new retries to the DS that will fail (as you say if DS
is in unrecoverable state)? Then why would there be a fatal error
since writing thru the MDS should (hopefully) always succeed?

> then it typically sends a LAYOUTERROR and/or LAYOUTRETURN
> to the MDS, before redirtying the failed pages, and going for a new
> round of reads/writebacks. The problem is that if the server has no
> way to fix the DS, then we may need a way to interrupt this loop
> after a set number of attempts have been made.
> This patch adds an optional module parameter that allows the admin
> to specify how many times to retry the read/writeback process before
> failing with a fatal error.
> The default behaviour is to retry forever.
>
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
>  fs/nfs/direct.c                        |  7 +++++++
>  fs/nfs/flexfilelayout/flexfilelayout.c |  8 ++++++++
>  fs/nfs/pagelist.c                      | 14 +++++++++++++-
>  fs/nfs/write.c                         |  5 +++++
>  include/linux/nfs_page.h               |  4 +++-
>  5 files changed, 36 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> index 2d301a1a73e2..2436bd92bc00 100644
> --- a/fs/nfs/direct.c
> +++ b/fs/nfs/direct.c
> @@ -663,6 +663,8 @@ static void nfs_direct_write_reschedule(struct nfs_direct_req *dreq)
>         }
>
>         list_for_each_entry_safe(req, tmp, &reqs, wb_list) {
> +               /* Bump the transmission count */
> +               req->wb_nio++;
>                 if (!nfs_pageio_add_request(&desc, req)) {
>                         nfs_list_move_request(req, &failed);
>                         spin_lock(&cinfo.inode->i_lock);
> @@ -703,6 +705,11 @@ static void nfs_direct_commit_complete(struct nfs_commit_data *data)
>                 req = nfs_list_entry(data->pages.next);
>                 nfs_list_remove_request(req);
>                 if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES) {
> +                       /*
> +                        * Despite the reboot, the write was successful,
> +                        * so reset wb_nio.
> +                        */
> +                       req->wb_nio = 0;
>                         /* Note the rewrite will go through mds */
>                         nfs_mark_request_commit(req, NULL, &cinfo, 0);
>                 } else
> diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
> index 6673d4ff5a2a..9fdbcfd3e39d 100644
> --- a/fs/nfs/flexfilelayout/flexfilelayout.c
> +++ b/fs/nfs/flexfilelayout/flexfilelayout.c
> @@ -28,6 +28,8 @@
>  #define FF_LAYOUT_POLL_RETRY_MAX     (15*HZ)
>  #define FF_LAYOUTRETURN_MAXERR 20
>
> +static unsigned short io_maxretrans;
> +
>  static void ff_layout_read_record_layoutstats_done(struct rpc_task *task,
>                 struct nfs_pgio_header *hdr);
>  static int ff_layout_mirror_prepare_stats(struct pnfs_layout_hdr *lo,
> @@ -925,6 +927,7 @@ ff_layout_pg_init_read(struct nfs_pageio_descriptor *pgio,
>         pgm = &pgio->pg_mirrors[0];
>         pgm->pg_bsize = mirror->mirror_ds->ds_versions[0].rsize;
>
> +       pgio->pg_maxretrans = io_maxretrans;
>         return;
>  out_nolseg:
>         if (pgio->pg_error < 0)
> @@ -992,6 +995,7 @@ ff_layout_pg_init_write(struct nfs_pageio_descriptor *pgio,
>                 pgm->pg_bsize = mirror->mirror_ds->ds_versions[0].wsize;
>         }
>
> +       pgio->pg_maxretrans = io_maxretrans;
>         return;
>
>  out_mds:
> @@ -2515,3 +2519,7 @@ MODULE_DESCRIPTION("The NFSv4 flexfile layout driver");
>
>  module_init(nfs4flexfilelayout_init);
>  module_exit(nfs4flexfilelayout_exit);
> +
> +module_param(io_maxretrans, ushort, 0644);
> +MODULE_PARM_DESC(io_maxretrans, "The  number of times the NFSv4.1 client "
> +                       "retries an I/O request before returning an error. ");
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index b8301c40dd78..4a31284f411e 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -16,8 +16,8 @@
>  #include <linux/nfs.h>
>  #include <linux/nfs3.h>
>  #include <linux/nfs4.h>
> -#include <linux/nfs_page.h>
>  #include <linux/nfs_fs.h>
> +#include <linux/nfs_page.h>
>  #include <linux/nfs_mount.h>
>  #include <linux/export.h>
>
> @@ -327,6 +327,7 @@ __nfs_create_request(struct nfs_lock_context *l_ctx, struct page *page,
>         req->wb_bytes   = count;
>         req->wb_context = get_nfs_open_context(ctx);
>         kref_init(&req->wb_kref);
> +       req->wb_nio = 0;
>         return req;
>  }
>
> @@ -370,6 +371,7 @@ nfs_create_subreq(struct nfs_page *req, struct nfs_page *last,
>                 nfs_lock_request(ret);
>                 ret->wb_index = req->wb_index;
>                 nfs_page_group_init(ret, last);
> +               ret->wb_nio = req->wb_nio;
>         }
>         return ret;
>  }
> @@ -724,6 +726,7 @@ void nfs_pageio_init(struct nfs_pageio_descriptor *desc,
>         desc->pg_mirrors_dynamic = NULL;
>         desc->pg_mirrors = desc->pg_mirrors_static;
>         nfs_pageio_mirror_init(&desc->pg_mirrors[0], bsize);
> +       desc->pg_maxretrans = 0;
>  }
>
>  /**
> @@ -983,6 +986,15 @@ static int nfs_pageio_do_add_request(struct nfs_pageio_descriptor *desc,
>                         return 0;
>                 mirror->pg_base = req->wb_pgbase;
>         }
> +
> +       if (desc->pg_maxretrans && req->wb_nio > desc->pg_maxretrans) {
> +               if (NFS_SERVER(desc->pg_inode)->flags & NFS_MOUNT_SOFTERR)
> +                       desc->pg_error = -ETIMEDOUT;
> +               else
> +                       desc->pg_error = -EIO;
> +               return 0;
> +       }
> +
>         if (!nfs_can_coalesce_requests(prev, req, desc))
>                 return 0;
>         nfs_list_move_request(req, &mirror->pg_list);
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 0712d886ff08..908b166d635d 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -1009,6 +1009,8 @@ static void nfs_write_completion(struct nfs_pgio_header *hdr)
>                         goto remove_req;
>                 }
>                 if (nfs_write_need_commit(hdr)) {
> +                       /* Reset wb_nio, since the write was successful. */
> +                       req->wb_nio = 0;
>                         memcpy(&req->wb_verf, &hdr->verf.verifier, sizeof(req->wb_verf));
>                         nfs_mark_request_commit(req, hdr->lseg, &cinfo,
>                                 hdr->pgio_mirror_idx);
> @@ -1142,6 +1144,7 @@ static struct nfs_page *nfs_try_to_update_request(struct inode *inode,
>                 req->wb_bytes = end - req->wb_offset;
>         else
>                 req->wb_bytes = rqend - req->wb_offset;
> +       req->wb_nio = 0;
>         return req;
>  out_flushme:
>         /*
> @@ -1416,6 +1419,8 @@ static void nfs_initiate_write(struct nfs_pgio_header *hdr,
>   */
>  static void nfs_redirty_request(struct nfs_page *req)
>  {
> +       /* Bump the transmission count */
> +       req->wb_nio++;
>         nfs_mark_request_dirty(req);
>         set_bit(NFS_CONTEXT_RESEND_WRITES, &req->wb_context->flags);
>         nfs_end_page_writeback(req);
> diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
> index b7d0f15615c2..8b36800d342d 100644
> --- a/include/linux/nfs_page.h
> +++ b/include/linux/nfs_page.h
> @@ -53,6 +53,7 @@ struct nfs_page {
>         struct nfs_write_verifier       wb_verf;        /* Commit cookie */
>         struct nfs_page         *wb_this_page;  /* list of reqs for this page */
>         struct nfs_page         *wb_head;       /* head pointer for req list */
> +       unsigned short          wb_nio;         /* Number of I/O attempts */
>  };
>
>  struct nfs_pageio_descriptor;
> @@ -87,7 +88,6 @@ struct nfs_pgio_mirror {
>  };
>
>  struct nfs_pageio_descriptor {
> -       unsigned char           pg_moreio : 1;
>         struct inode            *pg_inode;
>         const struct nfs_pageio_ops *pg_ops;
>         const struct nfs_rw_ops *pg_rw_ops;
> @@ -105,6 +105,8 @@ struct nfs_pageio_descriptor {
>         struct nfs_pgio_mirror  pg_mirrors_static[1];
>         struct nfs_pgio_mirror  *pg_mirrors_dynamic;
>         u32                     pg_mirror_idx;  /* current mirror */
> +       unsigned short          pg_maxretrans;
> +       unsigned char           pg_moreio : 1;
>  };
>
>  /* arbitrarily selected limit to number of mirrors */
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 00/28] Fix up soft mounts for NFSv4.x
  2019-03-29 21:59 [PATCH v2 00/28] Fix up soft mounts for NFSv4.x Trond Myklebust
  2019-03-29 21:59 ` [PATCH v2 01/28] SUNRPC: Fix up task signalling Trond Myklebust
@ 2019-04-01 16:54 ` Olga Kornievskaia
  2019-04-02 18:28   ` Trond Myklebust
  1 sibling, 1 reply; 42+ messages in thread
From: Olga Kornievskaia @ 2019-04-01 16:54 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On Fri, Mar 29, 2019 at 6:02 PM Trond Myklebust <trondmy@gmail.com> wrote:
>
> This patchset aims to make soft mounts a viable option for NFSv4 clients
> by minimising the risk of false positive timeouts, while allowing for
> faster failover of reads and writes once a timeout is actually observed.
>
> The patches rely on the NFS server correctly implementing the contract
> specified in RFC7530 section 3.1.1 with respect to not dropping requests
> while the transport connection is up. When this is the case, the client
> can safely assume that if the request has not received a reply after
> transmitting a RPC request, it is not because the request was dropped,
> but rather is due to congestion, or slow processing on the server.
> IOW: as long as the connection remains up, there is no need for requests
> to time out.
>
> The patches break down roughly as follows:
> - A set of patches to clean up the RPC engine timeouts, and ensure they
>   are accurate.
> - A set of patches to change the 'soft' mount semantics for NFSv4.x.
> - A set of patches to add a new 'softerr' mount option that works like
>   soft, but explicitly signals timeouts using the ETIMEDOUT error code
>   rather than using EIO. This allows applications to tune their
>   behaviour (e.g. by failing over to a different server) if a timeout
>   occurs.

I'm just curious why would an application be aware of a different
server to connect to and an NFS layer would not be? I'm also curious
wouldn't it break application that typically expect to get an EIO
errors? Do all system calls allow to return ETIMEDOUT error?

> - A set of patches to change the NFS error reporting so that it matches
>   that of local filesystems w.r.t. guarantees that filesystem errors are
>   seen once and once only.
> - A patch to ensure the safe interruption of NFS4ERR_DELAYed operations
> - A patch to ensure that pNFS operations can be forced to break out
>   of layout error cycles after a certain number of retries.
> - A few cleanups...
>
> -------
> Changes since v1:
> - Change NFSv4 soft timeout condition to prevent all requests from
>   timing out when the connection is still up, instead of just the
>   ones that have been sent.
> - RPC queue timer cleanups
> - Ratelimit the "server not responding" messages
>
>
> *** BLURB HERE ***
>
> Trond Myklebust (28):
>   SUNRPC: Fix up task signalling
>   SUNRPC: Refactor rpc_restart_call/rpc_restart_call_prepare
>   SUNRPC: Refactor xprt_request_wait_receive()
>   SUNRPC: Refactor rpc_sleep_on()
>   SUNRPC: Remove unused argument 'action' from rpc_sleep_on_priority()
>   SUNRPC: Add function rpc_sleep_on_timeout()
>   SUNRPC: Fix up tracking of timeouts
>   SUNRPC: Simplify queue timeouts using timer_reduce()
>   SUNRPC: Declare RPC timers as TIMER_DEFERRABLE
>   SUNRPC: Ensure that the transport layer respect major timeouts
>   SUNRPC: Add tracking of RPC level errors
>   SUNRPC: Make "no retrans timeout" soft tasks behave like softconn for
>     timeouts
>   SUNRPC: Start the first major timeout calculation at task creation
>   SUNRPC: Ensure to ratelimit the "server not responding" syslog
>     messages
>   SUNRPC: Add the 'softerr' rpc_client flag
>   NFS: Consider ETIMEDOUT to be a fatal error
>   NFS: Move internal constants out of uapi/linux/nfs_mount.h
>   NFS: Add a mount option "softerr" to allow clients to see ETIMEDOUT
>     errors
>   NFS: Don't interrupt file writeout due to fatal errors
>   NFS: Don't call generic_error_remove_page() while holding locks
>   NFS: Don't inadvertently clear writeback errors
>   NFS: Replace custom error reporting mechanism with generic one
>   NFS: Fix up NFS I/O subrequest creation
>   NFS: Remove unused argument from nfs_create_request()
>   pNFS: Add tracking to limit the number of pNFS retries
>   NFS: Allow signal interruption of NFS4ERR_DELAYed operations
>   NFS: Add a helper to return a pointer to the open context of a struct
>     nfs_page
>   NFS: Remove redundant open context from nfs_page
>
>  fs/lockd/clntproc.c                        |   4 +-
>  fs/nfs/client.c                            |   2 +
>  fs/nfs/direct.c                            |  11 +-
>  fs/nfs/file.c                              |  31 +---
>  fs/nfs/filelayout/filelayout.c             |   4 +-
>  fs/nfs/flexfilelayout/flexfilelayout.c     |  14 +-
>  fs/nfs/internal.h                          |   7 +-
>  fs/nfs/nfs4_fs.h                           |   1 +
>  fs/nfs/nfs4file.c                          |   2 +-
>  fs/nfs/nfs4proc.c                          | 159 +++++++++++++++------
>  fs/nfs/pagelist.c                          | 122 +++++++++-------
>  fs/nfs/pnfs.c                              |   4 +-
>  fs/nfs/pnfs.h                              |   4 +-
>  fs/nfs/read.c                              |   6 +-
>  fs/nfs/super.c                             |  15 +-
>  fs/nfs/write.c                             |  67 +++++----
>  fs/nfsd/nfs4callback.c                     |   4 +-
>  include/linux/nfs_fs.h                     |   1 -
>  include/linux/nfs_fs_sb.h                  |  10 ++
>  include/linux/nfs_page.h                   |  12 +-
>  include/linux/sunrpc/clnt.h                |   2 +
>  include/linux/sunrpc/sched.h               |  20 ++-
>  include/linux/sunrpc/xprt.h                |   6 +-
>  include/trace/events/sunrpc.h              |   8 +-
>  include/uapi/linux/nfs_mount.h             |   9 --
>  net/sunrpc/auth_gss/auth_gss.c             |   5 +-
>  net/sunrpc/clnt.c                          | 116 +++++++++------
>  net/sunrpc/debugfs.c                       |   2 +-
>  net/sunrpc/rpcb_clnt.c                     |   3 +-
>  net/sunrpc/sched.c                         | 158 +++++++++++++++-----
>  net/sunrpc/xprt.c                          | 150 ++++++++++++-------
>  net/sunrpc/xprtrdma/svc_rdma_backchannel.c |   2 +-
>  net/sunrpc/xprtrdma/transport.c            |   2 +-
>  net/sunrpc/xprtsock.c                      |   9 +-
>  34 files changed, 631 insertions(+), 341 deletions(-)
>
> --
> 2.20.1
>

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 25/28] pNFS: Add tracking to limit the number of pNFS retries
  2019-04-01 16:27                                                   ` [PATCH v2 25/28] pNFS: Add tracking to limit the number of pNFS retries Olga Kornievskaia
@ 2019-04-02 18:23                                                     ` Trond Myklebust
  0 siblings, 0 replies; 42+ messages in thread
From: Trond Myklebust @ 2019-04-02 18:23 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: linux-nfs

On Mon, 2019-04-01 at 12:27 -0400, Olga Kornievskaia wrote:
> On Fri, Mar 29, 2019 at 6:03 PM Trond Myklebust <trondmy@gmail.com>
> wrote:
> > When the client is reading or writing using pNFS, and hits an error
> > on the DS,
> 
> Doesn't the client retry IO against the MDS when IO to the DS fails?
> I
> find the commit message confusing. What re-tries are we talking
> about?
> I recall after a while the client will try to get a layout again and
> if it succeeds it will send the IO to the DS. So are you trying to
> prevent these new retries to the DS that will fail (as you say if DS
> is in unrecoverable state)? Then why would there be a fatal error
> since writing thru the MDS should (hopefully) always succeed?

You are thinking about tightly coupled pNFS systems, where the MDS has
a 'special relationship' with the DSes. On a more generic system, such
as flexfiles, there is no point in doing write through the MDS because
the MDS typically has no better chance of success than the client.

As you can see from the patch, that is the main case we're targetting
here. There is no change to the other pNFS layout behaviours.

> > then it typically sends a LAYOUTERROR and/or LAYOUTRETURN
> > to the MDS, before redirtying the failed pages, and going for a new
> > round of reads/writebacks. The problem is that if the server has no
> > way to fix the DS, then we may need a way to interrupt this loop
> > after a set number of attempts have been made.
> > This patch adds an optional module parameter that allows the admin
> > to specify how many times to retry the read/writeback process
> > before
> > failing with a fatal error.
> > The default behaviour is to retry forever.
> > 
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > ---
> >  fs/nfs/direct.c                        |  7 +++++++
> >  fs/nfs/flexfilelayout/flexfilelayout.c |  8 ++++++++
> >  fs/nfs/pagelist.c                      | 14 +++++++++++++-
> >  fs/nfs/write.c                         |  5 +++++
> >  include/linux/nfs_page.h               |  4 +++-
> >  5 files changed, 36 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/nfs/direct.c b/fs/nfs/direct.c
> > index 2d301a1a73e2..2436bd92bc00 100644
> > --- a/fs/nfs/direct.c
> > +++ b/fs/nfs/direct.c
> > @@ -663,6 +663,8 @@ static void nfs_direct_write_reschedule(struct
> > nfs_direct_req *dreq)
> >         }
> > 
> >         list_for_each_entry_safe(req, tmp, &reqs, wb_list) {
> > +               /* Bump the transmission count */
> > +               req->wb_nio++;
> >                 if (!nfs_pageio_add_request(&desc, req)) {
> >                         nfs_list_move_request(req, &failed);
> >                         spin_lock(&cinfo.inode->i_lock);
> > @@ -703,6 +705,11 @@ static void nfs_direct_commit_complete(struct
> > nfs_commit_data *data)
> >                 req = nfs_list_entry(data->pages.next);
> >                 nfs_list_remove_request(req);
> >                 if (dreq->flags == NFS_ODIRECT_RESCHED_WRITES) {
> > +                       /*
> > +                        * Despite the reboot, the write was
> > successful,
> > +                        * so reset wb_nio.
> > +                        */
> > +                       req->wb_nio = 0;
> >                         /* Note the rewrite will go through mds */
> >                         nfs_mark_request_commit(req, NULL, &cinfo,
> > 0);
> >                 } else
> > diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c
> > b/fs/nfs/flexfilelayout/flexfilelayout.c
> > index 6673d4ff5a2a..9fdbcfd3e39d 100644
> > --- a/fs/nfs/flexfilelayout/flexfilelayout.c
> > +++ b/fs/nfs/flexfilelayout/flexfilelayout.c
> > @@ -28,6 +28,8 @@
> >  #define FF_LAYOUT_POLL_RETRY_MAX     (15*HZ)
> >  #define FF_LAYOUTRETURN_MAXERR 20
> > 
> > +static unsigned short io_maxretrans;
> > +
> >  static void ff_layout_read_record_layoutstats_done(struct rpc_task
> > *task,
> >                 struct nfs_pgio_header *hdr);
> >  static int ff_layout_mirror_prepare_stats(struct pnfs_layout_hdr
> > *lo,
> > @@ -925,6 +927,7 @@ ff_layout_pg_init_read(struct
> > nfs_pageio_descriptor *pgio,
> >         pgm = &pgio->pg_mirrors[0];
> >         pgm->pg_bsize = mirror->mirror_ds->ds_versions[0].rsize;
> > 
> > +       pgio->pg_maxretrans = io_maxretrans;
> >         return;
> >  out_nolseg:
> >         if (pgio->pg_error < 0)
> > @@ -992,6 +995,7 @@ ff_layout_pg_init_write(struct
> > nfs_pageio_descriptor *pgio,
> >                 pgm->pg_bsize = mirror->mirror_ds-
> > >ds_versions[0].wsize;
> >         }
> > 
> > +       pgio->pg_maxretrans = io_maxretrans;
> >         return;
> > 
> >  out_mds:
> > @@ -2515,3 +2519,7 @@ MODULE_DESCRIPTION("The NFSv4 flexfile layout
> > driver");
> > 
> >  module_init(nfs4flexfilelayout_init);
> >  module_exit(nfs4flexfilelayout_exit);
> > +
> > +module_param(io_maxretrans, ushort, 0644);
> > +MODULE_PARM_DESC(io_maxretrans, "The  number of times the NFSv4.1
> > client "
> > +                       "retries an I/O request before returning an
> > error. ");
> > diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> > index b8301c40dd78..4a31284f411e 100644
> > --- a/fs/nfs/pagelist.c
> > +++ b/fs/nfs/pagelist.c
> > @@ -16,8 +16,8 @@
> >  #include <linux/nfs.h>
> >  #include <linux/nfs3.h>
> >  #include <linux/nfs4.h>
> > -#include <linux/nfs_page.h>
> >  #include <linux/nfs_fs.h>
> > +#include <linux/nfs_page.h>
> >  #include <linux/nfs_mount.h>
> >  #include <linux/export.h>
> > 
> > @@ -327,6 +327,7 @@ __nfs_create_request(struct nfs_lock_context
> > *l_ctx, struct page *page,
> >         req->wb_bytes   = count;
> >         req->wb_context = get_nfs_open_context(ctx);
> >         kref_init(&req->wb_kref);
> > +       req->wb_nio = 0;
> >         return req;
> >  }
> > 
> > @@ -370,6 +371,7 @@ nfs_create_subreq(struct nfs_page *req, struct
> > nfs_page *last,
> >                 nfs_lock_request(ret);
> >                 ret->wb_index = req->wb_index;
> >                 nfs_page_group_init(ret, last);
> > +               ret->wb_nio = req->wb_nio;
> >         }
> >         return ret;
> >  }
> > @@ -724,6 +726,7 @@ void nfs_pageio_init(struct
> > nfs_pageio_descriptor *desc,
> >         desc->pg_mirrors_dynamic = NULL;
> >         desc->pg_mirrors = desc->pg_mirrors_static;
> >         nfs_pageio_mirror_init(&desc->pg_mirrors[0], bsize);
> > +       desc->pg_maxretrans = 0;
> >  }
> > 
> >  /**
> > @@ -983,6 +986,15 @@ static int nfs_pageio_do_add_request(struct
> > nfs_pageio_descriptor *desc,
> >                         return 0;
> >                 mirror->pg_base = req->wb_pgbase;
> >         }
> > +
> > +       if (desc->pg_maxretrans && req->wb_nio > desc-
> > >pg_maxretrans) {
> > +               if (NFS_SERVER(desc->pg_inode)->flags &
> > NFS_MOUNT_SOFTERR)
> > +                       desc->pg_error = -ETIMEDOUT;
> > +               else
> > +                       desc->pg_error = -EIO;
> > +               return 0;
> > +       }
> > +
> >         if (!nfs_can_coalesce_requests(prev, req, desc))
> >                 return 0;
> >         nfs_list_move_request(req, &mirror->pg_list);
> > diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> > index 0712d886ff08..908b166d635d 100644
> > --- a/fs/nfs/write.c
> > +++ b/fs/nfs/write.c
> > @@ -1009,6 +1009,8 @@ static void nfs_write_completion(struct
> > nfs_pgio_header *hdr)
> >                         goto remove_req;
> >                 }
> >                 if (nfs_write_need_commit(hdr)) {
> > +                       /* Reset wb_nio, since the write was
> > successful. */
> > +                       req->wb_nio = 0;
> >                         memcpy(&req->wb_verf, &hdr->verf.verifier,
> > sizeof(req->wb_verf));
> >                         nfs_mark_request_commit(req, hdr->lseg,
> > &cinfo,
> >                                 hdr->pgio_mirror_idx);
> > @@ -1142,6 +1144,7 @@ static struct nfs_page
> > *nfs_try_to_update_request(struct inode *inode,
> >                 req->wb_bytes = end - req->wb_offset;
> >         else
> >                 req->wb_bytes = rqend - req->wb_offset;
> > +       req->wb_nio = 0;
> >         return req;
> >  out_flushme:
> >         /*
> > @@ -1416,6 +1419,8 @@ static void nfs_initiate_write(struct
> > nfs_pgio_header *hdr,
> >   */
> >  static void nfs_redirty_request(struct nfs_page *req)
> >  {
> > +       /* Bump the transmission count */
> > +       req->wb_nio++;
> >         nfs_mark_request_dirty(req);
> >         set_bit(NFS_CONTEXT_RESEND_WRITES, &req->wb_context-
> > >flags);
> >         nfs_end_page_writeback(req);
> > diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
> > index b7d0f15615c2..8b36800d342d 100644
> > --- a/include/linux/nfs_page.h
> > +++ b/include/linux/nfs_page.h
> > @@ -53,6 +53,7 @@ struct nfs_page {
> >         struct nfs_write_verifier       wb_verf;        /* Commit
> > cookie */
> >         struct nfs_page         *wb_this_page;  /* list of reqs for
> > this page */
> >         struct nfs_page         *wb_head;       /* head pointer for
> > req list */
> > +       unsigned short          wb_nio;         /* Number of I/O
> > attempts */
> >  };
> > 
> >  struct nfs_pageio_descriptor;
> > @@ -87,7 +88,6 @@ struct nfs_pgio_mirror {
> >  };
> > 
> >  struct nfs_pageio_descriptor {
> > -       unsigned char           pg_moreio : 1;
> >         struct inode            *pg_inode;
> >         const struct nfs_pageio_ops *pg_ops;
> >         const struct nfs_rw_ops *pg_rw_ops;
> > @@ -105,6 +105,8 @@ struct nfs_pageio_descriptor {
> >         struct nfs_pgio_mirror  pg_mirrors_static[1];
> >         struct nfs_pgio_mirror  *pg_mirrors_dynamic;
> >         u32                     pg_mirror_idx;  /* current mirror
> > */
> > +       unsigned short          pg_maxretrans;
> > +       unsigned char           pg_moreio : 1;
> >  };
> > 
> >  /* arbitrarily selected limit to number of mirrors */
> > --
> > 2.20.1
> > 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 00/28] Fix up soft mounts for NFSv4.x
  2019-04-01 16:54 ` [PATCH v2 00/28] Fix up soft mounts for NFSv4.x Olga Kornievskaia
@ 2019-04-02 18:28   ` Trond Myklebust
  2019-04-03 20:51     ` Mkrtchyan, Tigran
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-04-02 18:28 UTC (permalink / raw)
  To: Olga Kornievskaia; +Cc: linux-nfs

On Mon, 2019-04-01 at 12:54 -0400, Olga Kornievskaia wrote:
> On Fri, Mar 29, 2019 at 6:02 PM Trond Myklebust <trondmy@gmail.com>
> wrote:
> > This patchset aims to make soft mounts a viable option for NFSv4
> > clients
> > by minimising the risk of false positive timeouts, while allowing
> > for
> > faster failover of reads and writes once a timeout is actually
> > observed.
> > 
> > The patches rely on the NFS server correctly implementing the
> > contract
> > specified in RFC7530 section 3.1.1 with respect to not dropping
> > requests
> > while the transport connection is up. When this is the case, the
> > client
> > can safely assume that if the request has not received a reply
> > after
> > transmitting a RPC request, it is not because the request was
> > dropped,
> > but rather is due to congestion, or slow processing on the server.
> > IOW: as long as the connection remains up, there is no need for
> > requests
> > to time out.
> > 
> > The patches break down roughly as follows:
> > - A set of patches to clean up the RPC engine timeouts, and ensure
> > they
> >   are accurate.
> > - A set of patches to change the 'soft' mount semantics for
> > NFSv4.x.
> > - A set of patches to add a new 'softerr' mount option that works
> > like
> >   soft, but explicitly signals timeouts using the ETIMEDOUT error
> > code
> >   rather than using EIO. This allows applications to tune their
> >   behaviour (e.g. by failing over to a different server) if a
> > timeout
> >   occurs.
> 
> I'm just curious why would an application be aware of a different
> server to connect to and an NFS layer would not be? I'm also curious
> wouldn't it break application that typically expect to get an EIO
> errors? Do all system calls allow to return ETIMEDOUT error?

This is why it is a separate mount option. ...and actually most
applications blow up when they get EIO as well. However you can imagine
an application that might decide to retry if it hits an ETIMEDOUT,
while failing if it hits an EIO.

Cheers
  Trond


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 00/28] Fix up soft mounts for NFSv4.x
  2019-04-02 18:28   ` Trond Myklebust
@ 2019-04-03 20:51     ` Mkrtchyan, Tigran
  2019-04-03 21:13       ` Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Mkrtchyan, Tigran @ 2019-04-03 20:51 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Olga Kornievskaia, linux-nfs

Hi Trond,

----- Original Message -----
> From: "Trond Myklebust" <trondmy@gmail.com>
> To: "Olga Kornievskaia" <aglo@umich.edu>
> Cc: "linux-nfs" <linux-nfs@vger.kernel.org>
> Sent: Tuesday, April 2, 2019 8:28:38 PM
> Subject: Re: [PATCH v2 00/28] Fix up soft mounts for NFSv4.x

> On Mon, 2019-04-01 at 12:54 -0400, Olga Kornievskaia wrote:
>> On Fri, Mar 29, 2019 at 6:02 PM Trond Myklebust <trondmy@gmail.com>
>> wrote:
>> > This patchset aims to make soft mounts a viable option for NFSv4
>> > clients
>> > by minimising the risk of false positive timeouts, while allowing
>> > for
>> > faster failover of reads and writes once a timeout is actually
>> > observed.
>> > 
>> > The patches rely on the NFS server correctly implementing the
>> > contract
>> > specified in RFC7530 section 3.1.1 with respect to not dropping
>> > requests
>> > while the transport connection is up. When this is the case, the
>> > client
>> > can safely assume that if the request has not received a reply
>> > after
>> > transmitting a RPC request, it is not because the request was
>> > dropped,
>> > but rather is due to congestion, or slow processing on the server.
>> > IOW: as long as the connection remains up, there is no need for
>> > requests
>> > to time out.
>> > 
>> > The patches break down roughly as follows:
>> > - A set of patches to clean up the RPC engine timeouts, and ensure
>> > they
>> >   are accurate.
>> > - A set of patches to change the 'soft' mount semantics for
>> > NFSv4.x.
>> > - A set of patches to add a new 'softerr' mount option that works
>> > like
>> >   soft, but explicitly signals timeouts using the ETIMEDOUT error
>> > code
>> >   rather than using EIO. This allows applications to tune their
>> >   behaviour (e.g. by failing over to a different server) if a
>> > timeout
>> >   occurs.
>> 
>> I'm just curious why would an application be aware of a different
>> server to connect to and an NFS layer would not be? I'm also curious
>> wouldn't it break application that typically expect to get an EIO
>> errors? Do all system calls allow to return ETIMEDOUT error?
> 
> This is why it is a separate mount option. ...and actually most
> applications blow up when they get EIO as well. However you can imagine
> an application that might decide to retry if it hits an ETIMEDOUT,
> while failing if it hits an EIO.

What is the reason of introducing new error code for IO operations, which
is not in the list of POSIX specified values for read(2) and write(2). Is
there expected application behavior change compared to EAGAIN?

I would like to use the opportunity to bring the topic of O_NONBLOCK open(2)
flag for offline files.

Tigran.

> 
> Cheers
>   Trond

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 00/28] Fix up soft mounts for NFSv4.x
  2019-04-03 20:51     ` Mkrtchyan, Tigran
@ 2019-04-03 21:13       ` Trond Myklebust
  2019-04-03 21:59         ` Mkrtchyan, Tigran
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-04-03 21:13 UTC (permalink / raw)
  To: tigran.mkrtchyan; +Cc: linux-nfs, aglo

On Wed, 2019-04-03 at 22:51 +0200, Mkrtchyan, Tigran wrote:
> Hi Trond,
> 
> ----- Original Message -----
> > From: "Trond Myklebust" <trondmy@gmail.com>
> > To: "Olga Kornievskaia" <aglo@umich.edu>
> > Cc: "linux-nfs" <linux-nfs@vger.kernel.org>
> > Sent: Tuesday, April 2, 2019 8:28:38 PM
> > Subject: Re: [PATCH v2 00/28] Fix up soft mounts for NFSv4.x
> > On Mon, 2019-04-01 at 12:54 -0400, Olga Kornievskaia wrote:
> > > On Fri, Mar 29, 2019 at 6:02 PM Trond Myklebust <
> > > trondmy@gmail.com>
> > > wrote:
> > > > This patchset aims to make soft mounts a viable option for
> > > > NFSv4
> > > > clients
> > > > by minimising the risk of false positive timeouts, while
> > > > allowing
> > > > for
> > > > faster failover of reads and writes once a timeout is actually
> > > > observed.
> > > > 
> > > > The patches rely on the NFS server correctly implementing the
> > > > contract
> > > > specified in RFC7530 section 3.1.1 with respect to not dropping
> > > > requests
> > > > while the transport connection is up. When this is the case,
> > > > the
> > > > client
> > > > can safely assume that if the request has not received a reply
> > > > after
> > > > transmitting a RPC request, it is not because the request was
> > > > dropped,
> > > > but rather is due to congestion, or slow processing on the
> > > > server.
> > > > IOW: as long as the connection remains up, there is no need for
> > > > requests
> > > > to time out.
> > > > 
> > > > The patches break down roughly as follows:
> > > > - A set of patches to clean up the RPC engine timeouts, and
> > > > ensure
> > > > they
> > > >   are accurate.
> > > > - A set of patches to change the 'soft' mount semantics for
> > > > NFSv4.x.
> > > > - A set of patches to add a new 'softerr' mount option that
> > > > works
> > > > like
> > > >   soft, but explicitly signals timeouts using the ETIMEDOUT
> > > > error
> > > > code
> > > >   rather than using EIO. This allows applications to tune their
> > > >   behaviour (e.g. by failing over to a different server) if a
> > > > timeout
> > > >   occurs.
> > > 
> > > I'm just curious why would an application be aware of a different
> > > server to connect to and an NFS layer would not be? I'm also
> > > curious
> > > wouldn't it break application that typically expect to get an EIO
> > > errors? Do all system calls allow to return ETIMEDOUT error?
> > 
> > This is why it is a separate mount option. ...and actually most
> > applications blow up when they get EIO as well. However you can
> > imagine
> > an application that might decide to retry if it hits an ETIMEDOUT,
> > while failing if it hits an EIO.
> 
> What is the reason of introducing new error code for IO operations,
> which
> is not in the list of POSIX specified values for read(2) and
> write(2). Is
> there expected application behavior change compared to EAGAIN?

The point is to allow aware applications to better handle a situation
which is not covered by POSIX because POSIX has no concept of storage
that is temporarily unavailable.

...and it is being proposed as an opt-in feature, precisely so that
existing applications don't need to change.

> I would like to use the opportunity to bring the topic of O_NONBLOCK
> open(2)
> flag for offline files.


-- 
Trond Myklebust
CTO, Hammerspace Inc
4300 El Camino Real, Suite 105
Los Altos, CA 94022
www.hammer.space



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 00/28] Fix up soft mounts for NFSv4.x
  2019-04-03 21:13       ` Trond Myklebust
@ 2019-04-03 21:59         ` Mkrtchyan, Tigran
  2019-04-03 22:10           ` Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Mkrtchyan, Tigran @ 2019-04-03 21:59 UTC (permalink / raw)
  To: trondmy; +Cc: linux-nfs, Olga Kornievskaia



----- Original Message -----
> From: "trondmy" <trondmy@hammerspace.com>
> To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> Cc: "linux-nfs" <linux-nfs@vger.kernel.org>, "Olga Kornievskaia" <aglo@umich.edu>
> Sent: Wednesday, April 3, 2019 11:13:37 PM
> Subject: Re: [PATCH v2 00/28] Fix up soft mounts for NFSv4.x

> On Wed, 2019-04-03 at 22:51 +0200, Mkrtchyan, Tigran wrote:
>> Hi Trond,
>> 
>> ----- Original Message -----
>> > From: "Trond Myklebust" <trondmy@gmail.com>
>> > To: "Olga Kornievskaia" <aglo@umich.edu>
>> > Cc: "linux-nfs" <linux-nfs@vger.kernel.org>
>> > Sent: Tuesday, April 2, 2019 8:28:38 PM
>> > Subject: Re: [PATCH v2 00/28] Fix up soft mounts for NFSv4.x
>> > On Mon, 2019-04-01 at 12:54 -0400, Olga Kornievskaia wrote:
>> > > On Fri, Mar 29, 2019 at 6:02 PM Trond Myklebust <
>> > > trondmy@gmail.com>
>> > > wrote:
>> > > > This patchset aims to make soft mounts a viable option for
>> > > > NFSv4
>> > > > clients
>> > > > by minimising the risk of false positive timeouts, while
>> > > > allowing
>> > > > for
>> > > > faster failover of reads and writes once a timeout is actually
>> > > > observed.
>> > > > 
>> > > > The patches rely on the NFS server correctly implementing the
>> > > > contract
>> > > > specified in RFC7530 section 3.1.1 with respect to not dropping
>> > > > requests
>> > > > while the transport connection is up. When this is the case,
>> > > > the
>> > > > client
>> > > > can safely assume that if the request has not received a reply
>> > > > after
>> > > > transmitting a RPC request, it is not because the request was
>> > > > dropped,
>> > > > but rather is due to congestion, or slow processing on the
>> > > > server.
>> > > > IOW: as long as the connection remains up, there is no need for
>> > > > requests
>> > > > to time out.
>> > > > 
>> > > > The patches break down roughly as follows:
>> > > > - A set of patches to clean up the RPC engine timeouts, and
>> > > > ensure
>> > > > they
>> > > >   are accurate.
>> > > > - A set of patches to change the 'soft' mount semantics for
>> > > > NFSv4.x.
>> > > > - A set of patches to add a new 'softerr' mount option that
>> > > > works
>> > > > like
>> > > >   soft, but explicitly signals timeouts using the ETIMEDOUT
>> > > > error
>> > > > code
>> > > >   rather than using EIO. This allows applications to tune their
>> > > >   behaviour (e.g. by failing over to a different server) if a
>> > > > timeout
>> > > >   occurs.
>> > > 
>> > > I'm just curious why would an application be aware of a different
>> > > server to connect to and an NFS layer would not be? I'm also
>> > > curious
>> > > wouldn't it break application that typically expect to get an EIO
>> > > errors? Do all system calls allow to return ETIMEDOUT error?
>> > 
>> > This is why it is a separate mount option. ...and actually most
>> > applications blow up when they get EIO as well. However you can
>> > imagine
>> > an application that might decide to retry if it hits an ETIMEDOUT,
>> > while failing if it hits an EIO.
>> 
>> What is the reason of introducing new error code for IO operations,
>> which
>> is not in the list of POSIX specified values for read(2) and
>> write(2). Is
>> there expected application behavior change compared to EAGAIN?
> 
> The point is to allow aware applications to better handle a situation
> which is not covered by POSIX because POSIX has no concept of storage
> that is temporarily unavailable.
> 
> ...and it is being proposed as an opt-in feature, precisely so that
> existing applications don't need to change.

Yes and no. As a mount option, you expose this behavior to all applications
on the client. Thus, either stupid app die and smart survive, or all
block, but smart suffer.

As you probably know, we have to handle similar issue. Currently it's a
server side configuration, which depending on uid/gid of the user returns
either NFSERR_IO or NFSERR_LAYOUTTRYLATER. This is still wrong, as not all
applications from the same users required the same handling.

Regards,
   Tigran.

> 
>> I would like to use the opportunity to bring the topic of O_NONBLOCK
>> open(2)
>> flag for offline files.
> 
> 
> --
> Trond Myklebust
> CTO, Hammerspace Inc
> 4300 El Camino Real, Suite 105
> Los Altos, CA 94022
> www.hammer.space

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 00/28] Fix up soft mounts for NFSv4.x
  2019-04-03 21:59         ` Mkrtchyan, Tigran
@ 2019-04-03 22:10           ` Trond Myklebust
  0 siblings, 0 replies; 42+ messages in thread
From: Trond Myklebust @ 2019-04-03 22:10 UTC (permalink / raw)
  To: tigran.mkrtchyan; +Cc: linux-nfs, aglo

On Wed, 2019-04-03 at 23:59 +0200, Mkrtchyan, Tigran wrote:
> 
> ----- Original Message -----
> > From: "trondmy" <trondmy@hammerspace.com>
> > To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> > Cc: "linux-nfs" <linux-nfs@vger.kernel.org>, "Olga Kornievskaia" <
> > aglo@umich.edu>
> > Sent: Wednesday, April 3, 2019 11:13:37 PM
> > Subject: Re: [PATCH v2 00/28] Fix up soft mounts for NFSv4.x
> > On Wed, 2019-04-03 at 22:51 +0200, Mkrtchyan, Tigran wrote:
> > > Hi Trond,
> > > 
> > > ----- Original Message -----
> > > > From: "Trond Myklebust" <trondmy@gmail.com>
> > > > To: "Olga Kornievskaia" <aglo@umich.edu>
> > > > Cc: "linux-nfs" <linux-nfs@vger.kernel.org>
> > > > Sent: Tuesday, April 2, 2019 8:28:38 PM
> > > > Subject: Re: [PATCH v2 00/28] Fix up soft mounts for NFSv4.x
> > > > On Mon, 2019-04-01 at 12:54 -0400, Olga Kornievskaia wrote:
> > > > > On Fri, Mar 29, 2019 at 6:02 PM Trond Myklebust <
> > > > > trondmy@gmail.com>
> > > > > wrote:
> > > > > > This patchset aims to make soft mounts a viable option for
> > > > > > NFSv4
> > > > > > clients
> > > > > > by minimising the risk of false positive timeouts, while
> > > > > > allowing
> > > > > > for
> > > > > > faster failover of reads and writes once a timeout is
> > > > > > actually
> > > > > > observed.
> > > > > > 
> > > > > > The patches rely on the NFS server correctly implementing
> > > > > > the
> > > > > > contract
> > > > > > specified in RFC7530 section 3.1.1 with respect to not
> > > > > > dropping
> > > > > > requests
> > > > > > while the transport connection is up. When this is the
> > > > > > case,
> > > > > > the
> > > > > > client
> > > > > > can safely assume that if the request has not received a
> > > > > > reply
> > > > > > after
> > > > > > transmitting a RPC request, it is not because the request
> > > > > > was
> > > > > > dropped,
> > > > > > but rather is due to congestion, or slow processing on the
> > > > > > server.
> > > > > > IOW: as long as the connection remains up, there is no need
> > > > > > for
> > > > > > requests
> > > > > > to time out.
> > > > > > 
> > > > > > The patches break down roughly as follows:
> > > > > > - A set of patches to clean up the RPC engine timeouts, and
> > > > > > ensure
> > > > > > they
> > > > > >   are accurate.
> > > > > > - A set of patches to change the 'soft' mount semantics for
> > > > > > NFSv4.x.
> > > > > > - A set of patches to add a new 'softerr' mount option that
> > > > > > works
> > > > > > like
> > > > > >   soft, but explicitly signals timeouts using the ETIMEDOUT
> > > > > > error
> > > > > > code
> > > > > >   rather than using EIO. This allows applications to tune
> > > > > > their
> > > > > >   behaviour (e.g. by failing over to a different server) if
> > > > > > a
> > > > > > timeout
> > > > > >   occurs.
> > > > > 
> > > > > I'm just curious why would an application be aware of a
> > > > > different
> > > > > server to connect to and an NFS layer would not be? I'm also
> > > > > curious
> > > > > wouldn't it break application that typically expect to get an
> > > > > EIO
> > > > > errors? Do all system calls allow to return ETIMEDOUT error?
> > > > 
> > > > This is why it is a separate mount option. ...and actually most
> > > > applications blow up when they get EIO as well. However you can
> > > > imagine
> > > > an application that might decide to retry if it hits an
> > > > ETIMEDOUT,
> > > > while failing if it hits an EIO.
> > > 
> > > What is the reason of introducing new error code for IO
> > > operations,
> > > which
> > > is not in the list of POSIX specified values for read(2) and
> > > write(2). Is
> > > there expected application behavior change compared to EAGAIN?
> > 
> > The point is to allow aware applications to better handle a
> > situation
> > which is not covered by POSIX because POSIX has no concept of
> > storage
> > that is temporarily unavailable.
> > 
> > ...and it is being proposed as an opt-in feature, precisely so that
> > existing applications don't need to change.
> 
> Yes and no. As a mount option, you expose this behavior to all
> applications
> on the client. Thus, either stupid app die and smart survive, or all
> block, but smart suffer.

I don't understand your point. This is doing the exact same thing as
'soft', but behaves differently with respect to timeouts, by returning
ETIMEDOUT instead of EIO.

IOW: if you want the same behaviour, but returning a POSIX error of
EIO, then that behaviour is already there with "soft".

> As you probably know, we have to handle similar issue. Currently it's
> a
> server side configuration, which depending on uid/gid of the user
> returns
> either NFSERR_IO or NFSERR_LAYOUTTRYLATER. This is still wrong, as
> not all
> applications from the same users required the same handling.

You have options here too.

Containers or VMs are one option for completely isolating applications
that need special behaviours, and giving them their own special mounts.

You can also isolate by path: mounting with one set of options in one
part of your namespace and with another set of options in another part
of the namespace, and then pointing the applications at the "correct"
path for the behaviour they need.


> Regards,
>    Tigran.
> 
> > > I would like to use the opportunity to bring the topic of
> > > O_NONBLOCK
> > > open(2)
> > > flag for offline files.
> > 
> > --
> > Trond Myklebust
> > CTO, Hammerspace Inc
> > 4300 El Camino Real, Suite 105
> > Los Altos, CA 94022
> > www.hammer.space
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 28/28] NFS: Remove redundant open context from nfs_page
  2019-03-29 21:59                                                       ` [PATCH v2 28/28] NFS: Remove redundant open context from nfs_page Trond Myklebust
@ 2019-04-05 19:42                                                         ` Anna Schumaker
  2019-04-06 14:14                                                           ` Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Anna Schumaker @ 2019-04-05 19:42 UTC (permalink / raw)
  To: Trond Myklebust, linux-nfs

Hi Trond,

I'm having some trouble with xfstests generic/464 after applying this patch.
Here is the oops I'm getting, I'm able to reproduce it fairly easily against all
NFS versions: 

[   41.182015] BUG: unable to handle kernel NULL pointer dereference at
0000000000000048
[   41.182051] #PF error: [normal kernel read fault]
[   41.182071] PGD 0 P4D 0 
[   41.182086] Oops: 0000 [#1] PREEMPT SMP PTI
[   41.182104] CPU: 1 PID: 14515 Comm: kworker/1:6 Not tainted 5.1.0-rc3-ANNA+
#5292
[   41.182131] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[   41.182174] Workqueue: nfsiod rpc_async_release [sunrpc]
[   41.182209] RIP: 0010:__put_nfs_open_context+0xd/0x110 [nfs]
[   41.182227] Code: f8 ba a8 00 00 00 be c0 0c 00 00 e8 5d c0 58 f7 48 89 c3 48
85 c0 74 c5 e9 15 ff ff ff 0f 1f 44 00 00 41 56 41 55 41 54 55 53 <48> 8b 47 48
48 8b 68 30 4c 8b 60 68 f0 ff 0f 0f 88 14 53 01 00 74
[   41.182267] RSP: 0018:ffffae5f04b6be30 EFLAGS: 00010246
[   41.182279] RAX: 0000000000000000 RBX: ffffa22aae71a680 RCX: 0000000000000000
[   41.182295] RDX: ffffffffb9205dc8 RSI: 0000000000000000 RDI: 0000000000000000
[   41.182311] RBP: 0000000000000801 R08: ffffa22aa5ca0000 R09: ffffa22aba267800
[   41.182327] R10: 0000000000000000 R11: 0000000000000000 R12: ffffce5effd1c200
[   41.182351] R13: 0000000000000000 R14: ffffa22aad2aa6c0 R15: 0ffffce5effd1c20
[   41.182378] FS:  0000000000000000(0000) GS:ffffa22abcb00000(0000)
knlGS:0000000000000000
[   41.182411] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   41.182435] CR2: 0000000000000048 CR3: 00000000af972005 CR4: 0000000000160ee0
[   41.182468] Call Trace:
[   41.182495]  nfs_commitdata_release+0x15/0x30 [nfs]
[   41.182536]  rpc_free_task+0x39/0x70 [sunrpc]
[   41.182571]  rpc_async_release+0x29/0x40 [sunrpc]
[   41.182594]  process_one_work+0x1eb/0x410
[   41.182611]  worker_thread+0x2d/0x3d0
[   41.182621]  ? process_one_work+0x410/0x410
[   41.182632]  kthread+0x112/0x130
[   41.182641]  ? kthread_park+0x80/0x80
[   41.182652]  ret_from_fork+0x35/0x40
[   41.182662] Modules linked in: nfsv3 nfs fscache rpcrdma ib_isert
iscsi_target_mod ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod
ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_uverbs ib_umad rdma_cm cfg80211
ib_cm iw_cm rfkill 8021q mrp ib_core crct10dif_pclmul crc32_pclmul joydev
mousedev crc32c_intel ghash_clmulni_intel aesni_intel input_leds led_class
psmouse aes_x86_64 evdev crypto_simd cryptd glue_helper mac_hid intel_agp
intel_gtt i2c_piix4 pcspkr nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables
x_tables ata_generic pata_acpi serio_raw atkbd libps2 ata_piix libata scsi_mod
floppy i8042 serio xfs virtio_balloon virtio_gpu drm_kms_helper syscopyarea
sysfillrect sysimgblt fb_sys_fops ttm drm virtio_net net_failover failover
agpgart virtio_pci virtio_blk virtio_ring virtio
[   41.182827] CR2: 0000000000000048
[   41.182836] ---[ end trace 19c8b9d2801d26ce ]---

Let me know if you need any more information!

Anna

On Fri, 2019-03-29 at 17:59 -0400, Trond Myklebust wrote:
> The lock context already references and tracks the open context, so
> take the opportunity to save some space in struct nfs_page.
> 
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
>  fs/nfs/pagelist.c        | 7 +------
>  include/linux/nfs_page.h | 3 +--
>  2 files changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index ce6440b79328..5d5ac5df93e2 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -325,7 +325,6 @@ __nfs_create_request(struct nfs_lock_context *l_ctx,
> struct page *page,
>  	req->wb_offset  = offset;
>  	req->wb_pgbase	= pgbase;
>  	req->wb_bytes   = count;
> -	req->wb_context = get_nfs_open_context(ctx);
>  	kref_init(&req->wb_kref);
>  	req->wb_nio = 0;
>  	return req;
> @@ -414,8 +413,8 @@ void nfs_unlock_and_release_request(struct nfs_page *req)
>  static void nfs_clear_request(struct nfs_page *req)
>  {
>  	struct page *page = req->wb_page;
> -	struct nfs_open_context *ctx = req->wb_context;
>  	struct nfs_lock_context *l_ctx = req->wb_lock_context;
> +	struct nfs_open_context *ctx = l_ctx->open_context;
>  
>  	if (page != NULL) {
>  		put_page(page);
> @@ -430,10 +429,6 @@ static void nfs_clear_request(struct nfs_page *req)
>  		nfs_put_lock_context(l_ctx);
>  		req->wb_lock_context = NULL;
>  	}
> -	if (ctx != NULL) {
> -		put_nfs_open_context(ctx);
> -		req->wb_context = NULL;
> -	}
>  }
>  
>  /**
> diff --git a/include/linux/nfs_page.h b/include/linux/nfs_page.h
> index 1ea13e94feb7..0bbd587fac6a 100644
> --- a/include/linux/nfs_page.h
> +++ b/include/linux/nfs_page.h
> @@ -42,7 +42,6 @@ struct nfs_inode;
>  struct nfs_page {
>  	struct list_head	wb_list;	/* Defines state of page: */
>  	struct page		*wb_page;	/* page to read in/write out
> */
> -	struct nfs_open_context	*wb_context;	/* File state context info */
>  	struct nfs_lock_context	*wb_lock_context;	/* lock context info
> */
>  	pgoff_t			wb_index;	/* Offset >> PAGE_SHIFT */
>  	unsigned int		wb_offset,	/* Offset & ~PAGE_MASK */
> @@ -203,7 +202,7 @@ loff_t req_offset(struct nfs_page *req)
>  static inline struct nfs_open_context *
>  nfs_req_openctx(struct nfs_page *req)
>  {
> -	return req->wb_context;
> +	return req->wb_lock_context->open_context;
>  }
>  
>  #endif /* _LINUX_NFS_PAGE_H */


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 28/28] NFS: Remove redundant open context from nfs_page
  2019-04-05 19:42                                                         ` Anna Schumaker
@ 2019-04-06 14:14                                                           ` Trond Myklebust
  2019-04-08 17:11                                                             ` Anna Schumaker
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-04-06 14:14 UTC (permalink / raw)
  To: linux-nfs, schumaker.anna

Hi Anna

On Fri, 2019-04-05 at 15:42 -0400, Anna Schumaker wrote:
> Hi Trond,
> 
> I'm having some trouble with xfstests generic/464 after applying this
> patch.
> Here is the oops I'm getting, I'm able to reproduce it fairly easily
> against all
> NFS versions: 
> 
> [   41.182015] BUG: unable to handle kernel NULL pointer dereference
> at
> 0000000000000048
> [   41.182051] #PF error: [normal kernel read fault]
> [   41.182071] PGD 0 P4D 0 
> [   41.182086] Oops: 0000 [#1] PREEMPT SMP PTI
> [   41.182104] CPU: 1 PID: 14515 Comm: kworker/1:6 Not tainted 5.1.0-
> rc3-ANNA+
> #5292
> [   41.182131] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [   41.182174] Workqueue: nfsiod rpc_async_release [sunrpc]
> [   41.182209] RIP: 0010:__put_nfs_open_context+0xd/0x110 [nfs]
> [   41.182227] Code: f8 ba a8 00 00 00 be c0 0c 00 00 e8 5d c0 58 f7
> 48 89 c3 48
> 85 c0 74 c5 e9 15 ff ff ff 0f 1f 44 00 00 41 56 41 55 41 54 55 53
> <48> 8b 47 48
> 48 8b 68 30 4c 8b 60 68 f0 ff 0f 0f 88 14 53 01 00 74
> [   41.182267] RSP: 0018:ffffae5f04b6be30 EFLAGS: 00010246
> [   41.182279] RAX: 0000000000000000 RBX: ffffa22aae71a680 RCX:
> 0000000000000000
> [   41.182295] RDX: ffffffffb9205dc8 RSI: 0000000000000000 RDI:
> 0000000000000000
> [   41.182311] RBP: 0000000000000801 R08: ffffa22aa5ca0000 R09:
> ffffa22aba267800
> [   41.182327] R10: 0000000000000000 R11: 0000000000000000 R12:
> ffffce5effd1c200
> [   41.182351] R13: 0000000000000000 R14: ffffa22aad2aa6c0 R15:
> 0ffffce5effd1c20
> [   41.182378] FS:  0000000000000000(0000) GS:ffffa22abcb00000(0000)
> knlGS:0000000000000000
> [   41.182411] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   41.182435] CR2: 0000000000000048 CR3: 00000000af972005 CR4:
> 0000000000160ee0
> [   41.182468] Call Trace:
> [   41.182495]  nfs_commitdata_release+0x15/0x30 [nfs]
> [   41.182536]  rpc_free_task+0x39/0x70 [sunrpc]
> [   41.182571]  rpc_async_release+0x29/0x40 [sunrpc]
> [   41.182594]  process_one_work+0x1eb/0x410
> [   41.182611]  worker_thread+0x2d/0x3d0
> [   41.182621]  ? process_one_work+0x410/0x410
> [   41.182632]  kthread+0x112/0x130
> [   41.182641]  ? kthread_park+0x80/0x80
> [   41.182652]  ret_from_fork+0x35/0x40
> [   41.182662] Modules linked in: nfsv3 nfs fscache rpcrdma ib_isert
> 

Does the following patch help to fix this?

Cheers
  Trond

8<---------------------------------------
From d84c699bba78d66d3ccd3f76a6aae573bdfddc05 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <trond.myklebust@hammerspace.com>
Date: Sat, 6 Apr 2019 10:05:55 -0400
Subject: [PATCH] NFS: Ensure that all nfs lock contexts have a valid open
 context

Force the lock context to keep a reference to the parent open
context so that we can guarantee the validity of the latter.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/inode.c | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 414a90d48493..efc45f22c581 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -885,10 +885,14 @@ struct nfs_lock_context *nfs_get_lock_context(struct nfs_open_context *ctx)
 		spin_lock(&inode->i_lock);
 		res = __nfs_find_lock_context(ctx);
 		if (res == NULL) {
-			list_add_tail_rcu(&new->list, &ctx->lock_context.list);
-			new->open_context = ctx;
-			res = new;
-			new = NULL;
+			new->open_context = get_nfs_open_context(ctx);
+			if (new->open_context) {
+				list_add_tail_rcu(&new->list,
+						&ctx->lock_context.list);
+				res = new;
+				new = NULL;
+			} else
+				res = ERR_PTR(-EBADF);
 		}
 		spin_unlock(&inode->i_lock);
 		kfree(new);
@@ -906,6 +910,7 @@ void nfs_put_lock_context(struct nfs_lock_context *l_ctx)
 		return;
 	list_del_rcu(&l_ctx->list);
 	spin_unlock(&inode->i_lock);
+	put_nfs_open_context(ctx);
 	kfree_rcu(l_ctx, rcu_head);
 }
 EXPORT_SYMBOL_GPL(nfs_put_lock_context);
-- 
2.20.1


-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 28/28] NFS: Remove redundant open context from nfs_page
  2019-04-06 14:14                                                           ` Trond Myklebust
@ 2019-04-08 17:11                                                             ` Anna Schumaker
  2019-04-08 18:13                                                               ` Trond Myklebust
  0 siblings, 1 reply; 42+ messages in thread
From: Anna Schumaker @ 2019-04-08 17:11 UTC (permalink / raw)
  To: Trond Myklebust, linux-nfs

On Sat, 2019-04-06 at 14:14 +0000, Trond Myklebust wrote:
> Hi Anna
> 
> On Fri, 2019-04-05 at 15:42 -0400, Anna Schumaker wrote:
> > Hi Trond,
> > 
> > I'm having some trouble with xfstests generic/464 after applying this
> > patch.
> > Here is the oops I'm getting, I'm able to reproduce it fairly easily
> > against all
> > NFS versions: 
> > 
> > [   41.182015] BUG: unable to handle kernel NULL pointer dereference
> > at
> > 0000000000000048
> > [   41.182051] #PF error: [normal kernel read fault]
> > [   41.182071] PGD 0 P4D 0 
> > [   41.182086] Oops: 0000 [#1] PREEMPT SMP PTI
> > [   41.182104] CPU: 1 PID: 14515 Comm: kworker/1:6 Not tainted 5.1.0-
> > rc3-ANNA+
> > #5292
> > [   41.182131] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > [   41.182174] Workqueue: nfsiod rpc_async_release [sunrpc]
> > [   41.182209] RIP: 0010:__put_nfs_open_context+0xd/0x110 [nfs]
> > [   41.182227] Code: f8 ba a8 00 00 00 be c0 0c 00 00 e8 5d c0 58 f7
> > 48 89 c3 48
> > 85 c0 74 c5 e9 15 ff ff ff 0f 1f 44 00 00 41 56 41 55 41 54 55 53
> > <48> 8b 47 48
> > 48 8b 68 30 4c 8b 60 68 f0 ff 0f 0f 88 14 53 01 00 74
> > [   41.182267] RSP: 0018:ffffae5f04b6be30 EFLAGS: 00010246
> > [   41.182279] RAX: 0000000000000000 RBX: ffffa22aae71a680 RCX:
> > 0000000000000000
> > [   41.182295] RDX: ffffffffb9205dc8 RSI: 0000000000000000 RDI:
> > 0000000000000000
> > [   41.182311] RBP: 0000000000000801 R08: ffffa22aa5ca0000 R09:
> > ffffa22aba267800
> > [   41.182327] R10: 0000000000000000 R11: 0000000000000000 R12:
> > ffffce5effd1c200
> > [   41.182351] R13: 0000000000000000 R14: ffffa22aad2aa6c0 R15:
> > 0ffffce5effd1c20
> > [   41.182378] FS:  0000000000000000(0000) GS:ffffa22abcb00000(0000)
> > knlGS:0000000000000000
> > [   41.182411] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [   41.182435] CR2: 0000000000000048 CR3: 00000000af972005 CR4:
> > 0000000000160ee0
> > [   41.182468] Call Trace:
> > [   41.182495]  nfs_commitdata_release+0x15/0x30 [nfs]
> > [   41.182536]  rpc_free_task+0x39/0x70 [sunrpc]
> > [   41.182571]  rpc_async_release+0x29/0x40 [sunrpc]
> > [   41.182594]  process_one_work+0x1eb/0x410
> > [   41.182611]  worker_thread+0x2d/0x3d0
> > [   41.182621]  ? process_one_work+0x410/0x410
> > [   41.182632]  kthread+0x112/0x130
> > [   41.182641]  ? kthread_park+0x80/0x80
> > [   41.182652]  ret_from_fork+0x35/0x40
> > [   41.182662] Modules linked in: nfsv3 nfs fscache rpcrdma ib_isert
> > 
> 
> Does the following patch help to fix this?

This patch does fix the problem. Thanks!

Anna

> 
> Cheers
>   Trond
> 
> 8<---------------------------------------
> From d84c699bba78d66d3ccd3f76a6aae573bdfddc05 Mon Sep 17 00:00:00 2001
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> Date: Sat, 6 Apr 2019 10:05:55 -0400
> Subject: [PATCH] NFS: Ensure that all nfs lock contexts have a valid open
>  context
> 
> Force the lock context to keep a reference to the parent open
> context so that we can guarantee the validity of the latter.
> 
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
>  fs/nfs/inode.c | 13 +++++++++----
>  1 file changed, 9 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index 414a90d48493..efc45f22c581 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -885,10 +885,14 @@ struct nfs_lock_context *nfs_get_lock_context(struct
> nfs_open_context *ctx)
>  		spin_lock(&inode->i_lock);
>  		res = __nfs_find_lock_context(ctx);
>  		if (res == NULL) {
> -			list_add_tail_rcu(&new->list, &ctx->lock_context.list);
> -			new->open_context = ctx;
> -			res = new;
> -			new = NULL;
> +			new->open_context = get_nfs_open_context(ctx);
> +			if (new->open_context) {
> +				list_add_tail_rcu(&new->list,
> +						&ctx->lock_context.list);
> +				res = new;
> +				new = NULL;
> +			} else
> +				res = ERR_PTR(-EBADF);
>  		}
>  		spin_unlock(&inode->i_lock);
>  		kfree(new);
> @@ -906,6 +910,7 @@ void nfs_put_lock_context(struct nfs_lock_context *l_ctx)
>  		return;
>  	list_del_rcu(&l_ctx->list);
>  	spin_unlock(&inode->i_lock);
> +	put_nfs_open_context(ctx);
>  	kfree_rcu(l_ctx, rcu_head);
>  }
>  EXPORT_SYMBOL_GPL(nfs_put_lock_context);
> -- 
> 2.20.1
> 
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 28/28] NFS: Remove redundant open context from nfs_page
  2019-04-08 17:11                                                             ` Anna Schumaker
@ 2019-04-08 18:13                                                               ` Trond Myklebust
  2019-04-08 18:15                                                                 ` Anna Schumaker
  0 siblings, 1 reply; 42+ messages in thread
From: Trond Myklebust @ 2019-04-08 18:13 UTC (permalink / raw)
  To: linux-nfs, schumaker.anna

On Mon, 2019-04-08 at 13:11 -0400, Anna Schumaker wrote:
> On Sat, 2019-04-06 at 14:14 +0000, Trond Myklebust wrote:
> > Hi Anna
> > 
> > On Fri, 2019-04-05 at 15:42 -0400, Anna Schumaker wrote:
> > > Hi Trond,
> > > 
> > > I'm having some trouble with xfstests generic/464 after applying
> > > this
> > > patch.
> > > Here is the oops I'm getting, I'm able to reproduce it fairly
> > > easily
> > > against all
> > > NFS versions: 
> > > 
> > > [   41.182015] BUG: unable to handle kernel NULL pointer
> > > dereference
> > > at
> > > 0000000000000048
> > > [   41.182051] #PF error: [normal kernel read fault]
> > > [   41.182071] PGD 0 P4D 0 
> > > [   41.182086] Oops: 0000 [#1] PREEMPT SMP PTI
> > > [   41.182104] CPU: 1 PID: 14515 Comm: kworker/1:6 Not tainted
> > > 5.1.0-
> > > rc3-ANNA+
> > > #5292
> > > [   41.182131] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > > [   41.182174] Workqueue: nfsiod rpc_async_release [sunrpc]
> > > [   41.182209] RIP: 0010:__put_nfs_open_context+0xd/0x110 [nfs]
> > > [   41.182227] Code: f8 ba a8 00 00 00 be c0 0c 00 00 e8 5d c0 58
> > > f7
> > > 48 89 c3 48
> > > 85 c0 74 c5 e9 15 ff ff ff 0f 1f 44 00 00 41 56 41 55 41 54 55 53
> > > <48> 8b 47 48
> > > 48 8b 68 30 4c 8b 60 68 f0 ff 0f 0f 88 14 53 01 00 74
> > > [   41.182267] RSP: 0018:ffffae5f04b6be30 EFLAGS: 00010246
> > > [   41.182279] RAX: 0000000000000000 RBX: ffffa22aae71a680 RCX:
> > > 0000000000000000
> > > [   41.182295] RDX: ffffffffb9205dc8 RSI: 0000000000000000 RDI:
> > > 0000000000000000
> > > [   41.182311] RBP: 0000000000000801 R08: ffffa22aa5ca0000 R09:
> > > ffffa22aba267800
> > > [   41.182327] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > ffffce5effd1c200
> > > [   41.182351] R13: 0000000000000000 R14: ffffa22aad2aa6c0 R15:
> > > 0ffffce5effd1c20
> > > [   41.182378] FS:  0000000000000000(0000)
> > > GS:ffffa22abcb00000(0000)
> > > knlGS:0000000000000000
> > > [   41.182411] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [   41.182435] CR2: 0000000000000048 CR3: 00000000af972005 CR4:
> > > 0000000000160ee0
> > > [   41.182468] Call Trace:
> > > [   41.182495]  nfs_commitdata_release+0x15/0x30 [nfs]
> > > [   41.182536]  rpc_free_task+0x39/0x70 [sunrpc]
> > > [   41.182571]  rpc_async_release+0x29/0x40 [sunrpc]
> > > [   41.182594]  process_one_work+0x1eb/0x410
> > > [   41.182611]  worker_thread+0x2d/0x3d0
> > > [   41.182621]  ? process_one_work+0x410/0x410
> > > [   41.182632]  kthread+0x112/0x130
> > > [   41.182641]  ? kthread_park+0x80/0x80
> > > [   41.182652]  ret_from_fork+0x35/0x40
> > > [   41.182662] Modules linked in: nfsv3 nfs fscache rpcrdma
> > > ib_isert
> > > 
> > 
> > Does the following patch help to fix this?
> 
> This patch does fix the problem. Thanks!
> 

Cool! I sent a v3 patchset update last night with this patch correctly
positioned so it won't cause bisection issues, and with 2 other
bugfixes. Hopefully, that is the final iteration of the patchset.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH v2 28/28] NFS: Remove redundant open context from nfs_page
  2019-04-08 18:13                                                               ` Trond Myklebust
@ 2019-04-08 18:15                                                                 ` Anna Schumaker
  0 siblings, 0 replies; 42+ messages in thread
From: Anna Schumaker @ 2019-04-08 18:15 UTC (permalink / raw)
  To: Trond Myklebust, linux-nfs

On Mon, 2019-04-08 at 18:13 +0000, Trond Myklebust wrote:
> On Mon, 2019-04-08 at 13:11 -0400, Anna Schumaker wrote:
> > On Sat, 2019-04-06 at 14:14 +0000, Trond Myklebust wrote:
> > > Hi Anna
> > > 
> > > On Fri, 2019-04-05 at 15:42 -0400, Anna Schumaker wrote:
> > > > Hi Trond,
> > > > 
> > > > I'm having some trouble with xfstests generic/464 after applying
> > > > this
> > > > patch.
> > > > Here is the oops I'm getting, I'm able to reproduce it fairly
> > > > easily
> > > > against all
> > > > NFS versions: 
> > > > 
> > > > [   41.182015] BUG: unable to handle kernel NULL pointer
> > > > dereference
> > > > at
> > > > 0000000000000048
> > > > [   41.182051] #PF error: [normal kernel read fault]
> > > > [   41.182071] PGD 0 P4D 0 
> > > > [   41.182086] Oops: 0000 [#1] PREEMPT SMP PTI
> > > > [   41.182104] CPU: 1 PID: 14515 Comm: kworker/1:6 Not tainted
> > > > 5.1.0-
> > > > rc3-ANNA+
> > > > #5292
> > > > [   41.182131] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > > > [   41.182174] Workqueue: nfsiod rpc_async_release [sunrpc]
> > > > [   41.182209] RIP: 0010:__put_nfs_open_context+0xd/0x110 [nfs]
> > > > [   41.182227] Code: f8 ba a8 00 00 00 be c0 0c 00 00 e8 5d c0 58
> > > > f7
> > > > 48 89 c3 48
> > > > 85 c0 74 c5 e9 15 ff ff ff 0f 1f 44 00 00 41 56 41 55 41 54 55 53
> > > > <48> 8b 47 48
> > > > 48 8b 68 30 4c 8b 60 68 f0 ff 0f 0f 88 14 53 01 00 74
> > > > [   41.182267] RSP: 0018:ffffae5f04b6be30 EFLAGS: 00010246
> > > > [   41.182279] RAX: 0000000000000000 RBX: ffffa22aae71a680 RCX:
> > > > 0000000000000000
> > > > [   41.182295] RDX: ffffffffb9205dc8 RSI: 0000000000000000 RDI:
> > > > 0000000000000000
> > > > [   41.182311] RBP: 0000000000000801 R08: ffffa22aa5ca0000 R09:
> > > > ffffa22aba267800
> > > > [   41.182327] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > > ffffce5effd1c200
> > > > [   41.182351] R13: 0000000000000000 R14: ffffa22aad2aa6c0 R15:
> > > > 0ffffce5effd1c20
> > > > [   41.182378] FS:  0000000000000000(0000)
> > > > GS:ffffa22abcb00000(0000)
> > > > knlGS:0000000000000000
> > > > [   41.182411] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [   41.182435] CR2: 0000000000000048 CR3: 00000000af972005 CR4:
> > > > 0000000000160ee0
> > > > [   41.182468] Call Trace:
> > > > [   41.182495]  nfs_commitdata_release+0x15/0x30 [nfs]
> > > > [   41.182536]  rpc_free_task+0x39/0x70 [sunrpc]
> > > > [   41.182571]  rpc_async_release+0x29/0x40 [sunrpc]
> > > > [   41.182594]  process_one_work+0x1eb/0x410
> > > > [   41.182611]  worker_thread+0x2d/0x3d0
> > > > [   41.182621]  ? process_one_work+0x410/0x410
> > > > [   41.182632]  kthread+0x112/0x130
> > > > [   41.182641]  ? kthread_park+0x80/0x80
> > > > [   41.182652]  ret_from_fork+0x35/0x40
> > > > [   41.182662] Modules linked in: nfsv3 nfs fscache rpcrdma
> > > > ib_isert
> > > > 
> > > 
> > > Does the following patch help to fix this?
> > 
> > This patch does fix the problem. Thanks!
> > 
> 
> Cool! I sent a v3 patchset update last night with this patch correctly
> positioned so it won't cause bisection issues, and with 2 other
> bugfixes. Hopefully, that is the final iteration of the patchset.

Sounds good. I'll look through them again and then add them to my linux-next.

Anna

> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
> 


^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2019-04-08 18:15 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-29 21:59 [PATCH v2 00/28] Fix up soft mounts for NFSv4.x Trond Myklebust
2019-03-29 21:59 ` [PATCH v2 01/28] SUNRPC: Fix up task signalling Trond Myklebust
2019-03-29 21:59   ` [PATCH v2 02/28] SUNRPC: Refactor rpc_restart_call/rpc_restart_call_prepare Trond Myklebust
2019-03-29 21:59     ` [PATCH v2 03/28] SUNRPC: Refactor xprt_request_wait_receive() Trond Myklebust
2019-03-29 21:59       ` [PATCH v2 04/28] SUNRPC: Refactor rpc_sleep_on() Trond Myklebust
2019-03-29 21:59         ` [PATCH v2 05/28] SUNRPC: Remove unused argument 'action' from rpc_sleep_on_priority() Trond Myklebust
2019-03-29 21:59           ` [PATCH v2 06/28] SUNRPC: Add function rpc_sleep_on_timeout() Trond Myklebust
2019-03-29 21:59             ` [PATCH v2 07/28] SUNRPC: Fix up tracking of timeouts Trond Myklebust
2019-03-29 21:59               ` [PATCH v2 08/28] SUNRPC: Simplify queue timeouts using timer_reduce() Trond Myklebust
2019-03-29 21:59                 ` [PATCH v2 09/28] SUNRPC: Declare RPC timers as TIMER_DEFERRABLE Trond Myklebust
2019-03-29 21:59                   ` [PATCH v2 10/28] SUNRPC: Ensure that the transport layer respect major timeouts Trond Myklebust
2019-03-29 21:59                     ` [PATCH v2 11/28] SUNRPC: Add tracking of RPC level errors Trond Myklebust
2019-03-29 21:59                       ` [PATCH v2 12/28] SUNRPC: Make "no retrans timeout" soft tasks behave like softconn for timeouts Trond Myklebust
2019-03-29 21:59                         ` [PATCH v2 13/28] SUNRPC: Start the first major timeout calculation at task creation Trond Myklebust
2019-03-29 21:59                           ` [PATCH v2 14/28] SUNRPC: Ensure to ratelimit the "server not responding" syslog messages Trond Myklebust
2019-03-29 21:59                             ` [PATCH v2 15/28] SUNRPC: Add the 'softerr' rpc_client flag Trond Myklebust
2019-03-29 21:59                               ` [PATCH v2 16/28] NFS: Consider ETIMEDOUT to be a fatal error Trond Myklebust
2019-03-29 21:59                                 ` [PATCH v2 17/28] NFS: Move internal constants out of uapi/linux/nfs_mount.h Trond Myklebust
2019-03-29 21:59                                   ` [PATCH v2 18/28] NFS: Add a mount option "softerr" to allow clients to see ETIMEDOUT errors Trond Myklebust
2019-03-29 21:59                                     ` [PATCH v2 19/28] NFS: Don't interrupt file writeout due to fatal errors Trond Myklebust
2019-03-29 21:59                                       ` [PATCH v2 20/28] NFS: Don't call generic_error_remove_page() while holding locks Trond Myklebust
2019-03-29 21:59                                         ` [PATCH v2 21/28] NFS: Don't inadvertently clear writeback errors Trond Myklebust
2019-03-29 21:59                                           ` [PATCH v2 22/28] NFS: Replace custom error reporting mechanism with generic one Trond Myklebust
2019-03-29 21:59                                             ` [PATCH v2 23/28] NFS: Fix up NFS I/O subrequest creation Trond Myklebust
2019-03-29 21:59                                               ` [PATCH v2 24/28] NFS: Remove unused argument from nfs_create_request() Trond Myklebust
2019-03-29 21:59                                                 ` [PATCH v2 25/28] pNFS: Add tracking to limit the number of pNFS retries Trond Myklebust
2019-03-29 21:59                                                   ` [PATCH v2 26/28] NFS: Allow signal interruption of NFS4ERR_DELAYed operations Trond Myklebust
2019-03-29 21:59                                                     ` [PATCH v2 27/28] NFS: Add a helper to return a pointer to the open context of a struct nfs_page Trond Myklebust
2019-03-29 21:59                                                       ` [PATCH v2 28/28] NFS: Remove redundant open context from nfs_page Trond Myklebust
2019-04-05 19:42                                                         ` Anna Schumaker
2019-04-06 14:14                                                           ` Trond Myklebust
2019-04-08 17:11                                                             ` Anna Schumaker
2019-04-08 18:13                                                               ` Trond Myklebust
2019-04-08 18:15                                                                 ` Anna Schumaker
2019-04-01 16:27                                                   ` [PATCH v2 25/28] pNFS: Add tracking to limit the number of pNFS retries Olga Kornievskaia
2019-04-02 18:23                                                     ` Trond Myklebust
2019-04-01 16:54 ` [PATCH v2 00/28] Fix up soft mounts for NFSv4.x Olga Kornievskaia
2019-04-02 18:28   ` Trond Myklebust
2019-04-03 20:51     ` Mkrtchyan, Tigran
2019-04-03 21:13       ` Trond Myklebust
2019-04-03 21:59         ` Mkrtchyan, Tigran
2019-04-03 22:10           ` Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).