linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/9] Various NFSv4 state error handling fixes
@ 2019-09-16 20:44 Trond Myklebust
  2019-09-16 20:44 ` [PATCH v2 1/9] pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors Trond Myklebust
  2019-09-18 19:38 ` [PATCH v2 0/9] Various NFSv4 state error handling fixes Olga Kornievskaia
  0 siblings, 2 replies; 15+ messages in thread
From: Trond Myklebust @ 2019-09-16 20:44 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Olga Kornievskaia, linux-nfs

Various NFSv4 fixes to ensure we handle state errors correctly. In
particular, we need to ensure that for COMPOUNDs like CLOSE and
DELEGRETURN, that may have an embedded LAYOUTRETURN, we handle the
layout state errors so that a retry of either the LAYOUTRETURN, or
the later CLOSE/DELEGRETURN does not corrupt the LAYOUTRETURN
reply.

Also ensure that if we get a NFS4ERR_OLD_STATEID, then we do our
best to still try to destroy the state on the server, in order to
avoid causing state leakage.

v2: Fix bug reports from Olga
 - Try to avoid sending old stateids on CLOSE/OPEN_DOWNGRADE when
   doing fully serialised NFSv4.0.
 - Ensure LOCKU initialises the stateid correctly.

Trond Myklebust (9):
  pNFS: Ensure we do clear the return-on-close layout stateid on fatal
    errors
  NFSv4: Clean up pNFS return-on-close error handling
  NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
  NFSv4: Handle RPC level errors in LAYOUTRETURN
  NFSv4: Add a helper to increment stateid seqids
  pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state
    seqid
  NFSv4: Fix OPEN_DOWNGRADE error handling
  NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
  NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU

 fs/nfs/nfs4_fs.h   |  11 ++-
 fs/nfs/nfs4proc.c  | 204 ++++++++++++++++++++++++++++++---------------
 fs/nfs/nfs4state.c |  16 ----
 fs/nfs/pnfs.c      |  71 ++++++++++++++--
 fs/nfs/pnfs.h      |  17 +++-
 5 files changed, 229 insertions(+), 90 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/9] pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors
  2019-09-16 20:44 [PATCH v2 0/9] Various NFSv4 state error handling fixes Trond Myklebust
@ 2019-09-16 20:44 ` Trond Myklebust
  2019-09-16 20:44   ` [PATCH v2 2/9] NFSv4: Clean up pNFS return-on-close error handling Trond Myklebust
  2019-09-18 19:38 ` [PATCH v2 0/9] Various NFSv4 state error handling fixes Olga Kornievskaia
  1 sibling, 1 reply; 15+ messages in thread
From: Trond Myklebust @ 2019-09-16 20:44 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Olga Kornievskaia, linux-nfs

IF the server rejected our layout return with a state error such as
NFS4ERR_BAD_STATEID, or even a stale inode error, then we do want
to clear out all the remaining layout segments and mark that stateid
as invalid.

Fixes: 1c5bd76d17cca ("pNFS: Enable layoutreturn operation for...")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/pnfs.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 4525d5acae38..0418b198edd3 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1449,10 +1449,15 @@ void pnfs_roc_release(struct nfs4_layoutreturn_args *args,
 	const nfs4_stateid *res_stateid = NULL;
 	struct nfs4_xdr_opaque_data *ld_private = args->ld_private;
 
-	if (ret == 0) {
-		arg_stateid = &args->stateid;
+	switch (ret) {
+	case -NFS4ERR_NOMATCHING_LAYOUT:
+		break;
+	case 0:
 		if (res->lrs_present)
 			res_stateid = &res->stateid;
+		/* Fallthrough */
+	default:
+		arg_stateid = &args->stateid;
 	}
 	pnfs_layoutreturn_free_lsegs(lo, arg_stateid, &args->range,
 			res_stateid);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/9] NFSv4: Clean up pNFS return-on-close error handling
  2019-09-16 20:44 ` [PATCH v2 1/9] pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors Trond Myklebust
@ 2019-09-16 20:44   ` Trond Myklebust
  2019-09-16 20:44     ` [PATCH v2 3/9] NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close Trond Myklebust
  0 siblings, 1 reply; 15+ messages in thread
From: Trond Myklebust @ 2019-09-16 20:44 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Olga Kornievskaia, linux-nfs

Both close and delegreturn have identical code to handle pNFS
return-on-close. This patch refactors that code and places it
in pnfs.c

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs4proc.c | 66 +++++++----------------------------------------
 fs/nfs/pnfs.c     | 27 +++++++++++++++++++
 fs/nfs/pnfs.h     | 13 ++++++++++
 3 files changed, 50 insertions(+), 56 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 1406858bae6c..fcdfddfd3ab4 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3358,32 +3358,11 @@ static void nfs4_close_done(struct rpc_task *task, void *data)
 	trace_nfs4_close(state, &calldata->arg, &calldata->res, task->tk_status);
 
 	/* Handle Layoutreturn errors */
-	if (calldata->arg.lr_args && task->tk_status != 0) {
-		switch (calldata->res.lr_ret) {
-		default:
-			calldata->res.lr_ret = -NFS4ERR_NOMATCHING_LAYOUT;
-			break;
-		case 0:
-			calldata->arg.lr_args = NULL;
-			calldata->res.lr_res = NULL;
-			break;
-		case -NFS4ERR_OLD_STATEID:
-			if (nfs4_layoutreturn_refresh_stateid(&calldata->arg.lr_args->stateid,
-						&calldata->arg.lr_args->range,
-						calldata->inode))
-				goto lr_restart;
-			/* Fallthrough */
-		case -NFS4ERR_ADMIN_REVOKED:
-		case -NFS4ERR_DELEG_REVOKED:
-		case -NFS4ERR_EXPIRED:
-		case -NFS4ERR_BAD_STATEID:
-		case -NFS4ERR_UNKNOWN_LAYOUTTYPE:
-		case -NFS4ERR_WRONG_CRED:
-			calldata->arg.lr_args = NULL;
-			calldata->res.lr_res = NULL;
-			goto lr_restart;
-		}
-	}
+	if (pnfs_roc_done(task, calldata->inode,
+				&calldata->arg.lr_args,
+				&calldata->res.lr_res,
+				&calldata->res.lr_ret) == -EAGAIN)
+		goto out_restart;
 
 	/* hmm. we are done with the inode, and in the process of freeing
 	 * the state_owner. we keep this around to process errors
@@ -3430,8 +3409,6 @@ static void nfs4_close_done(struct rpc_task *task, void *data)
 	nfs_refresh_inode(calldata->inode, &calldata->fattr);
 	dprintk("%s: done, ret = %d!\n", __func__, task->tk_status);
 	return;
-lr_restart:
-	calldata->res.lr_ret = 0;
 out_restart:
 	task->tk_status = 0;
 	rpc_restart_call_prepare(task);
@@ -6129,32 +6106,11 @@ static void nfs4_delegreturn_done(struct rpc_task *task, void *calldata)
 	trace_nfs4_delegreturn_exit(&data->args, &data->res, task->tk_status);
 
 	/* Handle Layoutreturn errors */
-	if (data->args.lr_args && task->tk_status != 0) {
-		switch(data->res.lr_ret) {
-		default:
-			data->res.lr_ret = -NFS4ERR_NOMATCHING_LAYOUT;
-			break;
-		case 0:
-			data->args.lr_args = NULL;
-			data->res.lr_res = NULL;
-			break;
-		case -NFS4ERR_OLD_STATEID:
-			if (nfs4_layoutreturn_refresh_stateid(&data->args.lr_args->stateid,
-						&data->args.lr_args->range,
-						data->inode))
-				goto lr_restart;
-			/* Fallthrough */
-		case -NFS4ERR_ADMIN_REVOKED:
-		case -NFS4ERR_DELEG_REVOKED:
-		case -NFS4ERR_EXPIRED:
-		case -NFS4ERR_BAD_STATEID:
-		case -NFS4ERR_UNKNOWN_LAYOUTTYPE:
-		case -NFS4ERR_WRONG_CRED:
-			data->args.lr_args = NULL;
-			data->res.lr_res = NULL;
-			goto lr_restart;
-		}
-	}
+	if (pnfs_roc_done(task, data->inode,
+				&data->args.lr_args,
+				&data->res.lr_res,
+				&data->res.lr_ret) == -EAGAIN)
+		goto out_restart;
 
 	switch (task->tk_status) {
 	case 0:
@@ -6192,8 +6148,6 @@ static void nfs4_delegreturn_done(struct rpc_task *task, void *calldata)
 	}
 	data->rpc_status = task->tk_status;
 	return;
-lr_restart:
-	data->res.lr_ret = 0;
 out_restart:
 	task->tk_status = 0;
 	rpc_restart_call_prepare(task);
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 0418b198edd3..8769422a12f5 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1440,6 +1440,33 @@ bool pnfs_roc(struct inode *ino,
 	return false;
 }
 
+int pnfs_roc_done(struct rpc_task *task, struct inode *inode,
+		struct nfs4_layoutreturn_args **argpp,
+		struct nfs4_layoutreturn_res **respp,
+		int *ret)
+{
+	struct nfs4_layoutreturn_args *arg = *argpp;
+	int retval = -EAGAIN;
+
+	if (!arg)
+		return 0;
+	/* Handle Layoutreturn errors */
+	switch (*ret) {
+	case 0:
+		retval = 0;
+		break;
+	case -NFS4ERR_OLD_STATEID:
+		if (!nfs4_layoutreturn_refresh_stateid(&arg->stateid,
+					&arg->range, inode))
+			break;
+		*ret = -NFS4ERR_NOMATCHING_LAYOUT;
+		return -EAGAIN;
+	}
+	*argpp = NULL;
+	*respp = NULL;
+	return retval;
+}
+
 void pnfs_roc_release(struct nfs4_layoutreturn_args *args,
 		struct nfs4_layoutreturn_res *res,
 		int ret)
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index f15609c003d8..3ef3756d437c 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -282,6 +282,10 @@ bool pnfs_roc(struct inode *ino,
 		struct nfs4_layoutreturn_args *args,
 		struct nfs4_layoutreturn_res *res,
 		const struct cred *cred);
+int pnfs_roc_done(struct rpc_task *task, struct inode *inode,
+		struct nfs4_layoutreturn_args **argpp,
+		struct nfs4_layoutreturn_res **respp,
+		int *ret);
 void pnfs_roc_release(struct nfs4_layoutreturn_args *args,
 		struct nfs4_layoutreturn_res *res,
 		int ret);
@@ -701,6 +705,15 @@ pnfs_roc(struct inode *ino,
 	return false;
 }
 
+static inline int
+pnfs_roc_done(struct rpc_task *task, struct inode *inode,
+		struct nfs4_layoutreturn_args **argpp,
+		struct nfs4_layoutreturn_res **respp,
+		int *ret)
+{
+	return 0;
+}
+
 static inline void
 pnfs_roc_release(struct nfs4_layoutreturn_args *args,
 		struct nfs4_layoutreturn_res *res,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/9] NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
  2019-09-16 20:44   ` [PATCH v2 2/9] NFSv4: Clean up pNFS return-on-close error handling Trond Myklebust
@ 2019-09-16 20:44     ` Trond Myklebust
  2019-09-16 20:44       ` [PATCH v2 4/9] NFSv4: Handle RPC level errors in LAYOUTRETURN Trond Myklebust
  0 siblings, 1 reply; 15+ messages in thread
From: Trond Myklebust @ 2019-09-16 20:44 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Olga Kornievskaia, linux-nfs

If the server sends a NFS4ERR_DELAY, then allow the caller to retry.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/pnfs.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 8769422a12f5..6436047dc999 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1455,6 +1455,10 @@ int pnfs_roc_done(struct rpc_task *task, struct inode *inode,
 	case 0:
 		retval = 0;
 		break;
+	case -NFS4ERR_DELAY:
+		/* Let the caller handle the retry */
+		*ret = -NFS4ERR_NOMATCHING_LAYOUT;
+		return 0;
 	case -NFS4ERR_OLD_STATEID:
 		if (!nfs4_layoutreturn_refresh_stateid(&arg->stateid,
 					&arg->range, inode))
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/9] NFSv4: Handle RPC level errors in LAYOUTRETURN
  2019-09-16 20:44     ` [PATCH v2 3/9] NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close Trond Myklebust
@ 2019-09-16 20:44       ` Trond Myklebust
  2019-09-16 20:44         ` [PATCH v2 5/9] NFSv4: Add a helper to increment stateid seqids Trond Myklebust
  0 siblings, 1 reply; 15+ messages in thread
From: Trond Myklebust @ 2019-09-16 20:44 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Olga Kornievskaia, linux-nfs

Handle RPC level errors by assuming that the RPC call was successful.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs4proc.c |  9 +++++++++
 fs/nfs/pnfs.c     | 15 +++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index fcdfddfd3ab4..a5deb00b5ad1 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -9057,6 +9057,15 @@ static void nfs4_layoutreturn_done(struct rpc_task *task, void *calldata)
 	if (!nfs41_sequence_process(task, &lrp->res.seq_res))
 		return;
 
+	/*
+	 * Was there an RPC level error? Assume the call succeeded,
+	 * and that we need to release the layout
+	 */
+	if (task->tk_rpc_status != 0 && RPC_WAS_SENT(task)) {
+		lrp->res.lrs_present = 0;
+		return;
+	}
+
 	server = NFS_SERVER(lrp->args.inode);
 	switch (task->tk_status) {
 	case -NFS4ERR_OLD_STATEID:
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 6436047dc999..abc7188f1853 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -1455,6 +1455,21 @@ int pnfs_roc_done(struct rpc_task *task, struct inode *inode,
 	case 0:
 		retval = 0;
 		break;
+	case -NFS4ERR_NOMATCHING_LAYOUT:
+		/* Was there an RPC level error? If not, retry */
+		if (task->tk_rpc_status == 0)
+			break;
+		/* If the call was not sent, let caller handle it */
+		if (!RPC_WAS_SENT(task))
+			return 0;
+		/*
+		 * Otherwise, assume the call succeeded and
+		 * that we need to release the layout
+		 */
+		*ret = 0;
+		(*respp)->lrs_present = 0;
+		retval = 0;
+		break;
 	case -NFS4ERR_DELAY:
 		/* Let the caller handle the retry */
 		*ret = -NFS4ERR_NOMATCHING_LAYOUT;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 5/9] NFSv4: Add a helper to increment stateid seqids
  2019-09-16 20:44       ` [PATCH v2 4/9] NFSv4: Handle RPC level errors in LAYOUTRETURN Trond Myklebust
@ 2019-09-16 20:44         ` Trond Myklebust
  2019-09-16 20:44           ` [PATCH v2 6/9] pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid Trond Myklebust
  0 siblings, 1 reply; 15+ messages in thread
From: Trond Myklebust @ 2019-09-16 20:44 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Olga Kornievskaia, linux-nfs

Add a helper function to increment stateid seqids according to the
rules specified in RFC5661 Section 8.2.2.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs4_fs.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 3564da1ba8a1..e8f74ed98e42 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -574,6 +574,15 @@ static inline bool nfs4_stateid_is_newer(const nfs4_stateid *s1, const nfs4_stat
 	return (s32)(be32_to_cpu(s1->seqid) - be32_to_cpu(s2->seqid)) > 0;
 }
 
+static inline void nfs4_stateid_seqid_inc(nfs4_stateid *s1)
+{
+	u32 seqid = be32_to_cpu(s1->seqid);
+
+	if (++seqid == 0)
+		++seqid;
+	s1->seqid = cpu_to_be32(seqid);
+}
+
 static inline bool nfs4_valid_open_stateid(const struct nfs4_state *state)
 {
 	return test_bit(NFS_STATE_RECOVERY_FAILED, &state->flags) == 0;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 6/9] pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid
  2019-09-16 20:44         ` [PATCH v2 5/9] NFSv4: Add a helper to increment stateid seqids Trond Myklebust
@ 2019-09-16 20:44           ` Trond Myklebust
  2019-09-16 20:44             ` [PATCH v2 7/9] NFSv4: Fix OPEN_DOWNGRADE error handling Trond Myklebust
  0 siblings, 1 reply; 15+ messages in thread
From: Trond Myklebust @ 2019-09-16 20:44 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Olga Kornievskaia, linux-nfs

If a LAYOUTRETURN receives a reply of NFS4ERR_OLD_STATEID then assume we've
missed an update, and just bump the stateid.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs4proc.c |  2 +-
 fs/nfs/pnfs.c     | 18 ++++++++++++++----
 fs/nfs/pnfs.h     |  4 ++--
 3 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index a5deb00b5ad1..cbaf6b7ac128 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -9069,7 +9069,7 @@ static void nfs4_layoutreturn_done(struct rpc_task *task, void *calldata)
 	server = NFS_SERVER(lrp->args.inode);
 	switch (task->tk_status) {
 	case -NFS4ERR_OLD_STATEID:
-		if (nfs4_layoutreturn_refresh_stateid(&lrp->args.stateid,
+		if (nfs4_layout_refresh_old_stateid(&lrp->args.stateid,
 					&lrp->args.range,
 					lrp->args.inode))
 			goto out_restart;
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index abc7188f1853..bb80034a7661 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -359,9 +359,10 @@ pnfs_clear_lseg_state(struct pnfs_layout_segment *lseg,
 }
 
 /*
- * Update the seqid of a layout stateid
+ * Update the seqid of a layout stateid after receiving
+ * NFS4ERR_OLD_STATEID
  */
-bool nfs4_layoutreturn_refresh_stateid(nfs4_stateid *dst,
+bool nfs4_layout_refresh_old_stateid(nfs4_stateid *dst,
 		struct pnfs_layout_range *dst_range,
 		struct inode *inode)
 {
@@ -377,7 +378,15 @@ bool nfs4_layoutreturn_refresh_stateid(nfs4_stateid *dst,
 
 	spin_lock(&inode->i_lock);
 	lo = NFS_I(inode)->layout;
-	if (lo && nfs4_stateid_match_other(dst, &lo->plh_stateid)) {
+	if (lo &&  pnfs_layout_is_valid(lo) &&
+	    nfs4_stateid_match_other(dst, &lo->plh_stateid)) {
+		/* Is our call using the most recent seqid? If so, bump it */
+		if (!nfs4_stateid_is_newer(&lo->plh_stateid, dst)) {
+			nfs4_stateid_seqid_inc(dst);
+			ret = true;
+			goto out;
+		}
+		/* Try to update the seqid to the most recent */
 		err = pnfs_mark_matching_lsegs_return(lo, &head, &range, 0);
 		if (err != -EBUSY) {
 			dst->seqid = lo->plh_stateid.seqid;
@@ -385,6 +394,7 @@ bool nfs4_layoutreturn_refresh_stateid(nfs4_stateid *dst,
 			ret = true;
 		}
 	}
+out:
 	spin_unlock(&inode->i_lock);
 	pnfs_free_lseg_list(&head);
 	return ret;
@@ -1475,7 +1485,7 @@ int pnfs_roc_done(struct rpc_task *task, struct inode *inode,
 		*ret = -NFS4ERR_NOMATCHING_LAYOUT;
 		return 0;
 	case -NFS4ERR_OLD_STATEID:
-		if (!nfs4_layoutreturn_refresh_stateid(&arg->stateid,
+		if (!nfs4_layout_refresh_old_stateid(&arg->stateid,
 					&arg->range, inode))
 			break;
 		*ret = -NFS4ERR_NOMATCHING_LAYOUT;
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index 3ef3756d437c..f8a38065c7e4 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -261,7 +261,7 @@ int pnfs_destroy_layouts_byfsid(struct nfs_client *clp,
 		bool is_recall);
 int pnfs_destroy_layouts_byclid(struct nfs_client *clp,
 		bool is_recall);
-bool nfs4_layoutreturn_refresh_stateid(nfs4_stateid *dst,
+bool nfs4_layout_refresh_old_stateid(nfs4_stateid *dst,
 		struct pnfs_layout_range *dst_range,
 		struct inode *inode);
 void pnfs_put_layout_hdr(struct pnfs_layout_hdr *lo);
@@ -798,7 +798,7 @@ static inline void nfs4_pnfs_v3_ds_connect_unload(void)
 {
 }
 
-static inline bool nfs4_layoutreturn_refresh_stateid(nfs4_stateid *dst,
+static inline bool nfs4_layout_refresh_old_stateid(nfs4_stateid *dst,
 		struct pnfs_layout_range *dst_range,
 		struct inode *inode)
 {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 7/9] NFSv4: Fix OPEN_DOWNGRADE error handling
  2019-09-16 20:44           ` [PATCH v2 6/9] pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid Trond Myklebust
@ 2019-09-16 20:44             ` Trond Myklebust
  2019-09-16 20:44               ` [PATCH v2 8/9] NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE Trond Myklebust
  0 siblings, 1 reply; 15+ messages in thread
From: Trond Myklebust @ 2019-09-16 20:44 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Olga Kornievskaia, linux-nfs

If OPEN_DOWNGRADE returns a state error, then we want to initiate
state recovery in addition to marking the stateid as closed.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs4proc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index cbaf6b7ac128..025dd5efbf34 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3394,7 +3394,9 @@ static void nfs4_close_done(struct rpc_task *task, void *data)
 					task->tk_msg.rpc_cred);
 			/* Fallthrough */
 		case -NFS4ERR_BAD_STATEID:
-			break;
+			if (calldata->arg.fmode == 0)
+				break;
+			/* Fallthrough */
 		default:
 			task->tk_status = nfs4_async_handle_exception(task,
 					server, task->tk_status, &exception);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 8/9] NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
  2019-09-16 20:44             ` [PATCH v2 7/9] NFSv4: Fix OPEN_DOWNGRADE error handling Trond Myklebust
@ 2019-09-16 20:44               ` Trond Myklebust
  2019-09-16 20:44                 ` [PATCH v2 9/9] NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU Trond Myklebust
  0 siblings, 1 reply; 15+ messages in thread
From: Trond Myklebust @ 2019-09-16 20:44 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Olga Kornievskaia, linux-nfs

If a CLOSE or OPEN_DOWNGRADE operation receives a NFS4ERR_OLD_STATEID
then bump the seqid before resending. Ensure we only bump the seqid
by 1.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs4_fs.h   |  2 --
 fs/nfs/nfs4proc.c  | 75 ++++++++++++++++++++++++++++++++++++++++++++--
 fs/nfs/nfs4state.c | 16 ----------
 3 files changed, 72 insertions(+), 21 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index e8f74ed98e42..16b2e5cc3e94 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -491,8 +491,6 @@ extern int nfs4_set_lock_state(struct nfs4_state *state, struct file_lock *fl);
 extern int nfs4_select_rw_stateid(struct nfs4_state *, fmode_t,
 		const struct nfs_lock_context *, nfs4_stateid *,
 		const struct cred **);
-extern bool nfs4_refresh_open_stateid(nfs4_stateid *dst,
-		struct nfs4_state *state);
 extern bool nfs4_copy_open_stateid(nfs4_stateid *dst,
 		struct nfs4_state *state);
 
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 025dd5efbf34..c14af2c1c6b6 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3308,6 +3308,75 @@ nfs4_wait_on_layoutreturn(struct inode *inode, struct rpc_task *task)
 	return pnfs_wait_on_layoutreturn(inode, task);
 }
 
+/*
+ * Update the seqid of an open stateid
+ */
+static void nfs4_sync_open_stateid(nfs4_stateid *dst,
+		struct nfs4_state *state)
+{
+	__be32 seqid_open;
+	u32 dst_seqid;
+	int seq;
+
+	for (;;) {
+		if (!nfs4_valid_open_stateid(state))
+			break;
+		seq = read_seqbegin(&state->seqlock);
+		if (!nfs4_state_match_open_stateid_other(state, dst)) {
+			nfs4_stateid_copy(dst, &state->open_stateid);
+			if (read_seqretry(&state->seqlock, seq))
+				continue;
+			break;
+		}
+		seqid_open = state->open_stateid.seqid;
+		if (read_seqretry(&state->seqlock, seq))
+			continue;
+
+		dst_seqid = be32_to_cpu(dst->seqid);
+		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) < 0)
+			dst->seqid = seqid_open;
+		break;
+	}
+}
+
+/*
+ * Update the seqid of an open stateid after receiving
+ * NFS4ERR_OLD_STATEID
+ */
+static bool nfs4_refresh_open_old_stateid(nfs4_stateid *dst,
+		struct nfs4_state *state)
+{
+	__be32 seqid_open;
+	u32 dst_seqid;
+	bool ret;
+	int seq;
+
+	for (;;) {
+		ret = false;
+		if (!nfs4_valid_open_stateid(state))
+			break;
+		seq = read_seqbegin(&state->seqlock);
+		if (!nfs4_state_match_open_stateid_other(state, dst)) {
+			if (read_seqretry(&state->seqlock, seq))
+				continue;
+			break;
+		}
+		seqid_open = state->open_stateid.seqid;
+		if (read_seqretry(&state->seqlock, seq))
+			continue;
+
+		dst_seqid = be32_to_cpu(dst->seqid);
+		if ((s32)(dst_seqid - be32_to_cpu(seqid_open)) >= 0)
+			dst->seqid = cpu_to_be32(dst_seqid + 1);
+		else
+			dst->seqid = seqid_open;
+		ret = true;
+		break;
+	}
+
+	return ret;
+}
+
 struct nfs4_closedata {
 	struct inode *inode;
 	struct nfs4_state *state;
@@ -3382,7 +3451,7 @@ static void nfs4_close_done(struct rpc_task *task, void *data)
 			break;
 		case -NFS4ERR_OLD_STATEID:
 			/* Did we race with OPEN? */
-			if (nfs4_refresh_open_stateid(&calldata->arg.stateid,
+			if (nfs4_refresh_open_old_stateid(&calldata->arg.stateid,
 						state))
 				goto out_restart;
 			goto out_release;
@@ -3451,8 +3520,8 @@ static void nfs4_close_prepare(struct rpc_task *task, void *data)
 	} else if (is_rdwr)
 		calldata->arg.fmode |= FMODE_READ|FMODE_WRITE;
 
-	if (!nfs4_valid_open_stateid(state) ||
-	    !nfs4_refresh_open_stateid(&calldata->arg.stateid, state))
+	nfs4_sync_open_stateid(&calldata->arg.stateid, state);
+	if (!nfs4_valid_open_stateid(state))
 		call_close = 0;
 	spin_unlock(&state->owner->so_lock);
 
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index cad4e064b328..e23945174da4 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -1015,22 +1015,6 @@ static int nfs4_copy_lock_stateid(nfs4_stateid *dst,
 	return ret;
 }
 
-bool nfs4_refresh_open_stateid(nfs4_stateid *dst, struct nfs4_state *state)
-{
-	bool ret;
-	int seq;
-
-	do {
-		ret = false;
-		seq = read_seqbegin(&state->seqlock);
-		if (nfs4_state_match_open_stateid_other(state, dst)) {
-			dst->seqid = state->open_stateid.seqid;
-			ret = true;
-		}
-	} while (read_seqretry(&state->seqlock, seq));
-	return ret;
-}
-
 bool nfs4_copy_open_stateid(nfs4_stateid *dst, struct nfs4_state *state)
 {
 	bool ret;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 9/9] NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
  2019-09-16 20:44               ` [PATCH v2 8/9] NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE Trond Myklebust
@ 2019-09-16 20:44                 ` Trond Myklebust
  0 siblings, 0 replies; 15+ messages in thread
From: Trond Myklebust @ 2019-09-16 20:44 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: Olga Kornievskaia, linux-nfs

If a LOCKU request receives a NFS4ERR_OLD_STATEID, then bump the
seqid before resending. Ensure we only bump the seqid by 1.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs4proc.c | 48 +++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 44 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index c14af2c1c6b6..2fe6f8737023 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -6411,6 +6411,42 @@ static int nfs4_proc_getlk(struct nfs4_state *state, int cmd, struct file_lock *
 	return err;
 }
 
+/*
+ * Update the seqid of a lock stateid after receiving
+ * NFS4ERR_OLD_STATEID
+ */
+static bool nfs4_refresh_lock_old_stateid(nfs4_stateid *dst,
+		struct nfs4_lock_state *lsp)
+{
+	struct nfs4_state *state = lsp->ls_state;
+	bool ret = false;
+
+	spin_lock(&state->state_lock);
+	if (!nfs4_stateid_match_other(dst, &lsp->ls_stateid))
+		goto out;
+	if (!nfs4_stateid_is_newer(&lsp->ls_stateid, dst))
+		nfs4_stateid_seqid_inc(dst);
+	else
+		dst->seqid = lsp->ls_stateid.seqid;
+	ret = true;
+out:
+	spin_unlock(&state->state_lock);
+	return ret;
+}
+
+static bool nfs4_sync_lock_stateid(nfs4_stateid *dst,
+		struct nfs4_lock_state *lsp)
+{
+	struct nfs4_state *state = lsp->ls_state;
+	bool ret;
+
+	spin_lock(&state->state_lock);
+	ret = !nfs4_stateid_match_other(dst, &lsp->ls_stateid);
+	nfs4_stateid_copy(dst, &lsp->ls_stateid);
+	spin_unlock(&state->state_lock);
+	return ret;
+}
+
 struct nfs4_unlockdata {
 	struct nfs_locku_args arg;
 	struct nfs_locku_res res;
@@ -6444,6 +6480,7 @@ static struct nfs4_unlockdata *nfs4_alloc_unlockdata(struct file_lock *fl,
 	locks_init_lock(&p->fl);
 	locks_copy_lock(&p->fl, fl);
 	p->server = NFS_SERVER(inode);
+	nfs4_stateid_copy(&p->arg.stateid, &lsp->ls_stateid);
 	return p;
 }
 
@@ -6482,10 +6519,14 @@ static void nfs4_locku_done(struct rpc_task *task, void *data)
 					task->tk_msg.rpc_cred);
 			/* Fall through */
 		case -NFS4ERR_BAD_STATEID:
-		case -NFS4ERR_OLD_STATEID:
 		case -NFS4ERR_STALE_STATEID:
-			if (!nfs4_stateid_match(&calldata->arg.stateid,
-						&calldata->lsp->ls_stateid))
+			if (nfs4_sync_lock_stateid(&calldata->arg.stateid,
+						calldata->lsp))
+				rpc_restart_call_prepare(task);
+			break;
+		case -NFS4ERR_OLD_STATEID:
+			if (nfs4_refresh_lock_old_stateid(&calldata->arg.stateid,
+						calldata->lsp))
 				rpc_restart_call_prepare(task);
 			break;
 		default:
@@ -6508,7 +6549,6 @@ static void nfs4_locku_prepare(struct rpc_task *task, void *data)
 
 	if (nfs_wait_on_sequence(calldata->arg.seqid, task) != 0)
 		goto out_wait;
-	nfs4_stateid_copy(&calldata->arg.stateid, &calldata->lsp->ls_stateid);
 	if (test_bit(NFS_LOCK_INITIALIZED, &calldata->lsp->ls_flags) == 0) {
 		/* Note: exit _without_ running nfs4_locku_done */
 		goto out_no_action;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/9] Various NFSv4 state error handling fixes
  2019-09-16 20:44 [PATCH v2 0/9] Various NFSv4 state error handling fixes Trond Myklebust
  2019-09-16 20:44 ` [PATCH v2 1/9] pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors Trond Myklebust
@ 2019-09-18 19:38 ` Olga Kornievskaia
  2019-09-19  1:49   ` Trond Myklebust
  1 sibling, 1 reply; 15+ messages in thread
From: Olga Kornievskaia @ 2019-09-18 19:38 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Anna Schumaker, linux-nfs

Hi Trond,

These set of patches do not address the locking problem. It's actually
not the locking patch (which I thought it was as I reverted it and
still had the issue). Without the whole patch series the unlock works
fine so something in these new patches. Something is up with the 2
patches:
NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU

If I remove either one separately, unlock fails but if I remove both
unlock works.

On Mon, Sep 16, 2019 at 4:46 PM Trond Myklebust <trondmy@gmail.com> wrote:
>
> Various NFSv4 fixes to ensure we handle state errors correctly. In
> particular, we need to ensure that for COMPOUNDs like CLOSE and
> DELEGRETURN, that may have an embedded LAYOUTRETURN, we handle the
> layout state errors so that a retry of either the LAYOUTRETURN, or
> the later CLOSE/DELEGRETURN does not corrupt the LAYOUTRETURN
> reply.
>
> Also ensure that if we get a NFS4ERR_OLD_STATEID, then we do our
> best to still try to destroy the state on the server, in order to
> avoid causing state leakage.
>
> v2: Fix bug reports from Olga
>  - Try to avoid sending old stateids on CLOSE/OPEN_DOWNGRADE when
>    doing fully serialised NFSv4.0.
>  - Ensure LOCKU initialises the stateid correctly.
>
> Trond Myklebust (9):
>   pNFS: Ensure we do clear the return-on-close layout stateid on fatal
>     errors
>   NFSv4: Clean up pNFS return-on-close error handling
>   NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
>   NFSv4: Handle RPC level errors in LAYOUTRETURN
>   NFSv4: Add a helper to increment stateid seqids
>   pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state
>     seqid
>   NFSv4: Fix OPEN_DOWNGRADE error handling
>   NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
>   NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
>
>  fs/nfs/nfs4_fs.h   |  11 ++-
>  fs/nfs/nfs4proc.c  | 204 ++++++++++++++++++++++++++++++---------------
>  fs/nfs/nfs4state.c |  16 ----
>  fs/nfs/pnfs.c      |  71 ++++++++++++++--
>  fs/nfs/pnfs.h      |  17 +++-
>  5 files changed, 229 insertions(+), 90 deletions(-)
>
> --
> 2.21.0
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/9] Various NFSv4 state error handling fixes
  2019-09-18 19:38 ` [PATCH v2 0/9] Various NFSv4 state error handling fixes Olga Kornievskaia
@ 2019-09-19  1:49   ` Trond Myklebust
       [not found]     ` <CAN-5tyGX_Mb-wGTREtSWRFFSNK0qjgqLbm8SFPG=DPM7M2OWoQ@mail.gmail.com>
  0 siblings, 1 reply; 15+ messages in thread
From: Trond Myklebust @ 2019-09-19  1:49 UTC (permalink / raw)
  To: aglo; +Cc: linux-nfs, Anna.Schumaker

Hi Olga

On Wed, 2019-09-18 at 15:38 -0400, Olga Kornievskaia wrote:
> Hi Trond,
> 
> These set of patches do not address the locking problem. It's
> actually
> not the locking patch (which I thought it was as I reverted it and
> still had the issue). Without the whole patch series the unlock works
> fine so something in these new patches. Something is up with the 2
> patches:
> NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
> NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
> 
> If I remove either one separately, unlock fails but if I remove both
> unlock works.


Can you describe how you are testing this, and perhaps provide
wireshark traces that show how we're triggering these problems?

Thanks!
  Trond

> On Mon, Sep 16, 2019 at 4:46 PM Trond Myklebust <trondmy@gmail.com>
> wrote:
> > Various NFSv4 fixes to ensure we handle state errors correctly. In
> > particular, we need to ensure that for COMPOUNDs like CLOSE and
> > DELEGRETURN, that may have an embedded LAYOUTRETURN, we handle the
> > layout state errors so that a retry of either the LAYOUTRETURN, or
> > the later CLOSE/DELEGRETURN does not corrupt the LAYOUTRETURN
> > reply.
> > 
> > Also ensure that if we get a NFS4ERR_OLD_STATEID, then we do our
> > best to still try to destroy the state on the server, in order to
> > avoid causing state leakage.
> > 
> > v2: Fix bug reports from Olga
> >  - Try to avoid sending old stateids on CLOSE/OPEN_DOWNGRADE when
> >    doing fully serialised NFSv4.0.
> >  - Ensure LOCKU initialises the stateid correctly.
> > 
> > Trond Myklebust (9):
> >   pNFS: Ensure we do clear the return-on-close layout stateid on
> > fatal
> >     errors
> >   NFSv4: Clean up pNFS return-on-close error handling
> >   NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
> >   NFSv4: Handle RPC level errors in LAYOUTRETURN
> >   NFSv4: Add a helper to increment stateid seqids
> >   pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the
> > state
> >     seqid
> >   NFSv4: Fix OPEN_DOWNGRADE error handling
> >   NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
> >   NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
> > 
> >  fs/nfs/nfs4_fs.h   |  11 ++-
> >  fs/nfs/nfs4proc.c  | 204 ++++++++++++++++++++++++++++++-----------
> > ----
> >  fs/nfs/nfs4state.c |  16 ----
> >  fs/nfs/pnfs.c      |  71 ++++++++++++++--
> >  fs/nfs/pnfs.h      |  17 +++-
> >  5 files changed, 229 insertions(+), 90 deletions(-)
> > 
> > --
> > 2.21.0
> > 
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/9] Various NFSv4 state error handling fixes
       [not found]     ` <CAN-5tyGX_Mb-wGTREtSWRFFSNK0qjgqLbm8SFPG=DPM7M2OWoQ@mail.gmail.com>
@ 2019-09-19 23:42       ` Trond Myklebust
  2019-09-20 14:25         ` Olga Kornievskaia
  0 siblings, 1 reply; 15+ messages in thread
From: Trond Myklebust @ 2019-09-19 23:42 UTC (permalink / raw)
  To: aglo; +Cc: linux-nfs, Anna.Schumaker

Hi Olga

On Thu, 2019-09-19 at 09:14 -0400, Olga Kornievskaia wrote:
> Hi Trond,
> 
> On Wed, Sep 18, 2019 at 9:49 PM Trond Myklebust <
> trondmy@hammerspace.com> wrote:
> > Hi Olga
> > 
> > On Wed, 2019-09-18 at 15:38 -0400, Olga Kornievskaia wrote:
> > > Hi Trond,
> > > 
> > > These set of patches do not address the locking problem. It's
> > > actually
> > > not the locking patch (which I thought it was as I reverted it
> > > and
> > > still had the issue). Without the whole patch series the unlock
> > > works
> > > fine so something in these new patches. Something is up with the
> > > 2
> > > patches:
> > > NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
> > > NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
> > > 
> > > If I remove either one separately, unlock fails but if I remove
> > > both
> > > unlock works.
> > 
> > Can you describe how you are testing this, and perhaps provide
> > wireshark traces that show how we're triggering these problems?
> 
> I'm triggering by running "nfstest_lock --nfsversion 4.1 --runtest
> btest01" against either linux or ontap servers (while the test
> doesn't
> fail but on the network trace you can see unlock failing with
> bad_stateid). Network trace attached.
> 
> But actually a simple test open, lock, unlock does the trick (network
> trace attached).
> fd1 = open(RDWR)
> fctl(fd1) (lock /unlock)


These traces really do not mesh with what I'm seeing using a simple
Connectathon lock test run. When I look at the wireshark output from
that, I see exadtly two cases where the stateid arguments are both
zero, and those are both SETATTR, so expected.

All the LOCKU are showing up as non-zero stateids, and so I'm seeing no
BAD_STATEID or OLD_STATEID errors at all.

Is there something special about how your test is running?

Cheers
  Trond

PS: Note: I do think I need a v3 of the LOCKU patch in order to add a
spinlock around the new copy of the lock stateid in
nfs4_alloc_unlockdata(). However I don't see how the missing spinlocks
could cause you to consistently be seeing an all-zero stateid argument.


> 
> > Thanks!
> >   Trond
> > 
> > > On Mon, Sep 16, 2019 at 4:46 PM Trond Myklebust <
> > > trondmy@gmail.com>
> > > wrote:
> > > > Various NFSv4 fixes to ensure we handle state errors correctly.
> > > > In
> > > > particular, we need to ensure that for COMPOUNDs like CLOSE and
> > > > DELEGRETURN, that may have an embedded LAYOUTRETURN, we handle
> > > > the
> > > > layout state errors so that a retry of either the LAYOUTRETURN,
> > > > or
> > > > the later CLOSE/DELEGRETURN does not corrupt the LAYOUTRETURN
> > > > reply.
> > > > 
> > > > Also ensure that if we get a NFS4ERR_OLD_STATEID, then we do
> > > > our
> > > > best to still try to destroy the state on the server, in order
> > > > to
> > > > avoid causing state leakage.
> > > > 
> > > > v2: Fix bug reports from Olga
> > > >  - Try to avoid sending old stateids on CLOSE/OPEN_DOWNGRADE
> > > > when
> > > >    doing fully serialised NFSv4.0.
> > > >  - Ensure LOCKU initialises the stateid correctly.
> > > > 
> > > > Trond Myklebust (9):
> > > >   pNFS: Ensure we do clear the return-on-close layout stateid
> > > > on
> > > > fatal
> > > >     errors
> > > >   NFSv4: Clean up pNFS return-on-close error handling
> > > >   NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
> > > >   NFSv4: Handle RPC level errors in LAYOUTRETURN
> > > >   NFSv4: Add a helper to increment stateid seqids
> > > >   pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping
> > > > the
> > > > state
> > > >     seqid
> > > >   NFSv4: Fix OPEN_DOWNGRADE error handling
> > > >   NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
> > > >   NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
> > > > 
> > > >  fs/nfs/nfs4_fs.h   |  11 ++-
> > > >  fs/nfs/nfs4proc.c  | 204 ++++++++++++++++++++++++++++++-------
> > > > ----
> > > > ----
> > > >  fs/nfs/nfs4state.c |  16 ----
> > > >  fs/nfs/pnfs.c      |  71 ++++++++++++++--
> > > >  fs/nfs/pnfs.h      |  17 +++-
> > > >  5 files changed, 229 insertions(+), 90 deletions(-)
> > > > 
> > > > --
> > > > 2.21.0
> > > > 
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > trond.myklebust@hammerspace.com
> > 
> > 
Trond Myklebust
CTO, Hammerspace Inc
4300 El Camino Real, Suite 105
Los Altos, CA 94022
www.hammer.space

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/9] Various NFSv4 state error handling fixes
  2019-09-19 23:42       ` Trond Myklebust
@ 2019-09-20 14:25         ` Olga Kornievskaia
  2019-09-20 14:54           ` Trond Myklebust
  0 siblings, 1 reply; 15+ messages in thread
From: Olga Kornievskaia @ 2019-09-20 14:25 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, Anna.Schumaker

Hi Trond,

On Thu, Sep 19, 2019 at 7:42 PM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> Hi Olga
>
> On Thu, 2019-09-19 at 09:14 -0400, Olga Kornievskaia wrote:
> > Hi Trond,
> >
> > On Wed, Sep 18, 2019 at 9:49 PM Trond Myklebust <
> > trondmy@hammerspace.com> wrote:
> > > Hi Olga
> > >
> > > On Wed, 2019-09-18 at 15:38 -0400, Olga Kornievskaia wrote:
> > > > Hi Trond,
> > > >
> > > > These set of patches do not address the locking problem. It's
> > > > actually
> > > > not the locking patch (which I thought it was as I reverted it
> > > > and
> > > > still had the issue). Without the whole patch series the unlock
> > > > works
> > > > fine so something in these new patches. Something is up with the
> > > > 2
> > > > patches:
> > > > NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
> > > > NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
> > > >
> > > > If I remove either one separately, unlock fails but if I remove
> > > > both
> > > > unlock works.
> > >
> > > Can you describe how you are testing this, and perhaps provide
> > > wireshark traces that show how we're triggering these problems?
> >
> > I'm triggering by running "nfstest_lock --nfsversion 4.1 --runtest
> > btest01" against either linux or ontap servers (while the test
> > doesn't
> > fail but on the network trace you can see unlock failing with
> > bad_stateid). Network trace attached.
> >
> > But actually a simple test open, lock, unlock does the trick (network
> > trace attached).
> > fd1 = open(RDWR)
> > fctl(fd1) (lock /unlock)
>
>
> These traces really do not mesh with what I'm seeing using a simple
> Connectathon lock test run. When I look at the wireshark output from
> that, I see exadtly two cases where the stateid arguments are both
> zero, and those are both SETATTR, so expected.
>
> All the LOCKU are showing up as non-zero stateids, and so I'm seeing no
> BAD_STATEID or OLD_STATEID errors at all.
>
> Is there something special about how your test is running?

There is nothing special that I can think of about my setup or how
test run. I pull from your testing branch, build it (no extra
patches). Run tests over 4.1 (default mount opts) against a linux
server (typically same kernel).

Is this patch series somewhere in your git branches? I've been testing
your testing branch (as I could see v2 changes were in the testing
branch). It's not obvious to me what was changed in v3 to see if the
testing branch has the right code.

>
> Cheers
>   Trond
>
> PS: Note: I do think I need a v3 of the LOCKU patch in order to add a
> spinlock around the new copy of the lock stateid in
> nfs4_alloc_unlockdata(). However I don't see how the missing spinlocks
> could cause you to consistently be seeing an all-zero stateid argument.

I'm not testing with v3 so I don't think that's it.

>
>
> >
> > > Thanks!
> > >   Trond
> > >
> > > > On Mon, Sep 16, 2019 at 4:46 PM Trond Myklebust <
> > > > trondmy@gmail.com>
> > > > wrote:
> > > > > Various NFSv4 fixes to ensure we handle state errors correctly.
> > > > > In
> > > > > particular, we need to ensure that for COMPOUNDs like CLOSE and
> > > > > DELEGRETURN, that may have an embedded LAYOUTRETURN, we handle
> > > > > the
> > > > > layout state errors so that a retry of either the LAYOUTRETURN,
> > > > > or
> > > > > the later CLOSE/DELEGRETURN does not corrupt the LAYOUTRETURN
> > > > > reply.
> > > > >
> > > > > Also ensure that if we get a NFS4ERR_OLD_STATEID, then we do
> > > > > our
> > > > > best to still try to destroy the state on the server, in order
> > > > > to
> > > > > avoid causing state leakage.
> > > > >
> > > > > v2: Fix bug reports from Olga
> > > > >  - Try to avoid sending old stateids on CLOSE/OPEN_DOWNGRADE
> > > > > when
> > > > >    doing fully serialised NFSv4.0.
> > > > >  - Ensure LOCKU initialises the stateid correctly.
> > > > >
> > > > > Trond Myklebust (9):
> > > > >   pNFS: Ensure we do clear the return-on-close layout stateid
> > > > > on
> > > > > fatal
> > > > >     errors
> > > > >   NFSv4: Clean up pNFS return-on-close error handling
> > > > >   NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close
> > > > >   NFSv4: Handle RPC level errors in LAYOUTRETURN
> > > > >   NFSv4: Add a helper to increment stateid seqids
> > > > >   pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping
> > > > > the
> > > > > state
> > > > >     seqid
> > > > >   NFSv4: Fix OPEN_DOWNGRADE error handling
> > > > >   NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
> > > > >   NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
> > > > >
> > > > >  fs/nfs/nfs4_fs.h   |  11 ++-
> > > > >  fs/nfs/nfs4proc.c  | 204 ++++++++++++++++++++++++++++++-------
> > > > > ----
> > > > > ----
> > > > >  fs/nfs/nfs4state.c |  16 ----
> > > > >  fs/nfs/pnfs.c      |  71 ++++++++++++++--
> > > > >  fs/nfs/pnfs.h      |  17 +++-
> > > > >  5 files changed, 229 insertions(+), 90 deletions(-)
> > > > >
> > > > > --
> > > > > 2.21.0
> > > > >
> > > --
> > > Trond Myklebust
> > > Linux NFS client maintainer, Hammerspace
> > > trond.myklebust@hammerspace.com
> > >
> > >
> Trond Myklebust
> CTO, Hammerspace Inc
> 4300 El Camino Real, Suite 105
> Los Altos, CA 94022
> www.hammer.space
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
>
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/9] Various NFSv4 state error handling fixes
  2019-09-20 14:25         ` Olga Kornievskaia
@ 2019-09-20 14:54           ` Trond Myklebust
  0 siblings, 0 replies; 15+ messages in thread
From: Trond Myklebust @ 2019-09-20 14:54 UTC (permalink / raw)
  To: aglo; +Cc: linux-nfs, Anna.Schumaker

On Fri, 2019-09-20 at 10:25 -0400, Olga Kornievskaia wrote:
> Hi Trond,
> 
> On Thu, Sep 19, 2019 at 7:42 PM Trond Myklebust <
> trondmy@hammerspace.com> wrote:
> > Hi Olga
> > 
> > On Thu, 2019-09-19 at 09:14 -0400, Olga Kornievskaia wrote:
> > > Hi Trond,
> > > 
> > > On Wed, Sep 18, 2019 at 9:49 PM Trond Myklebust <
> > > trondmy@hammerspace.com> wrote:
> > > > Hi Olga
> > > > 
> > > > On Wed, 2019-09-18 at 15:38 -0400, Olga Kornievskaia wrote:
> > > > > Hi Trond,
> > > > > 
> > > > > These set of patches do not address the locking problem. It's
> > > > > actually
> > > > > not the locking patch (which I thought it was as I reverted
> > > > > it
> > > > > and
> > > > > still had the issue). Without the whole patch series the
> > > > > unlock
> > > > > works
> > > > > fine so something in these new patches. Something is up with
> > > > > the
> > > > > 2
> > > > > patches:
> > > > > NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE
> > > > > NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU
> > > > > 
> > > > > If I remove either one separately, unlock fails but if I
> > > > > remove
> > > > > both
> > > > > unlock works.
> > > > 
> > > > Can you describe how you are testing this, and perhaps provide
> > > > wireshark traces that show how we're triggering these problems?
> > > 
> > > I'm triggering by running "nfstest_lock --nfsversion 4.1 --
> > > runtest
> > > btest01" against either linux or ontap servers (while the test
> > > doesn't
> > > fail but on the network trace you can see unlock failing with
> > > bad_stateid). Network trace attached.
> > > 
> > > But actually a simple test open, lock, unlock does the trick
> > > (network
> > > trace attached).
> > > fd1 = open(RDWR)
> > > fctl(fd1) (lock /unlock)
> > 
> > These traces really do not mesh with what I'm seeing using a simple
> > Connectathon lock test run. When I look at the wireshark output
> > from
> > that, I see exadtly two cases where the stateid arguments are both
> > zero, and those are both SETATTR, so expected.
> > 
> > All the LOCKU are showing up as non-zero stateids, and so I'm
> > seeing no
> > BAD_STATEID or OLD_STATEID errors at all.
> > 
> > Is there something special about how your test is running?
> 
> There is nothing special that I can think of about my setup or how
> test run. I pull from your testing branch, build it (no extra
> patches). Run tests over 4.1 (default mount opts) against a linux
> server (typically same kernel).
> 
> Is this patch series somewhere in your git branches? I've been
> testing
> your testing branch (as I could see v2 changes were in the testing
> branch). It's not obvious to me what was changed in v3 to see if the
> testing branch has the right code.

I hadn't yet updated the testing branch with the v3 code. Pushed out
now as a forced-update.

Cheers
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-09-20 14:54 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-16 20:44 [PATCH v2 0/9] Various NFSv4 state error handling fixes Trond Myklebust
2019-09-16 20:44 ` [PATCH v2 1/9] pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors Trond Myklebust
2019-09-16 20:44   ` [PATCH v2 2/9] NFSv4: Clean up pNFS return-on-close error handling Trond Myklebust
2019-09-16 20:44     ` [PATCH v2 3/9] NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close Trond Myklebust
2019-09-16 20:44       ` [PATCH v2 4/9] NFSv4: Handle RPC level errors in LAYOUTRETURN Trond Myklebust
2019-09-16 20:44         ` [PATCH v2 5/9] NFSv4: Add a helper to increment stateid seqids Trond Myklebust
2019-09-16 20:44           ` [PATCH v2 6/9] pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid Trond Myklebust
2019-09-16 20:44             ` [PATCH v2 7/9] NFSv4: Fix OPEN_DOWNGRADE error handling Trond Myklebust
2019-09-16 20:44               ` [PATCH v2 8/9] NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE Trond Myklebust
2019-09-16 20:44                 ` [PATCH v2 9/9] NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU Trond Myklebust
2019-09-18 19:38 ` [PATCH v2 0/9] Various NFSv4 state error handling fixes Olga Kornievskaia
2019-09-19  1:49   ` Trond Myklebust
     [not found]     ` <CAN-5tyGX_Mb-wGTREtSWRFFSNK0qjgqLbm8SFPG=DPM7M2OWoQ@mail.gmail.com>
2019-09-19 23:42       ` Trond Myklebust
2019-09-20 14:25         ` Olga Kornievskaia
2019-09-20 14:54           ` Trond Myklebust

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).