All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 00/10] Assorted patches for knfsd
@ 2021-12-19  1:37 trondmy
  2021-12-19  1:37 ` [PATCH v2 01/10] nfsd: map EBADF trondmy
  2021-12-21 18:10 ` [PATCH v2 00/10] Assorted patches for knfsd Chuck Lever III
  0 siblings, 2 replies; 28+ messages in thread
From: trondmy @ 2021-12-19  1:37 UTC (permalink / raw)
  To: Chuck Lever, J. Bruce Fields; +Cc: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

The following patchset is mainly for improving support for re-exporting
NFSv4 as NFSv3. However it also includes one generic bugfix for NFSv3 to
allow zero length writes. It also improves the writeback performance by
replacing the rwsem with a lock-free errseq_t-based method.


- v2: Split patch adding WCC support
  v2: Rebase onto v5.16-rc5

Jeff Layton (3):
  nfsd: Add errno mapping for EREMOTEIO
  nfsd: Retry once in nfsd_open on an -EOPENSTALE return
  nfsd: allow lockd to be forcibly disabled

Peng Tao (1):
  nfsd: map EBADF

Trond Myklebust (6):
  nfsd: Distinguish between required and optional NFSv3 post-op
    attributes
  nfs: Add export support for weak cache consistency attributes
  nfsd: NFSv3 should allow zero length writes
  nfsd: Add a tracepoint for errors in nfsd4_clone_file_range()
  nfsd: Replace use of rwsem with errseq_t
  nfsd: Ignore rpcbind errors on nfsd startup

 fs/nfs/export.c                | 24 ++++++++++
 fs/nfsd/filecache.c            |  1 -
 fs/nfsd/filecache.h            |  1 -
 fs/nfsd/nfs3xdr.c              | 83 ++++++++++++++++++++++-----------
 fs/nfsd/nfs4proc.c             | 18 +++----
 fs/nfsd/nfs4xdr.c              |  6 +--
 fs/nfsd/nfsctl.c               |  7 ++-
 fs/nfsd/nfsd.h                 |  1 +
 fs/nfsd/nfsproc.c              |  3 ++
 fs/nfsd/nfssvc.c               | 29 +++++++++++-
 fs/nfsd/trace.h                | 50 ++++++++++++++++++++
 fs/nfsd/vfs.c                  | 85 +++++++++++++++++++++++-----------
 fs/nfsd/vfs.h                  |  8 ++--
 include/linux/exportfs.h       |  3 ++
 include/linux/sunrpc/svcsock.h |  5 +-
 net/sunrpc/svc.c               |  2 +-
 net/sunrpc/svcsock.c           | 14 +++---
 17 files changed, 257 insertions(+), 83 deletions(-)

-- 
2.33.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH v2 01/10] nfsd: map EBADF
  2021-12-19  1:37 [PATCH v2 00/10] Assorted patches for knfsd trondmy
@ 2021-12-19  1:37 ` trondmy
  2021-12-19  1:37   ` [PATCH v2 02/10] nfsd: Add errno mapping for EREMOTEIO trondmy
  2021-12-21 18:10 ` [PATCH v2 00/10] Assorted patches for knfsd Chuck Lever III
  1 sibling, 1 reply; 28+ messages in thread
From: trondmy @ 2021-12-19  1:37 UTC (permalink / raw)
  To: Chuck Lever, J. Bruce Fields; +Cc: linux-nfs

From: Peng Tao <tao.peng@primarydata.com>

Now that we have open file cache, it is possible that another client
deletes the file and DP will not know about it. Then IO to MDS would
fail with BADSTATEID and knfsd would start state recovery, which
should fail as well and then nfs read/write will fail with EBADF.
And it triggers a WARN() in nfserrno().

-----------[ cut here ]------------
WARNING: CPU: 0 PID: 13529 at fs/nfsd/nfsproc.c:758 nfserrno+0x58/0x70 [nfsd]()
nfsd: non-standard errno: -9
modules linked in: nfsv3 nfs_layout_flexfiles rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_connt
pata_acpi floppy
CPU: 0 PID: 13529 Comm: nfsd Tainted: G        W       4.1.5-00307-g6e6579b #7
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
 0000000000000000 00000000464e6c9c ffff88079085fba8 ffffffff81789936
 0000000000000000 ffff88079085fc00 ffff88079085fbe8 ffffffff810a08ea
 ffff88079085fbe8 ffff88080f45c900 ffff88080f627d50 ffff880790c46a48
 all Trace:
 [<ffffffff81789936>] dump_stack+0x45/0x57
 [<ffffffff810a08ea>] warn_slowpath_common+0x8a/0xc0
 [<ffffffff810a0975>] warn_slowpath_fmt+0x55/0x70
 [<ffffffff81252908>] ? splice_direct_to_actor+0x148/0x230
 [<ffffffffa02fb8c0>] ? fsid_source+0x60/0x60 [nfsd]
 [<ffffffffa02f9918>] nfserrno+0x58/0x70 [nfsd]
 [<ffffffffa02fba57>] nfsd_finish_read+0x97/0xb0 [nfsd]
 [<ffffffffa02fc7a6>] nfsd_splice_read+0x76/0xa0 [nfsd]
 [<ffffffffa02fcca1>] nfsd_read+0xc1/0xd0 [nfsd]
 [<ffffffffa0233af2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
 [<ffffffffa03073da>] nfsd3_proc_read+0xba/0x150 [nfsd]
 [<ffffffffa02f7a03>] nfsd_dispatch+0xc3/0x210 [nfsd]
 [<ffffffffa0233af2>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc]
 [<ffffffffa0232913>] svc_process_common+0x453/0x6f0 [sunrpc]
 [<ffffffffa0232cc3>] svc_process+0x113/0x1b0 [sunrpc]
 [<ffffffffa02f740f>] nfsd+0xff/0x170 [nfsd]
 [<ffffffffa02f7310>] ? nfsd_destroy+0x80/0x80 [nfsd]
 [<ffffffff810bf3a8>] kthread+0xd8/0xf0
 [<ffffffff810bf2d0>] ? kthread_create_on_node+0x1b0/0x1b0
 [<ffffffff817912a2>] ret_from_fork+0x42/0x70
 [<ffffffff810bf2d0>] ? kthread_create_on_node+0x1b0/0x1b0

Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfsd/nfsproc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index eea5b59b6a6c..0a2b2795585f 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -850,6 +850,7 @@ nfserrno (int errno)
 		{ nfserr_io, -EIO },
 		{ nfserr_nxio, -ENXIO },
 		{ nfserr_fbig, -E2BIG },
+		{ nfserr_stale, -EBADF },
 		{ nfserr_acces, -EACCES },
 		{ nfserr_exist, -EEXIST },
 		{ nfserr_xdev, -EXDEV },
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 02/10] nfsd: Add errno mapping for EREMOTEIO
  2021-12-19  1:37 ` [PATCH v2 01/10] nfsd: map EBADF trondmy
@ 2021-12-19  1:37   ` trondmy
  2021-12-19  1:37     ` [PATCH v2 03/10] nfsd: Retry once in nfsd_open on an -EOPENSTALE return trondmy
  0 siblings, 1 reply; 28+ messages in thread
From: trondmy @ 2021-12-19  1:37 UTC (permalink / raw)
  To: Chuck Lever, J. Bruce Fields; +Cc: linux-nfs

From: Jeff Layton <jeff.layton@primarydata.com>

The NFS client can occasionally return EREMOTEIO when signalling issues
with the server.  ...map to NFSERR_IO.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfsd/nfsproc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 0a2b2795585f..83bd11be8406 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -879,6 +879,7 @@ nfserrno (int errno)
 		{ nfserr_toosmall, -ETOOSMALL },
 		{ nfserr_serverfault, -ESERVERFAULT },
 		{ nfserr_serverfault, -ENFILE },
+		{ nfserr_io, -EREMOTEIO },
 		{ nfserr_io, -EUCLEAN },
 		{ nfserr_perm, -ENOKEY },
 		{ nfserr_no_grace, -ENOGRACE},
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 03/10] nfsd: Retry once in nfsd_open on an -EOPENSTALE return
  2021-12-19  1:37   ` [PATCH v2 02/10] nfsd: Add errno mapping for EREMOTEIO trondmy
@ 2021-12-19  1:37     ` trondmy
  2021-12-19  1:37       ` [PATCH v2 04/10] nfsd: Distinguish between required and optional NFSv3 post-op attributes trondmy
  0 siblings, 1 reply; 28+ messages in thread
From: trondmy @ 2021-12-19  1:37 UTC (permalink / raw)
  To: Chuck Lever, J. Bruce Fields; +Cc: linux-nfs

From: Jeff Layton <jeff.layton@primarydata.com>

If we get back -EOPENSTALE from an NFSv4 open, then we either got some
unhandled error or the inode we got back was not the same as the one
associated with the dentry.

We really have no recourse in that situation other than to retry the
open, and if it fails to just return nfserr_stale back to the client.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfsd/nfsproc.c |  1 +
 fs/nfsd/vfs.c     | 10 +++++++++-
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfsproc.c b/fs/nfsd/nfsproc.c
index 83bd11be8406..01c1e8f7b766 100644
--- a/fs/nfsd/nfsproc.c
+++ b/fs/nfsd/nfsproc.c
@@ -880,6 +880,7 @@ nfserrno (int errno)
 		{ nfserr_serverfault, -ESERVERFAULT },
 		{ nfserr_serverfault, -ENFILE },
 		{ nfserr_io, -EREMOTEIO },
+		{ nfserr_stale, -EOPENSTALE },
 		{ nfserr_io, -EUCLEAN },
 		{ nfserr_perm, -ENOKEY },
 		{ nfserr_no_grace, -ENOGRACE},
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index c99857689e2c..0faa3839ea6c 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -777,6 +777,7 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
 		int may_flags, struct file **filp)
 {
 	__be32 err;
+	bool retried = false;
 
 	validate_process_creds();
 	/*
@@ -792,9 +793,16 @@ nfsd_open(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type,
 	 */
 	if (type == S_IFREG)
 		may_flags |= NFSD_MAY_OWNER_OVERRIDE;
+retry:
 	err = fh_verify(rqstp, fhp, type, may_flags);
-	if (!err)
+	if (!err) {
 		err = __nfsd_open(rqstp, fhp, type, may_flags, filp);
+		if (err == nfserr_stale && !retried) {
+			retried = true;
+			fh_put(fhp);
+			goto retry;
+		}
+	}
 	validate_process_creds();
 	return err;
 }
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 04/10] nfsd: Distinguish between required and optional NFSv3 post-op attributes
  2021-12-19  1:37     ` [PATCH v2 03/10] nfsd: Retry once in nfsd_open on an -EOPENSTALE return trondmy
@ 2021-12-19  1:37       ` trondmy
  2021-12-19  1:37         ` [PATCH v2 05/10] nfs: Add export support for weak cache consistency attributes trondmy
  2021-12-19 20:10         ` [PATCH v2 04/10] nfsd: Distinguish between required and optional NFSv3 post-op attributes Chuck Lever III
  0 siblings, 2 replies; 28+ messages in thread
From: trondmy @ 2021-12-19  1:37 UTC (permalink / raw)
  To: Chuck Lever, J. Bruce Fields; +Cc: linux-nfs

From: Trond Myklebust <trond.myklebust@primarydata.com>

The fhp->fh_no_wcc flag is automatically set in nfsd_set_fh_dentry()
when the EXPORT_OP_NOWCC flag is set. In svcxdr_encode_post_op_attr(),
this then causes nfsd to skip returning the post-op attributes.

The problem is that some of these post-op attributes are not really
optional. In particular, we do want LOOKUP to always return post-op
attributes for the file that is being looked up.

The solution is therefore to explicitly label the attributes that we can
safely opt out from, and only apply the 'fhp->fh_no_wcc' test in that
case.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfsd/nfs3xdr.c | 77 +++++++++++++++++++++++++++++++----------------
 1 file changed, 51 insertions(+), 26 deletions(-)

diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index c3ac1b6aa3aa..6adfc40722fa 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -415,19 +415,9 @@ svcxdr_encode_pre_op_attr(struct xdr_stream *xdr, const struct svc_fh *fhp)
 	return svcxdr_encode_wcc_attr(xdr, fhp);
 }
 
-/**
- * svcxdr_encode_post_op_attr - Encode NFSv3 post-op attributes
- * @rqstp: Context of a completed RPC transaction
- * @xdr: XDR stream
- * @fhp: File handle to encode
- *
- * Return values:
- *   %false: Send buffer space was exhausted
- *   %true: Success
- */
-bool
-svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr,
-			   const struct svc_fh *fhp)
+static bool
+__svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr,
+			     const struct svc_fh *fhp, bool force)
 {
 	struct dentry *dentry = fhp->fh_dentry;
 	struct kstat stat;
@@ -437,7 +427,7 @@ svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr,
 	 * stale file handle. In this case, no attributes are
 	 * returned.
 	 */
-	if (fhp->fh_no_wcc || !dentry || !d_really_is_positive(dentry))
+	if (!force || !dentry || !d_really_is_positive(dentry))
 		goto no_post_op_attrs;
 	if (fh_getattr(fhp, &stat) != nfs_ok)
 		goto no_post_op_attrs;
@@ -454,6 +444,31 @@ svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr,
 	return xdr_stream_encode_item_absent(xdr) > 0;
 }
 
+/**
+ * svcxdr_encode_post_op_attr - Encode NFSv3 post-op attributes
+ * @rqstp: Context of a completed RPC transaction
+ * @xdr: XDR stream
+ * @fhp: File handle to encode
+ *
+ * Return values:
+ *   %false: Send buffer space was exhausted
+ *   %true: Success
+ */
+bool
+svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr,
+			   const struct svc_fh *fhp)
+{
+	return __svcxdr_encode_post_op_attr(rqstp, xdr, fhp, true);
+}
+
+static bool
+svcxdr_encode_post_op_attr_opportunistic(struct svc_rqst *rqstp,
+					 struct xdr_stream *xdr,
+					 const struct svc_fh *fhp)
+{
+	return __svcxdr_encode_post_op_attr(rqstp, xdr, fhp, !fhp->fh_no_wcc);
+}
+
 /*
  * Encode weak cache consistency data
  */
@@ -481,7 +496,7 @@ svcxdr_encode_wcc_data(struct svc_rqst *rqstp, struct xdr_stream *xdr,
 neither:
 	if (xdr_stream_encode_item_absent(xdr) < 0)
 		return false;
-	if (!svcxdr_encode_post_op_attr(rqstp, xdr, fhp))
+	if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr, fhp))
 		return false;
 
 	return true;
@@ -854,11 +869,13 @@ nfs3svc_encode_lookupres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
 			return false;
 		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh))
 			return false;
-		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->dirfh))
+		if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
+							      &resp->dirfh))
 			return false;
 		break;
 	default:
-		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->dirfh))
+		if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
+							      &resp->dirfh))
 			return false;
 	}
 
@@ -875,13 +892,15 @@ nfs3svc_encode_accessres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
 		return false;
 	switch (resp->status) {
 	case nfs_ok:
-		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh))
+		if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
+							      &resp->fh))
 			return false;
 		if (xdr_stream_encode_u32(xdr, resp->access) < 0)
 			return false;
 		break;
 	default:
-		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh))
+		if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
+							      &resp->fh))
 			return false;
 	}
 
@@ -899,7 +918,8 @@ nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
 		return false;
 	switch (resp->status) {
 	case nfs_ok:
-		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh))
+		if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
+							      &resp->fh))
 			return false;
 		if (xdr_stream_encode_u32(xdr, resp->len) < 0)
 			return false;
@@ -908,7 +928,8 @@ nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
 			return false;
 		break;
 	default:
-		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh))
+		if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
+							      &resp->fh))
 			return false;
 	}
 
@@ -926,7 +947,8 @@ nfs3svc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
 		return false;
 	switch (resp->status) {
 	case nfs_ok:
-		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh))
+		if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
+							      &resp->fh))
 			return false;
 		if (xdr_stream_encode_u32(xdr, resp->count) < 0)
 			return false;
@@ -940,7 +962,8 @@ nfs3svc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
 			return false;
 		break;
 	default:
-		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh))
+		if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
+							      &resp->fh))
 			return false;
 	}
 
@@ -1032,7 +1055,8 @@ nfs3svc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
 		return false;
 	switch (resp->status) {
 	case nfs_ok:
-		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh))
+		if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
+							      &resp->fh))
 			return false;
 		if (!svcxdr_encode_cookieverf3(xdr, resp->verf))
 			return false;
@@ -1044,7 +1068,8 @@ nfs3svc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
 			return false;
 		break;
 	default:
-		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh))
+		if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
+							      &resp->fh))
 			return false;
 	}
 
@@ -1188,7 +1213,7 @@ svcxdr_encode_entry3_plus(struct nfsd3_readdirres *resp, const char *name,
 	if (compose_entry_fh(resp, fhp, name, namlen, ino) != nfs_ok)
 		goto out_noattrs;
 
-	if (!svcxdr_encode_post_op_attr(resp->rqstp, xdr, fhp))
+	if (!svcxdr_encode_post_op_attr_opportunistic(resp->rqstp, xdr, fhp))
 		goto out;
 	if (!svcxdr_encode_post_op_fh3(xdr, fhp))
 		goto out;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 05/10] nfs: Add export support for weak cache consistency attributes
  2021-12-19  1:37       ` [PATCH v2 04/10] nfsd: Distinguish between required and optional NFSv3 post-op attributes trondmy
@ 2021-12-19  1:37         ` trondmy
  2021-12-19  1:37           ` [PATCH v2 06/10] nfsd: NFSv3 should allow zero length writes trondmy
  2022-01-05 16:10           ` [PATCH v2 05/10] nfs: Add export support for weak cache consistency attributes Daire Byrne
  2021-12-19 20:10         ` [PATCH v2 04/10] nfsd: Distinguish between required and optional NFSv3 post-op attributes Chuck Lever III
  1 sibling, 2 replies; 28+ messages in thread
From: trondmy @ 2021-12-19  1:37 UTC (permalink / raw)
  To: Chuck Lever, J. Bruce Fields; +Cc: linux-nfs

From: Trond Myklebust <trond.myklebust@primarydata.com>

Allow knfsd to request weak cache consistency attributes on files that
have delegations and/or have up to date attribute caches by propagating
the information to NFS that the attributes being requested are optional.

Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/export.c          | 24 ++++++++++++++++++++++++
 fs/nfsd/nfs3xdr.c        |  8 ++++++--
 fs/nfsd/nfs4xdr.c        |  6 +++---
 fs/nfsd/vfs.c            | 14 ++++++++++++++
 fs/nfsd/vfs.h            |  5 +++--
 include/linux/exportfs.h |  3 +++
 6 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/export.c b/fs/nfs/export.c
index 171c424cb6d5..967f8902c49b 100644
--- a/fs/nfs/export.c
+++ b/fs/nfs/export.c
@@ -151,10 +151,34 @@ static u64 nfs_fetch_iversion(struct inode *inode)
 	return inode_peek_iversion_raw(inode);
 }
 
+static int nfs_exp_getattr(struct path *path, struct kstat *stat, bool force)
+{
+	const unsigned long check_valid =
+		NFS_INO_INVALID_CHANGE | NFS_INO_INVALID_ATIME |
+		NFS_INO_INVALID_CTIME | NFS_INO_INVALID_MTIME |
+		NFS_INO_INVALID_SIZE | /* NFS_INO_INVALID_BLOCKS | */
+		NFS_INO_INVALID_OTHER | NFS_INO_INVALID_NLINK;
+	struct inode *inode = d_inode(path->dentry);
+	int flags = force ? AT_STATX_SYNC_AS_STAT : AT_STATX_DONT_SYNC;
+	int ret, ret2 = 0;
+
+	if (!force && nfs_check_cache_invalid(inode, check_valid))
+		ret2 = -EAGAIN;
+	ret = vfs_getattr(path, stat, STATX_BASIC_STATS & ~STATX_BLOCKS, flags);
+	if (ret < 0)
+		return ret;
+	stat->blocks = nfs_calc_block_size(stat->size);
+	if (S_ISDIR(inode->i_mode))
+		stat->blksize = NFS_SERVER(inode)->dtsize;
+	stat->result_mask |= STATX_BLOCKS;
+	return ret2;
+}
+
 const struct export_operations nfs_export_ops = {
 	.encode_fh = nfs_encode_fh,
 	.fh_to_dentry = nfs_fh_to_dentry,
 	.get_parent = nfs_get_parent,
+	.getattr = nfs_exp_getattr,
 	.fetch_iversion = nfs_fetch_iversion,
 	.flags = EXPORT_OP_NOWCC|EXPORT_OP_NOSUBTREECHK|
 		EXPORT_OP_CLOSE_BEFORE_UNLINK|EXPORT_OP_REMOTE_FS|
diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
index 6adfc40722fa..df6e29796494 100644
--- a/fs/nfsd/nfs3xdr.c
+++ b/fs/nfsd/nfs3xdr.c
@@ -420,6 +420,9 @@ __svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr,
 			     const struct svc_fh *fhp, bool force)
 {
 	struct dentry *dentry = fhp->fh_dentry;
+	struct path path = {
+		.dentry = dentry,
+	};
 	struct kstat stat;
 
 	/*
@@ -427,9 +430,10 @@ __svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr,
 	 * stale file handle. In this case, no attributes are
 	 * returned.
 	 */
-	if (!force || !dentry || !d_really_is_positive(dentry))
+	if (!dentry || !d_really_is_positive(dentry))
 		goto no_post_op_attrs;
-	if (fh_getattr(fhp, &stat) != nfs_ok)
+	path.mnt = fhp->fh_export->ex_path.mnt;
+	if (nfsd_getattr(&path, &stat, force) != nfs_ok)
 		goto no_post_op_attrs;
 
 	if (xdr_stream_encode_item_present(xdr) < 0)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 5a93a5db4fb0..8026925c121f 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2862,9 +2862,9 @@ nfsd4_encode_fattr(struct xdr_stream *xdr, struct svc_fh *fhp,
 			goto out;
 	}
 
-	err = vfs_getattr(&path, &stat, STATX_BASIC_STATS, AT_STATX_SYNC_AS_STAT);
-	if (err)
-		goto out_nfserr;
+	status = nfsd_getattr(&path, &stat, true);
+	if (status)
+		goto out;
 	if ((bmval0 & (FATTR4_WORD0_FILES_AVAIL | FATTR4_WORD0_FILES_FREE |
 			FATTR4_WORD0_FILES_TOTAL | FATTR4_WORD0_MAXNAME)) ||
 	    (bmval1 & (FATTR4_WORD1_SPACE_AVAIL | FATTR4_WORD1_SPACE_FREE |
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 0faa3839ea6c..eb9818432149 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -2411,3 +2411,17 @@ nfsd_permission(struct svc_rqst *rqstp, struct svc_export *exp,
 
 	return err? nfserrno(err) : 0;
 }
+
+
+__be32
+nfsd_getattr(struct path *p, struct kstat *stat, bool force)
+{
+	const struct export_operations *ops = p->dentry->d_sb->s_export_op;
+	int err;
+
+	if (ops->getattr)
+		err = ops->getattr(p, stat, force);
+	else
+		err = vfs_getattr(p, stat, STATX_BASIC_STATS, AT_STATX_SYNC_AS_STAT);
+	return err ? nfserrno(err) : 0;
+}
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index b21b76e6b9a8..6edae1b9a96e 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -132,6 +132,8 @@ __be32		nfsd_statfs(struct svc_rqst *, struct svc_fh *,
 __be32		nfsd_permission(struct svc_rqst *, struct svc_export *,
 				struct dentry *, int);
 
+__be32		nfsd_getattr(struct path *p, struct kstat *, bool);
+
 static inline int fh_want_write(struct svc_fh *fh)
 {
 	int ret;
@@ -156,8 +158,7 @@ static inline __be32 fh_getattr(const struct svc_fh *fh, struct kstat *stat)
 {
 	struct path p = {.mnt = fh->fh_export->ex_path.mnt,
 			 .dentry = fh->fh_dentry};
-	return nfserrno(vfs_getattr(&p, stat, STATX_BASIC_STATS,
-				    AT_STATX_SYNC_AS_STAT));
+	return nfsd_getattr(&p, stat, true);
 }
 
 static inline int nfsd_create_is_exclusive(int createmode)
diff --git a/include/linux/exportfs.h b/include/linux/exportfs.h
index 3260fe714846..58f36022787e 100644
--- a/include/linux/exportfs.h
+++ b/include/linux/exportfs.h
@@ -10,6 +10,8 @@ struct inode;
 struct iomap;
 struct super_block;
 struct vfsmount;
+struct path;
+struct kstat;
 
 /* limit the handle size to NFSv4 handle size now */
 #define MAX_HANDLE_SZ 128
@@ -224,6 +226,7 @@ struct export_operations {
 #define EXPORT_OP_SYNC_LOCKS		(0x20) /* Filesystem can't do
 						  asychronous blocking locks */
 	unsigned long	flags;
+	int (*getattr)(struct path *, struct kstat *, bool);
 };
 
 extern int exportfs_encode_inode_fh(struct inode *inode, struct fid *fid,
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 06/10] nfsd: NFSv3 should allow zero length writes
  2021-12-19  1:37         ` [PATCH v2 05/10] nfs: Add export support for weak cache consistency attributes trondmy
@ 2021-12-19  1:37           ` trondmy
  2021-12-19  1:38             ` [PATCH v2 07/10] nfsd: Add a tracepoint for errors in nfsd4_clone_file_range() trondmy
  2021-12-19 20:11             ` [PATCH v2 06/10] nfsd: NFSv3 should allow zero length writes Chuck Lever III
  2022-01-05 16:10           ` [PATCH v2 05/10] nfs: Add export support for weak cache consistency attributes Daire Byrne
  1 sibling, 2 replies; 28+ messages in thread
From: trondmy @ 2021-12-19  1:37 UTC (permalink / raw)
  To: Chuck Lever, J. Bruce Fields; +Cc: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

According to RFC1813: "If count is 0, the WRITE will succeed
and return a count of 0, barring errors due to permissions checking."

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfsd/vfs.c    | 3 +++
 net/sunrpc/svc.c | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index eb9818432149..85e579aa6944 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -967,6 +967,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
 
 	trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
 
+	if (!*cnt)
+		return nfs_ok;
+
 	if (sb->s_export_op)
 		exp_op_flags = sb->s_export_op->flags;
 
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 4292278a9552..72a7822fd257 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -1638,7 +1638,7 @@ unsigned int svc_fill_write_vector(struct svc_rqst *rqstp,
 	 * entirely in rq_arg.pages. In this case, @first is empty.
 	 */
 	i = 0;
-	if (first->iov_len) {
+	if (first->iov_len || !total) {
 		vec[i].iov_base = first->iov_base;
 		vec[i].iov_len = min_t(size_t, total, first->iov_len);
 		total -= vec[i].iov_len;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 07/10] nfsd: Add a tracepoint for errors in nfsd4_clone_file_range()
  2021-12-19  1:37           ` [PATCH v2 06/10] nfsd: NFSv3 should allow zero length writes trondmy
@ 2021-12-19  1:38             ` trondmy
  2021-12-19  1:38               ` [PATCH v2 08/10] nfsd: Replace use of rwsem with errseq_t trondmy
  2021-12-21 18:14               ` [PATCH v2 07/10] nfsd: Add a tracepoint for errors in nfsd4_clone_file_range() Chuck Lever III
  2021-12-19 20:11             ` [PATCH v2 06/10] nfsd: NFSv3 should allow zero length writes Chuck Lever III
  1 sibling, 2 replies; 28+ messages in thread
From: trondmy @ 2021-12-19  1:38 UTC (permalink / raw)
  To: Chuck Lever, J. Bruce Fields; +Cc: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Since a clone error commit can cause the boot verifier to change,
we should trace those errors.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfsd/nfs4proc.c |  2 +-
 fs/nfsd/trace.h    | 50 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfsd/vfs.c      | 18 +++++++++++++++--
 fs/nfsd/vfs.h      |  3 ++-
 4 files changed, 69 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index a36261f89bdf..35f33959a083 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1101,7 +1101,7 @@ nfsd4_clone(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
 	if (status)
 		goto out;
 
-	status = nfsd4_clone_file_range(src, clone->cl_src_pos,
+	status = nfsd4_clone_file_range(rqstp, src, clone->cl_src_pos,
 			dst, clone->cl_dst_pos, clone->cl_count,
 			EX_ISSYNC(cstate->current_fh.fh_export));
 
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index f1e0d3c51bc2..001444d9829d 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -413,6 +413,56 @@ TRACE_EVENT(nfsd_dirent,
 	)
 )
 
+DECLARE_EVENT_CLASS(nfsd_copy_err_class,
+	TP_PROTO(struct svc_rqst *rqstp,
+		 struct svc_fh	*src_fhp,
+		 loff_t		src_offset,
+		 struct svc_fh	*dst_fhp,
+		 loff_t		dst_offset,
+		 u64		count,
+		 int		status),
+	TP_ARGS(rqstp, src_fhp, src_offset, dst_fhp, dst_offset, count, status),
+	TP_STRUCT__entry(
+		__field(u32, xid)
+		__field(u32, src_fh_hash)
+		__field(loff_t, src_offset)
+		__field(u32, dst_fh_hash)
+		__field(loff_t, dst_offset)
+		__field(u64, count)
+		__field(int, status)
+	),
+	TP_fast_assign(
+		__entry->xid = be32_to_cpu(rqstp->rq_xid);
+		__entry->src_fh_hash = knfsd_fh_hash(&src_fhp->fh_handle);
+		__entry->src_offset = src_offset;
+		__entry->dst_fh_hash = knfsd_fh_hash(&dst_fhp->fh_handle);
+		__entry->dst_offset = dst_offset;
+		__entry->count = count;
+		__entry->status = status;
+	),
+	TP_printk("xid=0x%08x src_fh_hash=0x%08x src_offset=%lld "
+			"dst_fh_hash=0x%08x dst_offset=%lld "
+			"count=%llu status=%d",
+		  __entry->xid, __entry->src_fh_hash, __entry->src_offset,
+		  __entry->dst_fh_hash, __entry->dst_offset,
+		  (unsigned long long)__entry->count,
+		  __entry->status)
+)
+
+#define DEFINE_NFSD_COPY_ERR_EVENT(name)		\
+DEFINE_EVENT(nfsd_copy_err_class, nfsd_##name,		\
+	TP_PROTO(struct svc_rqst	*rqstp,		\
+		 struct svc_fh		*src_fhp,	\
+		 loff_t			src_offset,	\
+		 struct svc_fh		*dst_fhp,	\
+		 loff_t			dst_offset,	\
+		 u64			count,		\
+		 int			status),	\
+	TP_ARGS(rqstp, src_fhp, src_offset, dst_fhp, dst_offset, \
+		count, status))
+
+DEFINE_NFSD_COPY_ERR_EVENT(clone_file_range_err);
+
 #include "state.h"
 #include "filecache.h"
 #include "vfs.h"
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 85e579aa6944..4d2964d08097 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -40,6 +40,7 @@
 #include "../internal.h"
 #include "acl.h"
 #include "idmap.h"
+#include "xdr4.h"
 #endif /* CONFIG_NFSD_V4 */
 
 #include "nfsd.h"
@@ -517,8 +518,15 @@ __be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp,
 }
 #endif
 
-__be32 nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos,
-		struct nfsd_file *nf_dst, u64 dst_pos, u64 count, bool sync)
+static struct nfsd4_compound_state *nfsd4_get_cstate(struct svc_rqst *rqstp)
+{
+	return &((struct nfsd4_compoundres *)rqstp->rq_resp)->cstate;
+}
+
+__be32 nfsd4_clone_file_range(struct svc_rqst *rqstp,
+		struct nfsd_file *nf_src, u64 src_pos,
+		struct nfsd_file *nf_dst, u64 dst_pos,
+		u64 count, bool sync)
 {
 	struct file *src = nf_src->nf_file;
 	struct file *dst = nf_dst->nf_file;
@@ -542,6 +550,12 @@ __be32 nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos,
 		if (!status)
 			status = commit_inode_metadata(file_inode(src));
 		if (status < 0) {
+			trace_nfsd_clone_file_range_err(rqstp,
+					&nfsd4_get_cstate(rqstp)->save_fh,
+					src_pos,
+					&nfsd4_get_cstate(rqstp)->current_fh,
+					dst_pos,
+					count, status);
 			nfsd_reset_boot_verifier(net_generic(nf_dst->nf_net,
 						 nfsd_net_id));
 			ret = nfserrno(status);
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 6edae1b9a96e..3dba6397d452 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -57,7 +57,8 @@ __be32          nfsd4_set_nfs4_label(struct svc_rqst *, struct svc_fh *,
 		    struct xdr_netobj *);
 __be32		nfsd4_vfs_fallocate(struct svc_rqst *, struct svc_fh *,
 				    struct file *, loff_t, loff_t, int);
-__be32		nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos,
+__be32		nfsd4_clone_file_range(struct svc_rqst *,
+				       struct nfsd_file *nf_src, u64 src_pos,
 				       struct nfsd_file *nf_dst, u64 dst_pos,
 				       u64 count, bool sync);
 #endif /* CONFIG_NFSD_V4 */
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 08/10] nfsd: Replace use of rwsem with errseq_t
  2021-12-19  1:38             ` [PATCH v2 07/10] nfsd: Add a tracepoint for errors in nfsd4_clone_file_range() trondmy
@ 2021-12-19  1:38               ` trondmy
  2021-12-19  1:38                 ` [PATCH v2 09/10] nfsd: allow lockd to be forcibly disabled trondmy
  2021-12-21 18:14               ` [PATCH v2 07/10] nfsd: Add a tracepoint for errors in nfsd4_clone_file_range() Chuck Lever III
  1 sibling, 1 reply; 28+ messages in thread
From: trondmy @ 2021-12-19  1:38 UTC (permalink / raw)
  To: Chuck Lever, J. Bruce Fields; +Cc: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

The nfsd_file nf_rwsem is currently being used to separate file write
and commit instances to ensure that we catch errors and apply them to
the correct write/commit.
We can improve scalability at the expense of a little accuracy (some
extra false positives) by replacing the nf_rwsem with more careful
use of the errseq_t mechanism to track errors across the different
operations.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfsd/filecache.c |  1 -
 fs/nfsd/filecache.h |  1 -
 fs/nfsd/nfs4proc.c  | 16 +++++++++-------
 fs/nfsd/vfs.c       | 40 +++++++++++++++-------------------------
 4 files changed, 24 insertions(+), 34 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index fdf89fcf1a0c..36f5a06d1067 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -194,7 +194,6 @@ nfsd_file_alloc(struct inode *inode, unsigned int may, unsigned int hashval,
 				__set_bit(NFSD_FILE_BREAK_READ, &nf->nf_flags);
 		}
 		nf->nf_mark = NULL;
-		init_rwsem(&nf->nf_rwsem);
 		trace_nfsd_file_alloc(nf);
 	}
 	return nf;
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 7872df5a0fe3..435ceab27897 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -46,7 +46,6 @@ struct nfsd_file {
 	refcount_t		nf_ref;
 	unsigned char		nf_may;
 	struct nfsd_file_mark	*nf_mark;
-	struct rw_semaphore	nf_rwsem;
 };
 
 int nfsd_file_cache_init(void);
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 35f33959a083..8d2b9ad3b84a 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1510,6 +1510,9 @@ static void nfsd4_init_copy_res(struct nfsd4_copy *copy, bool sync)
 
 static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy)
 {
+	struct file *dst = copy->nf_dst->nf_file;
+	struct file *src = copy->nf_src->nf_file;
+	errseq_t since;
 	ssize_t bytes_copied = 0;
 	u64 bytes_total = copy->cp_count;
 	u64 src_pos = copy->cp_src_pos;
@@ -1522,9 +1525,8 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy)
 	do {
 		if (kthread_should_stop())
 			break;
-		bytes_copied = nfsd_copy_file_range(copy->nf_src->nf_file,
-				src_pos, copy->nf_dst->nf_file, dst_pos,
-				bytes_total);
+		bytes_copied = nfsd_copy_file_range(src, src_pos, dst, dst_pos,
+						    bytes_total);
 		if (bytes_copied <= 0)
 			break;
 		bytes_total -= bytes_copied;
@@ -1534,11 +1536,11 @@ static ssize_t _nfsd_copy_file_range(struct nfsd4_copy *copy)
 	} while (bytes_total > 0 && !copy->cp_synchronous);
 	/* for a non-zero asynchronous copy do a commit of data */
 	if (!copy->cp_synchronous && copy->cp_res.wr_bytes_written > 0) {
-		down_write(&copy->nf_dst->nf_rwsem);
-		status = vfs_fsync_range(copy->nf_dst->nf_file,
-					 copy->cp_dst_pos,
+		since = READ_ONCE(dst->f_wb_err);
+		status = vfs_fsync_range(dst, copy->cp_dst_pos,
 					 copy->cp_res.wr_bytes_written, 0);
-		up_write(&copy->nf_dst->nf_rwsem);
+		if (!status)
+			status = filemap_check_wb_err(dst->f_mapping, since);
 		if (!status)
 			copy->committed = true;
 	}
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 4d2964d08097..8a48bf18028c 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -530,10 +530,11 @@ __be32 nfsd4_clone_file_range(struct svc_rqst *rqstp,
 {
 	struct file *src = nf_src->nf_file;
 	struct file *dst = nf_dst->nf_file;
+	errseq_t since;
 	loff_t cloned;
 	__be32 ret = 0;
 
-	down_write(&nf_dst->nf_rwsem);
+	since = READ_ONCE(dst->f_wb_err);
 	cloned = vfs_clone_file_range(src, src_pos, dst, dst_pos, count, 0);
 	if (cloned < 0) {
 		ret = nfserrno(cloned);
@@ -547,6 +548,8 @@ __be32 nfsd4_clone_file_range(struct svc_rqst *rqstp,
 		loff_t dst_end = count ? dst_pos + count - 1 : LLONG_MAX;
 		int status = vfs_fsync_range(dst, dst_pos, dst_end, 0);
 
+		if (!status)
+			status = filemap_check_wb_err(dst->f_mapping, since);
 		if (!status)
 			status = commit_inode_metadata(file_inode(src));
 		if (status < 0) {
@@ -562,7 +565,6 @@ __be32 nfsd4_clone_file_range(struct svc_rqst *rqstp,
 		}
 	}
 out_err:
-	up_write(&nf_dst->nf_rwsem);
 	return ret;
 }
 
@@ -970,6 +972,7 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
 	struct super_block	*sb = file_inode(file)->i_sb;
 	struct svc_export	*exp;
 	struct iov_iter		iter;
+	errseq_t		since;
 	__be32			nfserr;
 	int			host_err;
 	int			use_wgather;
@@ -1010,21 +1013,18 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
 		flags |= RWF_SYNC;
 
 	iov_iter_kvec(&iter, WRITE, vec, vlen, *cnt);
+	since = READ_ONCE(file->f_wb_err);
 	if (flags & RWF_SYNC) {
-		down_write(&nf->nf_rwsem);
 		host_err = vfs_iter_write(file, &iter, &pos, flags);
 		if (host_err < 0)
 			nfsd_reset_boot_verifier(net_generic(SVC_NET(rqstp),
 						 nfsd_net_id));
-		up_write(&nf->nf_rwsem);
 	} else {
-		down_read(&nf->nf_rwsem);
 		if (verf)
 			nfsd_copy_boot_verifier(verf,
 					net_generic(SVC_NET(rqstp),
 					nfsd_net_id));
 		host_err = vfs_iter_write(file, &iter, &pos, flags);
-		up_read(&nf->nf_rwsem);
 	}
 	if (host_err < 0) {
 		nfsd_reset_boot_verifier(net_generic(SVC_NET(rqstp),
@@ -1034,6 +1034,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
 	*cnt = host_err;
 	nfsd_stats_io_write_add(exp, *cnt);
 	fsnotify_modify(file);
+	host_err = filemap_check_wb_err(file->f_mapping, since);
+	if (host_err < 0)
+		goto out_nfserr;
 
 	if (stable && use_wgather) {
 		host_err = wait_for_concurrent_writes(file);
@@ -1114,19 +1117,6 @@ nfsd_write(struct svc_rqst *rqstp, struct svc_fh *fhp, loff_t offset,
 }
 
 #ifdef CONFIG_NFSD_V3
-static int
-nfsd_filemap_write_and_wait_range(struct nfsd_file *nf, loff_t offset,
-				  loff_t end)
-{
-	struct address_space *mapping = nf->nf_file->f_mapping;
-	int ret = filemap_fdatawrite_range(mapping, offset, end);
-
-	if (ret)
-		return ret;
-	filemap_fdatawait_range_keep_errors(mapping, offset, end);
-	return 0;
-}
-
 /*
  * Commit all pending writes to stable storage.
  *
@@ -1157,25 +1147,25 @@ nfsd_commit(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	if (err)
 		goto out;
 	if (EX_ISSYNC(fhp->fh_export)) {
-		int err2 = nfsd_filemap_write_and_wait_range(nf, offset, end);
+		errseq_t since = READ_ONCE(nf->nf_file->f_wb_err);
+		int err2;
 
-		down_write(&nf->nf_rwsem);
-		if (!err2)
-			err2 = vfs_fsync_range(nf->nf_file, offset, end, 0);
+		err2 = vfs_fsync_range(nf->nf_file, offset, end, 0);
 		switch (err2) {
 		case 0:
 			nfsd_copy_boot_verifier(verf, net_generic(nf->nf_net,
 						nfsd_net_id));
+			err2 = filemap_check_wb_err(nf->nf_file->f_mapping,
+						    since);
 			break;
 		case -EINVAL:
 			err = nfserr_notsupp;
 			break;
 		default:
-			err = nfserrno(err2);
 			nfsd_reset_boot_verifier(net_generic(nf->nf_net,
 						 nfsd_net_id));
 		}
-		up_write(&nf->nf_rwsem);
+		err = nfserrno(err2);
 	} else
 		nfsd_copy_boot_verifier(verf, net_generic(nf->nf_net,
 					nfsd_net_id));
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 09/10] nfsd: allow lockd to be forcibly disabled
  2021-12-19  1:38               ` [PATCH v2 08/10] nfsd: Replace use of rwsem with errseq_t trondmy
@ 2021-12-19  1:38                 ` trondmy
  2021-12-19  1:38                   ` [PATCH v2 10/10] nfsd: Ignore rpcbind errors on nfsd startup trondmy
  2021-12-19 18:34                   ` [PATCH v2 09/10] nfsd: allow lockd to be forcibly disabled Chuck Lever III
  0 siblings, 2 replies; 28+ messages in thread
From: trondmy @ 2021-12-19  1:38 UTC (permalink / raw)
  To: Chuck Lever, J. Bruce Fields; +Cc: linux-nfs

From: Jeff Layton <jeff.layton@primarydata.com>

In some cases, we may want to use a userland NLM server which will
require that we turn off lockd.

Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfsd/nfssvc.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 80431921e5d7..6815c70b06af 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -340,8 +340,19 @@ static void nfsd_shutdown_generic(void)
 	nfsd_file_cache_shutdown();
 }
 
+/*
+ * Allow admin to disable lockd. This would typically be used to allow (e.g.)
+ * a userspace NLM server of some sort to be used.
+ */
+static bool nfsd_disable_lockd = false;
+module_param(nfsd_disable_lockd, bool, 0644);
+MODULE_PARM_DESC(nfsd_disable_lockd, "Allow lockd to be manually disabled.");
+
 static bool nfsd_needs_lockd(struct nfsd_net *nn)
 {
+	if (nfsd_disable_lockd)
+		return false;
+
 	return nfsd_vers(nn, 2, NFSD_TEST) || nfsd_vers(nn, 3, NFSD_TEST);
 }
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH v2 10/10] nfsd: Ignore rpcbind errors on nfsd startup
  2021-12-19  1:38                 ` [PATCH v2 09/10] nfsd: allow lockd to be forcibly disabled trondmy
@ 2021-12-19  1:38                   ` trondmy
  2021-12-19 18:15                     ` Chuck Lever III
  2021-12-19 18:34                   ` [PATCH v2 09/10] nfsd: allow lockd to be forcibly disabled Chuck Lever III
  1 sibling, 1 reply; 28+ messages in thread
From: trondmy @ 2021-12-19  1:38 UTC (permalink / raw)
  To: Chuck Lever, J. Bruce Fields; +Cc: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

NFSv4 doesn't need rpcbind, so let's not refuse to start up just because
the rpcbind registration failed.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfsd/nfsctl.c               |  7 ++++++-
 fs/nfsd/nfsd.h                 |  1 +
 fs/nfsd/nfssvc.c               | 18 ++++++++++++++++--
 include/linux/sunrpc/svcsock.h |  5 +++--
 net/sunrpc/svcsock.c           | 14 ++++++++------
 5 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 51a49e0cfe37..da9760479acd 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -727,6 +727,7 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred
 	char *mesg = buf;
 	int fd, err;
 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+	int flags = SVC_SOCK_DEFAULTS;
 
 	err = get_int(&mesg, &fd);
 	if (err != 0 || fd < 0)
@@ -741,7 +742,11 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred
 	if (err != 0)
 		return err;
 
-	err = svc_addsock(nn->nfsd_serv, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred);
+	if (!nfsd_rpcbind_error_is_fatal())
+		flags |= SVC_SOCK_RPCBIND_NOERR;
+
+	err = svc_addsock(nn->nfsd_serv, fd, buf, SIMPLE_TRANSACTION_LIMIT,
+			  flags, cred);
 	if (err < 0) {
 		nfsd_destroy(net);
 		return err;
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 498e5a489826..e0356d3ecf65 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -134,6 +134,7 @@ int nfsd_vers(struct nfsd_net *nn, int vers, enum vers_op change);
 int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change);
 void nfsd_reset_versions(struct nfsd_net *nn);
 int nfsd_create_serv(struct net *net);
+extern bool nfsd_rpcbind_error_is_fatal(void);
 
 extern int nfsd_max_blksize;
 
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 6815c70b06af..6f22c72f340d 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -289,17 +289,21 @@ static int nfsd_init_socks(struct net *net, const struct cred *cred)
 {
 	int error;
 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+	int flags = SVC_SOCK_DEFAULTS;
 
 	if (!list_empty(&nn->nfsd_serv->sv_permsocks))
 		return 0;
 
+	if (!nfsd_rpcbind_error_is_fatal())
+		flags |= SVC_SOCK_RPCBIND_NOERR;
+
 	error = svc_create_xprt(nn->nfsd_serv, "udp", net, PF_INET, NFS_PORT,
-					SVC_SOCK_DEFAULTS, cred);
+				flags, cred);
 	if (error < 0)
 		return error;
 
 	error = svc_create_xprt(nn->nfsd_serv, "tcp", net, PF_INET, NFS_PORT,
-					SVC_SOCK_DEFAULTS, cred);
+				flags, cred);
 	if (error < 0)
 		return error;
 
@@ -340,6 +344,16 @@ static void nfsd_shutdown_generic(void)
 	nfsd_file_cache_shutdown();
 }
 
+static bool nfsd_rpcbind_error_fatal = false;
+module_param(nfsd_rpcbind_error_fatal, bool, 0644);
+MODULE_PARM_DESC(nfsd_rpcbind_error_fatal,
+		 "rpcbind errors are fatal when starting nfsd.");
+
+bool nfsd_rpcbind_error_is_fatal(void)
+{
+	return nfsd_rpcbind_error_fatal;
+}
+
 /*
  * Allow admin to disable lockd. This would typically be used to allow (e.g.)
  * a userspace NLM server of some sort to be used.
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index bcc555c7ae9c..f34c222cee9d 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -61,8 +61,8 @@ void		svc_drop(struct svc_rqst *);
 void		svc_sock_update_bufs(struct svc_serv *serv);
 bool		svc_alien_sock(struct net *net, int fd);
 int		svc_addsock(struct svc_serv *serv, const int fd,
-					char *name_return, const size_t len,
-					const struct cred *cred);
+			    char *name_return, const size_t len, int flags,
+			    const struct cred *cred);
 void		svc_init_xprt_sock(void);
 void		svc_cleanup_xprt_sock(void);
 struct svc_xprt *svc_sock_create(struct svc_serv *serv, int prot);
@@ -74,5 +74,6 @@ void		svc_sock_destroy(struct svc_xprt *);
 #define SVC_SOCK_DEFAULTS	(0U)
 #define SVC_SOCK_ANONYMOUS	(1U << 0)	/* don't register with pmap */
 #define SVC_SOCK_TEMPORARY	(1U << 1)	/* flag socket as temporary */
+#define SVC_SOCK_RPCBIND_NOERR	(1U << 2)	/* Ignore pmap errors */
 
 #endif /* SUNRPC_SVCSOCK_H */
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
index 478f857cdaed..7f5b12a50bf9 100644
--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -1309,14 +1309,15 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
 	inet = sock->sk;
 
 	/* Register socket with portmapper */
-	if (pmap_register)
+	if (pmap_register) {
 		err = svc_register(serv, sock_net(sock->sk), inet->sk_family,
 				     inet->sk_protocol,
 				     ntohs(inet_sk(inet)->inet_sport));
 
-	if (err < 0) {
-		kfree(svsk);
-		return ERR_PTR(err);
+		if (err < 0 && !(flags & SVC_SOCK_RPCBIND_NOERR)) {
+			kfree(svsk);
+			return ERR_PTR(err);
+		}
 	}
 
 	svsk->sk_sock = sock;
@@ -1364,6 +1365,7 @@ EXPORT_SYMBOL_GPL(svc_alien_sock);
  * @fd: file descriptor of the new listener
  * @name_return: pointer to buffer to fill in with name of listener
  * @len: size of the buffer
+ * @flags: flags argument for svc_setup_socket()
  * @cred: credential
  *
  * Fills in socket name and returns positive length of name if successful.
@@ -1371,7 +1373,7 @@ EXPORT_SYMBOL_GPL(svc_alien_sock);
  * value.
  */
 int svc_addsock(struct svc_serv *serv, const int fd, char *name_return,
-		const size_t len, const struct cred *cred)
+		const size_t len, int flags, const struct cred *cred)
 {
 	int err = 0;
 	struct socket *so = sockfd_lookup(fd, &err);
@@ -1395,7 +1397,7 @@ int svc_addsock(struct svc_serv *serv, const int fd, char *name_return,
 	err = -ENOENT;
 	if (!try_module_get(THIS_MODULE))
 		goto out;
-	svsk = svc_setup_socket(serv, so, SVC_SOCK_DEFAULTS);
+	svsk = svc_setup_socket(serv, so, flags);
 	if (IS_ERR(svsk)) {
 		module_put(THIS_MODULE);
 		err = PTR_ERR(svsk);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 10/10] nfsd: Ignore rpcbind errors on nfsd startup
  2021-12-19  1:38                   ` [PATCH v2 10/10] nfsd: Ignore rpcbind errors on nfsd startup trondmy
@ 2021-12-19 18:15                     ` Chuck Lever III
  2021-12-19 20:49                       ` Trond Myklebust
  0 siblings, 1 reply; 28+ messages in thread
From: Chuck Lever III @ 2021-12-19 18:15 UTC (permalink / raw)
  To: trondmy; +Cc: Bruce Fields, Linux NFS Mailing List


> On Dec 18, 2021, at 8:38 PM, trondmy@kernel.org wrote:
> 
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> NFSv4 doesn't need rpcbind, so let's not refuse to start up just because
> the rpcbind registration failed.

Commit 7e55b59b2f32 ("SUNRPC/NFSD: Support a new option for ignoring
the result of svc_register") added vs_rpcb_optnl, which is already
set for nfsd4_version4. Is that not adequate?


> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
> fs/nfsd/nfsctl.c               |  7 ++++++-
> fs/nfsd/nfsd.h                 |  1 +
> fs/nfsd/nfssvc.c               | 18 ++++++++++++++++--
> include/linux/sunrpc/svcsock.h |  5 +++--
> net/sunrpc/svcsock.c           | 14 ++++++++------
> 5 files changed, 34 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 51a49e0cfe37..da9760479acd 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -727,6 +727,7 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred
> 	char *mesg = buf;
> 	int fd, err;
> 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> +	int flags = SVC_SOCK_DEFAULTS;
> 
> 	err = get_int(&mesg, &fd);
> 	if (err != 0 || fd < 0)
> @@ -741,7 +742,11 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred
> 	if (err != 0)
> 		return err;
> 
> -	err = svc_addsock(nn->nfsd_serv, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred);
> +	if (!nfsd_rpcbind_error_is_fatal())
> +		flags |= SVC_SOCK_RPCBIND_NOERR;
> +
> +	err = svc_addsock(nn->nfsd_serv, fd, buf, SIMPLE_TRANSACTION_LIMIT,
> +			  flags, cred);
> 	if (err < 0) {
> 		nfsd_destroy(net);
> 		return err;
> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> index 498e5a489826..e0356d3ecf65 100644
> --- a/fs/nfsd/nfsd.h
> +++ b/fs/nfsd/nfsd.h
> @@ -134,6 +134,7 @@ int nfsd_vers(struct nfsd_net *nn, int vers, enum vers_op change);
> int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change);
> void nfsd_reset_versions(struct nfsd_net *nn);
> int nfsd_create_serv(struct net *net);
> +extern bool nfsd_rpcbind_error_is_fatal(void);
> 
> extern int nfsd_max_blksize;
> 
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index 6815c70b06af..6f22c72f340d 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -289,17 +289,21 @@ static int nfsd_init_socks(struct net *net, const struct cred *cred)
> {
> 	int error;
> 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> +	int flags = SVC_SOCK_DEFAULTS;
> 
> 	if (!list_empty(&nn->nfsd_serv->sv_permsocks))
> 		return 0;
> 
> +	if (!nfsd_rpcbind_error_is_fatal())
> +		flags |= SVC_SOCK_RPCBIND_NOERR;
> +
> 	error = svc_create_xprt(nn->nfsd_serv, "udp", net, PF_INET, NFS_PORT,
> -					SVC_SOCK_DEFAULTS, cred);
> +				flags, cred);
> 	if (error < 0)
> 		return error;
> 
> 	error = svc_create_xprt(nn->nfsd_serv, "tcp", net, PF_INET, NFS_PORT,
> -					SVC_SOCK_DEFAULTS, cred);
> +				flags, cred);
> 	if (error < 0)
> 		return error;
> 
> @@ -340,6 +344,16 @@ static void nfsd_shutdown_generic(void)
> 	nfsd_file_cache_shutdown();
> }
> 
> +static bool nfsd_rpcbind_error_fatal = false;
> +module_param(nfsd_rpcbind_error_fatal, bool, 0644);
> +MODULE_PARM_DESC(nfsd_rpcbind_error_fatal,
> +		 "rpcbind errors are fatal when starting nfsd.");
> +
> +bool nfsd_rpcbind_error_is_fatal(void)
> +{
> +	return nfsd_rpcbind_error_fatal;
> +}
> +
> /*
>  * Allow admin to disable lockd. This would typically be used to allow (e.g.)
>  * a userspace NLM server of some sort to be used.
> diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
> index bcc555c7ae9c..f34c222cee9d 100644
> --- a/include/linux/sunrpc/svcsock.h
> +++ b/include/linux/sunrpc/svcsock.h
> @@ -61,8 +61,8 @@ void		svc_drop(struct svc_rqst *);
> void		svc_sock_update_bufs(struct svc_serv *serv);
> bool		svc_alien_sock(struct net *net, int fd);
> int		svc_addsock(struct svc_serv *serv, const int fd,
> -					char *name_return, const size_t len,
> -					const struct cred *cred);
> +			    char *name_return, const size_t len, int flags,
> +			    const struct cred *cred);
> void		svc_init_xprt_sock(void);
> void		svc_cleanup_xprt_sock(void);
> struct svc_xprt *svc_sock_create(struct svc_serv *serv, int prot);
> @@ -74,5 +74,6 @@ void		svc_sock_destroy(struct svc_xprt *);
> #define SVC_SOCK_DEFAULTS	(0U)
> #define SVC_SOCK_ANONYMOUS	(1U << 0)	/* don't register with pmap */
> #define SVC_SOCK_TEMPORARY	(1U << 1)	/* flag socket as temporary */
> +#define SVC_SOCK_RPCBIND_NOERR	(1U << 2)	/* Ignore pmap errors */
> 
> #endif /* SUNRPC_SVCSOCK_H */
> diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c
> index 478f857cdaed..7f5b12a50bf9 100644
> --- a/net/sunrpc/svcsock.c
> +++ b/net/sunrpc/svcsock.c
> @@ -1309,14 +1309,15 @@ static struct svc_sock *svc_setup_socket(struct svc_serv *serv,
> 	inet = sock->sk;
> 
> 	/* Register socket with portmapper */
> -	if (pmap_register)
> +	if (pmap_register) {
> 		err = svc_register(serv, sock_net(sock->sk), inet->sk_family,
> 				     inet->sk_protocol,
> 				     ntohs(inet_sk(inet)->inet_sport));
> 
> -	if (err < 0) {
> -		kfree(svsk);
> -		return ERR_PTR(err);
> +		if (err < 0 && !(flags & SVC_SOCK_RPCBIND_NOERR)) {
> +			kfree(svsk);
> +			return ERR_PTR(err);
> +		}
> 	}
> 
> 	svsk->sk_sock = sock;
> @@ -1364,6 +1365,7 @@ EXPORT_SYMBOL_GPL(svc_alien_sock);
>  * @fd: file descriptor of the new listener
>  * @name_return: pointer to buffer to fill in with name of listener
>  * @len: size of the buffer
> + * @flags: flags argument for svc_setup_socket()
>  * @cred: credential
>  *
>  * Fills in socket name and returns positive length of name if successful.
> @@ -1371,7 +1373,7 @@ EXPORT_SYMBOL_GPL(svc_alien_sock);
>  * value.
>  */
> int svc_addsock(struct svc_serv *serv, const int fd, char *name_return,
> -		const size_t len, const struct cred *cred)
> +		const size_t len, int flags, const struct cred *cred)
> {
> 	int err = 0;
> 	struct socket *so = sockfd_lookup(fd, &err);
> @@ -1395,7 +1397,7 @@ int svc_addsock(struct svc_serv *serv, const int fd, char *name_return,
> 	err = -ENOENT;
> 	if (!try_module_get(THIS_MODULE))
> 		goto out;
> -	svsk = svc_setup_socket(serv, so, SVC_SOCK_DEFAULTS);
> +	svsk = svc_setup_socket(serv, so, flags);
> 	if (IS_ERR(svsk)) {
> 		module_put(THIS_MODULE);
> 		err = PTR_ERR(svsk);
> -- 
> 2.33.1
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 09/10] nfsd: allow lockd to be forcibly disabled
  2021-12-19  1:38                 ` [PATCH v2 09/10] nfsd: allow lockd to be forcibly disabled trondmy
  2021-12-19  1:38                   ` [PATCH v2 10/10] nfsd: Ignore rpcbind errors on nfsd startup trondmy
@ 2021-12-19 18:34                   ` Chuck Lever III
  1 sibling, 0 replies; 28+ messages in thread
From: Chuck Lever III @ 2021-12-19 18:34 UTC (permalink / raw)
  To: trondmy; +Cc: Bruce Fields, Linux NFS Mailing List



> On Dec 18, 2021, at 8:38 PM, trondmy@kernel.org wrote:
> 
> From: Jeff Layton <jeff.layton@primarydata.com>
> 
> In some cases, we may want to use a userland NLM server which will
> require that we turn off lockd.
> 
> Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
> fs/nfsd/nfssvc.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
> 
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index 80431921e5d7..6815c70b06af 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -340,8 +340,19 @@ static void nfsd_shutdown_generic(void)
> 	nfsd_file_cache_shutdown();
> }
> 
> +/*
> + * Allow admin to disable lockd. This would typically be used to allow (e.g.)
> + * a userspace NLM server of some sort to be used.
> + */
> +static bool nfsd_disable_lockd = false;
> +module_param(nfsd_disable_lockd, bool, 0644);
> +MODULE_PARM_DESC(nfsd_disable_lockd, "Allow lockd to be manually disabled.");
> +
> static bool nfsd_needs_lockd(struct nfsd_net *nn)

                              ^^^^^^^^^^^^^^^^^^^^

An nfsd_net * is passed to nfsd_needs_lockd(), therefore the
availability of the lockd service needs to be aware of net namespaces,
right?

NAK for now, but I'm open to more dialog about how to support the "no
lockd" use case. That's intriguing.


> {
> +	if (nfsd_disable_lockd)
> +		return false;
> +
> 	return nfsd_vers(nn, 2, NFSD_TEST) || nfsd_vers(nn, 3, NFSD_TEST);
> }
> 
> -- 
> 2.33.1
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 04/10] nfsd: Distinguish between required and optional NFSv3 post-op attributes
  2021-12-19  1:37       ` [PATCH v2 04/10] nfsd: Distinguish between required and optional NFSv3 post-op attributes trondmy
  2021-12-19  1:37         ` [PATCH v2 05/10] nfs: Add export support for weak cache consistency attributes trondmy
@ 2021-12-19 20:10         ` Chuck Lever III
  2021-12-19 21:09           ` Trond Myklebust
  1 sibling, 1 reply; 28+ messages in thread
From: Chuck Lever III @ 2021-12-19 20:10 UTC (permalink / raw)
  To: trondmy; +Cc: Bruce Fields, Linux NFS Mailing List



> On Dec 18, 2021, at 8:37 PM, trondmy@kernel.org wrote:
> 
> From: Trond Myklebust <trond.myklebust@primarydata.com>
> 
> The fhp->fh_no_wcc flag is automatically set in nfsd_set_fh_dentry()
> when the EXPORT_OP_NOWCC flag is set. In svcxdr_encode_post_op_attr(),
> this then causes nfsd to skip returning the post-op attributes.
> 
> The problem is that some of these post-op attributes are not really
> optional. In particular, we do want LOOKUP to always return post-op
> attributes for the file that is being looked up.
> 
> The solution is therefore to explicitly label the attributes that we can
> safely opt out from, and only apply the 'fhp->fh_no_wcc' test in that
> case.
> 
> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
> fs/nfsd/nfs3xdr.c | 77 +++++++++++++++++++++++++++++++----------------
> 1 file changed, 51 insertions(+), 26 deletions(-)
> 
> diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> index c3ac1b6aa3aa..6adfc40722fa 100644
> --- a/fs/nfsd/nfs3xdr.c
> +++ b/fs/nfsd/nfs3xdr.c
> @@ -415,19 +415,9 @@ svcxdr_encode_pre_op_attr(struct xdr_stream *xdr, const struct svc_fh *fhp)
> 	return svcxdr_encode_wcc_attr(xdr, fhp);
> }
> 
> -/**
> - * svcxdr_encode_post_op_attr - Encode NFSv3 post-op attributes
> - * @rqstp: Context of a completed RPC transaction
> - * @xdr: XDR stream
> - * @fhp: File handle to encode
> - *
> - * Return values:
> - *   %false: Send buffer space was exhausted
> - *   %true: Success
> - */
> -bool
> -svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr,
> -			   const struct svc_fh *fhp)
> +static bool
> +__svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr,
> +			     const struct svc_fh *fhp, bool force)
> {
> 	struct dentry *dentry = fhp->fh_dentry;
> 	struct kstat stat;
> @@ -437,7 +427,7 @@ svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr,
> 	 * stale file handle. In this case, no attributes are
> 	 * returned.
> 	 */
> -	if (fhp->fh_no_wcc || !dentry || !d_really_is_positive(dentry))
> +	if (!force || !dentry || !d_really_is_positive(dentry))
> 		goto no_post_op_attrs;
> 	if (fh_getattr(fhp, &stat) != nfs_ok)
> 		goto no_post_op_attrs;
> @@ -454,6 +444,31 @@ svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr,
> 	return xdr_stream_encode_item_absent(xdr) > 0;
> }
> 
> +/**
> + * svcxdr_encode_post_op_attr - Encode NFSv3 post-op attributes
> + * @rqstp: Context of a completed RPC transaction
> + * @xdr: XDR stream
> + * @fhp: File handle to encode
> + *
> + * Return values:
> + *   %false: Send buffer space was exhausted
> + *   %true: Success
> + */
> +bool
> +svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr,
> +			   const struct svc_fh *fhp)
> +{
> +	return __svcxdr_encode_post_op_attr(rqstp, xdr, fhp, true);
> +}
> +
> +static bool
> +svcxdr_encode_post_op_attr_opportunistic(struct svc_rqst *rqstp,
> +					 struct xdr_stream *xdr,
> +					 const struct svc_fh *fhp)
> +{
> +	return __svcxdr_encode_post_op_attr(rqstp, xdr, fhp, !fhp->fh_no_wcc);
> +}
> +

Thanks for splitting this out: the "why" is much clearer.

Wouldn't it be simpler to explicitly set fh_no_wcc to false
in each of the cases where you want to ensure the server
emits WCC? And perhaps that should be done in nfs3proc.c
instead of in nfs3xdr.c.


> /*
>  * Encode weak cache consistency data
>  */
> @@ -481,7 +496,7 @@ svcxdr_encode_wcc_data(struct svc_rqst *rqstp, struct xdr_stream *xdr,
> neither:
> 	if (xdr_stream_encode_item_absent(xdr) < 0)
> 		return false;
> -	if (!svcxdr_encode_post_op_attr(rqstp, xdr, fhp))
> +	if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr, fhp))
> 		return false;
> 
> 	return true;
> @@ -854,11 +869,13 @@ nfs3svc_encode_lookupres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
> 			return false;
> 		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh))
> 			return false;
> -		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->dirfh))
> +		if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
> +							      &resp->dirfh))
> 			return false;
> 		break;
> 	default:
> -		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->dirfh))
> +		if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
> +							      &resp->dirfh))
> 			return false;
> 	}
> 
> @@ -875,13 +892,15 @@ nfs3svc_encode_accessres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
> 		return false;
> 	switch (resp->status) {
> 	case nfs_ok:
> -		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh))
> +		if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
> +							      &resp->fh))
> 			return false;
> 		if (xdr_stream_encode_u32(xdr, resp->access) < 0)
> 			return false;
> 		break;
> 	default:
> -		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh))
> +		if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
> +							      &resp->fh))
> 			return false;
> 	}
> 
> @@ -899,7 +918,8 @@ nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
> 		return false;
> 	switch (resp->status) {
> 	case nfs_ok:
> -		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh))
> +		if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
> +							      &resp->fh))
> 			return false;
> 		if (xdr_stream_encode_u32(xdr, resp->len) < 0)
> 			return false;
> @@ -908,7 +928,8 @@ nfs3svc_encode_readlinkres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
> 			return false;
> 		break;
> 	default:
> -		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh))
> +		if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
> +							      &resp->fh))
> 			return false;
> 	}
> 
> @@ -926,7 +947,8 @@ nfs3svc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
> 		return false;
> 	switch (resp->status) {
> 	case nfs_ok:
> -		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh))
> +		if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
> +							      &resp->fh))
> 			return false;
> 		if (xdr_stream_encode_u32(xdr, resp->count) < 0)
> 			return false;
> @@ -940,7 +962,8 @@ nfs3svc_encode_readres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
> 			return false;
> 		break;
> 	default:
> -		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh))
> +		if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
> +							      &resp->fh))
> 			return false;
> 	}
> 
> @@ -1032,7 +1055,8 @@ nfs3svc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
> 		return false;
> 	switch (resp->status) {
> 	case nfs_ok:
> -		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh))
> +		if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
> +							      &resp->fh))
> 			return false;
> 		if (!svcxdr_encode_cookieverf3(xdr, resp->verf))
> 			return false;
> @@ -1044,7 +1068,8 @@ nfs3svc_encode_readdirres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
> 			return false;
> 		break;
> 	default:
> -		if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh))
> +		if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
> +							      &resp->fh))
> 			return false;
> 	}
> 
> @@ -1188,7 +1213,7 @@ svcxdr_encode_entry3_plus(struct nfsd3_readdirres *resp, const char *name,
> 	if (compose_entry_fh(resp, fhp, name, namlen, ino) != nfs_ok)
> 		goto out_noattrs;
> 
> -	if (!svcxdr_encode_post_op_attr(resp->rqstp, xdr, fhp))
> +	if (!svcxdr_encode_post_op_attr_opportunistic(resp->rqstp, xdr, fhp))
> 		goto out;
> 	if (!svcxdr_encode_post_op_fh3(xdr, fhp))
> 		goto out;
> -- 
> 2.33.1
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 06/10] nfsd: NFSv3 should allow zero length writes
  2021-12-19  1:37           ` [PATCH v2 06/10] nfsd: NFSv3 should allow zero length writes trondmy
  2021-12-19  1:38             ` [PATCH v2 07/10] nfsd: Add a tracepoint for errors in nfsd4_clone_file_range() trondmy
@ 2021-12-19 20:11             ` Chuck Lever III
  1 sibling, 0 replies; 28+ messages in thread
From: Chuck Lever III @ 2021-12-19 20:11 UTC (permalink / raw)
  To: trondmy; +Cc: Bruce Fields, Linux NFS Mailing List



> On Dec 18, 2021, at 8:37 PM, trondmy@kernel.org wrote:
> 
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> According to RFC1813: "If count is 0, the WRITE will succeed
> and return a count of 0, barring errors due to permissions checking."

If you resend this series, please move this patch to the front.


> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
> fs/nfsd/vfs.c    | 3 +++
> net/sunrpc/svc.c | 2 +-
> 2 files changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index eb9818432149..85e579aa6944 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -967,6 +967,9 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
> 
> 	trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
> 
> +	if (!*cnt)
> +		return nfs_ok;
> +
> 	if (sb->s_export_op)
> 		exp_op_flags = sb->s_export_op->flags;
> 
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index 4292278a9552..72a7822fd257 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -1638,7 +1638,7 @@ unsigned int svc_fill_write_vector(struct svc_rqst *rqstp,
> 	 * entirely in rq_arg.pages. In this case, @first is empty.
> 	 */
> 	i = 0;
> -	if (first->iov_len) {
> +	if (first->iov_len || !total) {
> 		vec[i].iov_base = first->iov_base;
> 		vec[i].iov_len = min_t(size_t, total, first->iov_len);
> 		total -= vec[i].iov_len;
> -- 
> 2.33.1
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 10/10] nfsd: Ignore rpcbind errors on nfsd startup
  2021-12-19 18:15                     ` Chuck Lever III
@ 2021-12-19 20:49                       ` Trond Myklebust
  2021-12-20 15:51                         ` Chuck Lever III
  0 siblings, 1 reply; 28+ messages in thread
From: Trond Myklebust @ 2021-12-19 20:49 UTC (permalink / raw)
  To: chuck.lever, trondmy; +Cc: linux-nfs, bfields

On Sun, 2021-12-19 at 18:15 +0000, Chuck Lever III wrote:
> 
> > On Dec 18, 2021, at 8:38 PM, trondmy@kernel.org wrote:
> > 
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > 
> > NFSv4 doesn't need rpcbind, so let's not refuse to start up just
> > because
> > the rpcbind registration failed.
> 
> Commit 7e55b59b2f32 ("SUNRPC/NFSD: Support a new option for ignoring
> the result of svc_register") added vs_rpcb_optnl, which is already
> set for nfsd4_version4. Is that not adequate?
> 

The other issue is that under certain circumstances, you may also want
to run NFSv3 without rpcbind support. For instance, when you have a
knfsd server instance running as a data server, there is typically no
need to run rpcbind.



-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 04/10] nfsd: Distinguish between required and optional NFSv3 post-op attributes
  2021-12-19 20:10         ` [PATCH v2 04/10] nfsd: Distinguish between required and optional NFSv3 post-op attributes Chuck Lever III
@ 2021-12-19 21:09           ` Trond Myklebust
  2021-12-20 16:02             ` Chuck Lever III
  0 siblings, 1 reply; 28+ messages in thread
From: Trond Myklebust @ 2021-12-19 21:09 UTC (permalink / raw)
  To: chuck.lever; +Cc: linux-nfs, bfields

On Sun, 2021-12-19 at 20:10 +0000, Chuck Lever III wrote:
> 
> 
> > On Dec 18, 2021, at 8:37 PM, trondmy@kernel.org wrote:
> > 
> > From: Trond Myklebust <trond.myklebust@primarydata.com>
> > 
> > The fhp->fh_no_wcc flag is automatically set in
> > nfsd_set_fh_dentry()
> > when the EXPORT_OP_NOWCC flag is set. In
> > svcxdr_encode_post_op_attr(),
> > this then causes nfsd to skip returning the post-op attributes.
> > 
> > The problem is that some of these post-op attributes are not really
> > optional. In particular, we do want LOOKUP to always return post-op
> > attributes for the file that is being looked up.
> > 
> > The solution is therefore to explicitly label the attributes that
> > we can
> > safely opt out from, and only apply the 'fhp->fh_no_wcc' test in
> > that
> > case.
> > 
> > Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
> > Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > ---
> > fs/nfsd/nfs3xdr.c | 77 +++++++++++++++++++++++++++++++-------------
> > ---
> > 1 file changed, 51 insertions(+), 26 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> > index c3ac1b6aa3aa..6adfc40722fa 100644
> > --- a/fs/nfsd/nfs3xdr.c
> > +++ b/fs/nfsd/nfs3xdr.c
> > @@ -415,19 +415,9 @@ svcxdr_encode_pre_op_attr(struct xdr_stream
> > *xdr, const struct svc_fh *fhp)
> >         return svcxdr_encode_wcc_attr(xdr, fhp);
> > }
> > 
> > -/**
> > - * svcxdr_encode_post_op_attr - Encode NFSv3 post-op attributes
> > - * @rqstp: Context of a completed RPC transaction
> > - * @xdr: XDR stream
> > - * @fhp: File handle to encode
> > - *
> > - * Return values:
> > - *   %false: Send buffer space was exhausted
> > - *   %true: Success
> > - */
> > -bool
> > -svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct
> > xdr_stream *xdr,
> > -                          const struct svc_fh *fhp)
> > +static bool
> > +__svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct
> > xdr_stream *xdr,
> > +                            const struct svc_fh *fhp, bool force)
> > {
> >         struct dentry *dentry = fhp->fh_dentry;
> >         struct kstat stat;
> > @@ -437,7 +427,7 @@ svcxdr_encode_post_op_attr(struct svc_rqst
> > *rqstp, struct xdr_stream *xdr,
> >          * stale file handle. In this case, no attributes are
> >          * returned.
> >          */
> > -       if (fhp->fh_no_wcc || !dentry ||
> > !d_really_is_positive(dentry))
> > +       if (!force || !dentry || !d_really_is_positive(dentry))
> >                 goto no_post_op_attrs;
> >         if (fh_getattr(fhp, &stat) != nfs_ok)
> >                 goto no_post_op_attrs;
> > @@ -454,6 +444,31 @@ svcxdr_encode_post_op_attr(struct svc_rqst
> > *rqstp, struct xdr_stream *xdr,
> >         return xdr_stream_encode_item_absent(xdr) > 0;
> > }
> > 
> > +/**
> > + * svcxdr_encode_post_op_attr - Encode NFSv3 post-op attributes
> > + * @rqstp: Context of a completed RPC transaction
> > + * @xdr: XDR stream
> > + * @fhp: File handle to encode
> > + *
> > + * Return values:
> > + *   %false: Send buffer space was exhausted
> > + *   %true: Success
> > + */
> > +bool
> > +svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct
> > xdr_stream *xdr,
> > +                          const struct svc_fh *fhp)
> > +{
> > +       return __svcxdr_encode_post_op_attr(rqstp, xdr, fhp, true);
> > +}
> > +
> > +static bool
> > +svcxdr_encode_post_op_attr_opportunistic(struct svc_rqst *rqstp,
> > +                                        struct xdr_stream *xdr,
> > +                                        const struct svc_fh *fhp)
> > +{
> > +       return __svcxdr_encode_post_op_attr(rqstp, xdr, fhp, !fhp-
> > >fh_no_wcc);
> > +}
> > +
> 
> Thanks for splitting this out: the "why" is much clearer.
> 
> Wouldn't it be simpler to explicitly set fh_no_wcc to false
> in each of the cases where you want to ensure the server
> emits WCC? And perhaps that should be done in nfs3proc.c
> instead of in nfs3xdr.c.
> 

It can't be done in nfs3proc.c, and toggling the value of fh_no_wcc is
a lot more cumbersome than this approach.

The current code is broken for NFSv3 exports because it is unable to
distinguish between what is and isn't mandatory to return for in the
same NFS operation. That's the problem this patch fixes.

LOOKUP has to return the attributes for the object being looked up in
order to be useful. If the attributes are not up to date then we should
ask the NFS client that is being re-exported to go to the server to
revalidate its attributes.
The same is not true of the directory post-op attributes. LOOKUP does
not even change the contents of the directory, and so while it is
beneficial to have the NFS client return those attributes if they are
up to date, forcing it to go to the server to retrieve them is less
than optimal for system performance.



> 
> > /*
> >  * Encode weak cache consistency data
> >  */
> > @@ -481,7 +496,7 @@ svcxdr_encode_wcc_data(struct svc_rqst *rqstp,
> > struct xdr_stream *xdr,
> > neither:
> >         if (xdr_stream_encode_item_absent(xdr) < 0)
> >                 return false;
> > -       if (!svcxdr_encode_post_op_attr(rqstp, xdr, fhp))
> > +       if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
> > fhp))
> >                 return false;
> > 
> >         return true;
> > @@ -854,11 +869,13 @@ nfs3svc_encode_lookupres(struct svc_rqst
> > *rqstp, struct xdr_stream *xdr)
> >                         return false;
> >                 if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
> > >fh))
> >                         return false;
> > -               if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
> > >dirfh))
> > +               if
> > (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
> > +                                                            
> > &resp->dirfh))
> >                         return false;
> >                 break;
> >         default:
> > -               if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
> > >dirfh))
> > +               if
> > (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
> > +                                                            
> > &resp->dirfh))
> >                         return false;
> >         }
> > 
> > @@ -875,13 +892,15 @@ nfs3svc_encode_accessres(struct svc_rqst
> > *rqstp, struct xdr_stream *xdr)
> >                 return false;
> >         switch (resp->status) {
> >         case nfs_ok:
> > -               if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
> > >fh))
> > +               if
> > (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
> > +                                                            
> > &resp->fh))
> >                         return false;
> >                 if (xdr_stream_encode_u32(xdr, resp->access) < 0)
> >                         return false;
> >                 break;
> >         default:
> > -               if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
> > >fh))
> > +               if
> > (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
> > +                                                            
> > &resp->fh))
> >                         return false;
> >         }
> > 
> > @@ -899,7 +918,8 @@ nfs3svc_encode_readlinkres(struct svc_rqst
> > *rqstp, struct xdr_stream *xdr)
> >                 return false;
> >         switch (resp->status) {
> >         case nfs_ok:
> > -               if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
> > >fh))
> > +               if
> > (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
> > +                                                            
> > &resp->fh))
> >                         return false;
> >                 if (xdr_stream_encode_u32(xdr, resp->len) < 0)
> >                         return false;
> > @@ -908,7 +928,8 @@ nfs3svc_encode_readlinkres(struct svc_rqst
> > *rqstp, struct xdr_stream *xdr)
> >                         return false;
> >                 break;
> >         default:
> > -               if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
> > >fh))
> > +               if
> > (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
> > +                                                            
> > &resp->fh))
> >                         return false;
> >         }
> > 
> > @@ -926,7 +947,8 @@ nfs3svc_encode_readres(struct svc_rqst *rqstp,
> > struct xdr_stream *xdr)
> >                 return false;
> >         switch (resp->status) {
> >         case nfs_ok:
> > -               if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
> > >fh))
> > +               if
> > (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
> > +                                                            
> > &resp->fh))
> >                         return false;
> >                 if (xdr_stream_encode_u32(xdr, resp->count) < 0)
> >                         return false;
> > @@ -940,7 +962,8 @@ nfs3svc_encode_readres(struct svc_rqst *rqstp,
> > struct xdr_stream *xdr)
> >                         return false;
> >                 break;
> >         default:
> > -               if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
> > >fh))
> > +               if
> > (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
> > +                                                            
> > &resp->fh))
> >                         return false;
> >         }
> > 
> > @@ -1032,7 +1055,8 @@ nfs3svc_encode_readdirres(struct svc_rqst
> > *rqstp, struct xdr_stream *xdr)
> >                 return false;
> >         switch (resp->status) {
> >         case nfs_ok:
> > -               if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
> > >fh))
> > +               if
> > (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
> > +                                                            
> > &resp->fh))
> >                         return false;
> >                 if (!svcxdr_encode_cookieverf3(xdr, resp->verf))
> >                         return false;
> > @@ -1044,7 +1068,8 @@ nfs3svc_encode_readdirres(struct svc_rqst
> > *rqstp, struct xdr_stream *xdr)
> >                         return false;
> >                 break;
> >         default:
> > -               if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
> > >fh))
> > +               if
> > (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
> > +                                                            
> > &resp->fh))
> >                         return false;
> >         }
> > 
> > @@ -1188,7 +1213,7 @@ svcxdr_encode_entry3_plus(struct
> > nfsd3_readdirres *resp, const char *name,
> >         if (compose_entry_fh(resp, fhp, name, namlen, ino) !=
> > nfs_ok)
> >                 goto out_noattrs;
> > 
> > -       if (!svcxdr_encode_post_op_attr(resp->rqstp, xdr, fhp))
> > +       if (!svcxdr_encode_post_op_attr_opportunistic(resp->rqstp,
> > xdr, fhp))
> >                 goto out;
> >         if (!svcxdr_encode_post_op_fh3(xdr, fhp))
> >                 goto out;
> > -- 
> > 2.33.1
> > 
> 
> --
> Chuck Lever
> 
> 
> 

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 10/10] nfsd: Ignore rpcbind errors on nfsd startup
  2021-12-19 20:49                       ` Trond Myklebust
@ 2021-12-20 15:51                         ` Chuck Lever III
  2021-12-20 18:35                           ` Trond Myklebust
  0 siblings, 1 reply; 28+ messages in thread
From: Chuck Lever III @ 2021-12-20 15:51 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: trondmy, Linux NFS Mailing List, Bruce Fields



> On Dec 19, 2021, at 3:49 PM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Sun, 2021-12-19 at 18:15 +0000, Chuck Lever III wrote:
>> 
>>> On Dec 18, 2021, at 8:38 PM, trondmy@kernel.org wrote:
>>> 
>>> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>>> 
>>> NFSv4 doesn't need rpcbind, so let's not refuse to start up just
>>> because
>>> the rpcbind registration failed.
>> 
>> Commit 7e55b59b2f32 ("SUNRPC/NFSD: Support a new option for ignoring
>> the result of svc_register") added vs_rpcb_optnl, which is already
>> set for nfsd4_version4. Is that not adequate?
>> 
> 
> The other issue is that under certain circumstances, you may also want
> to run NFSv3 without rpcbind support. For instance, when you have a
> knfsd server instance running as a data server, there is typically no
> need to run rpcbind.

So what you are saying is that you'd like this to be a run-time setting
instead of a compile-time setting. Got it.

Note that this patch adds a non-container-aware administrative API. For
the same reasons I NAK'd 9/10, I'm going to NAK this one and ask that
you provide a version that is container-aware (ideally: not a module
parameter).

The new implementation should remove vs_rpcb_optnl, as a clean up.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 04/10] nfsd: Distinguish between required and optional NFSv3 post-op attributes
  2021-12-19 21:09           ` Trond Myklebust
@ 2021-12-20 16:02             ` Chuck Lever III
  2021-12-20 18:38               ` Trond Myklebust
  0 siblings, 1 reply; 28+ messages in thread
From: Chuck Lever III @ 2021-12-20 16:02 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS Mailing List, Bruce Fields



> On Dec 19, 2021, at 4:09 PM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Sun, 2021-12-19 at 20:10 +0000, Chuck Lever III wrote:
>> 
>> 
>>> On Dec 18, 2021, at 8:37 PM, trondmy@kernel.org wrote:
>>> 
>>> From: Trond Myklebust <trond.myklebust@primarydata.com>
>>> 
>>> The fhp->fh_no_wcc flag is automatically set in
>>> nfsd_set_fh_dentry()
>>> when the EXPORT_OP_NOWCC flag is set. In
>>> svcxdr_encode_post_op_attr(),
>>> this then causes nfsd to skip returning the post-op attributes.
>>> 
>>> The problem is that some of these post-op attributes are not really
>>> optional. In particular, we do want LOOKUP to always return post-op
>>> attributes for the file that is being looked up.
>>> 
>>> The solution is therefore to explicitly label the attributes that
>>> we can
>>> safely opt out from, and only apply the 'fhp->fh_no_wcc' test in
>>> that
>>> case.
>>> 
>>> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
>>> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
>>> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
>>> ---
>>> fs/nfsd/nfs3xdr.c | 77 +++++++++++++++++++++++++++++++-------------
>>> ---
>>> 1 file changed, 51 insertions(+), 26 deletions(-)
>>> 
>>> diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
>>> index c3ac1b6aa3aa..6adfc40722fa 100644
>>> --- a/fs/nfsd/nfs3xdr.c
>>> +++ b/fs/nfsd/nfs3xdr.c
>>> @@ -415,19 +415,9 @@ svcxdr_encode_pre_op_attr(struct xdr_stream
>>> *xdr, const struct svc_fh *fhp)
>>>         return svcxdr_encode_wcc_attr(xdr, fhp);
>>> }
>>> 
>>> -/**
>>> - * svcxdr_encode_post_op_attr - Encode NFSv3 post-op attributes
>>> - * @rqstp: Context of a completed RPC transaction
>>> - * @xdr: XDR stream
>>> - * @fhp: File handle to encode
>>> - *
>>> - * Return values:
>>> - *   %false: Send buffer space was exhausted
>>> - *   %true: Success
>>> - */
>>> -bool
>>> -svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct
>>> xdr_stream *xdr,
>>> -                          const struct svc_fh *fhp)
>>> +static bool
>>> +__svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct
>>> xdr_stream *xdr,
>>> +                            const struct svc_fh *fhp, bool force)
>>> {
>>>         struct dentry *dentry = fhp->fh_dentry;
>>>         struct kstat stat;
>>> @@ -437,7 +427,7 @@ svcxdr_encode_post_op_attr(struct svc_rqst
>>> *rqstp, struct xdr_stream *xdr,
>>>          * stale file handle. In this case, no attributes are
>>>          * returned.
>>>          */
>>> -       if (fhp->fh_no_wcc || !dentry ||
>>> !d_really_is_positive(dentry))
>>> +       if (!force || !dentry || !d_really_is_positive(dentry))
>>>                 goto no_post_op_attrs;
>>>         if (fh_getattr(fhp, &stat) != nfs_ok)
>>>                 goto no_post_op_attrs;
>>> @@ -454,6 +444,31 @@ svcxdr_encode_post_op_attr(struct svc_rqst
>>> *rqstp, struct xdr_stream *xdr,
>>>         return xdr_stream_encode_item_absent(xdr) > 0;
>>> }
>>> 
>>> +/**
>>> + * svcxdr_encode_post_op_attr - Encode NFSv3 post-op attributes
>>> + * @rqstp: Context of a completed RPC transaction
>>> + * @xdr: XDR stream
>>> + * @fhp: File handle to encode
>>> + *
>>> + * Return values:
>>> + *   %false: Send buffer space was exhausted
>>> + *   %true: Success
>>> + */
>>> +bool
>>> +svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct
>>> xdr_stream *xdr,
>>> +                          const struct svc_fh *fhp)
>>> +{
>>> +       return __svcxdr_encode_post_op_attr(rqstp, xdr, fhp, true);
>>> +}
>>> +
>>> +static bool
>>> +svcxdr_encode_post_op_attr_opportunistic(struct svc_rqst *rqstp,
>>> +                                        struct xdr_stream *xdr,
>>> +                                        const struct svc_fh *fhp)
>>> +{
>>> +       return __svcxdr_encode_post_op_attr(rqstp, xdr, fhp, !fhp-
>>>> fh_no_wcc);
>>> +}
>>> +
>> 
>> Thanks for splitting this out: the "why" is much clearer.
>> 
>> Wouldn't it be simpler to explicitly set fh_no_wcc to false
>> in each of the cases where you want to ensure the server
>> emits WCC? And perhaps that should be done in nfs3proc.c
>> instead of in nfs3xdr.c.
>> 
> 
> It can't be done in nfs3proc.c, and toggling the value of fh_no_wcc is
> a lot more cumbersome than this approach.
> 
> The current code is broken for NFSv3 exports because it is unable to
> distinguish between what is and isn't mandatory to return for in the
> same NFS operation. That's the problem this patch fixes.

That suggests that a Fixes: tag is appropriate. Can you recommend one?


> LOOKUP has to return the attributes for the object being looked up in
> order to be useful. If the attributes are not up to date then we should
> ask the NFS client that is being re-exported to go to the server to
> revalidate its attributes.
> The same is not true of the directory post-op attributes. LOOKUP does
> not even change the contents of the directory, and so while it is
> beneficial to have the NFS client return those attributes if they are
> up to date, forcing it to go to the server to retrieve them is less
> than optimal for system performance.

I get all that, but I don't see how this is cumbersome at all:

 82 static __be32
 83 nfsd3_proc_lookup(struct svc_rqst *rqstp)
 84 {
...
 96         resp->status = nfsd_lookup(rqstp, &resp->dirfh,
 97                                    argp->name, argp->len,
 98                                    &resp->fh);
    +       resp->fh.fh_no_wcc = false;
 99         return rpc_success;
100 }

Then in 5/10, pass !fhp->fh_no_wcc to nfsd_getattr().


>>> /*
>>>  * Encode weak cache consistency data
>>>  */
>>> @@ -481,7 +496,7 @@ svcxdr_encode_wcc_data(struct svc_rqst *rqstp,
>>> struct xdr_stream *xdr,
>>> neither:
>>>         if (xdr_stream_encode_item_absent(xdr) < 0)
>>>                 return false;
>>> -       if (!svcxdr_encode_post_op_attr(rqstp, xdr, fhp))
>>> +       if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
>>> fhp))
>>>                 return false;
>>> 
>>>         return true;
>>> @@ -854,11 +869,13 @@ nfs3svc_encode_lookupres(struct svc_rqst
>>> *rqstp, struct xdr_stream *xdr)
>>>                         return false;
>>>                 if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
>>>> fh))
>>>                         return false;
>>> -               if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
>>>> dirfh))
>>> +               if
>>> (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
>>> +                                                            
>>> &resp->dirfh))
>>>                         return false;
>>>                 break;
>>>         default:
>>> -               if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
>>>> dirfh))
>>> +               if
>>> (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
>>> +                                                            
>>> &resp->dirfh))
>>>                         return false;
>>>         }
>>> 
>>> @@ -875,13 +892,15 @@ nfs3svc_encode_accessres(struct svc_rqst
>>> *rqstp, struct xdr_stream *xdr)
>>>                 return false;
>>>         switch (resp->status) {
>>>         case nfs_ok:
>>> -               if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
>>>> fh))
>>> +               if
>>> (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
>>> +                                                            
>>> &resp->fh))
>>>                         return false;
>>>                 if (xdr_stream_encode_u32(xdr, resp->access) < 0)
>>>                         return false;
>>>                 break;
>>>         default:
>>> -               if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
>>>> fh))
>>> +               if
>>> (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
>>> +                                                            
>>> &resp->fh))
>>>                         return false;
>>>         }
>>> 
>>> @@ -899,7 +918,8 @@ nfs3svc_encode_readlinkres(struct svc_rqst
>>> *rqstp, struct xdr_stream *xdr)
>>>                 return false;
>>>         switch (resp->status) {
>>>         case nfs_ok:
>>> -               if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
>>>> fh))
>>> +               if
>>> (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
>>> +                                                            
>>> &resp->fh))
>>>                         return false;
>>>                 if (xdr_stream_encode_u32(xdr, resp->len) < 0)
>>>                         return false;
>>> @@ -908,7 +928,8 @@ nfs3svc_encode_readlinkres(struct svc_rqst
>>> *rqstp, struct xdr_stream *xdr)
>>>                         return false;
>>>                 break;
>>>         default:
>>> -               if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
>>>> fh))
>>> +               if
>>> (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
>>> +                                                            
>>> &resp->fh))
>>>                         return false;
>>>         }
>>> 
>>> @@ -926,7 +947,8 @@ nfs3svc_encode_readres(struct svc_rqst *rqstp,
>>> struct xdr_stream *xdr)
>>>                 return false;
>>>         switch (resp->status) {
>>>         case nfs_ok:
>>> -               if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
>>>> fh))
>>> +               if
>>> (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
>>> +                                                            
>>> &resp->fh))
>>>                         return false;
>>>                 if (xdr_stream_encode_u32(xdr, resp->count) < 0)
>>>                         return false;
>>> @@ -940,7 +962,8 @@ nfs3svc_encode_readres(struct svc_rqst *rqstp,
>>> struct xdr_stream *xdr)
>>>                         return false;
>>>                 break;
>>>         default:
>>> -               if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
>>>> fh))
>>> +               if
>>> (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
>>> +                                                            
>>> &resp->fh))
>>>                         return false;
>>>         }
>>> 
>>> @@ -1032,7 +1055,8 @@ nfs3svc_encode_readdirres(struct svc_rqst
>>> *rqstp, struct xdr_stream *xdr)
>>>                 return false;
>>>         switch (resp->status) {
>>>         case nfs_ok:
>>> -               if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
>>>> fh))
>>> +               if
>>> (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
>>> +                                                            
>>> &resp->fh))
>>>                         return false;
>>>                 if (!svcxdr_encode_cookieverf3(xdr, resp->verf))
>>>                         return false;
>>> @@ -1044,7 +1068,8 @@ nfs3svc_encode_readdirres(struct svc_rqst
>>> *rqstp, struct xdr_stream *xdr)
>>>                         return false;
>>>                 break;
>>>         default:
>>> -               if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp-
>>>> fh))
>>> +               if
>>> (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
>>> +                                                            
>>> &resp->fh))
>>>                         return false;
>>>         }
>>> 
>>> @@ -1188,7 +1213,7 @@ svcxdr_encode_entry3_plus(struct
>>> nfsd3_readdirres *resp, const char *name,
>>>         if (compose_entry_fh(resp, fhp, name, namlen, ino) !=
>>> nfs_ok)
>>>                 goto out_noattrs;
>>> 
>>> -       if (!svcxdr_encode_post_op_attr(resp->rqstp, xdr, fhp))
>>> +       if (!svcxdr_encode_post_op_attr_opportunistic(resp->rqstp,
>>> xdr, fhp))
>>>                 goto out;
>>>         if (!svcxdr_encode_post_op_fh3(xdr, fhp))
>>>                 goto out;
>>> -- 
>>> 2.33.1
>>> 
>> 
>> --
>> Chuck Lever
>> 
>> 
>> 
> 
> -- 
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
> 
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 10/10] nfsd: Ignore rpcbind errors on nfsd startup
  2021-12-20 15:51                         ` Chuck Lever III
@ 2021-12-20 18:35                           ` Trond Myklebust
  2021-12-20 19:02                             ` Chuck Lever III
  0 siblings, 1 reply; 28+ messages in thread
From: Trond Myklebust @ 2021-12-20 18:35 UTC (permalink / raw)
  To: chuck.lever; +Cc: linux-nfs, bfields, trondmy

On Mon, 2021-12-20 at 15:51 +0000, Chuck Lever III wrote:
> 
> 
> > On Dec 19, 2021, at 3:49 PM, Trond Myklebust
> > <trondmy@hammerspace.com> wrote:
> > 
> > On Sun, 2021-12-19 at 18:15 +0000, Chuck Lever III wrote:
> > > 
> > > > On Dec 18, 2021, at 8:38 PM, trondmy@kernel.org wrote:
> > > > 
> > > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > > 
> > > > NFSv4 doesn't need rpcbind, so let's not refuse to start up
> > > > just
> > > > because
> > > > the rpcbind registration failed.
> > > 
> > > Commit 7e55b59b2f32 ("SUNRPC/NFSD: Support a new option for
> > > ignoring
> > > the result of svc_register") added vs_rpcb_optnl, which is
> > > already
> > > set for nfsd4_version4. Is that not adequate?
> > > 
> > 
> > The other issue is that under certain circumstances, you may also
> > want
> > to run NFSv3 without rpcbind support. For instance, when you have a
> > knfsd server instance running as a data server, there is typically
> > no
> > need to run rpcbind.
> 
> So what you are saying is that you'd like this to be a run-time
> setting
> instead of a compile-time setting. Got it.
> 
> Note that this patch adds a non-container-aware administrative API.
> For
> the same reasons I NAK'd 9/10, I'm going to NAK this one and ask that
> you provide a version that is container-aware (ideally: not a module
> parameter).
> 
> The new implementation should remove vs_rpcb_optnl, as a clean up.
> 
> 

This is not something that turns off the registration with rpcbind. It
is something that turns off the decision to abort knfsd if that
registration fails. That's not something that needs to be
containerised: it's just common sense and really wants to be the
default behaviour everywhere.

The only reason for the module parameter is to enable legacy behaviour.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 04/10] nfsd: Distinguish between required and optional NFSv3 post-op attributes
  2021-12-20 16:02             ` Chuck Lever III
@ 2021-12-20 18:38               ` Trond Myklebust
  2021-12-20 19:22                 ` Chuck Lever III
  0 siblings, 1 reply; 28+ messages in thread
From: Trond Myklebust @ 2021-12-20 18:38 UTC (permalink / raw)
  To: chuck.lever; +Cc: linux-nfs, bfields

On Mon, 2021-12-20 at 16:02 +0000, Chuck Lever III wrote:
> 
> 
> > On Dec 19, 2021, at 4:09 PM, Trond Myklebust
> > <trondmy@hammerspace.com> wrote:
> > 
> > On Sun, 2021-12-19 at 20:10 +0000, Chuck Lever III wrote:
> > > 
> > > 
> > > > On Dec 18, 2021, at 8:37 PM, trondmy@kernel.org wrote:
> > > > 
> > > > From: Trond Myklebust <trond.myklebust@primarydata.com>
> > > > 
> > > > The fhp->fh_no_wcc flag is automatically set in
> > > > nfsd_set_fh_dentry()
> > > > when the EXPORT_OP_NOWCC flag is set. In
> > > > svcxdr_encode_post_op_attr(),
> > > > this then causes nfsd to skip returning the post-op attributes.
> > > > 
> > > > The problem is that some of these post-op attributes are not
> > > > really
> > > > optional. In particular, we do want LOOKUP to always return
> > > > post-op
> > > > attributes for the file that is being looked up.
> > > > 
> > > > The solution is therefore to explicitly label the attributes
> > > > that
> > > > we can
> > > > safely opt out from, and only apply the 'fhp->fh_no_wcc' test
> > > > in
> > > > that
> > > > case.
> > > > 
> > > > Signed-off-by: Trond Myklebust
> > > > <trond.myklebust@primarydata.com>
> > > > Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
> > > > Signed-off-by: Trond Myklebust
> > > > <trond.myklebust@hammerspace.com>
> > > > ---
> > > > fs/nfsd/nfs3xdr.c | 77 +++++++++++++++++++++++++++++++---------
> > > > ----
> > > > ---
> > > > 1 file changed, 51 insertions(+), 26 deletions(-)
> > > > 
> > > > diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
> > > > index c3ac1b6aa3aa..6adfc40722fa 100644
> > > > --- a/fs/nfsd/nfs3xdr.c
> > > > +++ b/fs/nfsd/nfs3xdr.c
> > > > @@ -415,19 +415,9 @@ svcxdr_encode_pre_op_attr(struct
> > > > xdr_stream
> > > > *xdr, const struct svc_fh *fhp)
> > > >         return svcxdr_encode_wcc_attr(xdr, fhp);
> > > > }
> > > > 
> > > > -/**
> > > > - * svcxdr_encode_post_op_attr - Encode NFSv3 post-op
> > > > attributes
> > > > - * @rqstp: Context of a completed RPC transaction
> > > > - * @xdr: XDR stream
> > > > - * @fhp: File handle to encode
> > > > - *
> > > > - * Return values:
> > > > - *   %false: Send buffer space was exhausted
> > > > - *   %true: Success
> > > > - */
> > > > -bool
> > > > -svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct
> > > > xdr_stream *xdr,
> > > > -                          const struct svc_fh *fhp)
> > > > +static bool
> > > > +__svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct
> > > > xdr_stream *xdr,
> > > > +                            const struct svc_fh *fhp, bool
> > > > force)
> > > > {
> > > >         struct dentry *dentry = fhp->fh_dentry;
> > > >         struct kstat stat;
> > > > @@ -437,7 +427,7 @@ svcxdr_encode_post_op_attr(struct svc_rqst
> > > > *rqstp, struct xdr_stream *xdr,
> > > >          * stale file handle. In this case, no attributes are
> > > >          * returned.
> > > >          */
> > > > -       if (fhp->fh_no_wcc || !dentry ||
> > > > !d_really_is_positive(dentry))
> > > > +       if (!force || !dentry || !d_really_is_positive(dentry))
> > > >                 goto no_post_op_attrs;
> > > >         if (fh_getattr(fhp, &stat) != nfs_ok)
> > > >                 goto no_post_op_attrs;
> > > > @@ -454,6 +444,31 @@ svcxdr_encode_post_op_attr(struct svc_rqst
> > > > *rqstp, struct xdr_stream *xdr,
> > > >         return xdr_stream_encode_item_absent(xdr) > 0;
> > > > }
> > > > 
> > > > +/**
> > > > + * svcxdr_encode_post_op_attr - Encode NFSv3 post-op
> > > > attributes
> > > > + * @rqstp: Context of a completed RPC transaction
> > > > + * @xdr: XDR stream
> > > > + * @fhp: File handle to encode
> > > > + *
> > > > + * Return values:
> > > > + *   %false: Send buffer space was exhausted
> > > > + *   %true: Success
> > > > + */
> > > > +bool
> > > > +svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct
> > > > xdr_stream *xdr,
> > > > +                          const struct svc_fh *fhp)
> > > > +{
> > > > +       return __svcxdr_encode_post_op_attr(rqstp, xdr, fhp,
> > > > true);
> > > > +}
> > > > +
> > > > +static bool
> > > > +svcxdr_encode_post_op_attr_opportunistic(struct svc_rqst
> > > > *rqstp,
> > > > +                                        struct xdr_stream
> > > > *xdr,
> > > > +                                        const struct svc_fh
> > > > *fhp)
> > > > +{
> > > > +       return __svcxdr_encode_post_op_attr(rqstp, xdr, fhp,
> > > > !fhp-
> > > > > fh_no_wcc);
> > > > +}
> > > > +
> > > 
> > > Thanks for splitting this out: the "why" is much clearer.
> > > 
> > > Wouldn't it be simpler to explicitly set fh_no_wcc to false
> > > in each of the cases where you want to ensure the server
> > > emits WCC? And perhaps that should be done in nfs3proc.c
> > > instead of in nfs3xdr.c.
> > > 
> > 
> > It can't be done in nfs3proc.c, and toggling the value of fh_no_wcc
> > is
> > a lot more cumbersome than this approach.
> > 
> > The current code is broken for NFSv3 exports because it is unable
> > to
> > distinguish between what is and isn't mandatory to return for in
> > the
> > same NFS operation. That's the problem this patch fixes.
> 
> That suggests that a Fixes: tag is appropriate. Can you recommend
> one?
> 
> 
> > LOOKUP has to return the attributes for the object being looked up
> > in
> > order to be useful. If the attributes are not up to date then we
> > should
> > ask the NFS client that is being re-exported to go to the server to
> > revalidate its attributes.
> > The same is not true of the directory post-op attributes. LOOKUP
> > does
> > not even change the contents of the directory, and so while it is
> > beneficial to have the NFS client return those attributes if they
> > are
> > up to date, forcing it to go to the server to retrieve them is less
> > than optimal for system performance.
> 
> I get all that, but I don't see how this is cumbersome at all:
> 
>  82 static __be32
>  83 nfsd3_proc_lookup(struct svc_rqst *rqstp)
>  84 {
> ...
>  96         resp->status = nfsd_lookup(rqstp, &resp->dirfh,
>  97                                    argp->name, argp->len,
>  98                                    &resp->fh);
>     +       resp->fh.fh_no_wcc = false;
>  99         return rpc_success;
> 100 }
> 
> Then in 5/10, pass !fhp->fh_no_wcc to nfsd_getattr().
> 
> 

That's not equivalent. That will force the NFS client to retrieve the
lookup object attributes AND the directory attributes.

As I said above, the latter is optional. The former is not.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 10/10] nfsd: Ignore rpcbind errors on nfsd startup
  2021-12-20 18:35                           ` Trond Myklebust
@ 2021-12-20 19:02                             ` Chuck Lever III
  2021-12-20 19:52                               ` Trond Myklebust
  0 siblings, 1 reply; 28+ messages in thread
From: Chuck Lever III @ 2021-12-20 19:02 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS Mailing List, Bruce Fields, trondmy



> On Dec 20, 2021, at 1:35 PM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Mon, 2021-12-20 at 15:51 +0000, Chuck Lever III wrote:
>> 
>> 
>>> On Dec 19, 2021, at 3:49 PM, Trond Myklebust
>>> <trondmy@hammerspace.com> wrote:
>>> 
>>> On Sun, 2021-12-19 at 18:15 +0000, Chuck Lever III wrote:
>>>> 
>>>>> On Dec 18, 2021, at 8:38 PM, trondmy@kernel.org wrote:
>>>>> 
>>>>> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>>>>> 
>>>>> NFSv4 doesn't need rpcbind, so let's not refuse to start up
>>>>> just
>>>>> because
>>>>> the rpcbind registration failed.
>>>> 
>>>> Commit 7e55b59b2f32 ("SUNRPC/NFSD: Support a new option for
>>>> ignoring
>>>> the result of svc_register") added vs_rpcb_optnl, which is
>>>> already
>>>> set for nfsd4_version4. Is that not adequate?
>>>> 
>>> 
>>> The other issue is that under certain circumstances, you may also
>>> want
>>> to run NFSv3 without rpcbind support. For instance, when you have a
>>> knfsd server instance running as a data server, there is typically
>>> no
>>> need to run rpcbind.
>> 
>> So what you are saying is that you'd like this to be a run-time
>> setting
>> instead of a compile-time setting. Got it.
>> 
>> Note that this patch adds a non-container-aware administrative API.
>> For
>> the same reasons I NAK'd 9/10, I'm going to NAK this one and ask that
>> you provide a version that is container-aware (ideally: not a module
>> parameter).
>> 
>> The new implementation should remove vs_rpcb_optnl, as a clean up.
>> 
>> 
> 
> This is not something that turns off the registration with rpcbind. It
> is something that turns off the decision to abort knfsd if that
> registration fails.

See below, that's just what vs_rpcb_optnl does. It it's true,
then the result of the rpcbind registration for that service
is ignored.

1039 int svc_generic_rpcbind_set(struct net *net,
1040                             const struct svc_program *progp,
1041                             u32 version, int family,
1042                             unsigned short proto,
1043                             unsigned short port)
1044 {
1045         const struct svc_version *vers = progp->pg_vers[version];
1046         int error;
1047 
1048         if (vers == NULL)
1049                 return 0;
1050 
1051         if (vers->vs_hidden) {
1052                 trace_svc_noregister(progp->pg_name, version, proto,
1053                                      port, family, 0);
1054                 return 0;
1055         }
1056 
1057         /*
1058          * Don't register a UDP port if we need congestion
1059          * control.
1060          */
1061         if (vers->vs_need_cong_ctrl && proto == IPPROTO_UDP)
1062                 return 0;
1063 
1064         error = svc_rpcbind_set_version(net, progp, version,
1065                                         family, proto, port);
1066 
1067         return (vers->vs_rpcb_optnl) ? 0 : error;
1068 }
1069 EXPORT_SYMBOL_GPL(svc_generic_rpcbind_set);

And, it's already the case that vs_rpcb_optnl is true for our
NFSv4 server. So the issue is for NFSv3 only. It still looks
to me like the patch description for this patch, at the very
least, is not correct.


> That's not something that needs to be
> containerised: it's just common sense and really wants to be the
> default behaviour everywhere.

I'm not following this. I can imagine deployment cases where one
container might want to ensure rpcbind is running for NFSv3 while
another does not care. What am I missing?


> The only reason for the module parameter is to enable legacy behaviour.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 04/10] nfsd: Distinguish between required and optional NFSv3 post-op attributes
  2021-12-20 18:38               ` Trond Myklebust
@ 2021-12-20 19:22                 ` Chuck Lever III
  0 siblings, 0 replies; 28+ messages in thread
From: Chuck Lever III @ 2021-12-20 19:22 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS Mailing List, Bruce Fields



> On Dec 20, 2021, at 1:38 PM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Mon, 2021-12-20 at 16:02 +0000, Chuck Lever III wrote:
>> 
>> 
>>> On Dec 19, 2021, at 4:09 PM, Trond Myklebust
>>> <trondmy@hammerspace.com> wrote:
>>> 
>>> On Sun, 2021-12-19 at 20:10 +0000, Chuck Lever III wrote:
>>>> 
>>>> 
>>>>> On Dec 18, 2021, at 8:37 PM, trondmy@kernel.org wrote:
>>>>> 
>>>>> From: Trond Myklebust <trond.myklebust@primarydata.com>
>>>>> 
>>>>> The fhp->fh_no_wcc flag is automatically set in
>>>>> nfsd_set_fh_dentry()
>>>>> when the EXPORT_OP_NOWCC flag is set. In
>>>>> svcxdr_encode_post_op_attr(),
>>>>> this then causes nfsd to skip returning the post-op attributes.
>>>>> 
>>>>> The problem is that some of these post-op attributes are not
>>>>> really
>>>>> optional. In particular, we do want LOOKUP to always return
>>>>> post-op
>>>>> attributes for the file that is being looked up.
>>>>> 
>>>>> The solution is therefore to explicitly label the attributes
>>>>> that
>>>>> we can
>>>>> safely opt out from, and only apply the 'fhp->fh_no_wcc' test
>>>>> in
>>>>> that
>>>>> case.
>>>>> 
>>>>> Signed-off-by: Trond Myklebust
>>>>> <trond.myklebust@primarydata.com>
>>>>> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
>>>>> Signed-off-by: Trond Myklebust
>>>>> <trond.myklebust@hammerspace.com>
>>>>> ---
>>>>> fs/nfsd/nfs3xdr.c | 77 +++++++++++++++++++++++++++++++---------
>>>>> ----
>>>>> ---
>>>>> 1 file changed, 51 insertions(+), 26 deletions(-)
>>>>> 
>>>>> diff --git a/fs/nfsd/nfs3xdr.c b/fs/nfsd/nfs3xdr.c
>>>>> index c3ac1b6aa3aa..6adfc40722fa 100644
>>>>> --- a/fs/nfsd/nfs3xdr.c
>>>>> +++ b/fs/nfsd/nfs3xdr.c
>>>>> @@ -415,19 +415,9 @@ svcxdr_encode_pre_op_attr(struct
>>>>> xdr_stream
>>>>> *xdr, const struct svc_fh *fhp)
>>>>>         return svcxdr_encode_wcc_attr(xdr, fhp);
>>>>> }
>>>>> 
>>>>> -/**
>>>>> - * svcxdr_encode_post_op_attr - Encode NFSv3 post-op
>>>>> attributes
>>>>> - * @rqstp: Context of a completed RPC transaction
>>>>> - * @xdr: XDR stream
>>>>> - * @fhp: File handle to encode
>>>>> - *
>>>>> - * Return values:
>>>>> - *   %false: Send buffer space was exhausted
>>>>> - *   %true: Success
>>>>> - */
>>>>> -bool
>>>>> -svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct
>>>>> xdr_stream *xdr,
>>>>> -                          const struct svc_fh *fhp)
>>>>> +static bool
>>>>> +__svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct
>>>>> xdr_stream *xdr,
>>>>> +                            const struct svc_fh *fhp, bool
>>>>> force)
>>>>> {
>>>>>         struct dentry *dentry = fhp->fh_dentry;
>>>>>         struct kstat stat;
>>>>> @@ -437,7 +427,7 @@ svcxdr_encode_post_op_attr(struct svc_rqst
>>>>> *rqstp, struct xdr_stream *xdr,
>>>>>          * stale file handle. In this case, no attributes are
>>>>>          * returned.
>>>>>          */
>>>>> -       if (fhp->fh_no_wcc || !dentry ||
>>>>> !d_really_is_positive(dentry))
>>>>> +       if (!force || !dentry || !d_really_is_positive(dentry))
>>>>>                 goto no_post_op_attrs;
>>>>>         if (fh_getattr(fhp, &stat) != nfs_ok)
>>>>>                 goto no_post_op_attrs;
>>>>> @@ -454,6 +444,31 @@ svcxdr_encode_post_op_attr(struct svc_rqst
>>>>> *rqstp, struct xdr_stream *xdr,
>>>>>         return xdr_stream_encode_item_absent(xdr) > 0;
>>>>> }
>>>>> 
>>>>> +/**
>>>>> + * svcxdr_encode_post_op_attr - Encode NFSv3 post-op
>>>>> attributes
>>>>> + * @rqstp: Context of a completed RPC transaction
>>>>> + * @xdr: XDR stream
>>>>> + * @fhp: File handle to encode
>>>>> + *
>>>>> + * Return values:
>>>>> + *   %false: Send buffer space was exhausted
>>>>> + *   %true: Success
>>>>> + */
>>>>> +bool
>>>>> +svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct
>>>>> xdr_stream *xdr,
>>>>> +                          const struct svc_fh *fhp)
>>>>> +{
>>>>> +       return __svcxdr_encode_post_op_attr(rqstp, xdr, fhp,
>>>>> true);
>>>>> +}
>>>>> +
>>>>> +static bool
>>>>> +svcxdr_encode_post_op_attr_opportunistic(struct svc_rqst
>>>>> *rqstp,
>>>>> +                                        struct xdr_stream
>>>>> *xdr,
>>>>> +                                        const struct svc_fh
>>>>> *fhp)
>>>>> +{
>>>>> +       return __svcxdr_encode_post_op_attr(rqstp, xdr, fhp,
>>>>> !fhp-
>>>>>> fh_no_wcc);
>>>>> +}
>>>>> +
>>>> 
>>>> Thanks for splitting this out: the "why" is much clearer.
>>>> 
>>>> Wouldn't it be simpler to explicitly set fh_no_wcc to false
>>>> in each of the cases where you want to ensure the server
>>>> emits WCC? And perhaps that should be done in nfs3proc.c
>>>> instead of in nfs3xdr.c.
>>>> 
>>> 
>>> It can't be done in nfs3proc.c, and toggling the value of fh_no_wcc
>>> is
>>> a lot more cumbersome than this approach.
>>> 
>>> The current code is broken for NFSv3 exports because it is unable
>>> to
>>> distinguish between what is and isn't mandatory to return for in
>>> the
>>> same NFS operation. That's the problem this patch fixes.
>> 
>> That suggests that a Fixes: tag is appropriate. Can you recommend
>> one?
>> 
>> 
>>> LOOKUP has to return the attributes for the object being looked up
>>> in
>>> order to be useful. If the attributes are not up to date then we
>>> should
>>> ask the NFS client that is being re-exported to go to the server to
>>> revalidate its attributes.
>>> The same is not true of the directory post-op attributes. LOOKUP
>>> does
>>> not even change the contents of the directory, and so while it is
>>> beneficial to have the NFS client return those attributes if they
>>> are
>>> up to date, forcing it to go to the server to retrieve them is less
>>> than optimal for system performance.
>> 
>> I get all that, but I don't see how this is cumbersome at all:
>> 
>>  82 static __be32
>>  83 nfsd3_proc_lookup(struct svc_rqst *rqstp)
>>  84 {
>> ...
>>  96         resp->status = nfsd_lookup(rqstp, &resp->dirfh,
>>  97                                    argp->name, argp->len,
>>  98                                    &resp->fh);
>>     +       resp->fh.fh_no_wcc = false;
>>  99         return rpc_success;
>> 100 }
>> 
>> Then in 5/10, pass !fhp->fh_no_wcc to nfsd_getattr().
> 
> That's not equivalent. That will force the NFS client to retrieve the
> lookup object attributes AND the directory attributes.

Does it? Your patch does this:

 418 static bool
 419 __svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr,
 420                              const struct svc_fh *fhp, bool force)
 421 {
...
 436         if (nfsd_getattr(&path, &stat, force) != nfs_ok)
 437                 goto no_post_op_attrs;

...

 461 bool
 462 svcxdr_encode_post_op_attr(struct svc_rqst *rqstp, struct xdr_stream *xdr,
 463                            const struct svc_fh *fhp)
 464 {
 465         return __svcxdr_encode_post_op_attr(rqstp, xdr, fhp, true);
 466 }
 467 
 468 static bool
 469 svcxdr_encode_post_op_attr_opportunistic(struct svc_rqst *rqstp,
 470                                          struct xdr_stream *xdr,
 471                                          const struct svc_fh *fhp)
 472 {
 473         return __svcxdr_encode_post_op_attr(rqstp, xdr, fhp, !fhp->fh_no_wcc);
 474 }

...

 863 bool
 864 nfs3svc_encode_lookupres(struct svc_rqst *rqstp, struct xdr_stream *xdr)
 865 {
...
 871         case nfs_ok:
 872                 if (!svcxdr_encode_nfs_fh3(xdr, &resp->fh))
 873                         return false;
 874                 if (!svcxdr_encode_post_op_attr(rqstp, xdr, &resp->fh))
 875                         return false;
 876                 if (!svcxdr_encode_post_op_attr_opportunistic(rqstp, xdr,
 877                                                               &resp->dirfh))
 878                         return false;
 879                 break;

So for resp->fh, this is equivalent to resp->fh.fh_no_wcc = false
and for resp->dirfh, this is equivalent to passing in
resp->dirfh.fh_no_wcc unchanged.

I don't see how that's different from what my suggestion does --
it forces WCC for the looked-up object, and leaves WCC for the
parent directory optional.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 10/10] nfsd: Ignore rpcbind errors on nfsd startup
  2021-12-20 19:02                             ` Chuck Lever III
@ 2021-12-20 19:52                               ` Trond Myklebust
  2021-12-20 20:12                                 ` Chuck Lever III
  0 siblings, 1 reply; 28+ messages in thread
From: Trond Myklebust @ 2021-12-20 19:52 UTC (permalink / raw)
  To: chuck.lever; +Cc: linux-nfs, bfields

On Mon, 2021-12-20 at 19:02 +0000, Chuck Lever III wrote:
> 
> 
> > On Dec 20, 2021, at 1:35 PM, Trond Myklebust
> > <trondmy@hammerspace.com> wrote:
> > 
> > On Mon, 2021-12-20 at 15:51 +0000, Chuck Lever III wrote:
> > > 
> > > 
> > > > On Dec 19, 2021, at 3:49 PM, Trond Myklebust
> > > > <trondmy@hammerspace.com> wrote:
> > > > 
> > > > On Sun, 2021-12-19 at 18:15 +0000, Chuck Lever III wrote:
> > > > > 
> > > > > > On Dec 18, 2021, at 8:38 PM, trondmy@kernel.org wrote:
> > > > > > 
> > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > > > > 
> > > > > > NFSv4 doesn't need rpcbind, so let's not refuse to start up
> > > > > > just
> > > > > > because
> > > > > > the rpcbind registration failed.
> > > > > 
> > > > > Commit 7e55b59b2f32 ("SUNRPC/NFSD: Support a new option for
> > > > > ignoring
> > > > > the result of svc_register") added vs_rpcb_optnl, which is
> > > > > already
> > > > > set for nfsd4_version4. Is that not adequate?
> > > > > 
> > > > 
> > > > The other issue is that under certain circumstances, you may
> > > > also
> > > > want
> > > > to run NFSv3 without rpcbind support. For instance, when you
> > > > have a
> > > > knfsd server instance running as a data server, there is
> > > > typically
> > > > no
> > > > need to run rpcbind.
> > > 
> > > So what you are saying is that you'd like this to be a run-time
> > > setting
> > > instead of a compile-time setting. Got it.
> > > 
> > > Note that this patch adds a non-container-aware administrative
> > > API.
> > > For
> > > the same reasons I NAK'd 9/10, I'm going to NAK this one and ask
> > > that
> > > you provide a version that is container-aware (ideally: not a
> > > module
> > > parameter).
> > > 
> > > The new implementation should remove vs_rpcb_optnl, as a clean
> > > up.
> > > 
> > > 
> > 
> > This is not something that turns off the registration with rpcbind.
> > It
> > is something that turns off the decision to abort knfsd if that
> > registration fails.
> 
> See below, that's just what vs_rpcb_optnl does. It it's true,
> then the result of the rpcbind registration for that service
> is ignored.
> 
> 1039 int svc_generic_rpcbind_set(struct net *net,
> 1040                             const struct svc_program *progp,
> 1041                             u32 version, int family,
> 1042                             unsigned short proto,
> 1043                             unsigned short port)
> 1044 {
> 1045         const struct svc_version *vers = progp-
> >pg_vers[version];
> 1046         int error;
> 1047 
> 1048         if (vers == NULL)
> 1049                 return 0;
> 1050 
> 1051         if (vers->vs_hidden) {
> 1052                 trace_svc_noregister(progp->pg_name, version,
> proto,
> 1053                                      port, family, 0);
> 1054                 return 0;
> 1055         }
> 1056 
> 1057         /*
> 1058          * Don't register a UDP port if we need congestion
> 1059          * control.
> 1060          */
> 1061         if (vers->vs_need_cong_ctrl && proto == IPPROTO_UDP)
> 1062                 return 0;
> 1063 
> 1064         error = svc_rpcbind_set_version(net, progp, version,
> 1065                                         family, proto, port);
> 1066 
> 1067         return (vers->vs_rpcb_optnl) ? 0 : error;
> 1068 }
> 1069 EXPORT_SYMBOL_GPL(svc_generic_rpcbind_set);
> 
> And, it's already the case that vs_rpcb_optnl is true for our
> NFSv4 server. So the issue is for NFSv3 only. It still looks
> to me like the patch description for this patch, at the very
> least, is not correct.
> 
> 
> > That's not something that needs to be
> > containerised: it's just common sense and really wants to be the
> > default behaviour everywhere.
> 
> I'm not following this. I can imagine deployment cases where one
> container might want to ensure rpcbind is running for NFSv3 while
> another does not care. What am I missing?
> 

Monitoring that rpcbind is working is not a kernel task. It's something
that can and should be done in userspace. The kernel task should only
be to try to register the ports that it is using to that rpcbind server
if it is up and running.

In a containerised environment, then even the job of registering the
ports is not necessarily desirable behaviour, since there will almost
always be some additional form of NAT or address + port mapping going
on that knfsd has no visibility into.

> 
> > The only reason for the module parameter is to enable legacy
> > behaviour.
> 
> 

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 10/10] nfsd: Ignore rpcbind errors on nfsd startup
  2021-12-20 19:52                               ` Trond Myklebust
@ 2021-12-20 20:12                                 ` Chuck Lever III
  0 siblings, 0 replies; 28+ messages in thread
From: Chuck Lever III @ 2021-12-20 20:12 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Linux NFS Mailing List, Bruce Fields



> On Dec 20, 2021, at 2:52 PM, Trond Myklebust <trondmy@hammerspace.com> wrote:
> 
> On Mon, 2021-12-20 at 19:02 +0000, Chuck Lever III wrote:
>> 
>> 
>>> On Dec 20, 2021, at 1:35 PM, Trond Myklebust
>>> <trondmy@hammerspace.com> wrote:
>>> 
>>> On Mon, 2021-12-20 at 15:51 +0000, Chuck Lever III wrote:
>>>> 
>>>> 
>>>>> On Dec 19, 2021, at 3:49 PM, Trond Myklebust
>>>>> <trondmy@hammerspace.com> wrote:
>>>>> 
>>>>> On Sun, 2021-12-19 at 18:15 +0000, Chuck Lever III wrote:
>>>>>> 
>>>>>>> On Dec 18, 2021, at 8:38 PM, trondmy@kernel.org wrote:
>>>>>>> 
>>>>>>> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>>>>>>> 
>>>>>>> NFSv4 doesn't need rpcbind, so let's not refuse to start up
>>>>>>> just
>>>>>>> because
>>>>>>> the rpcbind registration failed.
>>>>>> 
>>>>>> Commit 7e55b59b2f32 ("SUNRPC/NFSD: Support a new option for
>>>>>> ignoring
>>>>>> the result of svc_register") added vs_rpcb_optnl, which is
>>>>>> already
>>>>>> set for nfsd4_version4. Is that not adequate?
>>>>>> 
>>>>> 
>>>>> The other issue is that under certain circumstances, you may
>>>>> also
>>>>> want
>>>>> to run NFSv3 without rpcbind support. For instance, when you
>>>>> have a
>>>>> knfsd server instance running as a data server, there is
>>>>> typically
>>>>> no
>>>>> need to run rpcbind.
>>>> 
>>>> So what you are saying is that you'd like this to be a run-time
>>>> setting
>>>> instead of a compile-time setting. Got it.
>>>> 
>>>> Note that this patch adds a non-container-aware administrative
>>>> API.
>>>> For
>>>> the same reasons I NAK'd 9/10, I'm going to NAK this one and ask
>>>> that
>>>> you provide a version that is container-aware (ideally: not a
>>>> module
>>>> parameter).
>>>> 
>>>> The new implementation should remove vs_rpcb_optnl, as a clean
>>>> up.
>>>> 
>>>> 
>>> 
>>> This is not something that turns off the registration with rpcbind.
>>> It
>>> is something that turns off the decision to abort knfsd if that
>>> registration fails.
>> 
>> See below, that's just what vs_rpcb_optnl does. It it's true,
>> then the result of the rpcbind registration for that service
>> is ignored.
>> 
>> 1039 int svc_generic_rpcbind_set(struct net *net,
>> 1040                             const struct svc_program *progp,
>> 1041                             u32 version, int family,
>> 1042                             unsigned short proto,
>> 1043                             unsigned short port)
>> 1044 {
>> 1045         const struct svc_version *vers = progp-
>>> pg_vers[version];
>> 1046         int error;
>> 1047 
>> 1048         if (vers == NULL)
>> 1049                 return 0;
>> 1050 
>> 1051         if (vers->vs_hidden) {
>> 1052                 trace_svc_noregister(progp->pg_name, version,
>> proto,
>> 1053                                      port, family, 0);
>> 1054                 return 0;
>> 1055         }
>> 1056 
>> 1057         /*
>> 1058          * Don't register a UDP port if we need congestion
>> 1059          * control.
>> 1060          */
>> 1061         if (vers->vs_need_cong_ctrl && proto == IPPROTO_UDP)
>> 1062                 return 0;
>> 1063 
>> 1064         error = svc_rpcbind_set_version(net, progp, version,
>> 1065                                         family, proto, port);
>> 1066 
>> 1067         return (vers->vs_rpcb_optnl) ? 0 : error;
>> 1068 }
>> 1069 EXPORT_SYMBOL_GPL(svc_generic_rpcbind_set);
>> 
>> And, it's already the case that vs_rpcb_optnl is true for our
>> NFSv4 server. So the issue is for NFSv3 only. It still looks
>> to me like the patch description for this patch, at the very
>> least, is not correct.
>> 
>> 
>>> That's not something that needs to be
>>> containerised: it's just common sense and really wants to be the
>>> default behaviour everywhere.
>> 
>> I'm not following this. I can imagine deployment cases where one
>> container might want to ensure rpcbind is running for NFSv3 while
>> another does not care. What am I missing?
>> 
> 
> Monitoring that rpcbind is working is not a kernel task. It's something
> that can and should be done in userspace. The kernel task should only
> be to try to register the ports that it is using to that rpcbind server
> if it is up and running.
> 
> In a containerised environment, then even the job of registering the
> ports is not necessarily desirable behaviour, since there will almost
> always be some additional form of NAT or address + port mapping going
> on that knfsd has no visibility into.

Today, the kernel fails nfsd start-up if it can't complete rpcbind
registration. If we want to change that, then the typical process for
this has been that the user space components to support it need to be
in place before we think of changing kernel behavior. Are they?

And, if knfsd is really not the place for checking the registration,
then the right answer is to remove the in-kernel check for everyone,
but only after a proper user space mechanism is in place.

I really don't want to have to carry a module parameter for this
transition in perpetuity. A better choice for managing the transition
would be a CONFIG option, which is relatively harmless to remove once
it is no longer needed.

I don't object to eventually removing the kernel registration check,
but I want to see the transition away from the current arrangement
done properly, please. At the very least we deserve to have some
dialog about the long-term direction.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 00/10] Assorted patches for knfsd
  2021-12-19  1:37 [PATCH v2 00/10] Assorted patches for knfsd trondmy
  2021-12-19  1:37 ` [PATCH v2 01/10] nfsd: map EBADF trondmy
@ 2021-12-21 18:10 ` Chuck Lever III
  1 sibling, 0 replies; 28+ messages in thread
From: Chuck Lever III @ 2021-12-21 18:10 UTC (permalink / raw)
  To: trondmy; +Cc: Bruce Fields, Linux NFS Mailing List

Hi-

> On Dec 18, 2021, at 8:37 PM, trondmy@kernel.org wrote:
> 
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> The following patchset is mainly for improving support for re-exporting
> NFSv4 as NFSv3. However it also includes one generic bugfix for NFSv3 to
> allow zero length writes. It also improves the writeback performance by
> replacing the rwsem with a lock-free errseq_t-based method.
> 
> 
> - v2: Split patch adding WCC support
>  v2: Rebase onto v5.16-rc5


I've replaced the following patch. Thank you for the bug report.

>  nfsd: NFSv3 should allow zero length writes

I've provisionally applied the following five to for-next while
tests are in progress:

>  nfsd: map EBADF
>  nfsd: Add errno mapping for EREMOTEIO
>  nfsd: Retry once in nfsd_open on an -EOPENSTALE return
>  nfsd: Add a tracepoint for errors in nfsd4_clone_file_range()
>  nfsd: Replace use of rwsem with errseq_t


The following two are still open for discussion. I get where you
want to go, just a quibble about exactly how to get there.

>  nfsd: Distinguish between required and optional NFSv3 post-op
>    attributes
>  nfs: Add export support for weak cache consistency attributes


The following two are deferred. IMHO they are not ready.

>  nfsd: allow lockd to be forcibly disabled
>  nfsd: Ignore rpcbind errors on nfsd startup

When reposting, you can leave out the patches that have already
been applied.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 07/10] nfsd: Add a tracepoint for errors in nfsd4_clone_file_range()
  2021-12-19  1:38             ` [PATCH v2 07/10] nfsd: Add a tracepoint for errors in nfsd4_clone_file_range() trondmy
  2021-12-19  1:38               ` [PATCH v2 08/10] nfsd: Replace use of rwsem with errseq_t trondmy
@ 2021-12-21 18:14               ` Chuck Lever III
  1 sibling, 0 replies; 28+ messages in thread
From: Chuck Lever III @ 2021-12-21 18:14 UTC (permalink / raw)
  To: trondmy; +Cc: Bruce Fields, Linux NFS Mailing List

Hi-

> On Dec 18, 2021, at 8:38 PM, trondmy@kernel.org wrote:
> 
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> Since a clone error commit can cause the boot verifier to change,
> we should trace those errors.

I've applied this one, but wanted to follow up.

I like the idea of tracing boot verifier reset events. I might
just code something up.


One more comment below.

> 
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
> fs/nfsd/nfs4proc.c |  2 +-
> fs/nfsd/trace.h    | 50 ++++++++++++++++++++++++++++++++++++++++++++++
> fs/nfsd/vfs.c      | 18 +++++++++++++++--
> fs/nfsd/vfs.h      |  3 ++-
> 4 files changed, 69 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
> index a36261f89bdf..35f33959a083 100644
> --- a/fs/nfsd/nfs4proc.c
> +++ b/fs/nfsd/nfs4proc.c
> @@ -1101,7 +1101,7 @@ nfsd4_clone(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> 	if (status)
> 		goto out;
> 
> -	status = nfsd4_clone_file_range(src, clone->cl_src_pos,
> +	status = nfsd4_clone_file_range(rqstp, src, clone->cl_src_pos,
> 			dst, clone->cl_dst_pos, clone->cl_count,
> 			EX_ISSYNC(cstate->current_fh.fh_export));
> 
> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> index f1e0d3c51bc2..001444d9829d 100644
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -413,6 +413,56 @@ TRACE_EVENT(nfsd_dirent,
> 	)
> )
> 
> +DECLARE_EVENT_CLASS(nfsd_copy_err_class,
> +	TP_PROTO(struct svc_rqst *rqstp,
> +		 struct svc_fh	*src_fhp,
> +		 loff_t		src_offset,
> +		 struct svc_fh	*dst_fhp,
> +		 loff_t		dst_offset,
> +		 u64		count,
> +		 int		status),
> +	TP_ARGS(rqstp, src_fhp, src_offset, dst_fhp, dst_offset, count, status),
> +	TP_STRUCT__entry(
> +		__field(u32, xid)
> +		__field(u32, src_fh_hash)
> +		__field(loff_t, src_offset)
> +		__field(u32, dst_fh_hash)
> +		__field(loff_t, dst_offset)
> +		__field(u64, count)
> +		__field(int, status)
> +	),
> +	TP_fast_assign(
> +		__entry->xid = be32_to_cpu(rqstp->rq_xid);
> +		__entry->src_fh_hash = knfsd_fh_hash(&src_fhp->fh_handle);
> +		__entry->src_offset = src_offset;
> +		__entry->dst_fh_hash = knfsd_fh_hash(&dst_fhp->fh_handle);
> +		__entry->dst_offset = dst_offset;
> +		__entry->count = count;
> +		__entry->status = status;
> +	),
> +	TP_printk("xid=0x%08x src_fh_hash=0x%08x src_offset=%lld "
> +			"dst_fh_hash=0x%08x dst_offset=%lld "
> +			"count=%llu status=%d",
> +		  __entry->xid, __entry->src_fh_hash, __entry->src_offset,
> +		  __entry->dst_fh_hash, __entry->dst_offset,
> +		  (unsigned long long)__entry->count,
> +		  __entry->status)
> +)
> +
> +#define DEFINE_NFSD_COPY_ERR_EVENT(name)		\
> +DEFINE_EVENT(nfsd_copy_err_class, nfsd_##name,		\
> +	TP_PROTO(struct svc_rqst	*rqstp,		\
> +		 struct svc_fh		*src_fhp,	\
> +		 loff_t			src_offset,	\
> +		 struct svc_fh		*dst_fhp,	\
> +		 loff_t			dst_offset,	\
> +		 u64			count,		\
> +		 int			status),	\
> +	TP_ARGS(rqstp, src_fhp, src_offset, dst_fhp, dst_offset, \
> +		count, status))
> +
> +DEFINE_NFSD_COPY_ERR_EVENT(clone_file_range_err);
> +
> #include "state.h"
> #include "filecache.h"
> #include "vfs.h"
> diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
> index 85e579aa6944..4d2964d08097 100644
> --- a/fs/nfsd/vfs.c
> +++ b/fs/nfsd/vfs.c
> @@ -40,6 +40,7 @@
> #include "../internal.h"
> #include "acl.h"
> #include "idmap.h"
> +#include "xdr4.h"
> #endif /* CONFIG_NFSD_V4 */
> 
> #include "nfsd.h"
> @@ -517,8 +518,15 @@ __be32 nfsd4_set_nfs4_label(struct svc_rqst *rqstp, struct svc_fh *fhp,
> }
> #endif
> 
> -__be32 nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos,
> -		struct nfsd_file *nf_dst, u64 dst_pos, u64 count, bool sync)
> +static struct nfsd4_compound_state *nfsd4_get_cstate(struct svc_rqst *rqstp)
> +{
> +	return &((struct nfsd4_compoundres *)rqstp->rq_resp)->cstate;
> +}
> +
> +__be32 nfsd4_clone_file_range(struct svc_rqst *rqstp,
> +		struct nfsd_file *nf_src, u64 src_pos,
> +		struct nfsd_file *nf_dst, u64 dst_pos,
> +		u64 count, bool sync)
> {
> 	struct file *src = nf_src->nf_file;
> 	struct file *dst = nf_dst->nf_file;
> @@ -542,6 +550,12 @@ __be32 nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos,
> 		if (!status)
> 			status = commit_inode_metadata(file_inode(src));
> 		if (status < 0) {
> +			trace_nfsd_clone_file_range_err(rqstp,
> +					&nfsd4_get_cstate(rqstp)->save_fh,
> +					src_pos,
> +					&nfsd4_get_cstate(rqstp)->current_fh,
> +					dst_pos,
> +					count, status);
> 			nfsd_reset_boot_verifier(net_generic(nf_dst->nf_net,
> 						 nfsd_net_id));
> 			ret = nfserrno(status);
> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> index 6edae1b9a96e..3dba6397d452 100644
> --- a/fs/nfsd/vfs.h
> +++ b/fs/nfsd/vfs.h
> @@ -57,7 +57,8 @@ __be32          nfsd4_set_nfs4_label(struct svc_rqst *, struct svc_fh *,
> 		    struct xdr_netobj *);
> __be32		nfsd4_vfs_fallocate(struct svc_rqst *, struct svc_fh *,
> 				    struct file *, loff_t, loff_t, int);
> -__be32		nfsd4_clone_file_range(struct nfsd_file *nf_src, u64 src_pos,
> +__be32		nfsd4_clone_file_range(struct svc_rqst *,
> +				       struct nfsd_file *nf_src, u64 src_pos,

checkpatch.pl complained about the missing "rqstp". I added it before
applying.


> 				       struct nfsd_file *nf_dst, u64 dst_pos,
> 				       u64 count, bool sync);
> #endif /* CONFIG_NFSD_V4 */
> -- 
> 2.33.1
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH v2 05/10] nfs: Add export support for weak cache consistency attributes
  2021-12-19  1:37         ` [PATCH v2 05/10] nfs: Add export support for weak cache consistency attributes trondmy
  2021-12-19  1:37           ` [PATCH v2 06/10] nfsd: NFSv3 should allow zero length writes trondmy
@ 2022-01-05 16:10           ` Daire Byrne
  1 sibling, 0 replies; 28+ messages in thread
From: Daire Byrne @ 2022-01-05 16:10 UTC (permalink / raw)
  To: trondmy; +Cc: Chuck Lever, J. Bruce Fields, linux-nfs

Hi,

I was interested in testing these patches with our re-export
workloads. However, it looks like this specific patch and the previous
one (nfsd: Distinguish between required and optional NFSv3 post-op
attributes) are breaking re-exports in such a way as to cause
applications to randomly crash out with sigbus errors.

At a guess, loading applications and memory mapping data files are
particularly good at triggering this. I can see no errors logged on
either the re-export server or the eventual client (e.g. stale
filehandles). We mostly re-export NFSv4.2 servers to NFSv3 clients but
we do also have a few NFSv3 servers re-exported as NFSv3  too
(Netapps).

This only happens with patches 4 + 5 and vanilla 5.16-rc6 does not
have any issues. I haven't had a chance to dig much deeper yet, but
thought I'd flag it anyway.

Daire

On Sun, 19 Dec 2021 at 01:44, <trondmy@kernel.org> wrote:
>
> From: Trond Myklebust <trond.myklebust@primarydata.com>
>
> Allow knfsd to request weak cache consistency attributes on files that
> have delegations and/or have up to date attribute caches by propagating
> the information to NFS that the attributes being requested are optional.

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-01-05 16:11 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-19  1:37 [PATCH v2 00/10] Assorted patches for knfsd trondmy
2021-12-19  1:37 ` [PATCH v2 01/10] nfsd: map EBADF trondmy
2021-12-19  1:37   ` [PATCH v2 02/10] nfsd: Add errno mapping for EREMOTEIO trondmy
2021-12-19  1:37     ` [PATCH v2 03/10] nfsd: Retry once in nfsd_open on an -EOPENSTALE return trondmy
2021-12-19  1:37       ` [PATCH v2 04/10] nfsd: Distinguish between required and optional NFSv3 post-op attributes trondmy
2021-12-19  1:37         ` [PATCH v2 05/10] nfs: Add export support for weak cache consistency attributes trondmy
2021-12-19  1:37           ` [PATCH v2 06/10] nfsd: NFSv3 should allow zero length writes trondmy
2021-12-19  1:38             ` [PATCH v2 07/10] nfsd: Add a tracepoint for errors in nfsd4_clone_file_range() trondmy
2021-12-19  1:38               ` [PATCH v2 08/10] nfsd: Replace use of rwsem with errseq_t trondmy
2021-12-19  1:38                 ` [PATCH v2 09/10] nfsd: allow lockd to be forcibly disabled trondmy
2021-12-19  1:38                   ` [PATCH v2 10/10] nfsd: Ignore rpcbind errors on nfsd startup trondmy
2021-12-19 18:15                     ` Chuck Lever III
2021-12-19 20:49                       ` Trond Myklebust
2021-12-20 15:51                         ` Chuck Lever III
2021-12-20 18:35                           ` Trond Myklebust
2021-12-20 19:02                             ` Chuck Lever III
2021-12-20 19:52                               ` Trond Myklebust
2021-12-20 20:12                                 ` Chuck Lever III
2021-12-19 18:34                   ` [PATCH v2 09/10] nfsd: allow lockd to be forcibly disabled Chuck Lever III
2021-12-21 18:14               ` [PATCH v2 07/10] nfsd: Add a tracepoint for errors in nfsd4_clone_file_range() Chuck Lever III
2021-12-19 20:11             ` [PATCH v2 06/10] nfsd: NFSv3 should allow zero length writes Chuck Lever III
2022-01-05 16:10           ` [PATCH v2 05/10] nfs: Add export support for weak cache consistency attributes Daire Byrne
2021-12-19 20:10         ` [PATCH v2 04/10] nfsd: Distinguish between required and optional NFSv3 post-op attributes Chuck Lever III
2021-12-19 21:09           ` Trond Myklebust
2021-12-20 16:02             ` Chuck Lever III
2021-12-20 18:38               ` Trond Myklebust
2021-12-20 19:22                 ` Chuck Lever III
2021-12-21 18:10 ` [PATCH v2 00/10] Assorted patches for knfsd Chuck Lever III

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.