linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/13] client side user xattr (RFC8276) support
@ 2020-03-11 19:56 Frank van der Linden
  2020-03-11 19:56 ` [PATCH 01/13] nfs,nfsd: NFSv4.2 extended attribute protocol definitions Frank van der Linden
                   ` (14 more replies)
  0 siblings, 15 replies; 33+ messages in thread
From: Frank van der Linden @ 2020-03-11 19:56 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, linux-nfs; +Cc: Frank van der Linden

This patchset implements the client side for NFS user extended attributes,
as defined in RFC8726.

This was originally posted as an RFC in:

https://patchwork.kernel.org/cover/11143565/

Patch 1 is shared with the server side patch, posted
separately.

Most comments in there still apply, except that:

1. Client side caching is now included in this patch set.
2. As per the discussion, user extended attributes are enabled if
   the client and server support them (e.g. they support 4.2 and
   advertise the user extended attribute FATTR). There are no longer
   options to switch them off on either the client or the server.
3. The code is no longer conditioned on a config option.
4. The number of patches has been reduced somewhat by merging
   smaller, related ones.

The client side caching is implemented through a per-inode hash table,
which is allocated on demand. See fs/nfs/nfs42xattr.c for details.

This has been tested as follows:

* Linux client and server:
	* Test all corner cases (XATTR_SIZE_*)
	* Test all failure cases (no xattr, setxattr with different or
	  invalid flags, etc).
	* Verify the content of xattrs across several operations.
	* Use KASAN and KMEMLEAK for a longer mix of testruns to verify
	  that there were no leaks (after unmounting the filesystem).
	* Stress tested caching, trying to run the client out of memory.

* Tested against the FreeBSD-current implementation as well, which works
  (after I fixed 2 bugs in that implementation, which I'm sending out to
  them too).

* Not tested: RDMA (I couldn't get a setup going).

Frank van der Linden (13):
  nfs,nfsd:  NFSv4.2 extended attribute protocol definitions
  nfs: add client side only definitions for user xattrs
  NFSv4.2: query the server for extended attribute support
  NFSv4.2: define limits and sizes for user xattr handling
  NFSv4.2: add client side XDR handling for extended attributes
  nfs: define nfs_access_get_cached function
  NFSv4.2: query the extended attribute access bits
  nfs: modify update_changeattr to deal with regular files
  nfs: define and use the NFS_INO_INVALID_XATTR flag
  nfs: make the buf_to_pages_noslab function available to the nfs code
  NFSv4.2: add the extended attribute proc functions.
  NFSv4.2: hook in the user extended attribute handlers
  NFSv4.2: add client side xattr caching.

 fs/nfs/Makefile             |    1 +
 fs/nfs/client.c             |   19 +-
 fs/nfs/dir.c                |   24 +-
 fs/nfs/inode.c              |   16 +-
 fs/nfs/internal.h           |   28 ++
 fs/nfs/nfs42.h              |   24 +
 fs/nfs/nfs42proc.c          |  248 ++++++++++
 fs/nfs/nfs42xattr.c         | 1083 +++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfs42xdr.c           |  442 ++++++++++++++++++
 fs/nfs/nfs4_fs.h            |    5 +
 fs/nfs/nfs4client.c         |   31 ++
 fs/nfs/nfs4proc.c           |  248 ++++++++--
 fs/nfs/nfs4super.c          |   10 +
 fs/nfs/nfs4xdr.c            |   29 ++
 fs/nfs/nfstrace.h           |    3 +-
 include/linux/nfs4.h        |   25 +
 include/linux/nfs_fs.h      |   12 +
 include/linux/nfs_fs_sb.h   |    6 +
 include/linux/nfs_xdr.h     |   60 ++-
 include/uapi/linux/nfs4.h   |    3 +
 include/uapi/linux/nfs_fs.h |    1 +
 21 files changed, 2276 insertions(+), 42 deletions(-)
 create mode 100644 fs/nfs/nfs42xattr.c

-- 
2.16.6


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/13] nfs,nfsd:  NFSv4.2 extended attribute protocol definitions
  2020-03-11 19:56 [PATCH 00/13] client side user xattr (RFC8276) support Frank van der Linden
@ 2020-03-11 19:56 ` Frank van der Linden
  2020-03-11 19:56 ` [PATCH 02/13] nfs: add client side only definitions for user xattrs Frank van der Linden
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Frank van der Linden @ 2020-03-11 19:56 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, linux-nfs; +Cc: Frank van der Linden

Add definitions for the new operations, errors and flags as defined
in RFC 8276 (File System Extended Attributes in NFSv4).

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
---
 include/linux/nfs4.h      | 20 ++++++++++++++++++++
 include/uapi/linux/nfs4.h |  3 +++
 2 files changed, 23 insertions(+)

diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 82d8fb422092..350aeda0c48c 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -150,6 +150,12 @@ enum nfs_opnum4 {
 	OP_WRITE_SAME = 70,
 	OP_CLONE = 71,
 
+	/* xattr support (RFC8726) */
+	OP_GETXATTR                = 72,
+	OP_SETXATTR                = 73,
+	OP_LISTXATTRS              = 74,
+	OP_REMOVEXATTR             = 75,
+
 	OP_ILLEGAL = 10044,
 };
 
@@ -280,6 +286,10 @@ enum nfsstat4 {
 	NFS4ERR_WRONG_LFS = 10092,
 	NFS4ERR_BADLABEL = 10093,
 	NFS4ERR_OFFLOAD_NO_REQS = 10094,
+
+	/* xattr (RFC8276) */
+	NFS4ERR_NOXATTR        = 10095,
+	NFS4ERR_XATTR2BIG      = 10096,
 };
 
 static inline bool seqid_mutating_err(u32 err)
@@ -452,6 +462,7 @@ enum change_attr_type4 {
 #define FATTR4_WORD2_CHANGE_ATTR_TYPE	(1UL << 15)
 #define FATTR4_WORD2_SECURITY_LABEL     (1UL << 16)
 #define FATTR4_WORD2_MODE_UMASK		(1UL << 17)
+#define FATTR4_WORD2_XATTR_SUPPORT	(1UL << 18)
 
 /* MDS threshold bitmap bits */
 #define THRESHOLD_RD                    (1UL << 0)
@@ -700,4 +711,13 @@ struct nl4_server {
 		struct nfs42_netaddr	nl4_addr; /* NL4_NETADDR */
 	} u;
 };
+
+/*
+ * Options for setxattr. These match the flags for setxattr(2).
+ */
+enum nfs4_setxattr_options {
+	SETXATTR4_EITHER	= 0,
+	SETXATTR4_CREATE	= 1,
+	SETXATTR4_REPLACE	= 2,
+};
 #endif
diff --git a/include/uapi/linux/nfs4.h b/include/uapi/linux/nfs4.h
index 8572930cf5b0..bf197e99b98f 100644
--- a/include/uapi/linux/nfs4.h
+++ b/include/uapi/linux/nfs4.h
@@ -33,6 +33,9 @@
 #define NFS4_ACCESS_EXTEND      0x0008
 #define NFS4_ACCESS_DELETE      0x0010
 #define NFS4_ACCESS_EXECUTE     0x0020
+#define NFS4_ACCESS_XAREAD      0x0040
+#define NFS4_ACCESS_XAWRITE     0x0080
+#define NFS4_ACCESS_XALIST      0x0100
 
 #define NFS4_FH_PERSISTENT		0x0000
 #define NFS4_FH_NOEXPIRE_WITH_OPEN	0x0001
-- 
2.16.6


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 02/13] nfs: add client side only definitions for user xattrs
  2020-03-11 19:56 [PATCH 00/13] client side user xattr (RFC8276) support Frank van der Linden
  2020-03-11 19:56 ` [PATCH 01/13] nfs,nfsd: NFSv4.2 extended attribute protocol definitions Frank van der Linden
@ 2020-03-11 19:56 ` Frank van der Linden
  2020-03-11 19:56 ` [PATCH 03/13] NFSv4.2: query the server for extended attribute support Frank van der Linden
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Frank van der Linden @ 2020-03-11 19:56 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, linux-nfs; +Cc: Frank van der Linden

Add client-side only definitions for user extended
attributes (RFC8276). These are the access bits
as used by the client code, and the CLNT procedure
number definition.

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
---
 include/linux/nfs4.h   | 5 +++++
 include/linux/nfs_fs.h | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/include/linux/nfs4.h b/include/linux/nfs4.h
index 350aeda0c48c..dd3871b05ae5 100644
--- a/include/linux/nfs4.h
+++ b/include/linux/nfs4.h
@@ -553,6 +553,11 @@ enum {
 	NFSPROC4_CLNT_LAYOUTERROR,
 
 	NFSPROC4_CLNT_COPY_NOTIFY,
+
+	NFSPROC4_CLNT_GETXATTR,
+	NFSPROC4_CLNT_SETXATTR,
+	NFSPROC4_CLNT_LISTXATTRS,
+	NFSPROC4_CLNT_REMOVEXATTR,
 };
 
 /* nfs41 types */
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 5d5b91e54f73..442458e94ab5 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -212,6 +212,9 @@ struct nfs4_copy_state {
 #define NFS_ACCESS_EXTEND      0x0008
 #define NFS_ACCESS_DELETE      0x0010
 #define NFS_ACCESS_EXECUTE     0x0020
+#define NFS_ACCESS_XAREAD      0x0040
+#define NFS_ACCESS_XAWRITE     0x0080
+#define NFS_ACCESS_XALIST      0x0100
 
 /*
  * Cache validity bit flags
-- 
2.16.6


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 03/13] NFSv4.2: query the server for extended attribute support
  2020-03-11 19:56 [PATCH 00/13] client side user xattr (RFC8276) support Frank van der Linden
  2020-03-11 19:56 ` [PATCH 01/13] nfs,nfsd: NFSv4.2 extended attribute protocol definitions Frank van der Linden
  2020-03-11 19:56 ` [PATCH 02/13] nfs: add client side only definitions for user xattrs Frank van der Linden
@ 2020-03-11 19:56 ` Frank van der Linden
  2020-03-12 16:15   ` Mkrtchyan, Tigran
  2020-03-11 19:56 ` [PATCH 04/13] NFSv4.2: define limits and sizes for user xattr handling Frank van der Linden
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 33+ messages in thread
From: Frank van der Linden @ 2020-03-11 19:56 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, linux-nfs; +Cc: Frank van der Linden

Query the server for extended attribute support, and record it
as the NFS_CAP_XATTR flag in the server capabilities.

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
---
 fs/nfs/nfs4proc.c         | 14 ++++++++++++--
 fs/nfs/nfs4xdr.c          | 23 +++++++++++++++++++++++
 include/linux/nfs_fs_sb.h |  1 +
 include/linux/nfs_xdr.h   |  1 +
 4 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 69b7ab7a5815..47bbd7db9d18 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -3742,6 +3742,7 @@ static void nfs4_close_context(struct nfs_open_context *ctx, int is_sync)
 static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *fhandle)
 {
 	u32 bitmask[3] = {}, minorversion = server->nfs_client->cl_minorversion;
+	u32 fattr4_word2_nfs42_mask;
 	struct nfs4_server_caps_arg args = {
 		.fhandle = fhandle,
 		.bitmask = bitmask,
@@ -3763,6 +3764,13 @@ static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *f
 	if (minorversion)
 		bitmask[2] = FATTR4_WORD2_SUPPATTR_EXCLCREAT;
 
+	fattr4_word2_nfs42_mask = FATTR4_WORD2_NFS42_MASK;
+
+	if (minorversion >= 2) {
+		bitmask[2] |= FATTR4_WORD2_XATTR_SUPPORT;
+		fattr4_word2_nfs42_mask |= FATTR4_WORD2_XATTR_SUPPORT;
+	}
+
 	status = nfs4_call_sync(server->client, server, &msg, &args.seq_args, &res.seq_res, 0);
 	if (status == 0) {
 		/* Sanity check the server answers */
@@ -3775,7 +3783,7 @@ static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *f
 			res.attr_bitmask[2] &= FATTR4_WORD2_NFS41_MASK;
 			break;
 		case 2:
-			res.attr_bitmask[2] &= FATTR4_WORD2_NFS42_MASK;
+			res.attr_bitmask[2] &= fattr4_word2_nfs42_mask;
 		}
 		memcpy(server->attr_bitmask, res.attr_bitmask, sizeof(server->attr_bitmask));
 		server->caps &= ~(NFS_CAP_ACLS|NFS_CAP_HARDLINKS|
@@ -3783,7 +3791,7 @@ static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *f
 				NFS_CAP_MODE|NFS_CAP_NLINK|NFS_CAP_OWNER|
 				NFS_CAP_OWNER_GROUP|NFS_CAP_ATIME|
 				NFS_CAP_CTIME|NFS_CAP_MTIME|
-				NFS_CAP_SECURITY_LABEL);
+				NFS_CAP_SECURITY_LABEL|NFS_CAP_XATTR);
 		if (res.attr_bitmask[0] & FATTR4_WORD0_ACL &&
 				res.acl_bitmask & ACL4_SUPPORT_ALLOW_ACL)
 			server->caps |= NFS_CAP_ACLS;
@@ -3811,6 +3819,8 @@ static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh *f
 		if (res.attr_bitmask[2] & FATTR4_WORD2_SECURITY_LABEL)
 			server->caps |= NFS_CAP_SECURITY_LABEL;
 #endif
+		if (res.has_xattr)
+			server->caps |= NFS_CAP_XATTR;
 		memcpy(server->attr_bitmask_nl, res.attr_bitmask,
 				sizeof(server->attr_bitmask));
 		server->attr_bitmask_nl[2] &= ~FATTR4_WORD2_SECURITY_LABEL;
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 47817ef0aadb..bebc087a1433 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -4201,6 +4201,26 @@ static int decode_attr_time_modify(struct xdr_stream *xdr, uint32_t *bitmap, str
 	return status;
 }
 
+static int decode_attr_xattrsupport(struct xdr_stream *xdr, uint32_t *bitmap,
+				    uint32_t *res)
+{
+	__be32 *p;
+
+	*res = 0;
+	if (unlikely(bitmap[2] & (FATTR4_WORD2_XATTR_SUPPORT - 1U)))
+		return -EIO;
+	if (likely(bitmap[2] & FATTR4_WORD2_XATTR_SUPPORT)) {
+		p = xdr_inline_decode(xdr, 4);
+		if (unlikely(!p))
+			return -EIO;
+		*res = be32_to_cpup(p);
+		bitmap[2] &= ~FATTR4_WORD2_XATTR_SUPPORT;
+	}
+	dprintk("%s: XATTR support=%s\n", __func__,
+		*res == 0 ? "false" : "true");
+	return 0;
+}
+
 static int verify_attr_len(struct xdr_stream *xdr, unsigned int savep, uint32_t attrlen)
 {
 	unsigned int attrwords = XDR_QUADLEN(attrlen);
@@ -4371,6 +4391,9 @@ static int decode_server_caps(struct xdr_stream *xdr, struct nfs4_server_caps_re
 	if ((status = decode_attr_exclcreat_supported(xdr, bitmap,
 				res->exclcreat_bitmask)) != 0)
 		goto xdr_error;
+	status = decode_attr_xattrsupport(xdr, bitmap, &res->has_xattr);
+	if (status != 0)
+		goto xdr_error;
 	status = verify_attr_len(xdr, savep, attrlen);
 xdr_error:
 	dprintk("%s: xdr returned %d!\n", __func__, -status);
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 465fa98258a3..d881f7a38bc9 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -281,5 +281,6 @@ struct nfs_server {
 #define NFS_CAP_OFFLOAD_CANCEL	(1U << 25)
 #define NFS_CAP_LAYOUTERROR	(1U << 26)
 #define NFS_CAP_COPY_NOTIFY	(1U << 27)
+#define NFS_CAP_XATTR		(1U << 28)
 
 #endif
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 94c77ed55ce1..5076fe42c693 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1178,6 +1178,7 @@ struct nfs4_server_caps_res {
 	u32				has_links;
 	u32				has_symlinks;
 	u32				fh_expire_type;
+	u32				has_xattr;
 };
 
 #define NFS4_PATHNAME_MAXCOMPONENTS 512
-- 
2.16.6


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 04/13] NFSv4.2: define limits and sizes for user xattr handling
  2020-03-11 19:56 [PATCH 00/13] client side user xattr (RFC8276) support Frank van der Linden
                   ` (2 preceding siblings ...)
  2020-03-11 19:56 ` [PATCH 03/13] NFSv4.2: query the server for extended attribute support Frank van der Linden
@ 2020-03-11 19:56 ` Frank van der Linden
  2020-03-12 20:35   ` Schumaker, Anna
  2020-03-11 19:56 ` [PATCH 05/13] NFSv4.2: add client side XDR handling for extended attributes Frank van der Linden
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 33+ messages in thread
From: Frank van der Linden @ 2020-03-11 19:56 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, linux-nfs; +Cc: Frank van der Linden

Set limits for extended attributes (attribute value size and listxattr
buffer size), based on the fs-independent limits (XATTR_*_MAX).

Define the maximum XDR sizes for the RFC 8276 XATTR operations.
In the case of operations that carry a larger payload (SETXATTR,
GETXATTR, LISTXATTR), these exclude that payload, which is added
as separate pages, like other operations do.

Define, much like for read and write operations, the maximum overhead
sizes for get/set/listxattr, and use them to limit the maximum payload
size for those operations, in combination with the channel attributes.

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
---
 fs/nfs/client.c           | 19 ++++++++++--
 fs/nfs/internal.h         |  5 ++++
 fs/nfs/nfs42.h            | 16 ++++++++++
 fs/nfs/nfs42xdr.c         | 74 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4client.c       | 31 ++++++++++++++++++++
 include/linux/nfs_fs_sb.h |  5 ++++
 6 files changed, 148 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 989c30c98511..eef39a4ec114 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -50,6 +50,7 @@
 #include "nfs.h"
 #include "netns.h"
 #include "sysfs.h"
+#include "nfs42.h"
 
 #define NFSDBG_FACILITY		NFSDBG_CLIENT
 
@@ -748,7 +749,7 @@ static int nfs_init_server(struct nfs_server *server,
 static void nfs_server_set_fsinfo(struct nfs_server *server,
 				  struct nfs_fsinfo *fsinfo)
 {
-	unsigned long max_rpc_payload;
+	unsigned long max_rpc_payload, raw_max_rpc_payload;
 
 	/* Work out a lot of parameters */
 	if (server->rsize == 0)
@@ -761,7 +762,9 @@ static void nfs_server_set_fsinfo(struct nfs_server *server,
 	if (fsinfo->wtmax >= 512 && server->wsize > fsinfo->wtmax)
 		server->wsize = nfs_block_size(fsinfo->wtmax, NULL);
 
-	max_rpc_payload = nfs_block_size(rpc_max_payload(server->client), NULL);
+	raw_max_rpc_payload = rpc_max_payload(server->client);
+	max_rpc_payload = nfs_block_size(raw_max_rpc_payload, NULL);
+
 	if (server->rsize > max_rpc_payload)
 		server->rsize = max_rpc_payload;
 	if (server->rsize > NFS_MAX_FILE_IO_SIZE)
@@ -794,6 +797,18 @@ static void nfs_server_set_fsinfo(struct nfs_server *server,
 	server->clone_blksize = fsinfo->clone_blksize;
 	/* We're airborne Set socket buffersize */
 	rpc_setbufsize(server->client, server->wsize + 100, server->rsize + 100);
+
+#ifdef CONFIG_NFS_V4_2
+	/*
+	 * Defaults until limited by the session parameters.
+	 */
+	server->gxasize = min_t(unsigned int, raw_max_rpc_payload,
+				XATTR_SIZE_MAX);
+	server->sxasize = min_t(unsigned int, raw_max_rpc_payload,
+				XATTR_SIZE_MAX);
+	server->lxasize = min_t(unsigned int, raw_max_rpc_payload,
+				nfs42_listxattr_xdrsize(XATTR_LIST_MAX));
+#endif
 }
 
 /*
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index f80c47d5ff27..68f235a571e1 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -306,6 +306,11 @@ extern const u32 nfs41_maxread_overhead;
 extern const u32 nfs41_maxwrite_overhead;
 extern const u32 nfs41_maxgetdevinfo_overhead;
 #endif
+#ifdef CONFIG_NFS_V4_2
+extern const u32 nfs42_maxsetxattr_overhead;
+extern const u32 nfs42_maxgetxattr_overhead;
+extern const u32 nfs42_maxlistxattrs_overhead;
+#endif
 
 /* nfs4proc.c */
 #if IS_ENABLED(CONFIG_NFS_V4)
diff --git a/fs/nfs/nfs42.h b/fs/nfs/nfs42.h
index c891af949886..51de8ddc7d88 100644
--- a/fs/nfs/nfs42.h
+++ b/fs/nfs/nfs42.h
@@ -6,6 +6,8 @@
 #ifndef __LINUX_FS_NFS_NFS4_2_H
 #define __LINUX_FS_NFS_NFS4_2_H
 
+#include <linux/xattr.h>
+
 /*
  * FIXME:  four LAYOUTSTATS calls per compound at most! Do we need to support
  * more? Need to consider not to pre-alloc too much for a compound.
@@ -36,5 +38,19 @@ static inline bool nfs42_files_from_same_server(struct file *in,
 	return nfs4_check_serverowner_major_id(c_in->cl_serverowner,
 					       c_out->cl_serverowner);
 }
+
+/*
+ * Maximum XDR buffer size needed for a listxattr buffer of buflen size.
+ *
+ * The upper boundary is a buffer with all 1-byte sized attribute names.
+ * They would be 7 bytes long in the eventual buffer ("user.x\0"), and
+ * 8 bytes long XDR-encoded.
+ *
+ * Include the trailing eof word as well.
+ */
+static inline u32 nfs42_listxattr_xdrsize(u32 buflen)
+{
+	return ((buflen / (XATTR_USER_PREFIX_LEN + 2)) * 8) + 4;
+}
 #endif /* CONFIG_NFS_V4_2 */
 #endif /* __LINUX_FS_NFS_NFS4_2_H */
diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index c03f3246d6c5..6712daa9d85b 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -169,6 +169,80 @@
 					 decode_clone_maxsz + \
 					 decode_getattr_maxsz)
 
+#ifdef CONFIG_NFS_V4_2
+/* Not limited by NFS itself, limited by the generic xattr code */
+#define nfs4_xattr_name_maxsz   XDR_QUADLEN(XATTR_NAME_MAX)
+
+#define encode_getxattr_maxsz   (op_encode_hdr_maxsz + 1 + \
+				 nfs4_xattr_name_maxsz)
+#define decode_getxattr_maxsz   (op_decode_hdr_maxsz + 1 + 1)
+#define encode_setxattr_maxsz   (op_encode_hdr_maxsz + \
+				 1 + nfs4_xattr_name_maxsz + 1)
+#define decode_setxattr_maxsz   (op_decode_hdr_maxsz + decode_change_info_maxsz)
+#define encode_listxattrs_maxsz  (op_encode_hdr_maxsz + 2 + 1)
+#define decode_listxattrs_maxsz  (op_decode_hdr_maxsz + 2 + 1 + 1)
+#define encode_removexattr_maxsz (op_encode_hdr_maxsz + 1 + \
+				  nfs4_xattr_name_maxsz)
+#define decode_removexattr_maxsz (op_decode_hdr_maxsz + \
+				  decode_change_info_maxsz)
+
+#define NFS4_enc_getxattr_sz	(compound_encode_hdr_maxsz + \
+				encode_sequence_maxsz + \
+				encode_putfh_maxsz + \
+				encode_getxattr_maxsz)
+#define NFS4_dec_getxattr_sz	(compound_decode_hdr_maxsz + \
+				decode_sequence_maxsz + \
+				decode_putfh_maxsz + \
+				decode_getxattr_maxsz)
+#define NFS4_enc_setxattr_sz	(compound_encode_hdr_maxsz + \
+				encode_sequence_maxsz + \
+				encode_putfh_maxsz + \
+				encode_setxattr_maxsz)
+#define NFS4_dec_setxattr_sz	(compound_decode_hdr_maxsz + \
+				decode_sequence_maxsz + \
+				decode_putfh_maxsz + \
+				decode_setxattr_maxsz)
+#define NFS4_enc_listxattrs_sz	(compound_encode_hdr_maxsz + \
+				encode_sequence_maxsz + \
+				encode_putfh_maxsz + \
+				encode_listxattrs_maxsz)
+#define NFS4_dec_listxattrs_sz	(compound_decode_hdr_maxsz + \
+				decode_sequence_maxsz + \
+				decode_putfh_maxsz + \
+				decode_listxattrs_maxsz)
+#define NFS4_enc_removexattr_sz	(compound_encode_hdr_maxsz + \
+				encode_sequence_maxsz + \
+				encode_putfh_maxsz + \
+				encode_removexattr_maxsz)
+#define NFS4_dec_removexattr_sz	(compound_decode_hdr_maxsz + \
+				decode_sequence_maxsz + \
+				decode_putfh_maxsz + \
+				decode_removexattr_maxsz)
+
+/*
+ * These values specify the maximum amount of data that is not
+ * associated with the extended attribute name or extended
+ * attribute list in the SETXATTR, GETXATTR and LISTXATTR
+ * respectively.
+ */
+const u32 nfs42_maxsetxattr_overhead = ((RPC_MAX_HEADER_WITH_AUTH +
+					compound_encode_hdr_maxsz +
+					encode_sequence_maxsz +
+					encode_putfh_maxsz + 1 +
+					nfs4_xattr_name_maxsz)
+					* XDR_UNIT);
+
+const u32 nfs42_maxgetxattr_overhead = ((RPC_MAX_HEADER_WITH_AUTH +
+					compound_decode_hdr_maxsz +
+					decode_sequence_maxsz +
+					decode_putfh_maxsz + 1) * XDR_UNIT);
+
+const u32 nfs42_maxlistxattrs_overhead = ((RPC_MAX_HEADER_WITH_AUTH +
+					compound_decode_hdr_maxsz +
+					decode_sequence_maxsz +
+					decode_putfh_maxsz + 3) * XDR_UNIT);
+#endif
+
 static void encode_fallocate(struct xdr_stream *xdr,
 			     const struct nfs42_falloc_args *args)
 {
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 0cd767e5c977..8b320ef0e8a3 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -993,6 +993,36 @@ static void nfs4_session_limit_rwsize(struct nfs_server *server)
 #endif /* CONFIG_NFS_V4_1 */
 }
 
+/*
+ * Limit xattr sizes using the channel attributes.
+ */
+static void nfs4_session_limit_xasize(struct nfs_server *server)
+{
+#ifdef CONFIG_NFS_V4_2
+	struct nfs4_session *sess;
+	u32 server_gxa_sz;
+	u32 server_sxa_sz;
+	u32 server_lxa_sz;
+
+	if (!nfs4_has_session(server->nfs_client))
+		return;
+
+	sess = server->nfs_client->cl_session;
+
+	server_gxa_sz = sess->fc_attrs.max_resp_sz - nfs42_maxgetxattr_overhead;
+	server_sxa_sz = sess->fc_attrs.max_rqst_sz - nfs42_maxsetxattr_overhead;
+	server_lxa_sz = sess->fc_attrs.max_resp_sz -
+	    nfs42_maxlistxattrs_overhead;
+
+	if (server->gxasize > server_gxa_sz)
+		server->gxasize = server_gxa_sz;
+	if (server->sxasize > server_sxa_sz)
+		server->sxasize = server_sxa_sz;
+	if (server->lxasize > server_lxa_sz)
+		server->lxasize = server_lxa_sz;
+#endif
+}
+
 static int nfs4_server_common_setup(struct nfs_server *server,
 		struct nfs_fh *mntfh, bool auth_probe)
 {
@@ -1040,6 +1070,7 @@ static int nfs4_server_common_setup(struct nfs_server *server,
 		goto out;
 
 	nfs4_session_limit_rwsize(server);
+	nfs4_session_limit_xasize(server);
 
 	if (server->namelen == 0 || server->namelen > NFS4_MAXNAMLEN)
 		server->namelen = NFS4_MAXNAMLEN;
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index d881f7a38bc9..7eae72a8762e 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -163,6 +163,11 @@ struct nfs_server {
 	unsigned int		dtsize;		/* readdir size */
 	unsigned short		port;		/* "port=" setting */
 	unsigned int		bsize;		/* server block size */
+#ifdef CONFIG_NFS_V4_2
+	unsigned int		gxasize;	/* getxattr size */
+	unsigned int		sxasize;	/* setxattr size */
+	unsigned int		lxasize;	/* listxattr size */
+#endif
 	unsigned int		acregmin;	/* attr cache timeouts */
 	unsigned int		acregmax;
 	unsigned int		acdirmin;
-- 
2.16.6


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 05/13] NFSv4.2: add client side XDR handling for extended attributes
  2020-03-11 19:56 [PATCH 00/13] client side user xattr (RFC8276) support Frank van der Linden
                   ` (3 preceding siblings ...)
  2020-03-11 19:56 ` [PATCH 04/13] NFSv4.2: define limits and sizes for user xattr handling Frank van der Linden
@ 2020-03-11 19:56 ` Frank van der Linden
  2020-03-12 20:49   ` Schumaker, Anna
  2020-03-11 19:56 ` [PATCH 06/13] nfs: define nfs_access_get_cached function Frank van der Linden
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 33+ messages in thread
From: Frank van der Linden @ 2020-03-11 19:56 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, linux-nfs; +Cc: Frank van der Linden

Define the argument and response structures that will be used for
RFC 8276 extended attribute RPC calls, and implement the necessary
functions to encode/decode the extended attribute operations.

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
---
 fs/nfs/nfs42xdr.c       | 368 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4xdr.c        |   6 +
 include/linux/nfs_xdr.h |  59 +++++++-
 3 files changed, 432 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
index 6712daa9d85b..ac45a1027523 100644
--- a/fs/nfs/nfs42xdr.c
+++ b/fs/nfs/nfs42xdr.c
@@ -407,6 +407,212 @@ static void encode_layouterror(struct xdr_stream *xdr,
 	encode_device_error(xdr, &args->errors[0]);
 }
 
+#ifdef CONFIG_NFS_V4_2
+static void encode_setxattr(struct xdr_stream *xdr,
+			    const struct nfs42_setxattrargs *arg,
+			    struct compound_hdr *hdr)
+{
+	__be32 *p;
+
+	BUILD_BUG_ON(XATTR_CREATE != SETXATTR4_CREATE);
+	BUILD_BUG_ON(XATTR_REPLACE != SETXATTR4_REPLACE);
+
+	encode_op_hdr(xdr, OP_SETXATTR, decode_setxattr_maxsz, hdr);
+	p = reserve_space(xdr, 4);
+	*p = cpu_to_be32(arg->xattr_flags);
+	encode_string(xdr, strlen(arg->xattr_name), arg->xattr_name);
+	p = reserve_space(xdr, 4);
+	*p = cpu_to_be32(arg->xattr_len);
+	if (arg->xattr_len)
+		xdr_write_pages(xdr, arg->xattr_pages, 0, arg->xattr_len);
+}
+
+static int decode_setxattr(struct xdr_stream *xdr,
+			   struct nfs4_change_info *cinfo)
+{
+	int status;
+
+	status = decode_op_hdr(xdr, OP_SETXATTR);
+	if (status)
+		goto out;
+	status = decode_change_info(xdr, cinfo);
+out:
+	return status;
+}
+
+
+static void encode_getxattr(struct xdr_stream *xdr, const char *name,
+			    struct compound_hdr *hdr)
+{
+	encode_op_hdr(xdr, OP_GETXATTR, decode_getxattr_maxsz, hdr);
+	encode_string(xdr, strlen(name), name);
+}
+
+static int decode_getxattr(struct xdr_stream *xdr,
+			   struct nfs42_getxattrres *res,
+			   struct rpc_rqst *req)
+{
+	int status;
+	__be32 *p;
+	u32 len, rdlen;
+
+	status = decode_op_hdr(xdr, OP_GETXATTR);
+	if (status)
+		return status;
+
+	p = xdr_inline_decode(xdr, 4);
+	if (unlikely(!p))
+		return -EIO;
+
+	len = be32_to_cpup(p);
+	if (len > req->rq_rcv_buf.page_len)
+		return -ERANGE;
+
+	res->xattr_len = len;
+
+	if (len > 0) {
+		rdlen = xdr_read_pages(xdr, len);
+		if (rdlen < len)
+			return -EIO;
+	}
+
+	return 0;
+}
+
+static void encode_removexattr(struct xdr_stream *xdr, const char *name,
+			       struct compound_hdr *hdr)
+{
+	encode_op_hdr(xdr, OP_REMOVEXATTR, decode_removexattr_maxsz, hdr);
+	encode_string(xdr, strlen(name), name);
+}
+
+
+static int decode_removexattr(struct xdr_stream *xdr,
+			   struct nfs4_change_info *cinfo)
+{
+	int status;
+
+	status = decode_op_hdr(xdr, OP_REMOVEXATTR);
+	if (status)
+		goto out;
+
+	status = decode_change_info(xdr, cinfo);
+out:
+	return status;
+}
+
+static void encode_listxattrs(struct xdr_stream *xdr,
+			     const struct nfs42_listxattrsargs *arg,
+			     struct compound_hdr *hdr)
+{
+	__be32 *p;
+
+	encode_op_hdr(xdr, OP_LISTXATTRS, decode_listxattrs_maxsz + 1, hdr);
+
+	p = reserve_space(xdr, 12);
+	if (unlikely(!p))
+		return;
+
+	p = xdr_encode_hyper(p, arg->cookie);
+	/*
+	 * RFC 8276 says to specify the full max length of the LISTXATTRS
+	 * XDR reply. Count is set to the XDR length of the names array
+	 * plus the EOF marker. So, add the cookie and the names count.
+	 */
+	*p = cpu_to_be32(arg->count + 8 + 4);
+}
+
+static int decode_listxattrs(struct xdr_stream *xdr,
+			    struct nfs42_listxattrsres *res)
+{
+	int status;
+	__be32 *p;
+	u32 count, len, ulen;
+	size_t left, copied;
+	char *buf;
+
+	status = decode_op_hdr(xdr, OP_LISTXATTRS);
+	if (status) {
+		/*
+		 * Special case: for LISTXATTRS, NFS4ERR_TOOSMALL
+		 * should be translated to ERANGE.
+		 */
+		if (status == -ETOOSMALL)
+			status = -ERANGE;
+		goto out;
+	}
+
+	p = xdr_inline_decode(xdr, 8);
+	if (unlikely(!p))
+		return -EIO;
+
+	xdr_decode_hyper(p, &res->cookie);
+
+	p = xdr_inline_decode(xdr, 4);
+	if (unlikely(!p))
+		return -EIO;
+
+	left = res->xattr_len;
+	buf = res->xattr_buf;
+
+	count = be32_to_cpup(p);
+	copied = 0;
+
+	/*
+	 * We have asked for enough room to encode the maximum number
+	 * of possible attribute names, so everything should fit.
+	 *
+	 * But, don't rely on that assumption. Just decode entries
+	 * until they don't fit anymore, just in case the server did
+	 * something odd.
+	 */
+	while (count--) {
+		p = xdr_inline_decode(xdr, 4);
+		if (unlikely(!p))
+			return -EIO;
+
+		len = be32_to_cpup(p);
+		if (len > (XATTR_NAME_MAX - XATTR_USER_PREFIX_LEN)) {
+			status = -ERANGE;
+			goto out;
+		}
+
+		p = xdr_inline_decode(xdr, len);
+		if (unlikely(!p))
+			return -EIO;
+
+		ulen = len + XATTR_USER_PREFIX_LEN + 1;
+		if (buf) {
+			if (ulen > left) {
+				status = -ERANGE;
+				goto out;
+			}
+
+			memcpy(buf, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN);
+			memcpy(buf + XATTR_USER_PREFIX_LEN, p, len);
+
+			buf[ulen - 1] = 0;
+			buf += ulen;
+			left -= ulen;
+		}
+		copied += ulen;
+	}
+
+	p = xdr_inline_decode(xdr, 4);
+	if (unlikely(!p))
+		return -EIO;
+
+	res->eof = be32_to_cpup(p);
+	res->copied = copied;
+
+out:
+	if (status == -ERANGE && res->xattr_len == XATTR_LIST_MAX)
+		status = -E2BIG;
+
+	return status;
+}
+#endif
+
 /*
  * Encode ALLOCATE request
  */
@@ -1062,4 +1268,166 @@ static int nfs4_xdr_dec_layouterror(struct rpc_rqst *rqstp,
 	return status;
 }
 
+#ifdef CONFIG_NFS_V4_2
+static void nfs4_xdr_enc_setxattr(struct rpc_rqst *req, struct xdr_stream *xdr,
+				  const void *data)
+{
+	const struct nfs42_setxattrargs *args = data;
+	struct compound_hdr hdr = {
+		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
+	};
+
+	encode_compound_hdr(xdr, req, &hdr);
+	encode_sequence(xdr, &args->seq_args, &hdr);
+	encode_putfh(xdr, args->fh, &hdr);
+	encode_setxattr(xdr, args, &hdr);
+	encode_nops(&hdr);
+}
+
+static int nfs4_xdr_dec_setxattr(struct rpc_rqst *req, struct xdr_stream *xdr,
+				 void *data)
+{
+	struct nfs42_setxattrres *res = data;
+	struct compound_hdr hdr;
+	int status;
+
+	status = decode_compound_hdr(xdr, &hdr);
+	if (status)
+		goto out;
+	status = decode_sequence(xdr, &res->seq_res, req);
+	if (status)
+		goto out;
+	status = decode_putfh(xdr);
+	if (status)
+		goto out;
+
+	status = decode_setxattr(xdr, &res->cinfo);
+out:
+	return status;
+}
+
+static void nfs4_xdr_enc_getxattr(struct rpc_rqst *req, struct xdr_stream *xdr,
+				  const void *data)
+{
+	const struct nfs42_getxattrargs *args = data;
+	struct compound_hdr hdr = {
+		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
+	};
+	size_t plen;
+
+	encode_compound_hdr(xdr, req, &hdr);
+	encode_sequence(xdr, &args->seq_args, &hdr);
+	encode_putfh(xdr, args->fh, &hdr);
+	encode_getxattr(xdr, args->xattr_name, &hdr);
+
+	plen = args->xattr_len ? args->xattr_len : XATTR_SIZE_MAX;
+
+	rpc_prepare_reply_pages(req, args->xattr_pages, 0, plen,
+	    hdr.replen);
+	req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
+
+	encode_nops(&hdr);
+}
+
+static int nfs4_xdr_dec_getxattr(struct rpc_rqst *rqstp,
+				 struct xdr_stream *xdr, void *data)
+{
+	struct nfs42_getxattrres *res = data;
+	struct compound_hdr hdr;
+	int status;
+
+	status = decode_compound_hdr(xdr, &hdr);
+	if (status)
+		goto out;
+	status = decode_sequence(xdr, &res->seq_res, rqstp);
+	if (status)
+		goto out;
+	status = decode_putfh(xdr);
+	if (status)
+		goto out;
+	status = decode_getxattr(xdr, res, rqstp);
+out:
+	return status;
+}
+
+static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
+				    struct xdr_stream *xdr, const void *data)
+{
+	const struct nfs42_listxattrsargs *args = data;
+	struct compound_hdr hdr = {
+		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
+	};
+
+	encode_compound_hdr(xdr, req, &hdr);
+	encode_sequence(xdr, &args->seq_args, &hdr);
+	encode_putfh(xdr, args->fh, &hdr);
+	encode_listxattrs(xdr, args, &hdr);
+
+	rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
+	    hdr.replen);
+	req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
+
+	encode_nops(&hdr);
+}
+
+static int nfs4_xdr_dec_listxattrs(struct rpc_rqst *rqstp,
+				   struct xdr_stream *xdr, void *data)
+{
+	struct nfs42_listxattrsres *res = data;
+	struct compound_hdr hdr;
+	int status;
+
+	xdr_set_scratch_buffer(xdr, page_address(res->scratch), PAGE_SIZE);
+
+	status = decode_compound_hdr(xdr, &hdr);
+	if (status)
+		goto out;
+	status = decode_sequence(xdr, &res->seq_res, rqstp);
+	if (status)
+		goto out;
+	status = decode_putfh(xdr);
+	if (status)
+		goto out;
+	status = decode_listxattrs(xdr, res);
+out:
+	return status;
+}
+
+static void nfs4_xdr_enc_removexattr(struct rpc_rqst *req,
+				     struct xdr_stream *xdr, const void *data)
+{
+	const struct nfs42_removexattrargs *args = data;
+	struct compound_hdr hdr = {
+		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
+	};
+
+	encode_compound_hdr(xdr, req, &hdr);
+	encode_sequence(xdr, &args->seq_args, &hdr);
+	encode_putfh(xdr, args->fh, &hdr);
+	encode_removexattr(xdr, args->xattr_name, &hdr);
+	encode_nops(&hdr);
+}
+
+static int nfs4_xdr_dec_removexattr(struct rpc_rqst *req,
+				    struct xdr_stream *xdr, void *data)
+{
+	struct nfs42_removexattrres *res = data;
+	struct compound_hdr hdr;
+	int status;
+
+	status = decode_compound_hdr(xdr, &hdr);
+	if (status)
+		goto out;
+	status = decode_sequence(xdr, &res->seq_res, req);
+	if (status)
+		goto out;
+	status = decode_putfh(xdr);
+	if (status)
+		goto out;
+
+	status = decode_removexattr(xdr, &res->cinfo);
+out:
+	return status;
+}
+#endif
 #endif /* __LINUX_FS_NFS_NFS4_2XDR_H */
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index bebc087a1433..9e1ced791789 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -7479,6 +7479,8 @@ static struct {
 	{ NFS4ERR_SYMLINK,	-ELOOP		},
 	{ NFS4ERR_OP_ILLEGAL,	-EOPNOTSUPP	},
 	{ NFS4ERR_DEADLOCK,	-EDEADLK	},
+	{ NFS4ERR_NOXATTR,	-ENODATA	},
+	{ NFS4ERR_XATTR2BIG,	-E2BIG		},
 	{ -1,			-EIO		}
 };
 
@@ -7607,6 +7609,10 @@ const struct rpc_procinfo nfs4_procedures[] = {
 	PROC42(COPY_NOTIFY,	enc_copy_notify,	dec_copy_notify),
 	PROC(LOOKUPP,		enc_lookupp,		dec_lookupp),
 	PROC42(LAYOUTERROR,	enc_layouterror,	dec_layouterror),
+	PROC42(GETXATTR,	enc_getxattr,		dec_getxattr),
+	PROC42(SETXATTR,	enc_setxattr,		dec_setxattr),
+	PROC42(LISTXATTRS,	enc_listxattrs,		dec_listxattrs),
+	PROC42(REMOVEXATTR,	enc_removexattr,	dec_removexattr),
 };
 
 static unsigned int nfs_version4_counts[ARRAY_SIZE(nfs4_procedures)];
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 5076fe42c693..685deed805e8 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1497,7 +1497,64 @@ struct nfs42_seek_res {
 	u32	sr_eof;
 	u64	sr_offset;
 };
-#endif
+
+struct nfs42_setxattrargs {
+	struct nfs4_sequence_args	seq_args;
+	struct nfs_fh			*fh;
+	const char			*xattr_name;
+	u32				xattr_flags;
+	size_t				xattr_len;
+	struct page			**xattr_pages;
+};
+
+struct nfs42_setxattrres {
+	struct nfs4_sequence_res	seq_res;
+	struct nfs4_change_info		cinfo;
+};
+
+struct nfs42_getxattrargs {
+	struct nfs4_sequence_args	seq_args;
+	struct nfs_fh			*fh;
+	const char			*xattr_name;
+	size_t				xattr_len;
+	struct page			**xattr_pages;
+};
+
+struct nfs42_getxattrres {
+	struct nfs4_sequence_res	seq_res;
+	size_t				xattr_len;
+};
+
+struct nfs42_listxattrsargs {
+	struct nfs4_sequence_args	seq_args;
+	struct nfs_fh			*fh;
+	u32				count;
+	u64				cookie;
+	struct page			**xattr_pages;
+};
+
+struct nfs42_listxattrsres {
+	struct nfs4_sequence_res	seq_res;
+	struct page			*scratch;
+	void				*xattr_buf;
+	size_t				xattr_len;
+	u64				cookie;
+	bool				eof;
+	size_t				copied;
+};
+
+struct nfs42_removexattrargs {
+	struct nfs4_sequence_args	seq_args;
+	struct nfs_fh			*fh;
+	const char			*xattr_name;
+};
+
+struct nfs42_removexattrres {
+	struct nfs4_sequence_res	seq_res;
+	struct nfs4_change_info		cinfo;
+};
+
+#endif /* CONFIG_NFS_V4_2 */
 
 struct nfs_page;
 
-- 
2.16.6


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 06/13] nfs: define nfs_access_get_cached function
  2020-03-11 19:56 [PATCH 00/13] client side user xattr (RFC8276) support Frank van der Linden
                   ` (4 preceding siblings ...)
  2020-03-11 19:56 ` [PATCH 05/13] NFSv4.2: add client side XDR handling for extended attributes Frank van der Linden
@ 2020-03-11 19:56 ` Frank van der Linden
  2020-03-11 19:56 ` [PATCH 07/13] NFSv4.2: query the extended attribute access bits Frank van der Linden
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Frank van der Linden @ 2020-03-11 19:56 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, linux-nfs; +Cc: Frank van der Linden

The only consumer of nfs_access_get_cached_rcu and nfs_access_cached
calls these static functions in order to first try RCU access, and
then locked access.

Combine them in to a single function, and call that. Make this function
available to the rest of the NFS code.

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
---
 fs/nfs/dir.c           | 20 ++++++++++++++++----
 include/linux/nfs_fs.h |  2 ++
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 193d6fb363b7..a0b564781e3e 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2433,7 +2433,7 @@ static struct nfs_access_entry *nfs_access_search_rbtree(struct inode *inode, co
 	return NULL;
 }
 
-static int nfs_access_get_cached(struct inode *inode, const struct cred *cred, struct nfs_access_entry *res, bool may_block)
+static int nfs_access_get_cached_locked(struct inode *inode, const struct cred *cred, struct nfs_access_entry *res, bool may_block)
 {
 	struct nfs_inode *nfsi = NFS_I(inode);
 	struct nfs_access_entry *cache;
@@ -2506,6 +2506,20 @@ static int nfs_access_get_cached_rcu(struct inode *inode, const struct cred *cre
 	return err;
 }
 
+int nfs_access_get_cached(struct inode *inode, const struct cred *cred, struct
+nfs_access_entry *res, bool may_block)
+{
+	int status;
+
+	status = nfs_access_get_cached_rcu(inode, cred, res);
+	if (status != 0)
+		status = nfs_access_get_cached_locked(inode, cred, res,
+		    may_block);
+
+	return status;
+}
+EXPORT_SYMBOL_GPL(nfs_access_get_cached);
+
 static void nfs_access_add_rbtree(struct inode *inode, struct nfs_access_entry *set)
 {
 	struct nfs_inode *nfsi = NFS_I(inode);
@@ -2620,9 +2634,7 @@ static int nfs_do_access(struct inode *inode, const struct cred *cred, int mask)
 
 	trace_nfs_access_enter(inode);
 
-	status = nfs_access_get_cached_rcu(inode, cred, &cache);
-	if (status != 0)
-		status = nfs_access_get_cached(inode, cred, &cache, may_block);
+	status = nfs_access_get_cached(inode, cred, &cache, may_block);
 	if (status == 0)
 		goto out_cached;
 
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 442458e94ab5..e86e7a747092 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -491,6 +491,8 @@ extern int nfs_instantiate(struct dentry *dentry, struct nfs_fh *fh,
 			struct nfs_fattr *fattr, struct nfs4_label *label);
 extern int nfs_may_open(struct inode *inode, const struct cred *cred, int openflags);
 extern void nfs_access_zap_cache(struct inode *inode);
+extern int nfs_access_get_cached(struct inode *inode, const struct cred *cred, struct nfs_access_entry *res,
+				 bool may_block);
 
 /*
  * linux/fs/nfs/symlink.c
-- 
2.16.6


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 07/13] NFSv4.2: query the extended attribute access bits
  2020-03-11 19:56 [PATCH 00/13] client side user xattr (RFC8276) support Frank van der Linden
                   ` (5 preceding siblings ...)
  2020-03-11 19:56 ` [PATCH 06/13] nfs: define nfs_access_get_cached function Frank van der Linden
@ 2020-03-11 19:56 ` Frank van der Linden
  2020-03-11 19:56 ` [PATCH 08/13] nfs: modify update_changeattr to deal with regular files Frank van der Linden
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Frank van der Linden @ 2020-03-11 19:56 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, linux-nfs; +Cc: Frank van der Linden

RFC 8276 defines separate ACCESS bits for extended attribute checking.
Query them in nfs_do_access and opendata.

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
---
 fs/nfs/dir.c      | 4 ++++
 fs/nfs/nfs4proc.c | 6 ++++++
 2 files changed, 10 insertions(+)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index a0b564781e3e..bd2606539299 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2646,6 +2646,10 @@ static int nfs_do_access(struct inode *inode, const struct cred *cred, int mask)
 	 * Determine which access bits we want to ask for...
 	 */
 	cache.mask = NFS_ACCESS_READ | NFS_ACCESS_MODIFY | NFS_ACCESS_EXTEND;
+	if (nfs_server_capable(inode, NFS_CAP_XATTR)) {
+		cache.mask |= NFS_ACCESS_XAREAD | NFS_ACCESS_XAWRITE |
+		    NFS_ACCESS_XALIST;
+	}
 	if (S_ISDIR(inode->i_mode))
 		cache.mask |= NFS_ACCESS_DELETE | NFS_ACCESS_LOOKUP;
 	else
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 47bbd7db9d18..da72237c6933 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1340,6 +1340,12 @@ static struct nfs4_opendata *nfs4_opendata_alloc(struct dentry *dentry,
 				NFS4_ACCESS_MODIFY |
 				NFS4_ACCESS_EXTEND |
 				NFS4_ACCESS_EXECUTE;
+#ifdef CONFIG_NFS_V4_2
+			if (server->caps & NFS_CAP_XATTR)
+				p->o_arg.access |= NFS4_ACCESS_XAREAD |
+				    NFS4_ACCESS_XAWRITE |
+				    NFS4_ACCESS_XALIST;
+#endif
 		}
 	}
 	p->o_arg.clientid = server->nfs_client->cl_clientid;
-- 
2.16.6


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 08/13] nfs: modify update_changeattr to deal with regular files
  2020-03-11 19:56 [PATCH 00/13] client side user xattr (RFC8276) support Frank van der Linden
                   ` (6 preceding siblings ...)
  2020-03-11 19:56 ` [PATCH 07/13] NFSv4.2: query the extended attribute access bits Frank van der Linden
@ 2020-03-11 19:56 ` Frank van der Linden
  2020-03-11 19:56 ` [PATCH 09/13] nfs: define and use the NFS_INO_INVALID_XATTR flag Frank van der Linden
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Frank van der Linden @ 2020-03-11 19:56 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, linux-nfs; +Cc: Frank van der Linden

Until now, change attributes in change_info form were only returned by
directory operations. However, they are also used for the RFC 8276
extended attribute operations, which work on both directories
and regular files.  Modify update_changeattr to deal:

* Rename it to nfs4_update_changeattr and make it non-static.
* Don't always use INO_INVALID_DATA, this isn't needed for a
  directory that only had its extended attributes changed by us.
* Existing callers now always pass in INO_INVALID_DATA.

For the current callers of this function, behavior is unchanged.

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
---
 fs/nfs/nfs4_fs.h  |  5 ++++
 fs/nfs/nfs4proc.c | 70 ++++++++++++++++++++++++++++++++++---------------------
 2 files changed, 49 insertions(+), 26 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 8be1ba7c62bb..42c9f237dc61 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -322,6 +322,11 @@ extern int update_open_stateid(struct nfs4_state *state,
 
 extern int nfs4_proc_get_lease_time(struct nfs_client *clp,
 		struct nfs_fsinfo *fsinfo);
+extern void nfs4_update_changeattr(struct inode *dir,
+				   struct nfs4_change_info *cinfo,
+				   unsigned long timestamp,
+				   unsigned long cache_validity);
+
 #if defined(CONFIG_NFS_V4_1)
 extern int nfs41_sequence_done(struct rpc_task *, struct nfs4_sequence_res *);
 extern int nfs4_proc_create_session(struct nfs_client *, const struct cred *);
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index da72237c6933..31d65bbf4a23 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1157,37 +1157,48 @@ nfs4_dec_nlink_locked(struct inode *inode)
 }
 
 static void
-update_changeattr_locked(struct inode *dir, struct nfs4_change_info *cinfo,
+nfs4_update_changeattr_locked(struct inode *inode,
+		struct nfs4_change_info *cinfo,
 		unsigned long timestamp, unsigned long cache_validity)
 {
-	struct nfs_inode *nfsi = NFS_I(dir);
+	struct nfs_inode *nfsi = NFS_I(inode);
 
 	nfsi->cache_validity |= NFS_INO_INVALID_CTIME
 		| NFS_INO_INVALID_MTIME
-		| NFS_INO_INVALID_DATA
 		| cache_validity;
-	if (cinfo->atomic && cinfo->before == inode_peek_iversion_raw(dir)) {
+
+	if (cinfo->atomic && cinfo->before == inode_peek_iversion_raw(inode)) {
 		nfsi->cache_validity &= ~NFS_INO_REVAL_PAGECACHE;
 		nfsi->attrtimeo_timestamp = jiffies;
 	} else {
-		nfs_force_lookup_revalidate(dir);
-		if (cinfo->before != inode_peek_iversion_raw(dir))
+		if (S_ISDIR(inode->i_mode)) {
+			nfsi->cache_validity |= NFS_INO_INVALID_DATA;
+			nfs_force_lookup_revalidate(inode);
+		} else {
+			if (!NFS_PROTO(inode)->have_delegation(inode,
+							       FMODE_READ))
+				nfsi->cache_validity |= NFS_INO_REVAL_PAGECACHE;
+		}
+
+		if (cinfo->before != inode_peek_iversion_raw(inode))
 			nfsi->cache_validity |= NFS_INO_INVALID_ACCESS |
-				NFS_INO_INVALID_ACL;
+						NFS_INO_INVALID_ACL;
 	}
-	inode_set_iversion_raw(dir, cinfo->after);
+	inode_set_iversion_raw(inode, cinfo->after);
 	nfsi->read_cache_jiffies = timestamp;
 	nfsi->attr_gencount = nfs_inc_attr_generation_counter();
 	nfsi->cache_validity &= ~NFS_INO_INVALID_CHANGE;
-	nfs_fscache_invalidate(dir);
+
+	if (nfsi->cache_validity & NFS_INO_INVALID_DATA)
+		nfs_fscache_invalidate(inode);
 }
 
-static void
-update_changeattr(struct inode *dir, struct nfs4_change_info *cinfo,
+void
+nfs4_update_changeattr(struct inode *dir, struct nfs4_change_info *cinfo,
 		unsigned long timestamp, unsigned long cache_validity)
 {
 	spin_lock(&dir->i_lock);
-	update_changeattr_locked(dir, cinfo, timestamp, cache_validity);
+	nfs4_update_changeattr_locked(dir, cinfo, timestamp, cache_validity);
 	spin_unlock(&dir->i_lock);
 }
 
@@ -2643,8 +2654,9 @@ static int _nfs4_proc_open(struct nfs4_opendata *data,
 			data->file_created = true;
 		if (data->file_created ||
 		    inode_peek_iversion_raw(dir) != o_res->cinfo.after)
-			update_changeattr(dir, &o_res->cinfo,
-					o_res->f_attr->time_start, 0);
+			nfs4_update_changeattr(dir, &o_res->cinfo,
+					o_res->f_attr->time_start,
+					NFS_INO_INVALID_DATA);
 	}
 	if ((o_res->rflags & NFS4_OPEN_RESULT_LOCKTYPE_POSIX) == 0)
 		server->caps &= ~NFS_CAP_POSIX_LOCK;
@@ -4543,7 +4555,8 @@ _nfs4_proc_remove(struct inode *dir, const struct qstr *name, u32 ftype)
 	status = nfs4_call_sync(server->client, server, &msg, &args.seq_args, &res.seq_res, 1);
 	if (status == 0) {
 		spin_lock(&dir->i_lock);
-		update_changeattr_locked(dir, &res.cinfo, timestamp, 0);
+		nfs4_update_changeattr_locked(dir, &res.cinfo, timestamp,
+					      NFS_INO_INVALID_DATA);
 		/* Removing a directory decrements nlink in the parent */
 		if (ftype == NF4DIR && dir->i_nlink > 2)
 			nfs4_dec_nlink_locked(dir);
@@ -4627,8 +4640,9 @@ static int nfs4_proc_unlink_done(struct rpc_task *task, struct inode *dir)
 				    &data->timeout) == -EAGAIN)
 		return 0;
 	if (task->tk_status == 0)
-		update_changeattr(dir, &res->cinfo,
-				res->dir_attr->time_start, 0);
+		nfs4_update_changeattr(dir, &res->cinfo,
+				res->dir_attr->time_start,
+				NFS_INO_INVALID_DATA);
 	return 1;
 }
 
@@ -4672,16 +4686,18 @@ static int nfs4_proc_rename_done(struct rpc_task *task, struct inode *old_dir,
 	if (task->tk_status == 0) {
 		if (new_dir != old_dir) {
 			/* Note: If we moved a directory, nlink will change */
-			update_changeattr(old_dir, &res->old_cinfo,
+			nfs4_update_changeattr(old_dir, &res->old_cinfo,
 					res->old_fattr->time_start,
-					NFS_INO_INVALID_OTHER);
-			update_changeattr(new_dir, &res->new_cinfo,
+					NFS_INO_INVALID_OTHER |
+					    NFS_INO_INVALID_DATA);
+			nfs4_update_changeattr(new_dir, &res->new_cinfo,
 					res->new_fattr->time_start,
-					NFS_INO_INVALID_OTHER);
+					NFS_INO_INVALID_OTHER |
+					    NFS_INO_INVALID_DATA);
 		} else
-			update_changeattr(old_dir, &res->old_cinfo,
+			nfs4_update_changeattr(old_dir, &res->old_cinfo,
 					res->old_fattr->time_start,
-					0);
+					NFS_INO_INVALID_DATA);
 	}
 	return 1;
 }
@@ -4722,7 +4738,8 @@ static int _nfs4_proc_link(struct inode *inode, struct inode *dir, const struct
 
 	status = nfs4_call_sync(server->client, server, &msg, &arg.seq_args, &res.seq_res, 1);
 	if (!status) {
-		update_changeattr(dir, &res.cinfo, res.fattr->time_start, 0);
+		nfs4_update_changeattr(dir, &res.cinfo, res.fattr->time_start,
+				       NFS_INO_INVALID_DATA);
 		status = nfs_post_op_update_inode(inode, res.fattr);
 		if (!status)
 			nfs_setsecurity(inode, res.fattr, res.label);
@@ -4800,8 +4817,9 @@ static int nfs4_do_create(struct inode *dir, struct dentry *dentry, struct nfs4_
 				    &data->arg.seq_args, &data->res.seq_res, 1);
 	if (status == 0) {
 		spin_lock(&dir->i_lock);
-		update_changeattr_locked(dir, &data->res.dir_cinfo,
-				data->res.fattr->time_start, 0);
+		nfs4_update_changeattr_locked(dir, &data->res.dir_cinfo,
+				data->res.fattr->time_start,
+				NFS_INO_INVALID_DATA);
 		/* Creating a directory bumps nlink in the parent */
 		if (data->arg.ftype == NF4DIR)
 			nfs4_inc_nlink_locked(dir);
-- 
2.16.6


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 09/13] nfs: define and use the NFS_INO_INVALID_XATTR flag
  2020-03-11 19:56 [PATCH 00/13] client side user xattr (RFC8276) support Frank van der Linden
                   ` (7 preceding siblings ...)
  2020-03-11 19:56 ` [PATCH 08/13] nfs: modify update_changeattr to deal with regular files Frank van der Linden
@ 2020-03-11 19:56 ` Frank van der Linden
       [not found]   ` <20200324060215.GD11705@shao2-debian>
  2020-03-11 19:56 ` [PATCH 10/13] nfs: make the buf_to_pages_noslab function available to the nfs code Frank van der Linden
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 33+ messages in thread
From: Frank van der Linden @ 2020-03-11 19:56 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, linux-nfs; +Cc: Frank van der Linden

Define the NFS_INO_INVALID_XATTR flag, to be used for the NFSv4.2 xattr
cache, and use it where appropriate.

No functional change as yet.

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
---
 fs/nfs/inode.c         | 7 ++++++-
 fs/nfs/nfs4proc.c      | 3 ++-
 fs/nfs/nfstrace.h      | 3 ++-
 include/linux/nfs_fs.h | 1 +
 4 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 11bf15800ac9..d2be152796ef 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -205,7 +205,8 @@ static void nfs_set_cache_invalid(struct inode *inode, unsigned long flags)
 			flags &= ~NFS_INO_INVALID_OTHER;
 		flags &= ~(NFS_INO_INVALID_CHANGE
 				| NFS_INO_INVALID_SIZE
-				| NFS_INO_REVAL_PAGECACHE);
+				| NFS_INO_REVAL_PAGECACHE
+				| NFS_INO_INVALID_XATTR);
 	}
 
 	if (inode->i_mapping->nrpages == 0)
@@ -535,6 +536,8 @@ nfs_fhget(struct super_block *sb, struct nfs_fh *fh, struct nfs_fattr *fattr, st
 			inode->i_gid = fattr->gid;
 		else if (nfs_server_capable(inode, NFS_CAP_OWNER_GROUP))
 			nfs_set_cache_invalid(inode, NFS_INO_INVALID_OTHER);
+		if (nfs_server_capable(inode, NFS_CAP_XATTR))
+			nfs_set_cache_invalid(inode, NFS_INO_INVALID_XATTR);
 		if (fattr->valid & NFS_ATTR_FATTR_BLOCKS_USED)
 			inode->i_blocks = fattr->du.nfs2.blocks;
 		if (fattr->valid & NFS_ATTR_FATTR_SPACE_USED) {
@@ -1365,6 +1368,8 @@ static void nfs_wcc_update_inode(struct inode *inode, struct nfs_fattr *fattr)
 		inode_set_iversion_raw(inode, fattr->change_attr);
 		if (S_ISDIR(inode->i_mode))
 			nfs_set_cache_invalid(inode, NFS_INO_INVALID_DATA);
+		else if (nfs_server_capable(inode, NFS_CAP_XATTR))
+			nfs_set_cache_invalid(inode, NFS_INO_INVALID_XATTR);
 	}
 	/* If we have atomic WCC data, we may update some attributes */
 	ts = inode->i_ctime;
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 31d65bbf4a23..725ae64f62f7 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1182,7 +1182,8 @@ nfs4_update_changeattr_locked(struct inode *inode,
 
 		if (cinfo->before != inode_peek_iversion_raw(inode))
 			nfsi->cache_validity |= NFS_INO_INVALID_ACCESS |
-						NFS_INO_INVALID_ACL;
+						NFS_INO_INVALID_ACL |
+						NFS_INO_INVALID_XATTR;
 	}
 	inode_set_iversion_raw(inode, cinfo->after);
 	nfsi->read_cache_jiffies = timestamp;
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index a9588d19a5ae..b941ad1199e5 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -59,7 +59,8 @@ TRACE_DEFINE_ENUM(NFS_INO_INVALID_OTHER);
 			{ NFS_INO_INVALID_CTIME, "INVALID_CTIME" }, \
 			{ NFS_INO_INVALID_MTIME, "INVALID_MTIME" }, \
 			{ NFS_INO_INVALID_SIZE, "INVALID_SIZE" }, \
-			{ NFS_INO_INVALID_OTHER, "INVALID_OTHER" })
+			{ NFS_INO_INVALID_OTHER, "INVALID_OTHER" }, \
+			{ NFS_INO_INVALID_XATTR, "INVALID_XATTR" })
 
 TRACE_DEFINE_ENUM(NFS_INO_ADVISE_RDPLUS);
 TRACE_DEFINE_ENUM(NFS_INO_STALE);
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index e86e7a747092..1fcfef670a4a 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -233,6 +233,7 @@ struct nfs4_copy_state {
 #define NFS_INO_INVALID_OTHER	BIT(12)		/* other attrs are invalid */
 #define NFS_INO_DATA_INVAL_DEFER	\
 				BIT(13)		/* Deferred cache invalidation */
+#define NFS_INO_INVALID_XATTR	BIT(14)		/* xattrs are invalid */
 
 #define NFS_INO_INVALID_ATTR	(NFS_INO_INVALID_CHANGE \
 		| NFS_INO_INVALID_CTIME \
-- 
2.16.6


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 10/13] nfs: make the buf_to_pages_noslab function available to the nfs code
  2020-03-11 19:56 [PATCH 00/13] client side user xattr (RFC8276) support Frank van der Linden
                   ` (8 preceding siblings ...)
  2020-03-11 19:56 ` [PATCH 09/13] nfs: define and use the NFS_INO_INVALID_XATTR flag Frank van der Linden
@ 2020-03-11 19:56 ` Frank van der Linden
  2020-03-12 20:36   ` Schumaker, Anna
  2020-03-11 19:56 ` [PATCH 11/13] NFSv4.2: add the extended attribute proc functions Frank van der Linden
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 33+ messages in thread
From: Frank van der Linden @ 2020-03-11 19:56 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, linux-nfs; +Cc: Frank van der Linden

Make the buf_to_pages_noslab function available to the rest of the NFS
code. Rename it to nfs4_buf_to_pages_noslab to be consistent.

This will be used later in the NFSv4.2 xattr code.

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
---
 fs/nfs/internal.h | 3 +++
 fs/nfs/nfs4proc.c | 4 ++--
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 68f235a571e1..1e3a7e119c93 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -317,6 +317,9 @@ extern const u32 nfs42_maxlistxattrs_overhead;
 extern const struct rpc_procinfo nfs4_procedures[];
 #endif
 
+extern int nfs4_buf_to_pages_noslab(const void *buf, size_t buflen,
+				    struct page **pages);
+
 #ifdef CONFIG_NFS_V4_SECURITY_LABEL
 extern struct nfs4_label *nfs4_label_alloc(struct nfs_server *server, gfp_t flags);
 static inline struct nfs4_label *
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 725ae64f62f7..aee3a1c97def 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5553,7 +5553,7 @@ static inline int nfs4_server_supports_acls(struct nfs_server *server)
  */
 #define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE)
 
-static int buf_to_pages_noslab(const void *buf, size_t buflen,
+int nfs4_buf_to_pages_noslab(const void *buf, size_t buflen,
 		struct page **pages)
 {
 	struct page *newpage, **spages;
@@ -5795,7 +5795,7 @@ static int __nfs4_proc_set_acl(struct inode *inode, const void *buf, size_t bufl
 		return -EOPNOTSUPP;
 	if (npages > ARRAY_SIZE(pages))
 		return -ERANGE;
-	i = buf_to_pages_noslab(buf, buflen, arg.acl_pages);
+	i = nfs4_buf_to_pages_noslab(buf, buflen, arg.acl_pages);
 	if (i < 0)
 		return i;
 	nfs4_inode_make_writeable(inode);
-- 
2.16.6


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 11/13] NFSv4.2: add the extended attribute proc functions.
  2020-03-11 19:56 [PATCH 00/13] client side user xattr (RFC8276) support Frank van der Linden
                   ` (9 preceding siblings ...)
  2020-03-11 19:56 ` [PATCH 10/13] nfs: make the buf_to_pages_noslab function available to the nfs code Frank van der Linden
@ 2020-03-11 19:56 ` Frank van der Linden
  2020-03-11 19:56 ` [PATCH 12/13] NFSv4.2: hook in the user extended attribute handlers Frank van der Linden
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Frank van der Linden @ 2020-03-11 19:56 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, linux-nfs; +Cc: Frank van der Linden

Implement the extended attribute procedures for NFSv4.2 extended
attribute support (RFC 8276).

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
---
 fs/nfs/nfs42.h     |   8 ++
 fs/nfs/nfs42proc.c | 236 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 244 insertions(+)

diff --git a/fs/nfs/nfs42.h b/fs/nfs/nfs42.h
index 51de8ddc7d88..0fe5aacbcfdf 100644
--- a/fs/nfs/nfs42.h
+++ b/fs/nfs/nfs42.h
@@ -39,6 +39,14 @@ static inline bool nfs42_files_from_same_server(struct file *in,
 					       c_out->cl_serverowner);
 }
 
+ssize_t nfs42_proc_getxattr(struct inode *inode, const char *name,
+			    void *buf, size_t buflen);
+int nfs42_proc_setxattr(struct inode *inode, const char *name,
+			const void *buf, size_t buflen, int flags);
+ssize_t nfs42_proc_listxattrs(struct inode *inode, void *buf,
+			       size_t buflen, u64 *cookiep, bool *eofp);
+int nfs42_proc_removexattr(struct inode *inode, const char *name);
+
 /*
  * Maximum XDR buffer size needed for a listxattr buffer of buflen size.
  *
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index e2ae54b35dfe..8c2e52bc986a 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -1088,3 +1088,239 @@ int nfs42_proc_clone(struct file *src_f, struct file *dst_f,
 	nfs_put_lock_context(src_lock);
 	return err;
 }
+
+#define NFS4XATTR_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE)
+
+static int _nfs42_proc_removexattr(struct inode *inode, const char *name)
+{
+	struct nfs_server *server = NFS_SERVER(inode);
+	struct nfs42_removexattrargs args = {
+		.fh = NFS_FH(inode),
+		.xattr_name = name,
+	};
+	struct nfs42_removexattrres res;
+	struct rpc_message msg = {
+		.rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_REMOVEXATTR],
+		.rpc_argp = &args,
+		.rpc_resp = &res,
+	};
+	int ret;
+	unsigned long timestamp = jiffies;
+
+	ret = nfs4_call_sync(server->client, server, &msg, &args.seq_args,
+	    &res.seq_res, 1);
+	if (!ret)
+		nfs4_update_changeattr(inode, &res.cinfo, timestamp, 0);
+
+	return ret;
+}
+
+static int _nfs42_proc_setxattr(struct inode *inode, const char *name,
+				const void *buf, size_t buflen, int flags)
+{
+	struct nfs_server *server = NFS_SERVER(inode);
+	struct page *pages[NFS4XATTR_MAXPAGES];
+	struct nfs42_setxattrargs arg = {
+		.fh		= NFS_FH(inode),
+		.xattr_pages	= pages,
+		.xattr_len	= buflen,
+		.xattr_name	= name,
+		.xattr_flags	= flags,
+	};
+	struct nfs42_setxattrres res;
+	struct rpc_message msg = {
+		.rpc_proc	= &nfs4_procedures[NFSPROC4_CLNT_SETXATTR],
+		.rpc_argp	= &arg,
+		.rpc_resp	= &res,
+	};
+	int ret, np;
+	unsigned long timestamp = jiffies;
+
+	if (buflen > server->sxasize)
+		return -ERANGE;
+
+	if (buflen > 0) {
+		np = nfs4_buf_to_pages_noslab(buf, buflen, arg.xattr_pages);
+		if (np < 0)
+			return np;
+	} else
+		np = 0;
+
+	ret = nfs4_call_sync(server->client, server, &msg, &arg.seq_args,
+	    &res.seq_res, 1);
+
+	for (; np > 0; np--)
+		put_page(pages[np - 1]);
+
+	if (!ret)
+		nfs4_update_changeattr(inode, &res.cinfo, timestamp, 0);
+
+	return ret;
+}
+
+static ssize_t _nfs42_proc_getxattr(struct inode *inode, const char *name,
+				void *buf, size_t buflen)
+{
+	struct nfs_server *server = NFS_SERVER(inode);
+	struct page *pages[NFS4XATTR_MAXPAGES] = {};
+	struct nfs42_getxattrargs arg = {
+		.fh		= NFS_FH(inode),
+		.xattr_pages	= pages,
+		.xattr_len	= buflen,
+		.xattr_name	= name,
+	};
+	struct nfs42_getxattrres res;
+	struct rpc_message msg = {
+		.rpc_proc	= &nfs4_procedures[NFSPROC4_CLNT_GETXATTR],
+		.rpc_argp	= &arg,
+		.rpc_resp	= &res,
+	};
+	int ret, np;
+
+	ret = nfs4_call_sync(server->client, server, &msg, &arg.seq_args,
+	    &res.seq_res, 0);
+	if (ret < 0)
+		return ret;
+
+	if (buflen) {
+		if (res.xattr_len > buflen)
+			return -ERANGE;
+		_copy_from_pages(buf, pages, 0, res.xattr_len);
+	}
+
+	np = DIV_ROUND_UP(res.xattr_len, PAGE_SIZE);
+	while (--np >= 0)
+		__free_page(pages[np]);
+
+	return res.xattr_len;
+}
+
+static ssize_t _nfs42_proc_listxattrs(struct inode *inode, void *buf,
+				 size_t buflen, u64 *cookiep, bool *eofp)
+{
+	struct nfs_server *server = NFS_SERVER(inode);
+	struct page **pages;
+	struct nfs42_listxattrsargs arg = {
+		.fh		= NFS_FH(inode),
+		.cookie		= *cookiep,
+	};
+	struct nfs42_listxattrsres res = {
+		.eof = false,
+		.xattr_buf = buf,
+		.xattr_len = buflen,
+	};
+	struct rpc_message msg = {
+		.rpc_proc	= &nfs4_procedures[NFSPROC4_CLNT_LISTXATTRS],
+		.rpc_argp	= &arg,
+		.rpc_resp	= &res,
+	};
+	u32 xdrlen;
+	int ret, np;
+
+
+	res.scratch = alloc_page(GFP_KERNEL);
+	if (!res.scratch)
+		return -ENOMEM;
+
+	xdrlen = nfs42_listxattr_xdrsize(buflen);
+	if (xdrlen > server->lxasize)
+		xdrlen = server->lxasize;
+	np = xdrlen / PAGE_SIZE + 1;
+
+	pages = kcalloc(np, sizeof(struct page *), GFP_KERNEL);
+	if (pages == NULL) {
+		__free_page(res.scratch);
+		return -ENOMEM;
+	}
+
+	arg.xattr_pages = pages;
+	arg.count = xdrlen;
+
+	ret = nfs4_call_sync(server->client, server, &msg, &arg.seq_args,
+	    &res.seq_res, 0);
+
+	if (ret >= 0) {
+		ret = res.copied;
+		*cookiep = res.cookie;
+		*eofp = res.eof;
+	}
+
+	while (--np >= 0) {
+		if (pages[np])
+			__free_page(pages[np]);
+	}
+
+	__free_page(res.scratch);
+	kfree(pages);
+
+	return ret;
+
+}
+
+ssize_t nfs42_proc_getxattr(struct inode *inode, const char *name,
+			      void *buf, size_t buflen)
+{
+	struct nfs4_exception exception = { };
+	ssize_t err;
+
+	do {
+		err = _nfs42_proc_getxattr(inode, name, buf, buflen);
+		if (err >= 0)
+			break;
+		err = nfs4_handle_exception(NFS_SERVER(inode), err,
+				&exception);
+	} while (exception.retry);
+
+	return err;
+}
+
+int nfs42_proc_setxattr(struct inode *inode, const char *name,
+			      const void *buf, size_t buflen, int flags)
+{
+	struct nfs4_exception exception = { };
+	int err;
+
+	do {
+		err = _nfs42_proc_setxattr(inode, name, buf, buflen, flags);
+		if (!err)
+			break;
+		err = nfs4_handle_exception(NFS_SERVER(inode), err,
+				&exception);
+	} while (exception.retry);
+
+	return err;
+}
+
+ssize_t nfs42_proc_listxattrs(struct inode *inode, void *buf,
+			      size_t buflen, u64 *cookiep, bool *eofp)
+{
+	struct nfs4_exception exception = { };
+	ssize_t err;
+
+	do {
+		err = _nfs42_proc_listxattrs(inode, buf, buflen,
+		    cookiep, eofp);
+		if (err >= 0)
+			break;
+		err = nfs4_handle_exception(NFS_SERVER(inode), err,
+				&exception);
+	} while (exception.retry);
+
+	return err;
+}
+
+int nfs42_proc_removexattr(struct inode *inode, const char *name)
+{
+	struct nfs4_exception exception = { };
+	int err;
+
+	do {
+		err = _nfs42_proc_removexattr(inode, name);
+		if (!err)
+			break;
+		err = nfs4_handle_exception(NFS_SERVER(inode), err,
+				&exception);
+	} while (exception.retry);
+
+	return err;
+}
-- 
2.16.6


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 12/13] NFSv4.2: hook in the user extended attribute handlers
  2020-03-11 19:56 [PATCH 00/13] client side user xattr (RFC8276) support Frank van der Linden
                   ` (10 preceding siblings ...)
  2020-03-11 19:56 ` [PATCH 11/13] NFSv4.2: add the extended attribute proc functions Frank van der Linden
@ 2020-03-11 19:56 ` Frank van der Linden
  2020-03-11 19:56 ` [PATCH 13/13] NFSv4.2: add client side xattr caching Frank van der Linden
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 33+ messages in thread
From: Frank van der Linden @ 2020-03-11 19:56 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, linux-nfs; +Cc: Frank van der Linden

Now that all the lower level code is there to make the RPC calls, hook
it in to the xattr handlers and the listxattr entry point, to make them
available.

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
---
 fs/nfs/nfs4proc.c | 123 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 121 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index aee3a1c97def..6df94857f5bb 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -66,6 +66,7 @@
 #include "nfs4idmap.h"
 #include "nfs4session.h"
 #include "fscache.h"
+#include "nfs42.h"
 
 #include "nfs4trace.h"
 
@@ -7451,6 +7452,103 @@ nfs4_listxattr_nfs4_label(struct inode *inode, char *list, size_t list_len)
 
 #endif
 
+#ifdef CONFIG_NFS_V4_2
+static int nfs4_xattr_set_nfs4_user(const struct xattr_handler *handler,
+				    struct dentry *unused, struct inode *inode,
+				    const char *key, const void *buf,
+				    size_t buflen, int flags)
+{
+	struct nfs_access_entry cache;
+
+	if (!nfs_server_capable(inode, NFS_CAP_XATTR))
+		return -EOPNOTSUPP;
+
+	/*
+	 * There is no mapping from the MAY_* flags to the NFS_ACCESS_XA*
+	 * flags right now. Handling of xattr operations use the normal
+	 * file read/write permissions.
+	 *
+	 * Just in case the server has other ideas (which RFC 8276 allows),
+	 * do a cached access check for the XA* flags to possibly avoid
+	 * doing an RPC and getting EACCES back.
+	 */
+	if (!nfs_access_get_cached(inode, current_cred(), &cache, true)) {
+		if (!(cache.mask & NFS_ACCESS_XAWRITE))
+			return -EACCES;
+	}
+
+	if (buf == NULL)
+		return nfs42_proc_removexattr(inode, key);
+	else
+		return nfs42_proc_setxattr(inode, key, buf, buflen, flags);
+}
+
+static int nfs4_xattr_get_nfs4_user(const struct xattr_handler *handler,
+				    struct dentry *unused, struct inode *inode,
+				    const char *key, void *buf, size_t buflen)
+{
+	struct nfs_access_entry cache;
+
+	if (!nfs_server_capable(inode, NFS_CAP_XATTR))
+		return -EOPNOTSUPP;
+
+	if (!nfs_access_get_cached(inode, current_cred(), &cache, true)) {
+		if (!(cache.mask & NFS_ACCESS_XAREAD))
+			return -EACCES;
+	}
+
+	return nfs42_proc_getxattr(inode, key, buf, buflen);
+}
+
+static ssize_t
+nfs4_listxattr_nfs4_user(struct inode *inode, char *list, size_t list_len)
+{
+	u64 cookie;
+	bool eof;
+	int ret, size;
+	char *buf;
+	size_t buflen;
+	struct nfs_access_entry cache;
+
+	if (!nfs_server_capable(inode, NFS_CAP_XATTR))
+		return 0;
+
+	if (!nfs_access_get_cached(inode, current_cred(), &cache, true)) {
+		if (!(cache.mask & NFS_ACCESS_XALIST))
+			return 0;
+	}
+
+	cookie = 0;
+	eof = false;
+	buflen = list_len ? list_len : XATTR_LIST_MAX;
+	buf = list_len ? list : NULL;
+	size = 0;
+
+	while (!eof) {
+		ret = nfs42_proc_listxattrs(inode, buf, buflen,
+		    &cookie, &eof);
+		if (ret < 0)
+			return ret;
+
+		if (list_len) {
+			buf += ret;
+			buflen -= ret;
+		}
+		size += ret;
+	}
+
+	return size;
+}
+
+#else
+
+static ssize_t
+nfs4_listxattr_nfs4_user(struct inode *inode, char *list, size_t list_len)
+{
+	return 0;
+}
+#endif /* CONFIG_NFS_V4_2 */
+
 /*
  * nfs_fhget will use either the mounted_on_fileid or the fileid
  */
@@ -10048,7 +10146,7 @@ const struct nfs4_minor_version_ops *nfs_v4_minor_ops[] = {
 
 static ssize_t nfs4_listxattr(struct dentry *dentry, char *list, size_t size)
 {
-	ssize_t error, error2;
+	ssize_t error, error2, error3;
 
 	error = generic_listxattr(dentry, list, size);
 	if (error < 0)
@@ -10061,7 +10159,17 @@ static ssize_t nfs4_listxattr(struct dentry *dentry, char *list, size_t size)
 	error2 = nfs4_listxattr_nfs4_label(d_inode(dentry), list, size);
 	if (error2 < 0)
 		return error2;
-	return error + error2;
+
+	if (list) {
+		list += error2;
+		size -= error2;
+	}
+
+	error3 = nfs4_listxattr_nfs4_user(d_inode(dentry), list, size);
+	if (error3 < 0)
+		return error3;
+
+	return error + error2 + error3;
 }
 
 static const struct inode_operations nfs4_dir_inode_operations = {
@@ -10149,10 +10257,21 @@ static const struct xattr_handler nfs4_xattr_nfs4_acl_handler = {
 	.set	= nfs4_xattr_set_nfs4_acl,
 };
 
+#ifdef CONFIG_NFS_V4_2
+static const struct xattr_handler nfs4_xattr_nfs4_user_handler = {
+	.prefix	= XATTR_USER_PREFIX,
+	.get	= nfs4_xattr_get_nfs4_user,
+	.set	= nfs4_xattr_set_nfs4_user,
+};
+#endif
+
 const struct xattr_handler *nfs4_xattr_handlers[] = {
 	&nfs4_xattr_nfs4_acl_handler,
 #ifdef CONFIG_NFS_V4_SECURITY_LABEL
 	&nfs4_xattr_nfs4_label_handler,
+#endif
+#ifdef CONFIG_NFS_V4_2
+	&nfs4_xattr_nfs4_user_handler,
 #endif
 	NULL
 };
-- 
2.16.6


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 13/13] NFSv4.2: add client side xattr caching.
  2020-03-11 19:56 [PATCH 00/13] client side user xattr (RFC8276) support Frank van der Linden
                   ` (11 preceding siblings ...)
  2020-03-11 19:56 ` [PATCH 12/13] NFSv4.2: hook in the user extended attribute handlers Frank van der Linden
@ 2020-03-11 19:56 ` Frank van der Linden
  2020-03-12 20:39   ` Schumaker, Anna
  2020-03-12 20:48   ` Schumaker, Anna
  2020-03-12 19:06 ` [PATCH 00/13] client side user xattr (RFC8276) support Mkrtchyan, Tigran
  2020-03-12 20:09 ` Anna Schumaker
  14 siblings, 2 replies; 33+ messages in thread
From: Frank van der Linden @ 2020-03-11 19:56 UTC (permalink / raw)
  To: trond.myklebust, anna.schumaker, linux-nfs; +Cc: Frank van der Linden

Implement client side caching for NFSv4.2 extended attributes. The cache
is a per-inode hashtable, with name/value entries. There is one special
entry for the listxattr cache.

NFS inodes have a pointer to a cache structure. The cache structure is
allocated on demand, freed when the cache is invalidated.

Memory shrinkers keep the size in check. Large entries (> PAGE_SIZE)
are collected by a separate shrinker, and freed more aggressively
than others.

Signed-off-by: Frank van der Linden <fllinden@amazon.com>
---
 fs/nfs/Makefile             |    1 +
 fs/nfs/inode.c              |    9 +-
 fs/nfs/internal.h           |   20 +
 fs/nfs/nfs42proc.c          |   12 +
 fs/nfs/nfs42xattr.c         | 1083 +++++++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfs4proc.c           |   42 +-
 fs/nfs/nfs4super.c          |   10 +
 include/linux/nfs_fs.h      |    6 +
 include/uapi/linux/nfs_fs.h |    1 +
 9 files changed, 1177 insertions(+), 7 deletions(-)
 create mode 100644 fs/nfs/nfs42xattr.c

diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 2433c3e03cfa..191b3e9aa232 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -31,6 +31,7 @@ nfsv4-$(CONFIG_NFS_USE_LEGACY_DNS) += cache_lib.o
 nfsv4-$(CONFIG_SYSCTL)	+= nfs4sysctl.o
 nfsv4-$(CONFIG_NFS_V4_1)	+= pnfs.o pnfs_dev.o pnfs_nfs.o
 nfsv4-$(CONFIG_NFS_V4_2)	+= nfs42proc.o
+nfsv4-$(CONFIG_NFS_V4_2)	+= nfs42xattr.o
 
 obj-$(CONFIG_PNFS_FILE_LAYOUT) += filelayout/
 obj-$(CONFIG_PNFS_BLOCK) += blocklayout/
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index d2be152796ef..9d4952d2306b 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -194,6 +194,7 @@ bool nfs_check_cache_invalid(struct inode *inode, unsigned long flags)
 
 	return nfs_check_cache_invalid_not_delegated(inode, flags);
 }
+EXPORT_SYMBOL_GPL(nfs_check_cache_invalid);
 
 static void nfs_set_cache_invalid(struct inode *inode, unsigned long flags)
 {
@@ -235,11 +236,13 @@ static void nfs_zap_caches_locked(struct inode *inode)
 					| NFS_INO_INVALID_DATA
 					| NFS_INO_INVALID_ACCESS
 					| NFS_INO_INVALID_ACL
+					| NFS_INO_INVALID_XATTR
 					| NFS_INO_REVAL_PAGECACHE);
 	} else
 		nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR
 					| NFS_INO_INVALID_ACCESS
 					| NFS_INO_INVALID_ACL
+					| NFS_INO_INVALID_XATTR
 					| NFS_INO_REVAL_PAGECACHE);
 	nfs_zap_label_cache_locked(nfsi);
 }
@@ -1885,7 +1888,8 @@ static int nfs_update_inode(struct inode *inode, struct nfs_fattr *fattr)
 			if (!(have_writers || have_delegation)) {
 				invalid |= NFS_INO_INVALID_DATA
 					| NFS_INO_INVALID_ACCESS
-					| NFS_INO_INVALID_ACL;
+					| NFS_INO_INVALID_ACL
+					| NFS_INO_INVALID_XATTR;
 				/* Force revalidate of all attributes */
 				save_cache_validity |= NFS_INO_INVALID_CTIME
 					| NFS_INO_INVALID_MTIME
@@ -2084,6 +2088,9 @@ struct inode *nfs_alloc_inode(struct super_block *sb)
 #if IS_ENABLED(CONFIG_NFS_V4)
 	nfsi->nfs4_acl = NULL;
 #endif /* CONFIG_NFS_V4 */
+#ifdef CONFIG_NFS_V4_2
+	nfsi->xattr_cache = NULL;
+#endif
 	return &nfsi->vfs_inode;
 }
 EXPORT_SYMBOL_GPL(nfs_alloc_inode);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 1e3a7e119c93..67b8e4f7c554 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -575,6 +575,26 @@ extern void nfs4_test_session_trunk(struct rpc_clnt *clnt,
 				struct rpc_xprt *xprt,
 				void *data);
 
+#ifdef CONFIG_NFS_V4_2
+extern int __init nfs4_xattr_cache_init(void);
+extern void nfs4_xattr_cache_exit(void);
+extern void nfs4_xattr_cache_add(struct inode *inode, const char *name,
+				 const char *buf, struct page **pages,
+				 ssize_t buflen);
+extern void nfs4_xattr_cache_remove(struct inode *inode, const char *name);
+extern ssize_t nfs4_xattr_cache_get(struct inode *inode, const char *name,
+				char *buf, ssize_t buflen);
+extern void nfs4_xattr_cache_set_list(struct inode *inode, const char *buf,
+				      ssize_t buflen);
+extern ssize_t nfs4_xattr_cache_list(struct inode *inode, char *buf,
+				     ssize_t buflen);
+extern void nfs4_xattr_cache_zap(struct inode *inode);
+#else
+static inline void nfs4_xattr_cache_zap(struct inode *inode)
+{
+}
+#endif
+
 static inline struct inode *nfs_igrab_and_active(struct inode *inode)
 {
 	inode = igrab(inode);
diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
index 8c2e52bc986a..e200522469af 100644
--- a/fs/nfs/nfs42proc.c
+++ b/fs/nfs/nfs42proc.c
@@ -1182,6 +1182,18 @@ static ssize_t _nfs42_proc_getxattr(struct inode *inode, const char *name,
 	if (ret < 0)
 		return ret;
 
+	/*
+	 * Normally, the caching is done one layer up, but for successful
+	 * RPCS, always cache the result here, even if the caller was
+	 * just querying the length, or if the reply was too big for
+	 * the caller. This avoids a second RPC in the case of the
+	 * common query-alloc-retrieve cycle for xattrs.
+	 *
+	 * Note that xattr_len is always capped to XATTR_SIZE_MAX.
+	 */
+
+	nfs4_xattr_cache_add(inode, name, NULL, pages, res.xattr_len);
+
 	if (buflen) {
 		if (res.xattr_len > buflen)
 			return -ERANGE;
diff --git a/fs/nfs/nfs42xattr.c b/fs/nfs/nfs42xattr.c
new file mode 100644
index 000000000000..23fdab977a2a
--- /dev/null
+++ b/fs/nfs/nfs42xattr.c
@@ -0,0 +1,1083 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2019, 2020 Amazon.com, Inc. or its affiliates. All rights reserved.
+ *
+ * User extended attribute client side cache functions.
+ *
+ * Author: Frank van der Linden <fllinden@amazon.com>
+ */
+#include <linux/errno.h>
+#include <linux/nfs_fs.h>
+#include <linux/hashtable.h>
+#include <linux/refcount.h>
+#include <uapi/linux/xattr.h>
+
+#include "nfs4_fs.h"
+#include "internal.h"
+
+/*
+ * User extended attributes client side caching is implemented by having
+ * a cache structure attached to NFS inodes. This structure is allocated
+ * when needed, and freed when the cache is zapped.
+ *
+ * The cache structure contains as hash table of entries, and a pointer
+ * to a special-cased entry for the listxattr cache.
+ *
+ * Accessing and allocating / freeing the caches is done via reference
+ * counting. The cache entries use a similar refcounting scheme.
+ *
+ * This makes freeing a cache, both from the shrinker and from the
+ * zap cache path, easy. It also means that, in current use cases,
+ * the large majority of inodes will not waste any memory, as they
+ * will never have any user extended attributes assigned to them.
+ *
+ * Attribute entries are hashed in to a simple hash table. They are
+ * also part of an LRU.
+ *
+ * There are three shrinkers.
+ *
+ * Two shrinkers deal with the cache entries themselves: one for
+ * large entries (> PAGE_SIZE), and one for smaller entries. The
+ * shrinker for the larger entries works more aggressively than
+ * those for the smaller entries.
+ *
+ * The other shrinker frees the cache structures themselves.
+ */
+
+/*
+ * 64 buckets is a good default. There is likely no reasonable
+ * workload that uses more than even 64 user extended attributes.
+ * You can certainly add a lot more - but you get what you ask for
+ * in those circumstances.
+ */
+#define NFS4_XATTR_HASH_SIZE	64
+
+#define NFSDBG_FACILITY	NFSDBG_XATTRCACHE
+
+struct nfs4_xattr_cache;
+struct nfs4_xattr_entry;
+
+struct nfs4_xattr_bucket {
+	spinlock_t lock;
+	struct hlist_head hlist;
+	struct nfs4_xattr_cache *cache;
+	bool draining;
+};
+
+struct nfs4_xattr_cache {
+	struct kref ref;
+	spinlock_t hash_lock;	/* protects hashtable and lru */
+	struct nfs4_xattr_bucket buckets[NFS4_XATTR_HASH_SIZE];
+	struct list_head lru;
+	struct list_head dispose;
+	atomic_long_t nent;
+	spinlock_t listxattr_lock;
+	struct inode *inode;
+	struct nfs4_xattr_entry *listxattr;
+	struct work_struct work;
+};
+
+struct nfs4_xattr_entry {
+	struct kref ref;
+	struct hlist_node hnode;
+	struct list_head lru;
+	struct list_head dispose;
+	char *xattr_name;
+	void *xattr_value;
+	size_t xattr_size;
+	struct nfs4_xattr_bucket *bucket;
+	uint32_t flags;
+};
+
+#define	NFS4_XATTR_ENTRY_EXTVAL	0x0001
+
+/*
+ * LRU list of NFS inodes that have xattr caches.
+ */
+static struct list_lru nfs4_xattr_cache_lru;
+static struct list_lru nfs4_xattr_entry_lru;
+static struct list_lru nfs4_xattr_large_entry_lru;
+
+static struct kmem_cache *nfs4_xattr_cache_cachep;
+
+static struct workqueue_struct *nfs4_xattr_cache_wq;
+
+/*
+ * Hashing helper functions.
+ */
+static void
+nfs4_xattr_hash_init(struct nfs4_xattr_cache *cache)
+{
+	unsigned int i;
+
+	for (i = 0; i < NFS4_XATTR_HASH_SIZE; i++) {
+		INIT_HLIST_HEAD(&cache->buckets[i].hlist);
+		spin_lock_init(&cache->buckets[i].lock);
+		cache->buckets[i].cache = cache;
+		cache->buckets[i].draining = false;
+	}
+}
+
+/*
+ * Locking order:
+ * 1. inode i_lock or bucket lock
+ * 2. list_lru lock (taken by list_lru_* functions)
+ */
+
+/*
+ * Wrapper functions to add a cache entry to the right LRU.
+ */
+static bool
+nfs4_xattr_entry_lru_add(struct nfs4_xattr_entry *entry)
+{
+	struct list_lru *lru;
+
+	lru = (entry->flags & NFS4_XATTR_ENTRY_EXTVAL) ?
+	    &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
+
+	return list_lru_add(lru, &entry->lru);
+}
+
+static bool
+nfs4_xattr_entry_lru_del(struct nfs4_xattr_entry *entry)
+{
+	struct list_lru *lru;
+
+	lru = (entry->flags & NFS4_XATTR_ENTRY_EXTVAL) ?
+	    &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
+
+	return list_lru_del(lru, &entry->lru);
+}
+
+/*
+ * This function allocates cache entries. They are the normal
+ * extended attribute name/value pairs, but may also be a listxattr
+ * cache. Those allocations use the same entry so that they can be
+ * treated as one by the memory shrinker.
+ *
+ * xattr cache entries are allocated together with names. If the
+ * value fits in to one page with the entry structure and the name,
+ * it will also be part of the same allocation (kmalloc). This is
+ * expected to be the vast majority of cases. Larger allocations
+ * have a value pointer that is allocated separately by kvmalloc.
+ *
+ * Parameters:
+ *
+ * @name:  Name of the extended attribute. NULL for listxattr cache
+ *         entry.
+ * @value: Value of attribute, or listxattr cache. NULL if the
+ *         value is to be copied from pages instead.
+ * @pages: Pages to copy the value from, if not NULL. Passed in to
+ *	   make it easier to copy the value after an RPC, even if
+ *	   the value will not be passed up to application (e.g.
+ *	   for a 'query' getxattr with NULL buffer).
+ * @len:   Length of the value. Can be 0 for zero-length attribues.
+ *         @value and @pages will be NULL if @len is 0.
+ */
+static struct nfs4_xattr_entry *
+nfs4_xattr_alloc_entry(const char *name, const void *value,
+		       struct page **pages, size_t len)
+{
+	struct nfs4_xattr_entry *entry;
+	void *valp;
+	char *namep;
+	size_t alloclen, slen;
+	char *buf;
+	uint32_t flags;
+
+	BUILD_BUG_ON(sizeof(struct nfs4_xattr_entry) +
+	    XATTR_NAME_MAX + 1 > PAGE_SIZE);
+
+	alloclen = sizeof(struct nfs4_xattr_entry);
+	if (name != NULL) {
+		slen = strlen(name) + 1;
+		alloclen += slen;
+	} else
+		slen = 0;
+
+	if (alloclen + len <= PAGE_SIZE) {
+		alloclen += len;
+		flags = 0;
+	} else {
+		flags = NFS4_XATTR_ENTRY_EXTVAL;
+	}
+
+	buf = kmalloc(alloclen, GFP_KERNEL_ACCOUNT | GFP_NOFS);
+	if (buf == NULL)
+		return NULL;
+	entry = (struct nfs4_xattr_entry *)buf;
+
+	if (name != NULL) {
+		namep = buf + sizeof(struct nfs4_xattr_entry);
+		memcpy(namep, name, slen);
+	} else {
+		namep = NULL;
+	}
+
+
+	if (flags & NFS4_XATTR_ENTRY_EXTVAL) {
+		valp = kvmalloc(len, GFP_KERNEL_ACCOUNT | GFP_NOFS);
+		if (valp == NULL) {
+			kfree(buf);
+			return NULL;
+		}
+	} else if (len != 0) {
+		valp = buf + sizeof(struct nfs4_xattr_entry) + slen;
+	} else
+		valp = NULL;
+
+	if (valp != NULL) {
+		if (value != NULL)
+			memcpy(valp, value, len);
+		else
+			_copy_from_pages(valp, pages, 0, len);
+	}
+
+	entry->flags = flags;
+	entry->xattr_value = valp;
+	kref_init(&entry->ref);
+	entry->xattr_name = namep;
+	entry->xattr_size = len;
+	entry->bucket = NULL;
+	INIT_LIST_HEAD(&entry->lru);
+	INIT_LIST_HEAD(&entry->dispose);
+	INIT_HLIST_NODE(&entry->hnode);
+
+	return entry;
+}
+
+static void
+nfs4_xattr_free_entry(struct nfs4_xattr_entry *entry)
+{
+	if (entry->flags & NFS4_XATTR_ENTRY_EXTVAL)
+		kvfree(entry->xattr_value);
+	kfree(entry);
+}
+
+static void
+nfs4_xattr_free_entry_cb(struct kref *kref)
+{
+	struct nfs4_xattr_entry *entry;
+
+	entry = container_of(kref, struct nfs4_xattr_entry, ref);
+
+	if (WARN_ON(!list_empty(&entry->lru)))
+		return;
+
+	nfs4_xattr_free_entry(entry);
+}
+
+static void
+nfs4_xattr_free_cache_cb(struct kref *kref)
+{
+	struct nfs4_xattr_cache *cache;
+	int i;
+
+	cache = container_of(kref, struct nfs4_xattr_cache, ref);
+
+	for (i = 0; i < NFS4_XATTR_HASH_SIZE; i++) {
+		if (WARN_ON(!hlist_empty(&cache->buckets[i].hlist)))
+			return;
+		cache->buckets[i].draining = false;
+	}
+
+	cache->listxattr = NULL;
+
+	kmem_cache_free(nfs4_xattr_cache_cachep, cache);
+
+}
+
+static struct nfs4_xattr_cache *
+nfs4_xattr_alloc_cache(void)
+{
+	struct nfs4_xattr_cache *cache;
+
+	cache = kmem_cache_alloc(nfs4_xattr_cache_cachep,
+	    GFP_KERNEL_ACCOUNT | GFP_NOFS);
+	if (cache == NULL)
+		return NULL;
+
+	kref_init(&cache->ref);
+	atomic_long_set(&cache->nent, 0);
+
+	return cache;
+}
+
+/*
+ * Set the listxattr cache, which is a special-cased cache entry.
+ * The special value ERR_PTR(-ESTALE) is used to indicate that
+ * the cache is being drained - this prevents a new listxattr
+ * cache from being added to what is now a stale cache.
+ */
+static int
+nfs4_xattr_set_listcache(struct nfs4_xattr_cache *cache,
+			 struct nfs4_xattr_entry *new)
+{
+	struct nfs4_xattr_entry *old;
+	int ret = 1;
+
+	spin_lock(&cache->listxattr_lock);
+
+	old = cache->listxattr;
+
+	if (old == ERR_PTR(-ESTALE)) {
+		ret = 0;
+		goto out;
+	}
+
+	cache->listxattr = new;
+	if (new != NULL && new != ERR_PTR(-ESTALE))
+		nfs4_xattr_entry_lru_add(new);
+
+	if (old != NULL) {
+		nfs4_xattr_entry_lru_del(old);
+		kref_put(&old->ref, nfs4_xattr_free_entry_cb);
+	}
+out:
+	spin_unlock(&cache->listxattr_lock);
+
+	return ret;
+}
+
+/*
+ * Unlink a cache from its parent inode, clearing out an invalid
+ * cache. Must be called with i_lock held.
+ */
+static struct nfs4_xattr_cache *
+nfs4_xattr_cache_unlink(struct inode *inode)
+{
+	struct nfs_inode *nfsi;
+	struct nfs4_xattr_cache *oldcache;
+
+	nfsi = NFS_I(inode);
+
+	oldcache = nfsi->xattr_cache;
+	if (oldcache != NULL) {
+		list_lru_del(&nfs4_xattr_cache_lru, &oldcache->lru);
+		oldcache->inode = NULL;
+	}
+	nfsi->xattr_cache = NULL;
+	nfsi->cache_validity &= ~NFS_INO_INVALID_XATTR;
+
+	return oldcache;
+
+}
+
+/*
+ * Discard a cache. Usually called by a worker, since walking all
+ * the entries can take up some cycles that we don't want to waste
+ * in the I/O path. Can also be called from the shrinker callback.
+ *
+ * The cache is dead, it has already been unlinked from its inode,
+ * and no longer appears on the cache LRU list.
+ *
+ * Mark all buckets as draining, so that no new entries are added. This
+ * could still happen in the unlikely, but possible case that another
+ * thread had grabbed a reference before it was unlinked from the inode,
+ * and is still holding it for an add operation.
+ *
+ * Remove all entries from the LRU lists, so that there is no longer
+ * any way to 'find' this cache. Then, remove the entries from the hash
+ * table.
+ *
+ * At that point, the cache will remain empty and can be freed when the final
+ * reference drops, which is very likely the kref_put at the end of
+ * this function, or the one called immediately afterwards in the
+ * shrinker callback.
+ */
+static void
+nfs4_xattr_discard_cache(struct nfs4_xattr_cache *cache)
+{
+	unsigned int i;
+	struct nfs4_xattr_entry *entry;
+	struct nfs4_xattr_bucket *bucket;
+	struct hlist_node *n;
+
+	nfs4_xattr_set_listcache(cache, ERR_PTR(-ESTALE));
+
+	for (i = 0; i < NFS4_XATTR_HASH_SIZE; i++) {
+		bucket = &cache->buckets[i];
+
+		spin_lock(&bucket->lock);
+		bucket->draining = true;
+		hlist_for_each_entry_safe(entry, n, &bucket->hlist, hnode) {
+			nfs4_xattr_entry_lru_del(entry);
+			hlist_del_init(&entry->hnode);
+			kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
+		}
+		spin_unlock(&bucket->lock);
+	}
+
+	atomic_long_set(&cache->nent, 0);
+
+	kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+}
+
+static void
+nfs4_xattr_discard_cache_worker(struct work_struct *work)
+{
+	struct nfs4_xattr_cache *cache = container_of(work,
+	    struct nfs4_xattr_cache, work);
+
+	nfs4_xattr_discard_cache(cache);
+}
+
+static void
+nfs4_xattr_reap_cache(struct nfs4_xattr_cache *cache)
+{
+	queue_work(nfs4_xattr_cache_wq, &cache->work);
+}
+
+/*
+ * Get a referenced copy of the cache structure. Avoid doing allocs
+ * while holding i_lock. Which means that we do some optimistic allocation,
+ * and might have to free the result in rare cases.
+ *
+ * This function only checks the NFS_INO_INVALID_XATTR cache validity bit
+ * and acts accordingly, replacing the cache when needed. For the read case
+ * (!add), this means that the caller must make sure that the cache
+ * is valid before caling this function. getxattr and listxattr call
+ * revalidate_inode to do this. The attribute cache timeout (for the
+ * non-delegated case) is expected to be dealt with in the revalidate
+ * call.
+ */
+
+static struct nfs4_xattr_cache *
+nfs4_xattr_get_cache(struct inode *inode, int add)
+{
+	struct nfs_inode *nfsi;
+	struct nfs4_xattr_cache *cache, *oldcache, *newcache;
+
+	nfsi = NFS_I(inode);
+
+	cache = oldcache = NULL;
+
+	spin_lock(&inode->i_lock);
+
+	if (nfsi->cache_validity & NFS_INO_INVALID_XATTR)
+		oldcache = nfs4_xattr_cache_unlink(inode);
+	else
+		cache = nfsi->xattr_cache;
+
+	if (cache != NULL)
+		kref_get(&cache->ref);
+
+	spin_unlock(&inode->i_lock);
+
+	if (add && cache == NULL) {
+		newcache = NULL;
+
+		cache = nfs4_xattr_alloc_cache();
+		if (cache == NULL)
+			goto out;
+
+		spin_lock(&inode->i_lock);
+		if (nfsi->cache_validity & NFS_INO_INVALID_XATTR) {
+			/*
+			 * The cache was invalidated again. Give up,
+			 * since what we want to enter is now likely
+			 * outdated anyway.
+			 */
+			spin_unlock(&inode->i_lock);
+			kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+			cache = NULL;
+			goto out;
+		}
+
+		/*
+		 * Check if someone beat us to it.
+		 */
+		if (nfsi->xattr_cache != NULL) {
+			newcache = nfsi->xattr_cache;
+			kref_get(&newcache->ref);
+		} else {
+			kref_get(&cache->ref);
+			nfsi->xattr_cache = cache;
+			cache->inode = inode;
+			list_lru_add(&nfs4_xattr_cache_lru, &cache->lru);
+		}
+
+		spin_unlock(&inode->i_lock);
+
+		/*
+		 * If there was a race, throw away the cache we just
+		 * allocated, and use the new one allocated by someone
+		 * else.
+		 */
+		if (newcache != NULL) {
+			kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+			cache = newcache;
+		}
+	}
+
+out:
+	/*
+	 * Discarding an old cache is done via a workqueue.
+	 */
+	if (oldcache != NULL)
+		nfs4_xattr_reap_cache(oldcache);
+
+	return cache;
+}
+
+static inline struct nfs4_xattr_bucket *
+nfs4_xattr_hash_bucket(struct nfs4_xattr_cache *cache, const char *name)
+{
+	return &cache->buckets[jhash(name, strlen(name), 0) &
+	    (ARRAY_SIZE(cache->buckets) - 1)];
+}
+
+static struct nfs4_xattr_entry *
+nfs4_xattr_get_entry(struct nfs4_xattr_bucket *bucket, const char *name)
+{
+	struct nfs4_xattr_entry *entry;
+
+	entry = NULL;
+
+	hlist_for_each_entry(entry, &bucket->hlist, hnode) {
+		if (!strcmp(entry->xattr_name, name))
+			break;
+	}
+
+	return entry;
+}
+
+static int
+nfs4_xattr_hash_add(struct nfs4_xattr_cache *cache,
+		    struct nfs4_xattr_entry *entry)
+{
+	struct nfs4_xattr_bucket *bucket;
+	struct nfs4_xattr_entry *oldentry = NULL;
+	int ret = 1;
+
+	bucket = nfs4_xattr_hash_bucket(cache, entry->xattr_name);
+	entry->bucket = bucket;
+
+	spin_lock(&bucket->lock);
+
+	if (bucket->draining) {
+		ret = 0;
+		goto out;
+	}
+
+	oldentry = nfs4_xattr_get_entry(bucket, entry->xattr_name);
+	if (oldentry != NULL) {
+		hlist_del_init(&oldentry->hnode);
+		nfs4_xattr_entry_lru_del(oldentry);
+	} else {
+		atomic_long_inc(&cache->nent);
+	}
+
+	hlist_add_head(&entry->hnode, &bucket->hlist);
+	nfs4_xattr_entry_lru_add(entry);
+
+out:
+	spin_unlock(&bucket->lock);
+
+	if (oldentry != NULL)
+		kref_put(&oldentry->ref, nfs4_xattr_free_entry_cb);
+
+	return ret;
+}
+
+static void
+nfs4_xattr_hash_remove(struct nfs4_xattr_cache *cache, const char *name)
+{
+	struct nfs4_xattr_bucket *bucket;
+	struct nfs4_xattr_entry *entry;
+
+	bucket = nfs4_xattr_hash_bucket(cache, name);
+
+	spin_lock(&bucket->lock);
+
+	entry = nfs4_xattr_get_entry(bucket, name);
+	if (entry != NULL) {
+		hlist_del_init(&entry->hnode);
+		nfs4_xattr_entry_lru_del(entry);
+		atomic_long_dec(&cache->nent);
+	}
+
+	spin_unlock(&bucket->lock);
+
+	if (entry != NULL)
+		kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
+}
+
+static struct nfs4_xattr_entry *
+nfs4_xattr_hash_find(struct nfs4_xattr_cache *cache, const char *name)
+{
+	struct nfs4_xattr_bucket *bucket;
+	struct nfs4_xattr_entry *entry;
+
+	bucket = nfs4_xattr_hash_bucket(cache, name);
+
+	spin_lock(&bucket->lock);
+
+	entry = nfs4_xattr_get_entry(bucket, name);
+	if (entry != NULL)
+		kref_get(&entry->ref);
+
+	spin_unlock(&bucket->lock);
+
+	return entry;
+}
+
+/*
+ * Entry point to retrieve an entry from the cache.
+ */
+ssize_t nfs4_xattr_cache_get(struct inode *inode, const char *name, char *buf,
+			 ssize_t buflen)
+{
+	struct nfs4_xattr_cache *cache;
+	struct nfs4_xattr_entry *entry;
+	ssize_t ret;
+
+	cache = nfs4_xattr_get_cache(inode, 0);
+	if (cache == NULL)
+		return -ENOENT;
+
+	ret = 0;
+	entry = nfs4_xattr_hash_find(cache, name);
+
+	if (entry != NULL) {
+		dprintk("%s: cache hit '%s', len %lu\n", __func__,
+		    entry->xattr_name, (unsigned long)entry->xattr_size);
+		if (buflen == 0) {
+			/* Length probe only */
+			ret = entry->xattr_size;
+		} else if (buflen < entry->xattr_size)
+			ret = -ERANGE;
+		else {
+			memcpy(buf, entry->xattr_value, entry->xattr_size);
+			ret = entry->xattr_size;
+		}
+		kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
+	} else {
+		dprintk("%s: cache miss '%s'\n", __func__, name);
+		ret = -ENOENT;
+	}
+
+	kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+
+	return ret;
+}
+
+/*
+ * Retrieve a cached list of xattrs from the cache.
+ */
+ssize_t nfs4_xattr_cache_list(struct inode *inode, char *buf, ssize_t buflen)
+{
+	struct nfs4_xattr_cache *cache;
+	struct nfs4_xattr_entry *entry;
+	ssize_t ret;
+
+	cache = nfs4_xattr_get_cache(inode, 0);
+	if (cache == NULL)
+		return -ENOENT;
+
+	spin_lock(&cache->listxattr_lock);
+
+	entry = cache->listxattr;
+
+	if (entry != NULL && entry != ERR_PTR(-ESTALE)) {
+		if (buflen == 0) {
+			/* Length probe only */
+			ret = entry->xattr_size;
+		} else if (entry->xattr_size > buflen)
+			ret = -ERANGE;
+		else {
+			memcpy(buf, entry->xattr_value, entry->xattr_size);
+			ret = entry->xattr_size;
+		}
+	} else {
+		ret = -ENOENT;
+	}
+
+	spin_unlock(&cache->listxattr_lock);
+
+	kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+
+	return ret;
+}
+
+/*
+ * Add an xattr to the cache.
+ *
+ * This also invalidates the xattr list cache.
+ */
+void nfs4_xattr_cache_add(struct inode *inode, const char *name,
+			  const char *buf, struct page **pages, ssize_t buflen)
+{
+	struct nfs4_xattr_cache *cache;
+	struct nfs4_xattr_entry *entry;
+
+	dprintk("%s: add '%s' len %lu\n", __func__,
+	    name, (unsigned long)buflen);
+
+	cache = nfs4_xattr_get_cache(inode, 1);
+	if (cache == NULL)
+		return;
+
+	entry = nfs4_xattr_alloc_entry(name, buf, pages, buflen);
+	if (entry == NULL)
+		goto out;
+
+	(void)nfs4_xattr_set_listcache(cache, NULL);
+
+	if (!nfs4_xattr_hash_add(cache, entry))
+		kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
+
+out:
+	kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+}
+
+
+/*
+ * Remove an xattr from the cache.
+ *
+ * This also invalidates the xattr list cache.
+ */
+void nfs4_xattr_cache_remove(struct inode *inode, const char *name)
+{
+	struct nfs4_xattr_cache *cache;
+
+	dprintk("%s: remove '%s'\n", __func__, name);
+
+	cache = nfs4_xattr_get_cache(inode, 0);
+	if (cache == NULL)
+		return;
+
+	(void)nfs4_xattr_set_listcache(cache, NULL);
+	nfs4_xattr_hash_remove(cache, name);
+
+	kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+}
+
+/*
+ * Cache listxattr output, replacing any possible old one.
+ */
+void nfs4_xattr_cache_set_list(struct inode *inode, const char *buf,
+			       ssize_t buflen)
+{
+	struct nfs4_xattr_cache *cache;
+	struct nfs4_xattr_entry *entry;
+
+	cache = nfs4_xattr_get_cache(inode, 1);
+	if (cache == NULL)
+		return;
+
+	entry = nfs4_xattr_alloc_entry(NULL, buf, NULL, buflen);
+	if (entry == NULL)
+		goto out;
+
+	/*
+	 * This is just there to be able to get to bucket->cache,
+	 * which is obviously the same for all buckets, so just
+	 * use bucket 0.
+	 */
+	entry->bucket = &cache->buckets[0];
+
+	if (!nfs4_xattr_set_listcache(cache, entry))
+		kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
+
+out:
+	kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+}
+
+/*
+ * Zap the entire cache. Called when an inode is evicted.
+ */
+void nfs4_xattr_cache_zap(struct inode *inode)
+{
+	struct nfs4_xattr_cache *oldcache;
+
+	spin_lock(&inode->i_lock);
+	oldcache = nfs4_xattr_cache_unlink(inode);
+	spin_unlock(&inode->i_lock);
+
+	if (oldcache)
+		nfs4_xattr_discard_cache(oldcache);
+}
+
+/*
+ * The entry LRU is shrunk more aggressively than the cache LRU,
+ * by settings @seeks to 1.
+ *
+ * Cache structures are freed only when they've become empty, after
+ * pruning all but one entry.
+ */
+
+static unsigned long nfs4_xattr_cache_count(struct shrinker *shrink,
+					    struct shrink_control *sc);
+static unsigned long nfs4_xattr_entry_count(struct shrinker *shrink,
+					    struct shrink_control *sc);
+static unsigned long nfs4_xattr_cache_scan(struct shrinker *shrink,
+					   struct shrink_control *sc);
+static unsigned long nfs4_xattr_entry_scan(struct shrinker *shrink,
+					   struct shrink_control *sc);
+
+static struct shrinker nfs4_xattr_cache_shrinker = {
+	.count_objects	= nfs4_xattr_cache_count,
+	.scan_objects	= nfs4_xattr_cache_scan,
+	.seeks		= DEFAULT_SEEKS,
+	.flags		= SHRINKER_MEMCG_AWARE,
+};
+
+static struct shrinker nfs4_xattr_entry_shrinker = {
+	.count_objects	= nfs4_xattr_entry_count,
+	.scan_objects	= nfs4_xattr_entry_scan,
+	.seeks		= DEFAULT_SEEKS,
+	.batch		= 512,
+	.flags		= SHRINKER_MEMCG_AWARE,
+};
+
+static struct shrinker nfs4_xattr_large_entry_shrinker = {
+	.count_objects	= nfs4_xattr_entry_count,
+	.scan_objects	= nfs4_xattr_entry_scan,
+	.seeks		= 1,
+	.batch		= 512,
+	.flags		= SHRINKER_MEMCG_AWARE,
+};
+
+static enum lru_status
+cache_lru_isolate(struct list_head *item,
+	struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
+{
+	struct list_head *dispose = arg;
+	struct inode *inode;
+	struct nfs4_xattr_cache *cache = container_of(item,
+	    struct nfs4_xattr_cache, lru);
+
+	if (atomic_long_read(&cache->nent) > 1)
+		return LRU_SKIP;
+
+	/*
+	 * If a cache structure is on the LRU list, we know that
+	 * its inode is valid. Try to lock it to break the link.
+	 * Since we're inverting the lock order here, only try.
+	 */
+	inode = cache->inode;
+
+	if (!spin_trylock(&inode->i_lock))
+		return LRU_SKIP;
+
+	kref_get(&cache->ref);
+
+	cache->inode = NULL;
+	NFS_I(inode)->xattr_cache = NULL;
+	NFS_I(inode)->cache_validity &= ~NFS_INO_INVALID_XATTR;
+	list_lru_isolate(lru, &cache->lru);
+
+	spin_unlock(&inode->i_lock);
+
+	list_add_tail(&cache->dispose, dispose);
+	return LRU_REMOVED;
+}
+
+static unsigned long
+nfs4_xattr_cache_scan(struct shrinker *shrink, struct shrink_control *sc)
+{
+	LIST_HEAD(dispose);
+	unsigned long freed;
+	struct nfs4_xattr_cache *cache;
+
+	freed = list_lru_shrink_walk(&nfs4_xattr_cache_lru, sc,
+	    cache_lru_isolate, &dispose);
+	while (!list_empty(&dispose)) {
+		cache = list_first_entry(&dispose, struct nfs4_xattr_cache,
+		    dispose);
+		list_del_init(&cache->dispose);
+		nfs4_xattr_discard_cache(cache);
+		kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
+	}
+
+	return freed;
+}
+
+
+static unsigned long
+nfs4_xattr_cache_count(struct shrinker *shrink, struct shrink_control *sc)
+{
+	unsigned long count;
+
+	count = list_lru_count(&nfs4_xattr_cache_lru);
+	return vfs_pressure_ratio(count);
+}
+
+static enum lru_status
+entry_lru_isolate(struct list_head *item,
+	struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
+{
+	struct list_head *dispose = arg;
+	struct nfs4_xattr_bucket *bucket;
+	struct nfs4_xattr_cache *cache;
+	struct nfs4_xattr_entry *entry = container_of(item,
+	    struct nfs4_xattr_entry, lru);
+
+	bucket = entry->bucket;
+	cache = bucket->cache;
+
+	/*
+	 * Unhook the entry from its parent (either a cache bucket
+	 * or a cache structure if it's a listxattr buf), so that
+	 * it's no longer found. Then add it to the isolate list,
+	 * to be freed later.
+	 *
+	 * In both cases, we're reverting lock order, so use
+	 * trylock and skip the entry if we can't get the lock.
+	 */
+	if (entry->xattr_name != NULL) {
+		/* Regular cache entry */
+		if (!spin_trylock(&bucket->lock))
+			return LRU_SKIP;
+
+		kref_get(&entry->ref);
+
+		hlist_del_init(&entry->hnode);
+		atomic_long_dec(&cache->nent);
+		list_lru_isolate(lru, &entry->lru);
+
+		spin_unlock(&bucket->lock);
+	} else {
+		/* Listxattr cache entry */
+		if (!spin_trylock(&cache->listxattr_lock))
+			return LRU_SKIP;
+
+		kref_get(&entry->ref);
+
+		cache->listxattr = NULL;
+		list_lru_isolate(lru, &entry->lru);
+
+		spin_unlock(&cache->listxattr_lock);
+	}
+
+	list_add_tail(&entry->dispose, dispose);
+	return LRU_REMOVED;
+}
+
+static unsigned long
+nfs4_xattr_entry_scan(struct shrinker *shrink, struct shrink_control *sc)
+{
+	LIST_HEAD(dispose);
+	unsigned long freed;
+	struct nfs4_xattr_entry *entry;
+	struct list_lru *lru;
+
+	lru = (shrink == &nfs4_xattr_large_entry_shrinker) ?
+	    &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
+
+	freed = list_lru_shrink_walk(lru, sc, entry_lru_isolate, &dispose);
+
+	while (!list_empty(&dispose)) {
+		entry = list_first_entry(&dispose, struct nfs4_xattr_entry,
+		    dispose);
+		list_del_init(&entry->dispose);
+
+		/*
+		 * Drop two references: the one that we just grabbed
+		 * in entry_lru_isolate, and the one that was set
+		 * when the entry was first allocated.
+		 */
+		kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
+		kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
+	}
+
+	return freed;
+}
+
+static unsigned long
+nfs4_xattr_entry_count(struct shrinker *shrink, struct shrink_control *sc)
+{
+	unsigned long count;
+	struct list_lru *lru;
+
+	lru = (shrink == &nfs4_xattr_large_entry_shrinker) ?
+	    &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
+
+	count = list_lru_count(lru);
+	return vfs_pressure_ratio(count);
+}
+
+
+static void nfs4_xattr_cache_init_once(void *p)
+{
+	struct nfs4_xattr_cache *cache = (struct nfs4_xattr_cache *)p;
+
+	spin_lock_init(&cache->listxattr_lock);
+	atomic_long_set(&cache->nent, 0);
+	nfs4_xattr_hash_init(cache);
+	cache->listxattr = NULL;
+	INIT_WORK(&cache->work, nfs4_xattr_discard_cache_worker);
+	INIT_LIST_HEAD(&cache->lru);
+	INIT_LIST_HEAD(&cache->dispose);
+}
+
+int __init nfs4_xattr_cache_init(void)
+{
+	int ret = 0;
+
+	nfs4_xattr_cache_cachep = kmem_cache_create("nfs4_xattr_cache_cache",
+	    sizeof(struct nfs4_xattr_cache), 0,
+	    (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT),
+	    nfs4_xattr_cache_init_once);
+	if (nfs4_xattr_cache_cachep == NULL)
+		return -ENOMEM;
+
+	ret = list_lru_init_memcg(&nfs4_xattr_large_entry_lru,
+	    &nfs4_xattr_large_entry_shrinker);
+	if (ret)
+		goto out4;
+
+	ret = list_lru_init_memcg(&nfs4_xattr_entry_lru,
+	    &nfs4_xattr_entry_shrinker);
+	if (ret)
+		goto out3;
+
+	ret = list_lru_init_memcg(&nfs4_xattr_cache_lru,
+	    &nfs4_xattr_cache_shrinker);
+	if (ret)
+		goto out2;
+
+	nfs4_xattr_cache_wq = alloc_workqueue("nfs4_xattr", WQ_MEM_RECLAIM, 0);
+	if (nfs4_xattr_cache_wq == NULL)
+		goto out1;
+
+	ret = register_shrinker(&nfs4_xattr_cache_shrinker);
+	if (ret)
+		goto out0;
+
+	ret = register_shrinker(&nfs4_xattr_entry_shrinker);
+	if (ret)
+		goto out;
+
+	ret = register_shrinker(&nfs4_xattr_large_entry_shrinker);
+	if (!ret)
+		return 0;
+
+	unregister_shrinker(&nfs4_xattr_entry_shrinker);
+out:
+	unregister_shrinker(&nfs4_xattr_cache_shrinker);
+out0:
+	destroy_workqueue(nfs4_xattr_cache_wq);
+out1:
+	list_lru_destroy(&nfs4_xattr_cache_lru);
+out2:
+	list_lru_destroy(&nfs4_xattr_entry_lru);
+out3:
+	list_lru_destroy(&nfs4_xattr_large_entry_lru);
+out4:
+	kmem_cache_destroy(nfs4_xattr_cache_cachep);
+
+	return ret;
+}
+
+void nfs4_xattr_cache_exit(void)
+{
+	unregister_shrinker(&nfs4_xattr_entry_shrinker);
+	unregister_shrinker(&nfs4_xattr_cache_shrinker);
+	list_lru_destroy(&nfs4_xattr_entry_lru);
+	list_lru_destroy(&nfs4_xattr_cache_lru);
+	kmem_cache_destroy(nfs4_xattr_cache_cachep);
+	destroy_workqueue(nfs4_xattr_cache_wq);
+}
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 6df94857f5bb..079c1ac84cee 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7459,6 +7459,7 @@ static int nfs4_xattr_set_nfs4_user(const struct xattr_handler *handler,
 				    size_t buflen, int flags)
 {
 	struct nfs_access_entry cache;
+	int ret;
 
 	if (!nfs_server_capable(inode, NFS_CAP_XATTR))
 		return -EOPNOTSUPP;
@@ -7477,10 +7478,17 @@ static int nfs4_xattr_set_nfs4_user(const struct xattr_handler *handler,
 			return -EACCES;
 	}
 
-	if (buf == NULL)
-		return nfs42_proc_removexattr(inode, key);
-	else
-		return nfs42_proc_setxattr(inode, key, buf, buflen, flags);
+	if (buf == NULL) {
+		ret = nfs42_proc_removexattr(inode, key);
+		if (!ret)
+			nfs4_xattr_cache_remove(inode, key);
+	} else {
+		ret = nfs42_proc_setxattr(inode, key, buf, buflen, flags);
+		if (!ret)
+			nfs4_xattr_cache_add(inode, key, buf, NULL, buflen);
+	}
+
+	return ret;
 }
 
 static int nfs4_xattr_get_nfs4_user(const struct xattr_handler *handler,
@@ -7488,6 +7496,7 @@ static int nfs4_xattr_get_nfs4_user(const struct xattr_handler *handler,
 				    const char *key, void *buf, size_t buflen)
 {
 	struct nfs_access_entry cache;
+	ssize_t ret;
 
 	if (!nfs_server_capable(inode, NFS_CAP_XATTR))
 		return -EOPNOTSUPP;
@@ -7497,7 +7506,17 @@ static int nfs4_xattr_get_nfs4_user(const struct xattr_handler *handler,
 			return -EACCES;
 	}
 
-	return nfs42_proc_getxattr(inode, key, buf, buflen);
+	ret = nfs_revalidate_inode(NFS_SERVER(inode), inode);
+	if (ret)
+		return ret;
+
+	ret = nfs4_xattr_cache_get(inode, key, buf, buflen);
+	if (ret >= 0 || (ret < 0 && ret != -ENOENT))
+		return ret;
+
+	ret = nfs42_proc_getxattr(inode, key, buf, buflen);
+
+	return ret;
 }
 
 static ssize_t
@@ -7505,7 +7524,7 @@ nfs4_listxattr_nfs4_user(struct inode *inode, char *list, size_t list_len)
 {
 	u64 cookie;
 	bool eof;
-	int ret, size;
+	ssize_t ret, size;
 	char *buf;
 	size_t buflen;
 	struct nfs_access_entry cache;
@@ -7518,6 +7537,14 @@ nfs4_listxattr_nfs4_user(struct inode *inode, char *list, size_t list_len)
 			return 0;
 	}
 
+	ret = nfs_revalidate_inode(NFS_SERVER(inode), inode);
+	if (ret)
+		return ret;
+
+	ret = nfs4_xattr_cache_list(inode, list, list_len);
+	if (ret >= 0 || (ret < 0 && ret != -ENOENT))
+		return ret;
+
 	cookie = 0;
 	eof = false;
 	buflen = list_len ? list_len : XATTR_LIST_MAX;
@@ -7537,6 +7564,9 @@ nfs4_listxattr_nfs4_user(struct inode *inode, char *list, size_t list_len)
 		size += ret;
 	}
 
+	if (list_len)
+		nfs4_xattr_cache_set_list(inode, list, size);
+
 	return size;
 }
 
diff --git a/fs/nfs/nfs4super.c b/fs/nfs/nfs4super.c
index 1475f932d7da..0c1ab846b83d 100644
--- a/fs/nfs/nfs4super.c
+++ b/fs/nfs/nfs4super.c
@@ -69,6 +69,7 @@ static void nfs4_evict_inode(struct inode *inode)
 	pnfs_destroy_layout(NFS_I(inode));
 	/* First call standard NFS clear_inode() code */
 	nfs_clear_inode(inode);
+	nfs4_xattr_cache_zap(inode);
 }
 
 struct nfs_referral_count {
@@ -268,6 +269,12 @@ static int __init init_nfs_v4(void)
 	if (err)
 		goto out1;
 
+#ifdef CONFIG_NFS_V4_2
+	err = nfs4_xattr_cache_init();
+	if (err)
+		goto out2;
+#endif
+
 	err = nfs4_register_sysctl();
 	if (err)
 		goto out2;
@@ -288,6 +295,9 @@ static void __exit exit_nfs_v4(void)
 	nfs4_pnfs_v3_ds_connect_unload();
 
 	unregister_nfs_version(&nfs_v4);
+#ifdef CONFIG_NFS_V4_2
+	nfs4_xattr_cache_exit();
+#endif
 	nfs4_unregister_sysctl();
 	nfs_idmap_quit();
 	nfs_dns_resolver_destroy();
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 1fcfef670a4a..c08cc22d9c32 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -102,6 +102,8 @@ struct nfs_delegation;
 
 struct posix_acl;
 
+struct nfs4_xattr_cache;
+
 /*
  * nfs fs inode data in memory
  */
@@ -188,6 +190,10 @@ struct nfs_inode {
 	struct fscache_cookie	*fscache;
 #endif
 	struct inode		vfs_inode;
+
+#ifdef CONFIG_NFS_V4_2
+	struct nfs4_xattr_cache *xattr_cache;
+#endif
 };
 
 struct nfs4_copy_state {
diff --git a/include/uapi/linux/nfs_fs.h b/include/uapi/linux/nfs_fs.h
index 7bcc8cd6831d..3afe3767c55d 100644
--- a/include/uapi/linux/nfs_fs.h
+++ b/include/uapi/linux/nfs_fs.h
@@ -56,6 +56,7 @@
 #define NFSDBG_PNFS		0x1000
 #define NFSDBG_PNFS_LD		0x2000
 #define NFSDBG_STATE		0x4000
+#define NFSDBG_XATTRCACHE	0x8000
 #define NFSDBG_ALL		0xFFFF
 
 
-- 
2.16.6


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/13] NFSv4.2: query the server for extended attribute support
  2020-03-11 19:56 ` [PATCH 03/13] NFSv4.2: query the server for extended attribute support Frank van der Linden
@ 2020-03-12 16:15   ` Mkrtchyan, Tigran
  2020-03-12 20:51     ` Frank van der Linden
  0 siblings, 1 reply; 33+ messages in thread
From: Mkrtchyan, Tigran @ 2020-03-12 16:15 UTC (permalink / raw)
  To: Frank van der Linden; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs

Hi Frank,

----- Original Message -----
> From: "Frank van der Linden" <fllinden@amazon.com>
> To: "Trond Myklebust" <trond.myklebust@hammerspace.com>, "Anna Schumaker" <anna.schumaker@netapp.com>, "linux-nfs"
> <linux-nfs@vger.kernel.org>
> Cc: "Frank van der Linden" <fllinden@amazon.com>
> Sent: Wednesday, March 11, 2020 8:56:03 PM
> Subject: [PATCH 03/13] NFSv4.2: query the server for extended attribute support

> Query the server for extended attribute support, and record it
> as the NFS_CAP_XATTR flag in the server capabilities.
> 
> Signed-off-by: Frank van der Linden <fllinden@amazon.com>
> ---
> fs/nfs/nfs4proc.c         | 14 ++++++++++++--
> fs/nfs/nfs4xdr.c          | 23 +++++++++++++++++++++++
> include/linux/nfs_fs_sb.h |  1 +
> include/linux/nfs_xdr.h   |  1 +
> 4 files changed, 37 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 69b7ab7a5815..47bbd7db9d18 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -3742,6 +3742,7 @@ static void nfs4_close_context(struct nfs_open_context
> *ctx, int is_sync)
> static int _nfs4_server_capabilities(struct nfs_server *server, struct nfs_fh
> *fhandle)
> {
> 	u32 bitmask[3] = {}, minorversion = server->nfs_client->cl_minorversion;
> +	u32 fattr4_word2_nfs42_mask;
> 	struct nfs4_server_caps_arg args = {
> 		.fhandle = fhandle,
> 		.bitmask = bitmask,
> @@ -3763,6 +3764,13 @@ static int _nfs4_server_capabilities(struct nfs_server
> *server, struct nfs_fh *f
> 	if (minorversion)
> 		bitmask[2] = FATTR4_WORD2_SUPPATTR_EXCLCREAT;
> 
> +	fattr4_word2_nfs42_mask = FATTR4_WORD2_NFS42_MASK;
> +
> +	if (minorversion >= 2) {

I am not sure you need this extra check as by querying for  FATTR4_WORD0_SUPPORTED_ATTRS
server already will return FATTR4_WORD2_XATTR_SUPPORT if supported.

Tigran.

> +		bitmask[2] |= FATTR4_WORD2_XATTR_SUPPORT;
> +		fattr4_word2_nfs42_mask |= FATTR4_WORD2_XATTR_SUPPORT;
> +	}
> +
> 	status = nfs4_call_sync(server->client, server, &msg, &args.seq_args,
> 	&res.seq_res, 0);
> 	if (status == 0) {
> 		/* Sanity check the server answers */
> @@ -3775,7 +3783,7 @@ static int _nfs4_server_capabilities(struct nfs_server
> *server, struct nfs_fh *f
> 			res.attr_bitmask[2] &= FATTR4_WORD2_NFS41_MASK;
> 			break;
> 		case 2:
> -			res.attr_bitmask[2] &= FATTR4_WORD2_NFS42_MASK;
> +			res.attr_bitmask[2] &= fattr4_word2_nfs42_mask;
> 		}
> 		memcpy(server->attr_bitmask, res.attr_bitmask, sizeof(server->attr_bitmask));
> 		server->caps &= ~(NFS_CAP_ACLS|NFS_CAP_HARDLINKS|
> @@ -3783,7 +3791,7 @@ static int _nfs4_server_capabilities(struct nfs_server
> *server, struct nfs_fh *f
> 				NFS_CAP_MODE|NFS_CAP_NLINK|NFS_CAP_OWNER|
> 				NFS_CAP_OWNER_GROUP|NFS_CAP_ATIME|
> 				NFS_CAP_CTIME|NFS_CAP_MTIME|
> -				NFS_CAP_SECURITY_LABEL);
> +				NFS_CAP_SECURITY_LABEL|NFS_CAP_XATTR);
> 		if (res.attr_bitmask[0] & FATTR4_WORD0_ACL &&
> 				res.acl_bitmask & ACL4_SUPPORT_ALLOW_ACL)
> 			server->caps |= NFS_CAP_ACLS;
> @@ -3811,6 +3819,8 @@ static int _nfs4_server_capabilities(struct nfs_server
> *server, struct nfs_fh *f
> 		if (res.attr_bitmask[2] & FATTR4_WORD2_SECURITY_LABEL)
> 			server->caps |= NFS_CAP_SECURITY_LABEL;
> #endif
> +		if (res.has_xattr)
> +			server->caps |= NFS_CAP_XATTR;
> 		memcpy(server->attr_bitmask_nl, res.attr_bitmask,
> 				sizeof(server->attr_bitmask));
> 		server->attr_bitmask_nl[2] &= ~FATTR4_WORD2_SECURITY_LABEL;
> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> index 47817ef0aadb..bebc087a1433 100644
> --- a/fs/nfs/nfs4xdr.c
> +++ b/fs/nfs/nfs4xdr.c
> @@ -4201,6 +4201,26 @@ static int decode_attr_time_modify(struct xdr_stream
> *xdr, uint32_t *bitmap, str
> 	return status;
> }
> 
> +static int decode_attr_xattrsupport(struct xdr_stream *xdr, uint32_t *bitmap,
> +				    uint32_t *res)
> +{
> +	__be32 *p;
> +
> +	*res = 0;
> +	if (unlikely(bitmap[2] & (FATTR4_WORD2_XATTR_SUPPORT - 1U)))
> +		return -EIO;
> +	if (likely(bitmap[2] & FATTR4_WORD2_XATTR_SUPPORT)) {
> +		p = xdr_inline_decode(xdr, 4);
> +		if (unlikely(!p))
> +			return -EIO;
> +		*res = be32_to_cpup(p);
> +		bitmap[2] &= ~FATTR4_WORD2_XATTR_SUPPORT;
> +	}
> +	dprintk("%s: XATTR support=%s\n", __func__,
> +		*res == 0 ? "false" : "true");
> +	return 0;
> +}
> +
> static int verify_attr_len(struct xdr_stream *xdr, unsigned int savep, uint32_t
> attrlen)
> {
> 	unsigned int attrwords = XDR_QUADLEN(attrlen);
> @@ -4371,6 +4391,9 @@ static int decode_server_caps(struct xdr_stream *xdr,
> struct nfs4_server_caps_re
> 	if ((status = decode_attr_exclcreat_supported(xdr, bitmap,
> 				res->exclcreat_bitmask)) != 0)
> 		goto xdr_error;
> +	status = decode_attr_xattrsupport(xdr, bitmap, &res->has_xattr);
> +	if (status != 0)
> +		goto xdr_error;
> 	status = verify_attr_len(xdr, savep, attrlen);
> xdr_error:
> 	dprintk("%s: xdr returned %d!\n", __func__, -status);
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index 465fa98258a3..d881f7a38bc9 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -281,5 +281,6 @@ struct nfs_server {
> #define NFS_CAP_OFFLOAD_CANCEL	(1U << 25)
> #define NFS_CAP_LAYOUTERROR	(1U << 26)
> #define NFS_CAP_COPY_NOTIFY	(1U << 27)
> +#define NFS_CAP_XATTR		(1U << 28)
> 
> #endif
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index 94c77ed55ce1..5076fe42c693 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -1178,6 +1178,7 @@ struct nfs4_server_caps_res {
> 	u32				has_links;
> 	u32				has_symlinks;
> 	u32				fh_expire_type;
> +	u32				has_xattr;
> };
> 
> #define NFS4_PATHNAME_MAXCOMPONENTS 512
> --
> 2.16.6

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 00/13] client side user xattr (RFC8276) support
  2020-03-11 19:56 [PATCH 00/13] client side user xattr (RFC8276) support Frank van der Linden
                   ` (12 preceding siblings ...)
  2020-03-11 19:56 ` [PATCH 13/13] NFSv4.2: add client side xattr caching Frank van der Linden
@ 2020-03-12 19:06 ` Mkrtchyan, Tigran
  2020-03-12 20:09 ` Anna Schumaker
  14 siblings, 0 replies; 33+ messages in thread
From: Mkrtchyan, Tigran @ 2020-03-12 19:06 UTC (permalink / raw)
  To: Frank van der Linden; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs


I have applied the patchset and run simple test against dCache nfs server:

root@anahit xattr-test]# df -h .
Filesystem      Size  Used Avail Use% Mounted on
ani:/           213G  179G   35G  84% /mnt
[root@anahit xattr-test]#
[root@anahit xattr-test]# touch file.txt
[root@anahit xattr-test]# attr -l file.txt
[root@anahit xattr-test]# attr -s key1 -V value1 file.txt
Attribute "key1" set to a 6 byte value for file.txt:
value1
[root@anahit xattr-test]# attr -s key2 -V value2 file.txt
Attribute "key2" set to a 6 byte value for file.txt:
value2
[root@anahit xattr-test]# attr -l file.txt
Attribute "user.key1" has a 6 byte value for file.txt
Attribute "user.key2" has a 6 byte value for file.txt
[root@anahit xattr-test]# attr -g key1 file.txt
Attribute "key1" had a 6 byte value for file.txt:
value1
[root@anahit xattr-test]# attr -g key2 file.txt
Attribute "key2" had a 6 byte value for file.txt:
value2
[root@anahit xattr-test]# getfattr -n user.key1 file.txt
# file: file.txt
user.key1="value1"

[root@anahit xattr-test]# getfattr -n user.key2 file.txt
# file: file.txt
user.key2="value2"

[root@anahit xattr-test]# attr -r key1 file.txt
[root@anahit xattr-test]# attr -r key2 file.txt
[root@anahit xattr-test]# attr -l file.txt
[root@anahit xattr-test]#


At lease a dirrerent implementation in addition to linux server works as expected.

Tested-by:  "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>


Tigran.

----- Original Message -----
> From: "Frank van der Linden" <fllinden@amazon.com>
> To: "Trond Myklebust" <trond.myklebust@hammerspace.com>, "Anna Schumaker" <anna.schumaker@netapp.com>, "linux-nfs"
> <linux-nfs@vger.kernel.org>
> Cc: "Frank van der Linden" <fllinden@amazon.com>
> Sent: Wednesday, March 11, 2020 8:56:00 PM
> Subject: [PATCH 00/13] client side user xattr (RFC8276) support

> This patchset implements the client side for NFS user extended attributes,
> as defined in RFC8726.
> 
> This was originally posted as an RFC in:
> 
> https://patchwork.kernel.org/cover/11143565/
> 
> Patch 1 is shared with the server side patch, posted
> separately.
> 
> Most comments in there still apply, except that:
> 
> 1. Client side caching is now included in this patch set.
> 2. As per the discussion, user extended attributes are enabled if
>   the client and server support them (e.g. they support 4.2 and
>   advertise the user extended attribute FATTR). There are no longer
>   options to switch them off on either the client or the server.
> 3. The code is no longer conditioned on a config option.
> 4. The number of patches has been reduced somewhat by merging
>   smaller, related ones.
> 
> The client side caching is implemented through a per-inode hash table,
> which is allocated on demand. See fs/nfs/nfs42xattr.c for details.
> 
> This has been tested as follows:
> 
> * Linux client and server:
>	* Test all corner cases (XATTR_SIZE_*)
>	* Test all failure cases (no xattr, setxattr with different or
>	  invalid flags, etc).
>	* Verify the content of xattrs across several operations.
>	* Use KASAN and KMEMLEAK for a longer mix of testruns to verify
>	  that there were no leaks (after unmounting the filesystem).
>	* Stress tested caching, trying to run the client out of memory.
> 
> * Tested against the FreeBSD-current implementation as well, which works
>  (after I fixed 2 bugs in that implementation, which I'm sending out to
>  them too).
> 
> * Not tested: RDMA (I couldn't get a setup going).
> 
> Frank van der Linden (13):
>  nfs,nfsd:  NFSv4.2 extended attribute protocol definitions
>  nfs: add client side only definitions for user xattrs
>  NFSv4.2: query the server for extended attribute support
>  NFSv4.2: define limits and sizes for user xattr handling
>  NFSv4.2: add client side XDR handling for extended attributes
>  nfs: define nfs_access_get_cached function
>  NFSv4.2: query the extended attribute access bits
>  nfs: modify update_changeattr to deal with regular files
>  nfs: define and use the NFS_INO_INVALID_XATTR flag
>  nfs: make the buf_to_pages_noslab function available to the nfs code
>  NFSv4.2: add the extended attribute proc functions.
>  NFSv4.2: hook in the user extended attribute handlers
>  NFSv4.2: add client side xattr caching.
> 
> fs/nfs/Makefile             |    1 +
> fs/nfs/client.c             |   19 +-
> fs/nfs/dir.c                |   24 +-
> fs/nfs/inode.c              |   16 +-
> fs/nfs/internal.h           |   28 ++
> fs/nfs/nfs42.h              |   24 +
> fs/nfs/nfs42proc.c          |  248 ++++++++++
> fs/nfs/nfs42xattr.c         | 1083 +++++++++++++++++++++++++++++++++++++++++++
> fs/nfs/nfs42xdr.c           |  442 ++++++++++++++++++
> fs/nfs/nfs4_fs.h            |    5 +
> fs/nfs/nfs4client.c         |   31 ++
> fs/nfs/nfs4proc.c           |  248 ++++++++--
> fs/nfs/nfs4super.c          |   10 +
> fs/nfs/nfs4xdr.c            |   29 ++
> fs/nfs/nfstrace.h           |    3 +-
> include/linux/nfs4.h        |   25 +
> include/linux/nfs_fs.h      |   12 +
> include/linux/nfs_fs_sb.h   |    6 +
> include/linux/nfs_xdr.h     |   60 ++-
> include/uapi/linux/nfs4.h   |    3 +
> include/uapi/linux/nfs_fs.h |    1 +
> 21 files changed, 2276 insertions(+), 42 deletions(-)
> create mode 100644 fs/nfs/nfs42xattr.c
> 
> --
> 2.16.6

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 00/13] client side user xattr (RFC8276) support
  2020-03-11 19:56 [PATCH 00/13] client side user xattr (RFC8276) support Frank van der Linden
                   ` (13 preceding siblings ...)
  2020-03-12 19:06 ` [PATCH 00/13] client side user xattr (RFC8276) support Mkrtchyan, Tigran
@ 2020-03-12 20:09 ` Anna Schumaker
  2020-03-16 15:50   ` Frank van der Linden
  2020-03-17 23:03   ` Frank van der Linden
  14 siblings, 2 replies; 33+ messages in thread
From: Anna Schumaker @ 2020-03-12 20:09 UTC (permalink / raw)
  To: Frank van der Linden; +Cc: trond.myklebust, Linux NFS Mailing List

Hi Frank,

On Wed, Mar 11, 2020 at 3:57 PM Frank van der Linden
<fllinden@amazon.com> wrote:
>
> This patchset implements the client side for NFS user extended attributes,
> as defined in RFC8726.
>
> This was originally posted as an RFC in:
>
> https://patchwork.kernel.org/cover/11143565/
>
> Patch 1 is shared with the server side patch, posted
> separately.
>
> Most comments in there still apply, except that:
>
> 1. Client side caching is now included in this patch set.
> 2. As per the discussion, user extended attributes are enabled if
>    the client and server support them (e.g. they support 4.2 and
>    advertise the user extended attribute FATTR). There are no longer
>    options to switch them off on either the client or the server.
> 3. The code is no longer conditioned on a config option.
> 4. The number of patches has been reduced somewhat by merging
>    smaller, related ones.
>
> The client side caching is implemented through a per-inode hash table,
> which is allocated on demand. See fs/nfs/nfs42xattr.c for details.
>
> This has been tested as follows:
>
> * Linux client and server:
>         * Test all corner cases (XATTR_SIZE_*)
>         * Test all failure cases (no xattr, setxattr with different or
>           invalid flags, etc).
>         * Verify the content of xattrs across several operations.
>         * Use KASAN and KMEMLEAK for a longer mix of testruns to verify
>           that there were no leaks (after unmounting the filesystem).
>         * Stress tested caching, trying to run the client out of memory.

I'm curious if you've tried xfstests with your patches? There are a
handful of tests using xattrs that might be good to check with, too:

anna@gouda % grep xattr -l tests/generic/[0-9][0-9][0-9]
tests/generic/037
tests/generic/062
tests/generic/066
tests/generic/093
tests/generic/117
tests/generic/337
tests/generic/377
tests/generic/403
tests/generic/425
tests/generic/454
tests/generic/489
tests/generic/523
tests/generic/529
tests/generic/556

Thanks,
Anna


>
> * Tested against the FreeBSD-current implementation as well, which works
>   (after I fixed 2 bugs in that implementation, which I'm sending out to
>   them too).
>
> * Not tested: RDMA (I couldn't get a setup going).
>
> Frank van der Linden (13):
>   nfs,nfsd:  NFSv4.2 extended attribute protocol definitions
>   nfs: add client side only definitions for user xattrs
>   NFSv4.2: query the server for extended attribute support
>   NFSv4.2: define limits and sizes for user xattr handling
>   NFSv4.2: add client side XDR handling for extended attributes
>   nfs: define nfs_access_get_cached function
>   NFSv4.2: query the extended attribute access bits
>   nfs: modify update_changeattr to deal with regular files
>   nfs: define and use the NFS_INO_INVALID_XATTR flag
>   nfs: make the buf_to_pages_noslab function available to the nfs code
>   NFSv4.2: add the extended attribute proc functions.
>   NFSv4.2: hook in the user extended attribute handlers
>   NFSv4.2: add client side xattr caching.
>
>  fs/nfs/Makefile             |    1 +
>  fs/nfs/client.c             |   19 +-
>  fs/nfs/dir.c                |   24 +-
>  fs/nfs/inode.c              |   16 +-
>  fs/nfs/internal.h           |   28 ++
>  fs/nfs/nfs42.h              |   24 +
>  fs/nfs/nfs42proc.c          |  248 ++++++++++
>  fs/nfs/nfs42xattr.c         | 1083 +++++++++++++++++++++++++++++++++++++++++++
>  fs/nfs/nfs42xdr.c           |  442 ++++++++++++++++++
>  fs/nfs/nfs4_fs.h            |    5 +
>  fs/nfs/nfs4client.c         |   31 ++
>  fs/nfs/nfs4proc.c           |  248 ++++++++--
>  fs/nfs/nfs4super.c          |   10 +
>  fs/nfs/nfs4xdr.c            |   29 ++
>  fs/nfs/nfstrace.h           |    3 +-
>  include/linux/nfs4.h        |   25 +
>  include/linux/nfs_fs.h      |   12 +
>  include/linux/nfs_fs_sb.h   |    6 +
>  include/linux/nfs_xdr.h     |   60 ++-
>  include/uapi/linux/nfs4.h   |    3 +
>  include/uapi/linux/nfs_fs.h |    1 +
>  21 files changed, 2276 insertions(+), 42 deletions(-)
>  create mode 100644 fs/nfs/nfs42xattr.c
>
> --
> 2.16.6
>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/13] NFSv4.2: define limits and sizes for user xattr handling
  2020-03-11 19:56 ` [PATCH 04/13] NFSv4.2: define limits and sizes for user xattr handling Frank van der Linden
@ 2020-03-12 20:35   ` Schumaker, Anna
  0 siblings, 0 replies; 33+ messages in thread
From: Schumaker, Anna @ 2020-03-12 20:35 UTC (permalink / raw)
  To: fllinden, linux-nfs, trond.myklebust

Hi Frank,

On Wed, 2020-03-11 at 19:56 +0000, Frank van der Linden wrote:
> Set limits for extended attributes (attribute value size and listxattr
> buffer size), based on the fs-independent limits (XATTR_*_MAX).
> 
> Define the maximum XDR sizes for the RFC 8276 XATTR operations.
> In the case of operations that carry a larger payload (SETXATTR,
> GETXATTR, LISTXATTR), these exclude that payload, which is added
> as separate pages, like other operations do.
> 
> Define, much like for read and write operations, the maximum overhead
> sizes for get/set/listxattr, and use them to limit the maximum payload
> size for those operations, in combination with the channel attributes.
> 
> Signed-off-by: Frank van der Linden <fllinden@amazon.com>
> ---
>  fs/nfs/client.c           | 19 ++++++++++--
>  fs/nfs/internal.h         |  5 ++++
>  fs/nfs/nfs42.h            | 16 ++++++++++
>  fs/nfs/nfs42xdr.c         | 74
> +++++++++++++++++++++++++++++++++++++++++++++++
>  fs/nfs/nfs4client.c       | 31 ++++++++++++++++++++
>  include/linux/nfs_fs_sb.h |  5 ++++
>  6 files changed, 148 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index 989c30c98511..eef39a4ec114 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -50,6 +50,7 @@
>  #include "nfs.h"
>  #include "netns.h"
>  #include "sysfs.h"
> +#include "nfs42.h"
>  
>  #define NFSDBG_FACILITY		NFSDBG_CLIENT
>  
> @@ -748,7 +749,7 @@ static int nfs_init_server(struct nfs_server *server,
>  static void nfs_server_set_fsinfo(struct nfs_server *server,
>  				  struct nfs_fsinfo *fsinfo)
>  {
> -	unsigned long max_rpc_payload;
> +	unsigned long max_rpc_payload, raw_max_rpc_payload;
>  
>  	/* Work out a lot of parameters */
>  	if (server->rsize == 0)
> @@ -761,7 +762,9 @@ static void nfs_server_set_fsinfo(struct nfs_server
> *server,
>  	if (fsinfo->wtmax >= 512 && server->wsize > fsinfo->wtmax)
>  		server->wsize = nfs_block_size(fsinfo->wtmax, NULL);
>  
> -	max_rpc_payload = nfs_block_size(rpc_max_payload(server->client), NULL);
> +	raw_max_rpc_payload = rpc_max_payload(server->client);
> +	max_rpc_payload = nfs_block_size(raw_max_rpc_payload, NULL);
> +
>  	if (server->rsize > max_rpc_payload)
>  		server->rsize = max_rpc_payload;
>  	if (server->rsize > NFS_MAX_FILE_IO_SIZE)
> @@ -794,6 +797,18 @@ static void nfs_server_set_fsinfo(struct nfs_server
> *server,
>  	server->clone_blksize = fsinfo->clone_blksize;
>  	/* We're airborne Set socket buffersize */
>  	rpc_setbufsize(server->client, server->wsize + 100, server->rsize +
> 100);
> +
> +#ifdef CONFIG_NFS_V4_2
> +	/*
> +	 * Defaults until limited by the session parameters.
> +	 */
> +	server->gxasize = min_t(unsigned int, raw_max_rpc_payload,
> +				XATTR_SIZE_MAX);
> +	server->sxasize = min_t(unsigned int, raw_max_rpc_payload,
> +				XATTR_SIZE_MAX);
> +	server->lxasize = min_t(unsigned int, raw_max_rpc_payload,
> +				nfs42_listxattr_xdrsize(XATTR_LIST_MAX));
> +#endif
>  }
>  
>  /*
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index f80c47d5ff27..68f235a571e1 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -306,6 +306,11 @@ extern const u32 nfs41_maxread_overhead;
>  extern const u32 nfs41_maxwrite_overhead;
>  extern const u32 nfs41_maxgetdevinfo_overhead;
>  #endif
> +#ifdef CONFIG_NFS_V4_2
> +extern const u32 nfs42_maxsetxattr_overhead;
> +extern const u32 nfs42_maxgetxattr_overhead;
> +extern const u32 nfs42_maxlistxattrs_overhead;
> +#endif

As far as I can tell, these are only used by the NFS v4 code. Can you put these
definitions in nfs4_fs.h instead of internal.h, since the generic client doesn't
really need to know about it?

Thanks,
Anna
>  
>  /* nfs4proc.c */
>  #if IS_ENABLED(CONFIG_NFS_V4)
> diff --git a/fs/nfs/nfs42.h b/fs/nfs/nfs42.h
> index c891af949886..51de8ddc7d88 100644
> --- a/fs/nfs/nfs42.h
> +++ b/fs/nfs/nfs42.h
> @@ -6,6 +6,8 @@
>  #ifndef __LINUX_FS_NFS_NFS4_2_H
>  #define __LINUX_FS_NFS_NFS4_2_H
>  
> +#include <linux/xattr.h>
> +
>  /*
>   * FIXME:  four LAYOUTSTATS calls per compound at most! Do we need to support
>   * more? Need to consider not to pre-alloc too much for a compound.
> @@ -36,5 +38,19 @@ static inline bool nfs42_files_from_same_server(struct file
> *in,
>  	return nfs4_check_serverowner_major_id(c_in->cl_serverowner,
>  					       c_out->cl_serverowner);
>  }
> +
> +/*
> + * Maximum XDR buffer size needed for a listxattr buffer of buflen size.
> + *
> + * The upper boundary is a buffer with all 1-byte sized attribute names.
> + * They would be 7 bytes long in the eventual buffer ("user.x\0"), and
> + * 8 bytes long XDR-encoded.
> + *
> + * Include the trailing eof word as well.
> + */
> +static inline u32 nfs42_listxattr_xdrsize(u32 buflen)
> +{
> +	return ((buflen / (XATTR_USER_PREFIX_LEN + 2)) * 8) + 4;
> +}
>  #endif /* CONFIG_NFS_V4_2 */
>  #endif /* __LINUX_FS_NFS_NFS4_2_H */
> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> index c03f3246d6c5..6712daa9d85b 100644
> --- a/fs/nfs/nfs42xdr.c
> +++ b/fs/nfs/nfs42xdr.c
> @@ -169,6 +169,80 @@
>  					 decode_clone_maxsz + \
>  					 decode_getattr_maxsz)
>  
> +#ifdef CONFIG_NFS_V4_2
> +/* Not limited by NFS itself, limited by the generic xattr code */
> +#define nfs4_xattr_name_maxsz   XDR_QUADLEN(XATTR_NAME_MAX)
> +
> +#define encode_getxattr_maxsz   (op_encode_hdr_maxsz + 1 + \
> +				 nfs4_xattr_name_maxsz)
> +#define decode_getxattr_maxsz   (op_decode_hdr_maxsz + 1 + 1)
> +#define encode_setxattr_maxsz   (op_encode_hdr_maxsz + \
> +				 1 + nfs4_xattr_name_maxsz + 1)
> +#define decode_setxattr_maxsz   (op_decode_hdr_maxsz +
> decode_change_info_maxsz)
> +#define encode_listxattrs_maxsz  (op_encode_hdr_maxsz + 2 + 1)
> +#define decode_listxattrs_maxsz  (op_decode_hdr_maxsz + 2 + 1 + 1)
> +#define encode_removexattr_maxsz (op_encode_hdr_maxsz + 1 + \
> +				  nfs4_xattr_name_maxsz)
> +#define decode_removexattr_maxsz (op_decode_hdr_maxsz + \
> +				  decode_change_info_maxsz)
> +
> +#define NFS4_enc_getxattr_sz	(compound_encode_hdr_maxsz + \
> +				encode_sequence_maxsz + \
> +				encode_putfh_maxsz + \
> +				encode_getxattr_maxsz)
> +#define NFS4_dec_getxattr_sz	(compound_decode_hdr_maxsz + \
> +				decode_sequence_maxsz + \
> +				decode_putfh_maxsz + \
> +				decode_getxattr_maxsz)
> +#define NFS4_enc_setxattr_sz	(compound_encode_hdr_maxsz + \
> +				encode_sequence_maxsz + \
> +				encode_putfh_maxsz + \
> +				encode_setxattr_maxsz)
> +#define NFS4_dec_setxattr_sz	(compound_decode_hdr_maxsz + \
> +				decode_sequence_maxsz + \
> +				decode_putfh_maxsz + \
> +				decode_setxattr_maxsz)
> +#define NFS4_enc_listxattrs_sz	(compound_encode_hdr_maxsz + \
> +				encode_sequence_maxsz + \
> +				encode_putfh_maxsz + \
> +				encode_listxattrs_maxsz)
> +#define NFS4_dec_listxattrs_sz	(compound_decode_hdr_maxsz + \
> +				decode_sequence_maxsz + \
> +				decode_putfh_maxsz + \
> +				decode_listxattrs_maxsz)
> +#define NFS4_enc_removexattr_sz	(compound_encode_hdr_maxsz + \
> +				encode_sequence_maxsz + \
> +				encode_putfh_maxsz + \
> +				encode_removexattr_maxsz)
> +#define NFS4_dec_removexattr_sz	(compound_decode_hdr_maxsz + \
> +				decode_sequence_maxsz + \
> +				decode_putfh_maxsz + \
> +				decode_removexattr_maxsz)
> +
> +/*
> + * These values specify the maximum amount of data that is not
> + * associated with the extended attribute name or extended
> + * attribute list in the SETXATTR, GETXATTR and LISTXATTR
> + * respectively.
> + */
> +const u32 nfs42_maxsetxattr_overhead = ((RPC_MAX_HEADER_WITH_AUTH +
> +					compound_encode_hdr_maxsz +
> +					encode_sequence_maxsz +
> +					encode_putfh_maxsz + 1 +
> +					nfs4_xattr_name_maxsz)
> +					* XDR_UNIT);
> +
> +const u32 nfs42_maxgetxattr_overhead = ((RPC_MAX_HEADER_WITH_AUTH +
> +					compound_decode_hdr_maxsz +
> +					decode_sequence_maxsz +
> +					decode_putfh_maxsz + 1) * XDR_UNIT);
> +
> +const u32 nfs42_maxlistxattrs_overhead = ((RPC_MAX_HEADER_WITH_AUTH +
> +					compound_decode_hdr_maxsz +
> +					decode_sequence_maxsz +
> +					decode_putfh_maxsz + 3) * XDR_UNIT);
> +#endif
> +
>  static void encode_fallocate(struct xdr_stream *xdr,
>  			     const struct nfs42_falloc_args *args)
>  {
> diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
> index 0cd767e5c977..8b320ef0e8a3 100644
> --- a/fs/nfs/nfs4client.c
> +++ b/fs/nfs/nfs4client.c
> @@ -993,6 +993,36 @@ static void nfs4_session_limit_rwsize(struct nfs_server
> *server)
>  #endif /* CONFIG_NFS_V4_1 */
>  }
>  
> +/*
> + * Limit xattr sizes using the channel attributes.
> + */
> +static void nfs4_session_limit_xasize(struct nfs_server *server)
> +{
> +#ifdef CONFIG_NFS_V4_2
> +	struct nfs4_session *sess;
> +	u32 server_gxa_sz;
> +	u32 server_sxa_sz;
> +	u32 server_lxa_sz;
> +
> +	if (!nfs4_has_session(server->nfs_client))
> +		return;
> +
> +	sess = server->nfs_client->cl_session;
> +
> +	server_gxa_sz = sess->fc_attrs.max_resp_sz - nfs42_maxgetxattr_overhead;
> +	server_sxa_sz = sess->fc_attrs.max_rqst_sz - nfs42_maxsetxattr_overhead;
> +	server_lxa_sz = sess->fc_attrs.max_resp_sz -
> +	    nfs42_maxlistxattrs_overhead;
> +
> +	if (server->gxasize > server_gxa_sz)
> +		server->gxasize = server_gxa_sz;
> +	if (server->sxasize > server_sxa_sz)
> +		server->sxasize = server_sxa_sz;
> +	if (server->lxasize > server_lxa_sz)
> +		server->lxasize = server_lxa_sz;
> +#endif
> +}
> +
>  static int nfs4_server_common_setup(struct nfs_server *server,
>  		struct nfs_fh *mntfh, bool auth_probe)
>  {
> @@ -1040,6 +1070,7 @@ static int nfs4_server_common_setup(struct nfs_server
> *server,
>  		goto out;
>  
>  	nfs4_session_limit_rwsize(server);
> +	nfs4_session_limit_xasize(server);
>  
>  	if (server->namelen == 0 || server->namelen > NFS4_MAXNAMLEN)
>  		server->namelen = NFS4_MAXNAMLEN;
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index d881f7a38bc9..7eae72a8762e 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -163,6 +163,11 @@ struct nfs_server {
>  	unsigned int		dtsize;		/* readdir size */
>  	unsigned short		port;		/* "port=" setting */
>  	unsigned int		bsize;		/* server block size */
> +#ifdef CONFIG_NFS_V4_2
> +	unsigned int		gxasize;	/* getxattr size */
> +	unsigned int		sxasize;	/* setxattr size */
> +	unsigned int		lxasize;	/* listxattr size */
> +#endif
>  	unsigned int		acregmin;	/* attr cache timeouts */
>  	unsigned int		acregmax;
>  	unsigned int		acdirmin;

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 10/13] nfs: make the buf_to_pages_noslab function available to the nfs code
  2020-03-11 19:56 ` [PATCH 10/13] nfs: make the buf_to_pages_noslab function available to the nfs code Frank van der Linden
@ 2020-03-12 20:36   ` Schumaker, Anna
  0 siblings, 0 replies; 33+ messages in thread
From: Schumaker, Anna @ 2020-03-12 20:36 UTC (permalink / raw)
  To: fllinden, linux-nfs, trond.myklebust

Hi Frank,

On Wed, 2020-03-11 at 19:56 +0000, Frank van der Linden wrote:
> Make the buf_to_pages_noslab function available to the rest of the NFS
> code. Rename it to nfs4_buf_to_pages_noslab to be consistent.
> 
> This will be used later in the NFSv4.2 xattr code.
> 
> Signed-off-by: Frank van der Linden <fllinden@amazon.com>
> ---
>  fs/nfs/internal.h | 3 +++
>  fs/nfs/nfs4proc.c | 4 ++--
>  2 files changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 68f235a571e1..1e3a7e119c93 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -317,6 +317,9 @@ extern const u32 nfs42_maxlistxattrs_overhead;
>  extern const struct rpc_procinfo nfs4_procedures[];
>  #endif
>  
> +extern int nfs4_buf_to_pages_noslab(const void *buf, size_t buflen,
> +				    struct page **pages);
> +

Please put this in nfs4_fs.h, as its only callers are nfs4proc.c and nfs42proc.c

Thanks,
Anna

>  #ifdef CONFIG_NFS_V4_SECURITY_LABEL
>  extern struct nfs4_label *nfs4_label_alloc(struct nfs_server *server, gfp_t
> flags);
>  static inline struct nfs4_label *
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 725ae64f62f7..aee3a1c97def 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -5553,7 +5553,7 @@ static inline int nfs4_server_supports_acls(struct
> nfs_server *server)
>   */
>  #define NFS4ACL_MAXPAGES DIV_ROUND_UP(XATTR_SIZE_MAX, PAGE_SIZE)
>  
> -static int buf_to_pages_noslab(const void *buf, size_t buflen,
> +int nfs4_buf_to_pages_noslab(const void *buf, size_t buflen,
>  		struct page **pages)
>  {
>  	struct page *newpage, **spages;
> @@ -5795,7 +5795,7 @@ static int __nfs4_proc_set_acl(struct inode *inode,
> const void *buf, size_t bufl
>  		return -EOPNOTSUPP;
>  	if (npages > ARRAY_SIZE(pages))
>  		return -ERANGE;
> -	i = buf_to_pages_noslab(buf, buflen, arg.acl_pages);
> +	i = nfs4_buf_to_pages_noslab(buf, buflen, arg.acl_pages);
>  	if (i < 0)
>  		return i;
>  	nfs4_inode_make_writeable(inode);

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 13/13] NFSv4.2: add client side xattr caching.
  2020-03-11 19:56 ` [PATCH 13/13] NFSv4.2: add client side xattr caching Frank van der Linden
@ 2020-03-12 20:39   ` Schumaker, Anna
  2020-03-12 20:48   ` Schumaker, Anna
  1 sibling, 0 replies; 33+ messages in thread
From: Schumaker, Anna @ 2020-03-12 20:39 UTC (permalink / raw)
  To: fllinden, linux-nfs, trond.myklebust

Hi Frank,

On Wed, 2020-03-11 at 19:56 +0000, Frank van der Linden wrote:
> Implement client side caching for NFSv4.2 extended attributes. The cache
> is a per-inode hashtable, with name/value entries. There is one special
> entry for the listxattr cache.
> 
> NFS inodes have a pointer to a cache structure. The cache structure is
> allocated on demand, freed when the cache is invalidated.
> 
> Memory shrinkers keep the size in check. Large entries (> PAGE_SIZE)
> are collected by a separate shrinker, and freed more aggressively
> than others.
> 
> Signed-off-by: Frank van der Linden <fllinden@amazon.com>
> ---
>  fs/nfs/Makefile             |    1 +
>  fs/nfs/inode.c              |    9 +-
>  fs/nfs/internal.h           |   20 +
>  fs/nfs/nfs42proc.c          |   12 +
>  fs/nfs/nfs42xattr.c         | 1083
> +++++++++++++++++++++++++++++++++++++++++++
>  fs/nfs/nfs4proc.c           |   42 +-
>  fs/nfs/nfs4super.c          |   10 +
>  include/linux/nfs_fs.h      |    6 +
>  include/uapi/linux/nfs_fs.h |    1 +
>  9 files changed, 1177 insertions(+), 7 deletions(-)
>  create mode 100644 fs/nfs/nfs42xattr.c
> 
> diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
> index 2433c3e03cfa..191b3e9aa232 100644
> --- a/fs/nfs/Makefile
> +++ b/fs/nfs/Makefile
> @@ -31,6 +31,7 @@ nfsv4-$(CONFIG_NFS_USE_LEGACY_DNS) += cache_lib.o
>  nfsv4-$(CONFIG_SYSCTL)	+= nfs4sysctl.o
>  nfsv4-$(CONFIG_NFS_V4_1)	+= pnfs.o pnfs_dev.o pnfs_nfs.o
>  nfsv4-$(CONFIG_NFS_V4_2)	+= nfs42proc.o
> +nfsv4-$(CONFIG_NFS_V4_2)	+= nfs42xattr.o
>  
>  obj-$(CONFIG_PNFS_FILE_LAYOUT) += filelayout/
>  obj-$(CONFIG_PNFS_BLOCK) += blocklayout/
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index d2be152796ef..9d4952d2306b 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -194,6 +194,7 @@ bool nfs_check_cache_invalid(struct inode *inode, unsigned
> long flags)
>  
>  	return nfs_check_cache_invalid_not_delegated(inode, flags);
>  }
> +EXPORT_SYMBOL_GPL(nfs_check_cache_invalid);
>  
>  static void nfs_set_cache_invalid(struct inode *inode, unsigned long flags)
>  {
> @@ -235,11 +236,13 @@ static void nfs_zap_caches_locked(struct inode *inode)
>  					| NFS_INO_INVALID_DATA
>  					| NFS_INO_INVALID_ACCESS
>  					| NFS_INO_INVALID_ACL
> +					| NFS_INO_INVALID_XATTR
>  					| NFS_INO_REVAL_PAGECACHE);
>  	} else
>  		nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR
>  					| NFS_INO_INVALID_ACCESS
>  					| NFS_INO_INVALID_ACL
> +					| NFS_INO_INVALID_XATTR
>  					| NFS_INO_REVAL_PAGECACHE);
>  	nfs_zap_label_cache_locked(nfsi);
>  }
> @@ -1885,7 +1888,8 @@ static int nfs_update_inode(struct inode *inode, struct
> nfs_fattr *fattr)
>  			if (!(have_writers || have_delegation)) {
>  				invalid |= NFS_INO_INVALID_DATA
>  					| NFS_INO_INVALID_ACCESS
> -					| NFS_INO_INVALID_ACL;
> +					| NFS_INO_INVALID_ACL
> +					| NFS_INO_INVALID_XATTR;
>  				/* Force revalidate of all attributes */
>  				save_cache_validity |= NFS_INO_INVALID_CTIME
>  					| NFS_INO_INVALID_MTIME
> @@ -2084,6 +2088,9 @@ struct inode *nfs_alloc_inode(struct super_block *sb)
>  #if IS_ENABLED(CONFIG_NFS_V4)
>  	nfsi->nfs4_acl = NULL;
>  #endif /* CONFIG_NFS_V4 */
> +#ifdef CONFIG_NFS_V4_2
> +	nfsi->xattr_cache = NULL;
> +#endif
>  	return &nfsi->vfs_inode;
>  }
>  EXPORT_SYMBOL_GPL(nfs_alloc_inode);
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 1e3a7e119c93..67b8e4f7c554 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -575,6 +575,26 @@ extern void nfs4_test_session_trunk(struct rpc_clnt
> *clnt,
>  				struct rpc_xprt *xprt,
>  				void *data);
>  
> +#ifdef CONFIG_NFS_V4_2
> +extern int __init nfs4_xattr_cache_init(void);
> +extern void nfs4_xattr_cache_exit(void);
> +extern void nfs4_xattr_cache_add(struct inode *inode, const char *name,
> +				 const char *buf, struct page **pages,
> +				 ssize_t buflen);
> +extern void nfs4_xattr_cache_remove(struct inode *inode, const char *name);
> +extern ssize_t nfs4_xattr_cache_get(struct inode *inode, const char *name,
> +				char *buf, ssize_t buflen);
> +extern void nfs4_xattr_cache_set_list(struct inode *inode, const char *buf,
> +				      ssize_t buflen);
> +extern ssize_t nfs4_xattr_cache_list(struct inode *inode, char *buf,
> +				     ssize_t buflen);
> +extern void nfs4_xattr_cache_zap(struct inode *inode);
> +#else
> +static inline void nfs4_xattr_cache_zap(struct inode *inode)
> +{
> +}
> +#endif
> +

Same thing with these functions. The generic client doesn't need to know about
them, so please move it into nfs4_fs.h instead.

Thanks,
Anna

>  static inline struct inode *nfs_igrab_and_active(struct inode *inode)
>  {
>  	inode = igrab(inode);
> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> index 8c2e52bc986a..e200522469af 100644
> --- a/fs/nfs/nfs42proc.c
> +++ b/fs/nfs/nfs42proc.c
> @@ -1182,6 +1182,18 @@ static ssize_t _nfs42_proc_getxattr(struct inode
> *inode, const char *name,
>  	if (ret < 0)
>  		return ret;
>  
> +	/*
> +	 * Normally, the caching is done one layer up, but for successful
> +	 * RPCS, always cache the result here, even if the caller was
> +	 * just querying the length, or if the reply was too big for
> +	 * the caller. This avoids a second RPC in the case of the
> +	 * common query-alloc-retrieve cycle for xattrs.
> +	 *
> +	 * Note that xattr_len is always capped to XATTR_SIZE_MAX.
> +	 */
> +
> +	nfs4_xattr_cache_add(inode, name, NULL, pages, res.xattr_len);
> +
>  	if (buflen) {
>  		if (res.xattr_len > buflen)
>  			return -ERANGE;
> diff --git a/fs/nfs/nfs42xattr.c b/fs/nfs/nfs42xattr.c
> new file mode 100644
> index 000000000000..23fdab977a2a
> --- /dev/null
> +++ b/fs/nfs/nfs42xattr.c
> @@ -0,0 +1,1083 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Copyright 2019, 2020 Amazon.com, Inc. or its affiliates. All rights
> reserved.
> + *
> + * User extended attribute client side cache functions.
> + *
> + * Author: Frank van der Linden <fllinden@amazon.com>
> + */
> +#include <linux/errno.h>
> +#include <linux/nfs_fs.h>
> +#include <linux/hashtable.h>
> +#include <linux/refcount.h>
> +#include <uapi/linux/xattr.h>
> +
> +#include "nfs4_fs.h"
> +#include "internal.h"
> +
> +/*
> + * User extended attributes client side caching is implemented by having
> + * a cache structure attached to NFS inodes. This structure is allocated
> + * when needed, and freed when the cache is zapped.
> + *
> + * The cache structure contains as hash table of entries, and a pointer
> + * to a special-cased entry for the listxattr cache.
> + *
> + * Accessing and allocating / freeing the caches is done via reference
> + * counting. The cache entries use a similar refcounting scheme.
> + *
> + * This makes freeing a cache, both from the shrinker and from the
> + * zap cache path, easy. It also means that, in current use cases,
> + * the large majority of inodes will not waste any memory, as they
> + * will never have any user extended attributes assigned to them.
> + *
> + * Attribute entries are hashed in to a simple hash table. They are
> + * also part of an LRU.
> + *
> + * There are three shrinkers.
> + *
> + * Two shrinkers deal with the cache entries themselves: one for
> + * large entries (> PAGE_SIZE), and one for smaller entries. The
> + * shrinker for the larger entries works more aggressively than
> + * those for the smaller entries.
> + *
> + * The other shrinker frees the cache structures themselves.
> + */
> +
> +/*
> + * 64 buckets is a good default. There is likely no reasonable
> + * workload that uses more than even 64 user extended attributes.
> + * You can certainly add a lot more - but you get what you ask for
> + * in those circumstances.
> + */
> +#define NFS4_XATTR_HASH_SIZE	64
> +
> +#define NFSDBG_FACILITY	NFSDBG_XATTRCACHE
> +
> +struct nfs4_xattr_cache;
> +struct nfs4_xattr_entry;
> +
> +struct nfs4_xattr_bucket {
> +	spinlock_t lock;
> +	struct hlist_head hlist;
> +	struct nfs4_xattr_cache *cache;
> +	bool draining;
> +};
> +
> +struct nfs4_xattr_cache {
> +	struct kref ref;
> +	spinlock_t hash_lock;	/* protects hashtable and lru */
> +	struct nfs4_xattr_bucket buckets[NFS4_XATTR_HASH_SIZE];
> +	struct list_head lru;
> +	struct list_head dispose;
> +	atomic_long_t nent;
> +	spinlock_t listxattr_lock;
> +	struct inode *inode;
> +	struct nfs4_xattr_entry *listxattr;
> +	struct work_struct work;
> +};
> +
> +struct nfs4_xattr_entry {
> +	struct kref ref;
> +	struct hlist_node hnode;
> +	struct list_head lru;
> +	struct list_head dispose;
> +	char *xattr_name;
> +	void *xattr_value;
> +	size_t xattr_size;
> +	struct nfs4_xattr_bucket *bucket;
> +	uint32_t flags;
> +};
> +
> +#define	NFS4_XATTR_ENTRY_EXTVAL	0x0001
> +
> +/*
> + * LRU list of NFS inodes that have xattr caches.
> + */
> +static struct list_lru nfs4_xattr_cache_lru;
> +static struct list_lru nfs4_xattr_entry_lru;
> +static struct list_lru nfs4_xattr_large_entry_lru;
> +
> +static struct kmem_cache *nfs4_xattr_cache_cachep;
> +
> +static struct workqueue_struct *nfs4_xattr_cache_wq;
> +
> +/*
> + * Hashing helper functions.
> + */
> +static void
> +nfs4_xattr_hash_init(struct nfs4_xattr_cache *cache)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < NFS4_XATTR_HASH_SIZE; i++) {
> +		INIT_HLIST_HEAD(&cache->buckets[i].hlist);
> +		spin_lock_init(&cache->buckets[i].lock);
> +		cache->buckets[i].cache = cache;
> +		cache->buckets[i].draining = false;
> +	}
> +}
> +
> +/*
> + * Locking order:
> + * 1. inode i_lock or bucket lock
> + * 2. list_lru lock (taken by list_lru_* functions)
> + */
> +
> +/*
> + * Wrapper functions to add a cache entry to the right LRU.
> + */
> +static bool
> +nfs4_xattr_entry_lru_add(struct nfs4_xattr_entry *entry)
> +{
> +	struct list_lru *lru;
> +
> +	lru = (entry->flags & NFS4_XATTR_ENTRY_EXTVAL) ?
> +	    &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
> +
> +	return list_lru_add(lru, &entry->lru);
> +}
> +
> +static bool
> +nfs4_xattr_entry_lru_del(struct nfs4_xattr_entry *entry)
> +{
> +	struct list_lru *lru;
> +
> +	lru = (entry->flags & NFS4_XATTR_ENTRY_EXTVAL) ?
> +	    &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
> +
> +	return list_lru_del(lru, &entry->lru);
> +}
> +
> +/*
> + * This function allocates cache entries. They are the normal
> + * extended attribute name/value pairs, but may also be a listxattr
> + * cache. Those allocations use the same entry so that they can be
> + * treated as one by the memory shrinker.
> + *
> + * xattr cache entries are allocated together with names. If the
> + * value fits in to one page with the entry structure and the name,
> + * it will also be part of the same allocation (kmalloc). This is
> + * expected to be the vast majority of cases. Larger allocations
> + * have a value pointer that is allocated separately by kvmalloc.
> + *
> + * Parameters:
> + *
> + * @name:  Name of the extended attribute. NULL for listxattr cache
> + *         entry.
> + * @value: Value of attribute, or listxattr cache. NULL if the
> + *         value is to be copied from pages instead.
> + * @pages: Pages to copy the value from, if not NULL. Passed in to
> + *	   make it easier to copy the value after an RPC, even if
> + *	   the value will not be passed up to application (e.g.
> + *	   for a 'query' getxattr with NULL buffer).
> + * @len:   Length of the value. Can be 0 for zero-length attribues.
> + *         @value and @pages will be NULL if @len is 0.
> + */
> +static struct nfs4_xattr_entry *
> +nfs4_xattr_alloc_entry(const char *name, const void *value,
> +		       struct page **pages, size_t len)
> +{
> +	struct nfs4_xattr_entry *entry;
> +	void *valp;
> +	char *namep;
> +	size_t alloclen, slen;
> +	char *buf;
> +	uint32_t flags;
> +
> +	BUILD_BUG_ON(sizeof(struct nfs4_xattr_entry) +
> +	    XATTR_NAME_MAX + 1 > PAGE_SIZE);
> +
> +	alloclen = sizeof(struct nfs4_xattr_entry);
> +	if (name != NULL) {
> +		slen = strlen(name) + 1;
> +		alloclen += slen;
> +	} else
> +		slen = 0;
> +
> +	if (alloclen + len <= PAGE_SIZE) {
> +		alloclen += len;
> +		flags = 0;
> +	} else {
> +		flags = NFS4_XATTR_ENTRY_EXTVAL;
> +	}
> +
> +	buf = kmalloc(alloclen, GFP_KERNEL_ACCOUNT | GFP_NOFS);
> +	if (buf == NULL)
> +		return NULL;
> +	entry = (struct nfs4_xattr_entry *)buf;
> +
> +	if (name != NULL) {
> +		namep = buf + sizeof(struct nfs4_xattr_entry);
> +		memcpy(namep, name, slen);
> +	} else {
> +		namep = NULL;
> +	}
> +
> +
> +	if (flags & NFS4_XATTR_ENTRY_EXTVAL) {
> +		valp = kvmalloc(len, GFP_KERNEL_ACCOUNT | GFP_NOFS);
> +		if (valp == NULL) {
> +			kfree(buf);
> +			return NULL;
> +		}
> +	} else if (len != 0) {
> +		valp = buf + sizeof(struct nfs4_xattr_entry) + slen;
> +	} else
> +		valp = NULL;
> +
> +	if (valp != NULL) {
> +		if (value != NULL)
> +			memcpy(valp, value, len);
> +		else
> +			_copy_from_pages(valp, pages, 0, len);
> +	}
> +
> +	entry->flags = flags;
> +	entry->xattr_value = valp;
> +	kref_init(&entry->ref);
> +	entry->xattr_name = namep;
> +	entry->xattr_size = len;
> +	entry->bucket = NULL;
> +	INIT_LIST_HEAD(&entry->lru);
> +	INIT_LIST_HEAD(&entry->dispose);
> +	INIT_HLIST_NODE(&entry->hnode);
> +
> +	return entry;
> +}
> +
> +static void
> +nfs4_xattr_free_entry(struct nfs4_xattr_entry *entry)
> +{
> +	if (entry->flags & NFS4_XATTR_ENTRY_EXTVAL)
> +		kvfree(entry->xattr_value);
> +	kfree(entry);
> +}
> +
> +static void
> +nfs4_xattr_free_entry_cb(struct kref *kref)
> +{
> +	struct nfs4_xattr_entry *entry;
> +
> +	entry = container_of(kref, struct nfs4_xattr_entry, ref);
> +
> +	if (WARN_ON(!list_empty(&entry->lru)))
> +		return;
> +
> +	nfs4_xattr_free_entry(entry);
> +}
> +
> +static void
> +nfs4_xattr_free_cache_cb(struct kref *kref)
> +{
> +	struct nfs4_xattr_cache *cache;
> +	int i;
> +
> +	cache = container_of(kref, struct nfs4_xattr_cache, ref);
> +
> +	for (i = 0; i < NFS4_XATTR_HASH_SIZE; i++) {
> +		if (WARN_ON(!hlist_empty(&cache->buckets[i].hlist)))
> +			return;
> +		cache->buckets[i].draining = false;
> +	}
> +
> +	cache->listxattr = NULL;
> +
> +	kmem_cache_free(nfs4_xattr_cache_cachep, cache);
> +
> +}
> +
> +static struct nfs4_xattr_cache *
> +nfs4_xattr_alloc_cache(void)
> +{
> +	struct nfs4_xattr_cache *cache;
> +
> +	cache = kmem_cache_alloc(nfs4_xattr_cache_cachep,
> +	    GFP_KERNEL_ACCOUNT | GFP_NOFS);
> +	if (cache == NULL)
> +		return NULL;
> +
> +	kref_init(&cache->ref);
> +	atomic_long_set(&cache->nent, 0);
> +
> +	return cache;
> +}
> +
> +/*
> + * Set the listxattr cache, which is a special-cased cache entry.
> + * The special value ERR_PTR(-ESTALE) is used to indicate that
> + * the cache is being drained - this prevents a new listxattr
> + * cache from being added to what is now a stale cache.
> + */
> +static int
> +nfs4_xattr_set_listcache(struct nfs4_xattr_cache *cache,
> +			 struct nfs4_xattr_entry *new)
> +{
> +	struct nfs4_xattr_entry *old;
> +	int ret = 1;
> +
> +	spin_lock(&cache->listxattr_lock);
> +
> +	old = cache->listxattr;
> +
> +	if (old == ERR_PTR(-ESTALE)) {
> +		ret = 0;
> +		goto out;
> +	}
> +
> +	cache->listxattr = new;
> +	if (new != NULL && new != ERR_PTR(-ESTALE))
> +		nfs4_xattr_entry_lru_add(new);
> +
> +	if (old != NULL) {
> +		nfs4_xattr_entry_lru_del(old);
> +		kref_put(&old->ref, nfs4_xattr_free_entry_cb);
> +	}
> +out:
> +	spin_unlock(&cache->listxattr_lock);
> +
> +	return ret;
> +}
> +
> +/*
> + * Unlink a cache from its parent inode, clearing out an invalid
> + * cache. Must be called with i_lock held.
> + */
> +static struct nfs4_xattr_cache *
> +nfs4_xattr_cache_unlink(struct inode *inode)
> +{
> +	struct nfs_inode *nfsi;
> +	struct nfs4_xattr_cache *oldcache;
> +
> +	nfsi = NFS_I(inode);
> +
> +	oldcache = nfsi->xattr_cache;
> +	if (oldcache != NULL) {
> +		list_lru_del(&nfs4_xattr_cache_lru, &oldcache->lru);
> +		oldcache->inode = NULL;
> +	}
> +	nfsi->xattr_cache = NULL;
> +	nfsi->cache_validity &= ~NFS_INO_INVALID_XATTR;
> +
> +	return oldcache;
> +
> +}
> +
> +/*
> + * Discard a cache. Usually called by a worker, since walking all
> + * the entries can take up some cycles that we don't want to waste
> + * in the I/O path. Can also be called from the shrinker callback.
> + *
> + * The cache is dead, it has already been unlinked from its inode,
> + * and no longer appears on the cache LRU list.
> + *
> + * Mark all buckets as draining, so that no new entries are added. This
> + * could still happen in the unlikely, but possible case that another
> + * thread had grabbed a reference before it was unlinked from the inode,
> + * and is still holding it for an add operation.
> + *
> + * Remove all entries from the LRU lists, so that there is no longer
> + * any way to 'find' this cache. Then, remove the entries from the hash
> + * table.
> + *
> + * At that point, the cache will remain empty and can be freed when the final
> + * reference drops, which is very likely the kref_put at the end of
> + * this function, or the one called immediately afterwards in the
> + * shrinker callback.
> + */
> +static void
> +nfs4_xattr_discard_cache(struct nfs4_xattr_cache *cache)
> +{
> +	unsigned int i;
> +	struct nfs4_xattr_entry *entry;
> +	struct nfs4_xattr_bucket *bucket;
> +	struct hlist_node *n;
> +
> +	nfs4_xattr_set_listcache(cache, ERR_PTR(-ESTALE));
> +
> +	for (i = 0; i < NFS4_XATTR_HASH_SIZE; i++) {
> +		bucket = &cache->buckets[i];
> +
> +		spin_lock(&bucket->lock);
> +		bucket->draining = true;
> +		hlist_for_each_entry_safe(entry, n, &bucket->hlist, hnode) {
> +			nfs4_xattr_entry_lru_del(entry);
> +			hlist_del_init(&entry->hnode);
> +			kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> +		}
> +		spin_unlock(&bucket->lock);
> +	}
> +
> +	atomic_long_set(&cache->nent, 0);
> +
> +	kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +}
> +
> +static void
> +nfs4_xattr_discard_cache_worker(struct work_struct *work)
> +{
> +	struct nfs4_xattr_cache *cache = container_of(work,
> +	    struct nfs4_xattr_cache, work);
> +
> +	nfs4_xattr_discard_cache(cache);
> +}
> +
> +static void
> +nfs4_xattr_reap_cache(struct nfs4_xattr_cache *cache)
> +{
> +	queue_work(nfs4_xattr_cache_wq, &cache->work);
> +}
> +
> +/*
> + * Get a referenced copy of the cache structure. Avoid doing allocs
> + * while holding i_lock. Which means that we do some optimistic allocation,
> + * and might have to free the result in rare cases.
> + *
> + * This function only checks the NFS_INO_INVALID_XATTR cache validity bit
> + * and acts accordingly, replacing the cache when needed. For the read case
> + * (!add), this means that the caller must make sure that the cache
> + * is valid before caling this function. getxattr and listxattr call
> + * revalidate_inode to do this. The attribute cache timeout (for the
> + * non-delegated case) is expected to be dealt with in the revalidate
> + * call.
> + */
> +
> +static struct nfs4_xattr_cache *
> +nfs4_xattr_get_cache(struct inode *inode, int add)
> +{
> +	struct nfs_inode *nfsi;
> +	struct nfs4_xattr_cache *cache, *oldcache, *newcache;
> +
> +	nfsi = NFS_I(inode);
> +
> +	cache = oldcache = NULL;
> +
> +	spin_lock(&inode->i_lock);
> +
> +	if (nfsi->cache_validity & NFS_INO_INVALID_XATTR)
> +		oldcache = nfs4_xattr_cache_unlink(inode);
> +	else
> +		cache = nfsi->xattr_cache;
> +
> +	if (cache != NULL)
> +		kref_get(&cache->ref);
> +
> +	spin_unlock(&inode->i_lock);
> +
> +	if (add && cache == NULL) {
> +		newcache = NULL;
> +
> +		cache = nfs4_xattr_alloc_cache();
> +		if (cache == NULL)
> +			goto out;
> +
> +		spin_lock(&inode->i_lock);
> +		if (nfsi->cache_validity & NFS_INO_INVALID_XATTR) {
> +			/*
> +			 * The cache was invalidated again. Give up,
> +			 * since what we want to enter is now likely
> +			 * outdated anyway.
> +			 */
> +			spin_unlock(&inode->i_lock);
> +			kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +			cache = NULL;
> +			goto out;
> +		}
> +
> +		/*
> +		 * Check if someone beat us to it.
> +		 */
> +		if (nfsi->xattr_cache != NULL) {
> +			newcache = nfsi->xattr_cache;
> +			kref_get(&newcache->ref);
> +		} else {
> +			kref_get(&cache->ref);
> +			nfsi->xattr_cache = cache;
> +			cache->inode = inode;
> +			list_lru_add(&nfs4_xattr_cache_lru, &cache->lru);
> +		}
> +
> +		spin_unlock(&inode->i_lock);
> +
> +		/*
> +		 * If there was a race, throw away the cache we just
> +		 * allocated, and use the new one allocated by someone
> +		 * else.
> +		 */
> +		if (newcache != NULL) {
> +			kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +			cache = newcache;
> +		}
> +	}
> +
> +out:
> +	/*
> +	 * Discarding an old cache is done via a workqueue.
> +	 */
> +	if (oldcache != NULL)
> +		nfs4_xattr_reap_cache(oldcache);
> +
> +	return cache;
> +}
> +
> +static inline struct nfs4_xattr_bucket *
> +nfs4_xattr_hash_bucket(struct nfs4_xattr_cache *cache, const char *name)
> +{
> +	return &cache->buckets[jhash(name, strlen(name), 0) &
> +	    (ARRAY_SIZE(cache->buckets) - 1)];
> +}
> +
> +static struct nfs4_xattr_entry *
> +nfs4_xattr_get_entry(struct nfs4_xattr_bucket *bucket, const char *name)
> +{
> +	struct nfs4_xattr_entry *entry;
> +
> +	entry = NULL;
> +
> +	hlist_for_each_entry(entry, &bucket->hlist, hnode) {
> +		if (!strcmp(entry->xattr_name, name))
> +			break;
> +	}
> +
> +	return entry;
> +}
> +
> +static int
> +nfs4_xattr_hash_add(struct nfs4_xattr_cache *cache,
> +		    struct nfs4_xattr_entry *entry)
> +{
> +	struct nfs4_xattr_bucket *bucket;
> +	struct nfs4_xattr_entry *oldentry = NULL;
> +	int ret = 1;
> +
> +	bucket = nfs4_xattr_hash_bucket(cache, entry->xattr_name);
> +	entry->bucket = bucket;
> +
> +	spin_lock(&bucket->lock);
> +
> +	if (bucket->draining) {
> +		ret = 0;
> +		goto out;
> +	}
> +
> +	oldentry = nfs4_xattr_get_entry(bucket, entry->xattr_name);
> +	if (oldentry != NULL) {
> +		hlist_del_init(&oldentry->hnode);
> +		nfs4_xattr_entry_lru_del(oldentry);
> +	} else {
> +		atomic_long_inc(&cache->nent);
> +	}
> +
> +	hlist_add_head(&entry->hnode, &bucket->hlist);
> +	nfs4_xattr_entry_lru_add(entry);
> +
> +out:
> +	spin_unlock(&bucket->lock);
> +
> +	if (oldentry != NULL)
> +		kref_put(&oldentry->ref, nfs4_xattr_free_entry_cb);
> +
> +	return ret;
> +}
> +
> +static void
> +nfs4_xattr_hash_remove(struct nfs4_xattr_cache *cache, const char *name)
> +{
> +	struct nfs4_xattr_bucket *bucket;
> +	struct nfs4_xattr_entry *entry;
> +
> +	bucket = nfs4_xattr_hash_bucket(cache, name);
> +
> +	spin_lock(&bucket->lock);
> +
> +	entry = nfs4_xattr_get_entry(bucket, name);
> +	if (entry != NULL) {
> +		hlist_del_init(&entry->hnode);
> +		nfs4_xattr_entry_lru_del(entry);
> +		atomic_long_dec(&cache->nent);
> +	}
> +
> +	spin_unlock(&bucket->lock);
> +
> +	if (entry != NULL)
> +		kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> +}
> +
> +static struct nfs4_xattr_entry *
> +nfs4_xattr_hash_find(struct nfs4_xattr_cache *cache, const char *name)
> +{
> +	struct nfs4_xattr_bucket *bucket;
> +	struct nfs4_xattr_entry *entry;
> +
> +	bucket = nfs4_xattr_hash_bucket(cache, name);
> +
> +	spin_lock(&bucket->lock);
> +
> +	entry = nfs4_xattr_get_entry(bucket, name);
> +	if (entry != NULL)
> +		kref_get(&entry->ref);
> +
> +	spin_unlock(&bucket->lock);
> +
> +	return entry;
> +}
> +
> +/*
> + * Entry point to retrieve an entry from the cache.
> + */
> +ssize_t nfs4_xattr_cache_get(struct inode *inode, const char *name, char
> *buf,
> +			 ssize_t buflen)
> +{
> +	struct nfs4_xattr_cache *cache;
> +	struct nfs4_xattr_entry *entry;
> +	ssize_t ret;
> +
> +	cache = nfs4_xattr_get_cache(inode, 0);
> +	if (cache == NULL)
> +		return -ENOENT;
> +
> +	ret = 0;
> +	entry = nfs4_xattr_hash_find(cache, name);
> +
> +	if (entry != NULL) {
> +		dprintk("%s: cache hit '%s', len %lu\n", __func__,
> +		    entry->xattr_name, (unsigned long)entry->xattr_size);
> +		if (buflen == 0) {
> +			/* Length probe only */
> +			ret = entry->xattr_size;
> +		} else if (buflen < entry->xattr_size)
> +			ret = -ERANGE;
> +		else {
> +			memcpy(buf, entry->xattr_value, entry->xattr_size);
> +			ret = entry->xattr_size;
> +		}
> +		kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> +	} else {
> +		dprintk("%s: cache miss '%s'\n", __func__, name);
> +		ret = -ENOENT;
> +	}
> +
> +	kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +
> +	return ret;
> +}
> +
> +/*
> + * Retrieve a cached list of xattrs from the cache.
> + */
> +ssize_t nfs4_xattr_cache_list(struct inode *inode, char *buf, ssize_t buflen)
> +{
> +	struct nfs4_xattr_cache *cache;
> +	struct nfs4_xattr_entry *entry;
> +	ssize_t ret;
> +
> +	cache = nfs4_xattr_get_cache(inode, 0);
> +	if (cache == NULL)
> +		return -ENOENT;
> +
> +	spin_lock(&cache->listxattr_lock);
> +
> +	entry = cache->listxattr;
> +
> +	if (entry != NULL && entry != ERR_PTR(-ESTALE)) {
> +		if (buflen == 0) {
> +			/* Length probe only */
> +			ret = entry->xattr_size;
> +		} else if (entry->xattr_size > buflen)
> +			ret = -ERANGE;
> +		else {
> +			memcpy(buf, entry->xattr_value, entry->xattr_size);
> +			ret = entry->xattr_size;
> +		}
> +	} else {
> +		ret = -ENOENT;
> +	}
> +
> +	spin_unlock(&cache->listxattr_lock);
> +
> +	kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +
> +	return ret;
> +}
> +
> +/*
> + * Add an xattr to the cache.
> + *
> + * This also invalidates the xattr list cache.
> + */
> +void nfs4_xattr_cache_add(struct inode *inode, const char *name,
> +			  const char *buf, struct page **pages, ssize_t buflen)
> +{
> +	struct nfs4_xattr_cache *cache;
> +	struct nfs4_xattr_entry *entry;
> +
> +	dprintk("%s: add '%s' len %lu\n", __func__,
> +	    name, (unsigned long)buflen);
> +
> +	cache = nfs4_xattr_get_cache(inode, 1);
> +	if (cache == NULL)
> +		return;
> +
> +	entry = nfs4_xattr_alloc_entry(name, buf, pages, buflen);
> +	if (entry == NULL)
> +		goto out;
> +
> +	(void)nfs4_xattr_set_listcache(cache, NULL);
> +
> +	if (!nfs4_xattr_hash_add(cache, entry))
> +		kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> +
> +out:
> +	kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +}
> +
> +
> +/*
> + * Remove an xattr from the cache.
> + *
> + * This also invalidates the xattr list cache.
> + */
> +void nfs4_xattr_cache_remove(struct inode *inode, const char *name)
> +{
> +	struct nfs4_xattr_cache *cache;
> +
> +	dprintk("%s: remove '%s'\n", __func__, name);
> +
> +	cache = nfs4_xattr_get_cache(inode, 0);
> +	if (cache == NULL)
> +		return;
> +
> +	(void)nfs4_xattr_set_listcache(cache, NULL);
> +	nfs4_xattr_hash_remove(cache, name);
> +
> +	kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +}
> +
> +/*
> + * Cache listxattr output, replacing any possible old one.
> + */
> +void nfs4_xattr_cache_set_list(struct inode *inode, const char *buf,
> +			       ssize_t buflen)
> +{
> +	struct nfs4_xattr_cache *cache;
> +	struct nfs4_xattr_entry *entry;
> +
> +	cache = nfs4_xattr_get_cache(inode, 1);
> +	if (cache == NULL)
> +		return;
> +
> +	entry = nfs4_xattr_alloc_entry(NULL, buf, NULL, buflen);
> +	if (entry == NULL)
> +		goto out;
> +
> +	/*
> +	 * This is just there to be able to get to bucket->cache,
> +	 * which is obviously the same for all buckets, so just
> +	 * use bucket 0.
> +	 */
> +	entry->bucket = &cache->buckets[0];
> +
> +	if (!nfs4_xattr_set_listcache(cache, entry))
> +		kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> +
> +out:
> +	kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +}
> +
> +/*
> + * Zap the entire cache. Called when an inode is evicted.
> + */
> +void nfs4_xattr_cache_zap(struct inode *inode)
> +{
> +	struct nfs4_xattr_cache *oldcache;
> +
> +	spin_lock(&inode->i_lock);
> +	oldcache = nfs4_xattr_cache_unlink(inode);
> +	spin_unlock(&inode->i_lock);
> +
> +	if (oldcache)
> +		nfs4_xattr_discard_cache(oldcache);
> +}
> +
> +/*
> + * The entry LRU is shrunk more aggressively than the cache LRU,
> + * by settings @seeks to 1.
> + *
> + * Cache structures are freed only when they've become empty, after
> + * pruning all but one entry.
> + */
> +
> +static unsigned long nfs4_xattr_cache_count(struct shrinker *shrink,
> +					    struct shrink_control *sc);
> +static unsigned long nfs4_xattr_entry_count(struct shrinker *shrink,
> +					    struct shrink_control *sc);
> +static unsigned long nfs4_xattr_cache_scan(struct shrinker *shrink,
> +					   struct shrink_control *sc);
> +static unsigned long nfs4_xattr_entry_scan(struct shrinker *shrink,
> +					   struct shrink_control *sc);
> +
> +static struct shrinker nfs4_xattr_cache_shrinker = {
> +	.count_objects	= nfs4_xattr_cache_count,
> +	.scan_objects	= nfs4_xattr_cache_scan,
> +	.seeks		= DEFAULT_SEEKS,
> +	.flags		= SHRINKER_MEMCG_AWARE,
> +};
> +
> +static struct shrinker nfs4_xattr_entry_shrinker = {
> +	.count_objects	= nfs4_xattr_entry_count,
> +	.scan_objects	= nfs4_xattr_entry_scan,
> +	.seeks		= DEFAULT_SEEKS,
> +	.batch		= 512,
> +	.flags		= SHRINKER_MEMCG_AWARE,
> +};
> +
> +static struct shrinker nfs4_xattr_large_entry_shrinker = {
> +	.count_objects	= nfs4_xattr_entry_count,
> +	.scan_objects	= nfs4_xattr_entry_scan,
> +	.seeks		= 1,
> +	.batch		= 512,
> +	.flags		= SHRINKER_MEMCG_AWARE,
> +};
> +
> +static enum lru_status
> +cache_lru_isolate(struct list_head *item,
> +	struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
> +{
> +	struct list_head *dispose = arg;
> +	struct inode *inode;
> +	struct nfs4_xattr_cache *cache = container_of(item,
> +	    struct nfs4_xattr_cache, lru);
> +
> +	if (atomic_long_read(&cache->nent) > 1)
> +		return LRU_SKIP;
> +
> +	/*
> +	 * If a cache structure is on the LRU list, we know that
> +	 * its inode is valid. Try to lock it to break the link.
> +	 * Since we're inverting the lock order here, only try.
> +	 */
> +	inode = cache->inode;
> +
> +	if (!spin_trylock(&inode->i_lock))
> +		return LRU_SKIP;
> +
> +	kref_get(&cache->ref);
> +
> +	cache->inode = NULL;
> +	NFS_I(inode)->xattr_cache = NULL;
> +	NFS_I(inode)->cache_validity &= ~NFS_INO_INVALID_XATTR;
> +	list_lru_isolate(lru, &cache->lru);
> +
> +	spin_unlock(&inode->i_lock);
> +
> +	list_add_tail(&cache->dispose, dispose);
> +	return LRU_REMOVED;
> +}
> +
> +static unsigned long
> +nfs4_xattr_cache_scan(struct shrinker *shrink, struct shrink_control *sc)
> +{
> +	LIST_HEAD(dispose);
> +	unsigned long freed;
> +	struct nfs4_xattr_cache *cache;
> +
> +	freed = list_lru_shrink_walk(&nfs4_xattr_cache_lru, sc,
> +	    cache_lru_isolate, &dispose);
> +	while (!list_empty(&dispose)) {
> +		cache = list_first_entry(&dispose, struct nfs4_xattr_cache,
> +		    dispose);
> +		list_del_init(&cache->dispose);
> +		nfs4_xattr_discard_cache(cache);
> +		kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +	}
> +
> +	return freed;
> +}
> +
> +
> +static unsigned long
> +nfs4_xattr_cache_count(struct shrinker *shrink, struct shrink_control *sc)
> +{
> +	unsigned long count;
> +
> +	count = list_lru_count(&nfs4_xattr_cache_lru);
> +	return vfs_pressure_ratio(count);
> +}
> +
> +static enum lru_status
> +entry_lru_isolate(struct list_head *item,
> +	struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
> +{
> +	struct list_head *dispose = arg;
> +	struct nfs4_xattr_bucket *bucket;
> +	struct nfs4_xattr_cache *cache;
> +	struct nfs4_xattr_entry *entry = container_of(item,
> +	    struct nfs4_xattr_entry, lru);
> +
> +	bucket = entry->bucket;
> +	cache = bucket->cache;
> +
> +	/*
> +	 * Unhook the entry from its parent (either a cache bucket
> +	 * or a cache structure if it's a listxattr buf), so that
> +	 * it's no longer found. Then add it to the isolate list,
> +	 * to be freed later.
> +	 *
> +	 * In both cases, we're reverting lock order, so use
> +	 * trylock and skip the entry if we can't get the lock.
> +	 */
> +	if (entry->xattr_name != NULL) {
> +		/* Regular cache entry */
> +		if (!spin_trylock(&bucket->lock))
> +			return LRU_SKIP;
> +
> +		kref_get(&entry->ref);
> +
> +		hlist_del_init(&entry->hnode);
> +		atomic_long_dec(&cache->nent);
> +		list_lru_isolate(lru, &entry->lru);
> +
> +		spin_unlock(&bucket->lock);
> +	} else {
> +		/* Listxattr cache entry */
> +		if (!spin_trylock(&cache->listxattr_lock))
> +			return LRU_SKIP;
> +
> +		kref_get(&entry->ref);
> +
> +		cache->listxattr = NULL;
> +		list_lru_isolate(lru, &entry->lru);
> +
> +		spin_unlock(&cache->listxattr_lock);
> +	}
> +
> +	list_add_tail(&entry->dispose, dispose);
> +	return LRU_REMOVED;
> +}
> +
> +static unsigned long
> +nfs4_xattr_entry_scan(struct shrinker *shrink, struct shrink_control *sc)
> +{
> +	LIST_HEAD(dispose);
> +	unsigned long freed;
> +	struct nfs4_xattr_entry *entry;
> +	struct list_lru *lru;
> +
> +	lru = (shrink == &nfs4_xattr_large_entry_shrinker) ?
> +	    &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
> +
> +	freed = list_lru_shrink_walk(lru, sc, entry_lru_isolate, &dispose);
> +
> +	while (!list_empty(&dispose)) {
> +		entry = list_first_entry(&dispose, struct nfs4_xattr_entry,
> +		    dispose);
> +		list_del_init(&entry->dispose);
> +
> +		/*
> +		 * Drop two references: the one that we just grabbed
> +		 * in entry_lru_isolate, and the one that was set
> +		 * when the entry was first allocated.
> +		 */
> +		kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> +		kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> +	}
> +
> +	return freed;
> +}
> +
> +static unsigned long
> +nfs4_xattr_entry_count(struct shrinker *shrink, struct shrink_control *sc)
> +{
> +	unsigned long count;
> +	struct list_lru *lru;
> +
> +	lru = (shrink == &nfs4_xattr_large_entry_shrinker) ?
> +	    &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
> +
> +	count = list_lru_count(lru);
> +	return vfs_pressure_ratio(count);
> +}
> +
> +
> +static void nfs4_xattr_cache_init_once(void *p)
> +{
> +	struct nfs4_xattr_cache *cache = (struct nfs4_xattr_cache *)p;
> +
> +	spin_lock_init(&cache->listxattr_lock);
> +	atomic_long_set(&cache->nent, 0);
> +	nfs4_xattr_hash_init(cache);
> +	cache->listxattr = NULL;
> +	INIT_WORK(&cache->work, nfs4_xattr_discard_cache_worker);
> +	INIT_LIST_HEAD(&cache->lru);
> +	INIT_LIST_HEAD(&cache->dispose);
> +}
> +
> +int __init nfs4_xattr_cache_init(void)
> +{
> +	int ret = 0;
> +
> +	nfs4_xattr_cache_cachep = kmem_cache_create("nfs4_xattr_cache_cache",
> +	    sizeof(struct nfs4_xattr_cache), 0,
> +	    (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT),
> +	    nfs4_xattr_cache_init_once);
> +	if (nfs4_xattr_cache_cachep == NULL)
> +		return -ENOMEM;
> +
> +	ret = list_lru_init_memcg(&nfs4_xattr_large_entry_lru,
> +	    &nfs4_xattr_large_entry_shrinker);
> +	if (ret)
> +		goto out4;
> +
> +	ret = list_lru_init_memcg(&nfs4_xattr_entry_lru,
> +	    &nfs4_xattr_entry_shrinker);
> +	if (ret)
> +		goto out3;
> +
> +	ret = list_lru_init_memcg(&nfs4_xattr_cache_lru,
> +	    &nfs4_xattr_cache_shrinker);
> +	if (ret)
> +		goto out2;
> +
> +	nfs4_xattr_cache_wq = alloc_workqueue("nfs4_xattr", WQ_MEM_RECLAIM, 0);
> +	if (nfs4_xattr_cache_wq == NULL)
> +		goto out1;
> +
> +	ret = register_shrinker(&nfs4_xattr_cache_shrinker);
> +	if (ret)
> +		goto out0;
> +
> +	ret = register_shrinker(&nfs4_xattr_entry_shrinker);
> +	if (ret)
> +		goto out;
> +
> +	ret = register_shrinker(&nfs4_xattr_large_entry_shrinker);
> +	if (!ret)
> +		return 0;
> +
> +	unregister_shrinker(&nfs4_xattr_entry_shrinker);
> +out:
> +	unregister_shrinker(&nfs4_xattr_cache_shrinker);
> +out0:
> +	destroy_workqueue(nfs4_xattr_cache_wq);
> +out1:
> +	list_lru_destroy(&nfs4_xattr_cache_lru);
> +out2:
> +	list_lru_destroy(&nfs4_xattr_entry_lru);
> +out3:
> +	list_lru_destroy(&nfs4_xattr_large_entry_lru);
> +out4:
> +	kmem_cache_destroy(nfs4_xattr_cache_cachep);
> +
> +	return ret;
> +}
> +
> +void nfs4_xattr_cache_exit(void)
> +{
> +	unregister_shrinker(&nfs4_xattr_entry_shrinker);
> +	unregister_shrinker(&nfs4_xattr_cache_shrinker);
> +	list_lru_destroy(&nfs4_xattr_entry_lru);
> +	list_lru_destroy(&nfs4_xattr_cache_lru);
> +	kmem_cache_destroy(nfs4_xattr_cache_cachep);
> +	destroy_workqueue(nfs4_xattr_cache_wq);
> +}
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 6df94857f5bb..079c1ac84cee 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -7459,6 +7459,7 @@ static int nfs4_xattr_set_nfs4_user(const struct
> xattr_handler *handler,
>  				    size_t buflen, int flags)
>  {
>  	struct nfs_access_entry cache;
> +	int ret;
>  
>  	if (!nfs_server_capable(inode, NFS_CAP_XATTR))
>  		return -EOPNOTSUPP;
> @@ -7477,10 +7478,17 @@ static int nfs4_xattr_set_nfs4_user(const struct
> xattr_handler *handler,
>  			return -EACCES;
>  	}
>  
> -	if (buf == NULL)
> -		return nfs42_proc_removexattr(inode, key);
> -	else
> -		return nfs42_proc_setxattr(inode, key, buf, buflen, flags);
> +	if (buf == NULL) {
> +		ret = nfs42_proc_removexattr(inode, key);
> +		if (!ret)
> +			nfs4_xattr_cache_remove(inode, key);
> +	} else {
> +		ret = nfs42_proc_setxattr(inode, key, buf, buflen, flags);
> +		if (!ret)
> +			nfs4_xattr_cache_add(inode, key, buf, NULL, buflen);
> +	}
> +
> +	return ret;
>  }
>  
>  static int nfs4_xattr_get_nfs4_user(const struct xattr_handler *handler,
> @@ -7488,6 +7496,7 @@ static int nfs4_xattr_get_nfs4_user(const struct
> xattr_handler *handler,
>  				    const char *key, void *buf, size_t buflen)
>  {
>  	struct nfs_access_entry cache;
> +	ssize_t ret;
>  
>  	if (!nfs_server_capable(inode, NFS_CAP_XATTR))
>  		return -EOPNOTSUPP;
> @@ -7497,7 +7506,17 @@ static int nfs4_xattr_get_nfs4_user(const struct
> xattr_handler *handler,
>  			return -EACCES;
>  	}
>  
> -	return nfs42_proc_getxattr(inode, key, buf, buflen);
> +	ret = nfs_revalidate_inode(NFS_SERVER(inode), inode);
> +	if (ret)
> +		return ret;
> +
> +	ret = nfs4_xattr_cache_get(inode, key, buf, buflen);
> +	if (ret >= 0 || (ret < 0 && ret != -ENOENT))
> +		return ret;
> +
> +	ret = nfs42_proc_getxattr(inode, key, buf, buflen);
> +
> +	return ret;
>  }
>  
>  static ssize_t
> @@ -7505,7 +7524,7 @@ nfs4_listxattr_nfs4_user(struct inode *inode, char
> *list, size_t list_len)
>  {
>  	u64 cookie;
>  	bool eof;
> -	int ret, size;
> +	ssize_t ret, size;
>  	char *buf;
>  	size_t buflen;
>  	struct nfs_access_entry cache;
> @@ -7518,6 +7537,14 @@ nfs4_listxattr_nfs4_user(struct inode *inode, char
> *list, size_t list_len)
>  			return 0;
>  	}
>  
> +	ret = nfs_revalidate_inode(NFS_SERVER(inode), inode);
> +	if (ret)
> +		return ret;
> +
> +	ret = nfs4_xattr_cache_list(inode, list, list_len);
> +	if (ret >= 0 || (ret < 0 && ret != -ENOENT))
> +		return ret;
> +
>  	cookie = 0;
>  	eof = false;
>  	buflen = list_len ? list_len : XATTR_LIST_MAX;
> @@ -7537,6 +7564,9 @@ nfs4_listxattr_nfs4_user(struct inode *inode, char
> *list, size_t list_len)
>  		size += ret;
>  	}
>  
> +	if (list_len)
> +		nfs4_xattr_cache_set_list(inode, list, size);
> +
>  	return size;
>  }
>  
> diff --git a/fs/nfs/nfs4super.c b/fs/nfs/nfs4super.c
> index 1475f932d7da..0c1ab846b83d 100644
> --- a/fs/nfs/nfs4super.c
> +++ b/fs/nfs/nfs4super.c
> @@ -69,6 +69,7 @@ static void nfs4_evict_inode(struct inode *inode)
>  	pnfs_destroy_layout(NFS_I(inode));
>  	/* First call standard NFS clear_inode() code */
>  	nfs_clear_inode(inode);
> +	nfs4_xattr_cache_zap(inode);
>  }
>  
>  struct nfs_referral_count {
> @@ -268,6 +269,12 @@ static int __init init_nfs_v4(void)
>  	if (err)
>  		goto out1;
>  
> +#ifdef CONFIG_NFS_V4_2
> +	err = nfs4_xattr_cache_init();
> +	if (err)
> +		goto out2;
> +#endif
> +
>  	err = nfs4_register_sysctl();
>  	if (err)
>  		goto out2;
> @@ -288,6 +295,9 @@ static void __exit exit_nfs_v4(void)
>  	nfs4_pnfs_v3_ds_connect_unload();
>  
>  	unregister_nfs_version(&nfs_v4);
> +#ifdef CONFIG_NFS_V4_2
> +	nfs4_xattr_cache_exit();
> +#endif
>  	nfs4_unregister_sysctl();
>  	nfs_idmap_quit();
>  	nfs_dns_resolver_destroy();
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index 1fcfef670a4a..c08cc22d9c32 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -102,6 +102,8 @@ struct nfs_delegation;
>  
>  struct posix_acl;
>  
> +struct nfs4_xattr_cache;
> +
>  /*
>   * nfs fs inode data in memory
>   */
> @@ -188,6 +190,10 @@ struct nfs_inode {
>  	struct fscache_cookie	*fscache;
>  #endif
>  	struct inode		vfs_inode;
> +
> +#ifdef CONFIG_NFS_V4_2
> +	struct nfs4_xattr_cache *xattr_cache;
> +#endif
>  };
>  
>  struct nfs4_copy_state {
> diff --git a/include/uapi/linux/nfs_fs.h b/include/uapi/linux/nfs_fs.h
> index 7bcc8cd6831d..3afe3767c55d 100644
> --- a/include/uapi/linux/nfs_fs.h
> +++ b/include/uapi/linux/nfs_fs.h
> @@ -56,6 +56,7 @@
>  #define NFSDBG_PNFS		0x1000
>  #define NFSDBG_PNFS_LD		0x2000
>  #define NFSDBG_STATE		0x4000
> +#define NFSDBG_XATTRCACHE	0x8000
>  #define NFSDBG_ALL		0xFFFF
>  
>  

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 13/13] NFSv4.2: add client side xattr caching.
  2020-03-11 19:56 ` [PATCH 13/13] NFSv4.2: add client side xattr caching Frank van der Linden
  2020-03-12 20:39   ` Schumaker, Anna
@ 2020-03-12 20:48   ` Schumaker, Anna
  1 sibling, 0 replies; 33+ messages in thread
From: Schumaker, Anna @ 2020-03-12 20:48 UTC (permalink / raw)
  To: fllinden, linux-nfs, trond.myklebust

On Wed, 2020-03-11 at 19:56 +0000, Frank van der Linden wrote:
> Implement client side caching for NFSv4.2 extended attributes. The cache
> is a per-inode hashtable, with name/value entries. There is one special
> entry for the listxattr cache.
> 
> NFS inodes have a pointer to a cache structure. The cache structure is
> allocated on demand, freed when the cache is invalidated.
> 
> Memory shrinkers keep the size in check. Large entries (> PAGE_SIZE)
> are collected by a separate shrinker, and freed more aggressively
> than others.
> 
> Signed-off-by: Frank van der Linden <fllinden@amazon.com>
> ---
>  fs/nfs/Makefile             |    1 +
>  fs/nfs/inode.c              |    9 +-
>  fs/nfs/internal.h           |   20 +
>  fs/nfs/nfs42proc.c          |   12 +
>  fs/nfs/nfs42xattr.c         | 1083
> +++++++++++++++++++++++++++++++++++++++++++
>  fs/nfs/nfs4proc.c           |   42 +-
>  fs/nfs/nfs4super.c          |   10 +
>  include/linux/nfs_fs.h      |    6 +
>  include/uapi/linux/nfs_fs.h |    1 +
>  9 files changed, 1177 insertions(+), 7 deletions(-)
>  create mode 100644 fs/nfs/nfs42xattr.c
> 
> diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
> index 2433c3e03cfa..191b3e9aa232 100644
> --- a/fs/nfs/Makefile
> +++ b/fs/nfs/Makefile
> @@ -31,6 +31,7 @@ nfsv4-$(CONFIG_NFS_USE_LEGACY_DNS) += cache_lib.o
>  nfsv4-$(CONFIG_SYSCTL)	+= nfs4sysctl.o
>  nfsv4-$(CONFIG_NFS_V4_1)	+= pnfs.o pnfs_dev.o pnfs_nfs.o
>  nfsv4-$(CONFIG_NFS_V4_2)	+= nfs42proc.o
> +nfsv4-$(CONFIG_NFS_V4_2)	+= nfs42xattr.o

Oh, you should also be able to combine the two CONFIG_NFS_V4_2 lines here:
 nfsv4-$(CONFIG_NFS_V4_2)	+= nfs42proc.o nfs42xattr.o

>  
>  obj-$(CONFIG_PNFS_FILE_LAYOUT) += filelayout/
>  obj-$(CONFIG_PNFS_BLOCK) += blocklayout/
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index d2be152796ef..9d4952d2306b 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -194,6 +194,7 @@ bool nfs_check_cache_invalid(struct inode *inode, unsigned
> long flags)
>  
>  	return nfs_check_cache_invalid_not_delegated(inode, flags);
>  }
> +EXPORT_SYMBOL_GPL(nfs_check_cache_invalid);
>  
>  static void nfs_set_cache_invalid(struct inode *inode, unsigned long flags)
>  {
> @@ -235,11 +236,13 @@ static void nfs_zap_caches_locked(struct inode *inode)
>  					| NFS_INO_INVALID_DATA
>  					| NFS_INO_INVALID_ACCESS
>  					| NFS_INO_INVALID_ACL
> +					| NFS_INO_INVALID_XATTR
>  					| NFS_INO_REVAL_PAGECACHE);
>  	} else
>  		nfs_set_cache_invalid(inode, NFS_INO_INVALID_ATTR
>  					| NFS_INO_INVALID_ACCESS
>  					| NFS_INO_INVALID_ACL
> +					| NFS_INO_INVALID_XATTR
>  					| NFS_INO_REVAL_PAGECACHE);
>  	nfs_zap_label_cache_locked(nfsi);
>  }
> @@ -1885,7 +1888,8 @@ static int nfs_update_inode(struct inode *inode, struct
> nfs_fattr *fattr)
>  			if (!(have_writers || have_delegation)) {
>  				invalid |= NFS_INO_INVALID_DATA
>  					| NFS_INO_INVALID_ACCESS
> -					| NFS_INO_INVALID_ACL;
> +					| NFS_INO_INVALID_ACL
> +					| NFS_INO_INVALID_XATTR;
>  				/* Force revalidate of all attributes */
>  				save_cache_validity |= NFS_INO_INVALID_CTIME
>  					| NFS_INO_INVALID_MTIME
> @@ -2084,6 +2088,9 @@ struct inode *nfs_alloc_inode(struct super_block *sb)
>  #if IS_ENABLED(CONFIG_NFS_V4)
>  	nfsi->nfs4_acl = NULL;
>  #endif /* CONFIG_NFS_V4 */
> +#ifdef CONFIG_NFS_V4_2
> +	nfsi->xattr_cache = NULL;
> +#endif
>  	return &nfsi->vfs_inode;
>  }
>  EXPORT_SYMBOL_GPL(nfs_alloc_inode);
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 1e3a7e119c93..67b8e4f7c554 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -575,6 +575,26 @@ extern void nfs4_test_session_trunk(struct rpc_clnt
> *clnt,
>  				struct rpc_xprt *xprt,
>  				void *data);
>  
> +#ifdef CONFIG_NFS_V4_2
> +extern int __init nfs4_xattr_cache_init(void);
> +extern void nfs4_xattr_cache_exit(void);
> +extern void nfs4_xattr_cache_add(struct inode *inode, const char *name,
> +				 const char *buf, struct page **pages,
> +				 ssize_t buflen);
> +extern void nfs4_xattr_cache_remove(struct inode *inode, const char *name);
> +extern ssize_t nfs4_xattr_cache_get(struct inode *inode, const char *name,
> +				char *buf, ssize_t buflen);
> +extern void nfs4_xattr_cache_set_list(struct inode *inode, const char *buf,
> +				      ssize_t buflen);
> +extern ssize_t nfs4_xattr_cache_list(struct inode *inode, char *buf,
> +				     ssize_t buflen);
> +extern void nfs4_xattr_cache_zap(struct inode *inode);
> +#else
> +static inline void nfs4_xattr_cache_zap(struct inode *inode)
> +{
> +}
> +#endif
> +
>  static inline struct inode *nfs_igrab_and_active(struct inode *inode)
>  {
>  	inode = igrab(inode);
> diff --git a/fs/nfs/nfs42proc.c b/fs/nfs/nfs42proc.c
> index 8c2e52bc986a..e200522469af 100644
> --- a/fs/nfs/nfs42proc.c
> +++ b/fs/nfs/nfs42proc.c
> @@ -1182,6 +1182,18 @@ static ssize_t _nfs42_proc_getxattr(struct inode
> *inode, const char *name,
>  	if (ret < 0)
>  		return ret;
>  
> +	/*
> +	 * Normally, the caching is done one layer up, but for successful
> +	 * RPCS, always cache the result here, even if the caller was
> +	 * just querying the length, or if the reply was too big for
> +	 * the caller. This avoids a second RPC in the case of the
> +	 * common query-alloc-retrieve cycle for xattrs.
> +	 *
> +	 * Note that xattr_len is always capped to XATTR_SIZE_MAX.
> +	 */
> +
> +	nfs4_xattr_cache_add(inode, name, NULL, pages, res.xattr_len);
> +
>  	if (buflen) {
>  		if (res.xattr_len > buflen)
>  			return -ERANGE;
> diff --git a/fs/nfs/nfs42xattr.c b/fs/nfs/nfs42xattr.c
> new file mode 100644
> index 000000000000..23fdab977a2a
> --- /dev/null
> +++ b/fs/nfs/nfs42xattr.c
> @@ -0,0 +1,1083 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * Copyright 2019, 2020 Amazon.com, Inc. or its affiliates. All rights
> reserved.
> + *
> + * User extended attribute client side cache functions.
> + *
> + * Author: Frank van der Linden <fllinden@amazon.com>
> + */
> +#include <linux/errno.h>
> +#include <linux/nfs_fs.h>
> +#include <linux/hashtable.h>
> +#include <linux/refcount.h>
> +#include <uapi/linux/xattr.h>
> +
> +#include "nfs4_fs.h"
> +#include "internal.h"
> +
> +/*
> + * User extended attributes client side caching is implemented by having
> + * a cache structure attached to NFS inodes. This structure is allocated
> + * when needed, and freed when the cache is zapped.
> + *
> + * The cache structure contains as hash table of entries, and a pointer
> + * to a special-cased entry for the listxattr cache.
> + *
> + * Accessing and allocating / freeing the caches is done via reference
> + * counting. The cache entries use a similar refcounting scheme.
> + *
> + * This makes freeing a cache, both from the shrinker and from the
> + * zap cache path, easy. It also means that, in current use cases,
> + * the large majority of inodes will not waste any memory, as they
> + * will never have any user extended attributes assigned to them.
> + *
> + * Attribute entries are hashed in to a simple hash table. They are
> + * also part of an LRU.
> + *
> + * There are three shrinkers.
> + *
> + * Two shrinkers deal with the cache entries themselves: one for
> + * large entries (> PAGE_SIZE), and one for smaller entries. The
> + * shrinker for the larger entries works more aggressively than
> + * those for the smaller entries.
> + *
> + * The other shrinker frees the cache structures themselves.
> + */
> +
> +/*
> + * 64 buckets is a good default. There is likely no reasonable
> + * workload that uses more than even 64 user extended attributes.
> + * You can certainly add a lot more - but you get what you ask for
> + * in those circumstances.
> + */
> +#define NFS4_XATTR_HASH_SIZE	64
> +
> +#define NFSDBG_FACILITY	NFSDBG_XATTRCACHE
> +
> +struct nfs4_xattr_cache;
> +struct nfs4_xattr_entry;
> +
> +struct nfs4_xattr_bucket {
> +	spinlock_t lock;
> +	struct hlist_head hlist;
> +	struct nfs4_xattr_cache *cache;
> +	bool draining;
> +};
> +
> +struct nfs4_xattr_cache {
> +	struct kref ref;
> +	spinlock_t hash_lock;	/* protects hashtable and lru */
> +	struct nfs4_xattr_bucket buckets[NFS4_XATTR_HASH_SIZE];
> +	struct list_head lru;
> +	struct list_head dispose;
> +	atomic_long_t nent;
> +	spinlock_t listxattr_lock;
> +	struct inode *inode;
> +	struct nfs4_xattr_entry *listxattr;
> +	struct work_struct work;
> +};
> +
> +struct nfs4_xattr_entry {
> +	struct kref ref;
> +	struct hlist_node hnode;
> +	struct list_head lru;
> +	struct list_head dispose;
> +	char *xattr_name;
> +	void *xattr_value;
> +	size_t xattr_size;
> +	struct nfs4_xattr_bucket *bucket;
> +	uint32_t flags;
> +};
> +
> +#define	NFS4_XATTR_ENTRY_EXTVAL	0x0001
> +
> +/*
> + * LRU list of NFS inodes that have xattr caches.
> + */
> +static struct list_lru nfs4_xattr_cache_lru;
> +static struct list_lru nfs4_xattr_entry_lru;
> +static struct list_lru nfs4_xattr_large_entry_lru;
> +
> +static struct kmem_cache *nfs4_xattr_cache_cachep;
> +
> +static struct workqueue_struct *nfs4_xattr_cache_wq;
> +
> +/*
> + * Hashing helper functions.
> + */
> +static void
> +nfs4_xattr_hash_init(struct nfs4_xattr_cache *cache)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < NFS4_XATTR_HASH_SIZE; i++) {
> +		INIT_HLIST_HEAD(&cache->buckets[i].hlist);
> +		spin_lock_init(&cache->buckets[i].lock);
> +		cache->buckets[i].cache = cache;
> +		cache->buckets[i].draining = false;
> +	}
> +}
> +
> +/*
> + * Locking order:
> + * 1. inode i_lock or bucket lock
> + * 2. list_lru lock (taken by list_lru_* functions)
> + */
> +
> +/*
> + * Wrapper functions to add a cache entry to the right LRU.
> + */
> +static bool
> +nfs4_xattr_entry_lru_add(struct nfs4_xattr_entry *entry)
> +{
> +	struct list_lru *lru;
> +
> +	lru = (entry->flags & NFS4_XATTR_ENTRY_EXTVAL) ?
> +	    &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
> +
> +	return list_lru_add(lru, &entry->lru);
> +}
> +
> +static bool
> +nfs4_xattr_entry_lru_del(struct nfs4_xattr_entry *entry)
> +{
> +	struct list_lru *lru;
> +
> +	lru = (entry->flags & NFS4_XATTR_ENTRY_EXTVAL) ?
> +	    &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
> +
> +	return list_lru_del(lru, &entry->lru);
> +}
> +
> +/*
> + * This function allocates cache entries. They are the normal
> + * extended attribute name/value pairs, but may also be a listxattr
> + * cache. Those allocations use the same entry so that they can be
> + * treated as one by the memory shrinker.
> + *
> + * xattr cache entries are allocated together with names. If the
> + * value fits in to one page with the entry structure and the name,
> + * it will also be part of the same allocation (kmalloc). This is
> + * expected to be the vast majority of cases. Larger allocations
> + * have a value pointer that is allocated separately by kvmalloc.
> + *
> + * Parameters:
> + *
> + * @name:  Name of the extended attribute. NULL for listxattr cache
> + *         entry.
> + * @value: Value of attribute, or listxattr cache. NULL if the
> + *         value is to be copied from pages instead.
> + * @pages: Pages to copy the value from, if not NULL. Passed in to
> + *	   make it easier to copy the value after an RPC, even if
> + *	   the value will not be passed up to application (e.g.
> + *	   for a 'query' getxattr with NULL buffer).
> + * @len:   Length of the value. Can be 0 for zero-length attribues.
> + *         @value and @pages will be NULL if @len is 0.
> + */
> +static struct nfs4_xattr_entry *
> +nfs4_xattr_alloc_entry(const char *name, const void *value,
> +		       struct page **pages, size_t len)
> +{
> +	struct nfs4_xattr_entry *entry;
> +	void *valp;
> +	char *namep;
> +	size_t alloclen, slen;
> +	char *buf;
> +	uint32_t flags;
> +
> +	BUILD_BUG_ON(sizeof(struct nfs4_xattr_entry) +
> +	    XATTR_NAME_MAX + 1 > PAGE_SIZE);
> +
> +	alloclen = sizeof(struct nfs4_xattr_entry);
> +	if (name != NULL) {
> +		slen = strlen(name) + 1;
> +		alloclen += slen;
> +	} else
> +		slen = 0;
> +
> +	if (alloclen + len <= PAGE_SIZE) {
> +		alloclen += len;
> +		flags = 0;
> +	} else {
> +		flags = NFS4_XATTR_ENTRY_EXTVAL;
> +	}
> +
> +	buf = kmalloc(alloclen, GFP_KERNEL_ACCOUNT | GFP_NOFS);
> +	if (buf == NULL)
> +		return NULL;
> +	entry = (struct nfs4_xattr_entry *)buf;
> +
> +	if (name != NULL) {
> +		namep = buf + sizeof(struct nfs4_xattr_entry);
> +		memcpy(namep, name, slen);
> +	} else {
> +		namep = NULL;
> +	}
> +
> +
> +	if (flags & NFS4_XATTR_ENTRY_EXTVAL) {
> +		valp = kvmalloc(len, GFP_KERNEL_ACCOUNT | GFP_NOFS);
> +		if (valp == NULL) {
> +			kfree(buf);
> +			return NULL;
> +		}
> +	} else if (len != 0) {
> +		valp = buf + sizeof(struct nfs4_xattr_entry) + slen;
> +	} else
> +		valp = NULL;
> +
> +	if (valp != NULL) {
> +		if (value != NULL)
> +			memcpy(valp, value, len);
> +		else
> +			_copy_from_pages(valp, pages, 0, len);
> +	}
> +
> +	entry->flags = flags;
> +	entry->xattr_value = valp;
> +	kref_init(&entry->ref);
> +	entry->xattr_name = namep;
> +	entry->xattr_size = len;
> +	entry->bucket = NULL;
> +	INIT_LIST_HEAD(&entry->lru);
> +	INIT_LIST_HEAD(&entry->dispose);
> +	INIT_HLIST_NODE(&entry->hnode);
> +
> +	return entry;
> +}
> +
> +static void
> +nfs4_xattr_free_entry(struct nfs4_xattr_entry *entry)
> +{
> +	if (entry->flags & NFS4_XATTR_ENTRY_EXTVAL)
> +		kvfree(entry->xattr_value);
> +	kfree(entry);
> +}
> +
> +static void
> +nfs4_xattr_free_entry_cb(struct kref *kref)
> +{
> +	struct nfs4_xattr_entry *entry;
> +
> +	entry = container_of(kref, struct nfs4_xattr_entry, ref);
> +
> +	if (WARN_ON(!list_empty(&entry->lru)))
> +		return;
> +
> +	nfs4_xattr_free_entry(entry);
> +}
> +
> +static void
> +nfs4_xattr_free_cache_cb(struct kref *kref)
> +{
> +	struct nfs4_xattr_cache *cache;
> +	int i;
> +
> +	cache = container_of(kref, struct nfs4_xattr_cache, ref);
> +
> +	for (i = 0; i < NFS4_XATTR_HASH_SIZE; i++) {
> +		if (WARN_ON(!hlist_empty(&cache->buckets[i].hlist)))
> +			return;
> +		cache->buckets[i].draining = false;
> +	}
> +
> +	cache->listxattr = NULL;
> +
> +	kmem_cache_free(nfs4_xattr_cache_cachep, cache);
> +
> +}
> +
> +static struct nfs4_xattr_cache *
> +nfs4_xattr_alloc_cache(void)
> +{
> +	struct nfs4_xattr_cache *cache;
> +
> +	cache = kmem_cache_alloc(nfs4_xattr_cache_cachep,
> +	    GFP_KERNEL_ACCOUNT | GFP_NOFS);
> +	if (cache == NULL)
> +		return NULL;
> +
> +	kref_init(&cache->ref);
> +	atomic_long_set(&cache->nent, 0);
> +
> +	return cache;
> +}
> +
> +/*
> + * Set the listxattr cache, which is a special-cased cache entry.
> + * The special value ERR_PTR(-ESTALE) is used to indicate that
> + * the cache is being drained - this prevents a new listxattr
> + * cache from being added to what is now a stale cache.
> + */
> +static int
> +nfs4_xattr_set_listcache(struct nfs4_xattr_cache *cache,
> +			 struct nfs4_xattr_entry *new)
> +{
> +	struct nfs4_xattr_entry *old;
> +	int ret = 1;
> +
> +	spin_lock(&cache->listxattr_lock);
> +
> +	old = cache->listxattr;
> +
> +	if (old == ERR_PTR(-ESTALE)) {
> +		ret = 0;
> +		goto out;
> +	}
> +
> +	cache->listxattr = new;
> +	if (new != NULL && new != ERR_PTR(-ESTALE))
> +		nfs4_xattr_entry_lru_add(new);
> +
> +	if (old != NULL) {
> +		nfs4_xattr_entry_lru_del(old);
> +		kref_put(&old->ref, nfs4_xattr_free_entry_cb);
> +	}
> +out:
> +	spin_unlock(&cache->listxattr_lock);
> +
> +	return ret;
> +}
> +
> +/*
> + * Unlink a cache from its parent inode, clearing out an invalid
> + * cache. Must be called with i_lock held.
> + */
> +static struct nfs4_xattr_cache *
> +nfs4_xattr_cache_unlink(struct inode *inode)
> +{
> +	struct nfs_inode *nfsi;
> +	struct nfs4_xattr_cache *oldcache;
> +
> +	nfsi = NFS_I(inode);
> +
> +	oldcache = nfsi->xattr_cache;
> +	if (oldcache != NULL) {
> +		list_lru_del(&nfs4_xattr_cache_lru, &oldcache->lru);
> +		oldcache->inode = NULL;
> +	}
> +	nfsi->xattr_cache = NULL;
> +	nfsi->cache_validity &= ~NFS_INO_INVALID_XATTR;
> +
> +	return oldcache;
> +
> +}
> +
> +/*
> + * Discard a cache. Usually called by a worker, since walking all
> + * the entries can take up some cycles that we don't want to waste
> + * in the I/O path. Can also be called from the shrinker callback.
> + *
> + * The cache is dead, it has already been unlinked from its inode,
> + * and no longer appears on the cache LRU list.
> + *
> + * Mark all buckets as draining, so that no new entries are added. This
> + * could still happen in the unlikely, but possible case that another
> + * thread had grabbed a reference before it was unlinked from the inode,
> + * and is still holding it for an add operation.
> + *
> + * Remove all entries from the LRU lists, so that there is no longer
> + * any way to 'find' this cache. Then, remove the entries from the hash
> + * table.
> + *
> + * At that point, the cache will remain empty and can be freed when the final
> + * reference drops, which is very likely the kref_put at the end of
> + * this function, or the one called immediately afterwards in the
> + * shrinker callback.
> + */
> +static void
> +nfs4_xattr_discard_cache(struct nfs4_xattr_cache *cache)
> +{
> +	unsigned int i;
> +	struct nfs4_xattr_entry *entry;
> +	struct nfs4_xattr_bucket *bucket;
> +	struct hlist_node *n;
> +
> +	nfs4_xattr_set_listcache(cache, ERR_PTR(-ESTALE));
> +
> +	for (i = 0; i < NFS4_XATTR_HASH_SIZE; i++) {
> +		bucket = &cache->buckets[i];
> +
> +		spin_lock(&bucket->lock);
> +		bucket->draining = true;
> +		hlist_for_each_entry_safe(entry, n, &bucket->hlist, hnode) {
> +			nfs4_xattr_entry_lru_del(entry);
> +			hlist_del_init(&entry->hnode);
> +			kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> +		}
> +		spin_unlock(&bucket->lock);
> +	}
> +
> +	atomic_long_set(&cache->nent, 0);
> +
> +	kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +}
> +
> +static void
> +nfs4_xattr_discard_cache_worker(struct work_struct *work)
> +{
> +	struct nfs4_xattr_cache *cache = container_of(work,
> +	    struct nfs4_xattr_cache, work);
> +
> +	nfs4_xattr_discard_cache(cache);
> +}
> +
> +static void
> +nfs4_xattr_reap_cache(struct nfs4_xattr_cache *cache)
> +{
> +	queue_work(nfs4_xattr_cache_wq, &cache->work);
> +}
> +
> +/*
> + * Get a referenced copy of the cache structure. Avoid doing allocs
> + * while holding i_lock. Which means that we do some optimistic allocation,
> + * and might have to free the result in rare cases.
> + *
> + * This function only checks the NFS_INO_INVALID_XATTR cache validity bit
> + * and acts accordingly, replacing the cache when needed. For the read case
> + * (!add), this means that the caller must make sure that the cache
> + * is valid before caling this function. getxattr and listxattr call
> + * revalidate_inode to do this. The attribute cache timeout (for the
> + * non-delegated case) is expected to be dealt with in the revalidate
> + * call.
> + */
> +
> +static struct nfs4_xattr_cache *
> +nfs4_xattr_get_cache(struct inode *inode, int add)
> +{
> +	struct nfs_inode *nfsi;
> +	struct nfs4_xattr_cache *cache, *oldcache, *newcache;
> +
> +	nfsi = NFS_I(inode);
> +
> +	cache = oldcache = NULL;
> +
> +	spin_lock(&inode->i_lock);
> +
> +	if (nfsi->cache_validity & NFS_INO_INVALID_XATTR)
> +		oldcache = nfs4_xattr_cache_unlink(inode);
> +	else
> +		cache = nfsi->xattr_cache;
> +
> +	if (cache != NULL)
> +		kref_get(&cache->ref);
> +
> +	spin_unlock(&inode->i_lock);
> +
> +	if (add && cache == NULL) {
> +		newcache = NULL;
> +
> +		cache = nfs4_xattr_alloc_cache();
> +		if (cache == NULL)
> +			goto out;
> +
> +		spin_lock(&inode->i_lock);
> +		if (nfsi->cache_validity & NFS_INO_INVALID_XATTR) {
> +			/*
> +			 * The cache was invalidated again. Give up,
> +			 * since what we want to enter is now likely
> +			 * outdated anyway.
> +			 */
> +			spin_unlock(&inode->i_lock);
> +			kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +			cache = NULL;
> +			goto out;
> +		}
> +
> +		/*
> +		 * Check if someone beat us to it.
> +		 */
> +		if (nfsi->xattr_cache != NULL) {
> +			newcache = nfsi->xattr_cache;
> +			kref_get(&newcache->ref);
> +		} else {
> +			kref_get(&cache->ref);
> +			nfsi->xattr_cache = cache;
> +			cache->inode = inode;
> +			list_lru_add(&nfs4_xattr_cache_lru, &cache->lru);
> +		}
> +
> +		spin_unlock(&inode->i_lock);
> +
> +		/*
> +		 * If there was a race, throw away the cache we just
> +		 * allocated, and use the new one allocated by someone
> +		 * else.
> +		 */
> +		if (newcache != NULL) {
> +			kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +			cache = newcache;
> +		}
> +	}
> +
> +out:
> +	/*
> +	 * Discarding an old cache is done via a workqueue.
> +	 */
> +	if (oldcache != NULL)
> +		nfs4_xattr_reap_cache(oldcache);
> +
> +	return cache;
> +}
> +
> +static inline struct nfs4_xattr_bucket *
> +nfs4_xattr_hash_bucket(struct nfs4_xattr_cache *cache, const char *name)
> +{
> +	return &cache->buckets[jhash(name, strlen(name), 0) &
> +	    (ARRAY_SIZE(cache->buckets) - 1)];
> +}
> +
> +static struct nfs4_xattr_entry *
> +nfs4_xattr_get_entry(struct nfs4_xattr_bucket *bucket, const char *name)
> +{
> +	struct nfs4_xattr_entry *entry;
> +
> +	entry = NULL;
> +
> +	hlist_for_each_entry(entry, &bucket->hlist, hnode) {
> +		if (!strcmp(entry->xattr_name, name))
> +			break;
> +	}
> +
> +	return entry;
> +}
> +
> +static int
> +nfs4_xattr_hash_add(struct nfs4_xattr_cache *cache,
> +		    struct nfs4_xattr_entry *entry)
> +{
> +	struct nfs4_xattr_bucket *bucket;
> +	struct nfs4_xattr_entry *oldentry = NULL;
> +	int ret = 1;
> +
> +	bucket = nfs4_xattr_hash_bucket(cache, entry->xattr_name);
> +	entry->bucket = bucket;
> +
> +	spin_lock(&bucket->lock);
> +
> +	if (bucket->draining) {
> +		ret = 0;
> +		goto out;
> +	}
> +
> +	oldentry = nfs4_xattr_get_entry(bucket, entry->xattr_name);
> +	if (oldentry != NULL) {
> +		hlist_del_init(&oldentry->hnode);
> +		nfs4_xattr_entry_lru_del(oldentry);
> +	} else {
> +		atomic_long_inc(&cache->nent);
> +	}
> +
> +	hlist_add_head(&entry->hnode, &bucket->hlist);
> +	nfs4_xattr_entry_lru_add(entry);
> +
> +out:
> +	spin_unlock(&bucket->lock);
> +
> +	if (oldentry != NULL)
> +		kref_put(&oldentry->ref, nfs4_xattr_free_entry_cb);
> +
> +	return ret;
> +}
> +
> +static void
> +nfs4_xattr_hash_remove(struct nfs4_xattr_cache *cache, const char *name)
> +{
> +	struct nfs4_xattr_bucket *bucket;
> +	struct nfs4_xattr_entry *entry;
> +
> +	bucket = nfs4_xattr_hash_bucket(cache, name);
> +
> +	spin_lock(&bucket->lock);
> +
> +	entry = nfs4_xattr_get_entry(bucket, name);
> +	if (entry != NULL) {
> +		hlist_del_init(&entry->hnode);
> +		nfs4_xattr_entry_lru_del(entry);
> +		atomic_long_dec(&cache->nent);
> +	}
> +
> +	spin_unlock(&bucket->lock);
> +
> +	if (entry != NULL)
> +		kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> +}
> +
> +static struct nfs4_xattr_entry *
> +nfs4_xattr_hash_find(struct nfs4_xattr_cache *cache, const char *name)
> +{
> +	struct nfs4_xattr_bucket *bucket;
> +	struct nfs4_xattr_entry *entry;
> +
> +	bucket = nfs4_xattr_hash_bucket(cache, name);
> +
> +	spin_lock(&bucket->lock);
> +
> +	entry = nfs4_xattr_get_entry(bucket, name);
> +	if (entry != NULL)
> +		kref_get(&entry->ref);
> +
> +	spin_unlock(&bucket->lock);
> +
> +	return entry;
> +}
> +
> +/*
> + * Entry point to retrieve an entry from the cache.
> + */
> +ssize_t nfs4_xattr_cache_get(struct inode *inode, const char *name, char
> *buf,
> +			 ssize_t buflen)
> +{
> +	struct nfs4_xattr_cache *cache;
> +	struct nfs4_xattr_entry *entry;
> +	ssize_t ret;
> +
> +	cache = nfs4_xattr_get_cache(inode, 0);
> +	if (cache == NULL)
> +		return -ENOENT;
> +
> +	ret = 0;
> +	entry = nfs4_xattr_hash_find(cache, name);
> +
> +	if (entry != NULL) {
> +		dprintk("%s: cache hit '%s', len %lu\n", __func__,
> +		    entry->xattr_name, (unsigned long)entry->xattr_size);
> +		if (buflen == 0) {
> +			/* Length probe only */
> +			ret = entry->xattr_size;
> +		} else if (buflen < entry->xattr_size)
> +			ret = -ERANGE;
> +		else {
> +			memcpy(buf, entry->xattr_value, entry->xattr_size);
> +			ret = entry->xattr_size;
> +		}
> +		kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> +	} else {
> +		dprintk("%s: cache miss '%s'\n", __func__, name);
> +		ret = -ENOENT;
> +	}
> +
> +	kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +
> +	return ret;
> +}
> +
> +/*
> + * Retrieve a cached list of xattrs from the cache.
> + */
> +ssize_t nfs4_xattr_cache_list(struct inode *inode, char *buf, ssize_t buflen)
> +{
> +	struct nfs4_xattr_cache *cache;
> +	struct nfs4_xattr_entry *entry;
> +	ssize_t ret;
> +
> +	cache = nfs4_xattr_get_cache(inode, 0);
> +	if (cache == NULL)
> +		return -ENOENT;
> +
> +	spin_lock(&cache->listxattr_lock);
> +
> +	entry = cache->listxattr;
> +
> +	if (entry != NULL && entry != ERR_PTR(-ESTALE)) {
> +		if (buflen == 0) {
> +			/* Length probe only */
> +			ret = entry->xattr_size;
> +		} else if (entry->xattr_size > buflen)
> +			ret = -ERANGE;
> +		else {
> +			memcpy(buf, entry->xattr_value, entry->xattr_size);
> +			ret = entry->xattr_size;
> +		}
> +	} else {
> +		ret = -ENOENT;
> +	}
> +
> +	spin_unlock(&cache->listxattr_lock);
> +
> +	kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +
> +	return ret;
> +}
> +
> +/*
> + * Add an xattr to the cache.
> + *
> + * This also invalidates the xattr list cache.
> + */
> +void nfs4_xattr_cache_add(struct inode *inode, const char *name,
> +			  const char *buf, struct page **pages, ssize_t buflen)
> +{
> +	struct nfs4_xattr_cache *cache;
> +	struct nfs4_xattr_entry *entry;
> +
> +	dprintk("%s: add '%s' len %lu\n", __func__,
> +	    name, (unsigned long)buflen);
> +
> +	cache = nfs4_xattr_get_cache(inode, 1);
> +	if (cache == NULL)
> +		return;
> +
> +	entry = nfs4_xattr_alloc_entry(name, buf, pages, buflen);
> +	if (entry == NULL)
> +		goto out;
> +
> +	(void)nfs4_xattr_set_listcache(cache, NULL);
> +
> +	if (!nfs4_xattr_hash_add(cache, entry))
> +		kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> +
> +out:
> +	kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +}
> +
> +
> +/*
> + * Remove an xattr from the cache.
> + *
> + * This also invalidates the xattr list cache.
> + */
> +void nfs4_xattr_cache_remove(struct inode *inode, const char *name)
> +{
> +	struct nfs4_xattr_cache *cache;
> +
> +	dprintk("%s: remove '%s'\n", __func__, name);
> +
> +	cache = nfs4_xattr_get_cache(inode, 0);
> +	if (cache == NULL)
> +		return;
> +
> +	(void)nfs4_xattr_set_listcache(cache, NULL);
> +	nfs4_xattr_hash_remove(cache, name);
> +
> +	kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +}
> +
> +/*
> + * Cache listxattr output, replacing any possible old one.
> + */
> +void nfs4_xattr_cache_set_list(struct inode *inode, const char *buf,
> +			       ssize_t buflen)
> +{
> +	struct nfs4_xattr_cache *cache;
> +	struct nfs4_xattr_entry *entry;
> +
> +	cache = nfs4_xattr_get_cache(inode, 1);
> +	if (cache == NULL)
> +		return;
> +
> +	entry = nfs4_xattr_alloc_entry(NULL, buf, NULL, buflen);
> +	if (entry == NULL)
> +		goto out;
> +
> +	/*
> +	 * This is just there to be able to get to bucket->cache,
> +	 * which is obviously the same for all buckets, so just
> +	 * use bucket 0.
> +	 */
> +	entry->bucket = &cache->buckets[0];
> +
> +	if (!nfs4_xattr_set_listcache(cache, entry))
> +		kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> +
> +out:
> +	kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +}
> +
> +/*
> + * Zap the entire cache. Called when an inode is evicted.
> + */
> +void nfs4_xattr_cache_zap(struct inode *inode)
> +{
> +	struct nfs4_xattr_cache *oldcache;
> +
> +	spin_lock(&inode->i_lock);
> +	oldcache = nfs4_xattr_cache_unlink(inode);
> +	spin_unlock(&inode->i_lock);
> +
> +	if (oldcache)
> +		nfs4_xattr_discard_cache(oldcache);
> +}
> +
> +/*
> + * The entry LRU is shrunk more aggressively than the cache LRU,
> + * by settings @seeks to 1.
> + *
> + * Cache structures are freed only when they've become empty, after
> + * pruning all but one entry.
> + */
> +
> +static unsigned long nfs4_xattr_cache_count(struct shrinker *shrink,
> +					    struct shrink_control *sc);
> +static unsigned long nfs4_xattr_entry_count(struct shrinker *shrink,
> +					    struct shrink_control *sc);
> +static unsigned long nfs4_xattr_cache_scan(struct shrinker *shrink,
> +					   struct shrink_control *sc);
> +static unsigned long nfs4_xattr_entry_scan(struct shrinker *shrink,
> +					   struct shrink_control *sc);
> +
> +static struct shrinker nfs4_xattr_cache_shrinker = {
> +	.count_objects	= nfs4_xattr_cache_count,
> +	.scan_objects	= nfs4_xattr_cache_scan,
> +	.seeks		= DEFAULT_SEEKS,
> +	.flags		= SHRINKER_MEMCG_AWARE,
> +};
> +
> +static struct shrinker nfs4_xattr_entry_shrinker = {
> +	.count_objects	= nfs4_xattr_entry_count,
> +	.scan_objects	= nfs4_xattr_entry_scan,
> +	.seeks		= DEFAULT_SEEKS,
> +	.batch		= 512,
> +	.flags		= SHRINKER_MEMCG_AWARE,
> +};
> +
> +static struct shrinker nfs4_xattr_large_entry_shrinker = {
> +	.count_objects	= nfs4_xattr_entry_count,
> +	.scan_objects	= nfs4_xattr_entry_scan,
> +	.seeks		= 1,
> +	.batch		= 512,
> +	.flags		= SHRINKER_MEMCG_AWARE,
> +};
> +
> +static enum lru_status
> +cache_lru_isolate(struct list_head *item,
> +	struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
> +{
> +	struct list_head *dispose = arg;
> +	struct inode *inode;
> +	struct nfs4_xattr_cache *cache = container_of(item,
> +	    struct nfs4_xattr_cache, lru);
> +
> +	if (atomic_long_read(&cache->nent) > 1)
> +		return LRU_SKIP;
> +
> +	/*
> +	 * If a cache structure is on the LRU list, we know that
> +	 * its inode is valid. Try to lock it to break the link.
> +	 * Since we're inverting the lock order here, only try.
> +	 */
> +	inode = cache->inode;
> +
> +	if (!spin_trylock(&inode->i_lock))
> +		return LRU_SKIP;
> +
> +	kref_get(&cache->ref);
> +
> +	cache->inode = NULL;
> +	NFS_I(inode)->xattr_cache = NULL;
> +	NFS_I(inode)->cache_validity &= ~NFS_INO_INVALID_XATTR;
> +	list_lru_isolate(lru, &cache->lru);
> +
> +	spin_unlock(&inode->i_lock);
> +
> +	list_add_tail(&cache->dispose, dispose);
> +	return LRU_REMOVED;
> +}
> +
> +static unsigned long
> +nfs4_xattr_cache_scan(struct shrinker *shrink, struct shrink_control *sc)
> +{
> +	LIST_HEAD(dispose);
> +	unsigned long freed;
> +	struct nfs4_xattr_cache *cache;
> +
> +	freed = list_lru_shrink_walk(&nfs4_xattr_cache_lru, sc,
> +	    cache_lru_isolate, &dispose);
> +	while (!list_empty(&dispose)) {
> +		cache = list_first_entry(&dispose, struct nfs4_xattr_cache,
> +		    dispose);
> +		list_del_init(&cache->dispose);
> +		nfs4_xattr_discard_cache(cache);
> +		kref_put(&cache->ref, nfs4_xattr_free_cache_cb);
> +	}
> +
> +	return freed;
> +}
> +
> +
> +static unsigned long
> +nfs4_xattr_cache_count(struct shrinker *shrink, struct shrink_control *sc)
> +{
> +	unsigned long count;
> +
> +	count = list_lru_count(&nfs4_xattr_cache_lru);
> +	return vfs_pressure_ratio(count);
> +}
> +
> +static enum lru_status
> +entry_lru_isolate(struct list_head *item,
> +	struct list_lru_one *lru, spinlock_t *lru_lock, void *arg)
> +{
> +	struct list_head *dispose = arg;
> +	struct nfs4_xattr_bucket *bucket;
> +	struct nfs4_xattr_cache *cache;
> +	struct nfs4_xattr_entry *entry = container_of(item,
> +	    struct nfs4_xattr_entry, lru);
> +
> +	bucket = entry->bucket;
> +	cache = bucket->cache;
> +
> +	/*
> +	 * Unhook the entry from its parent (either a cache bucket
> +	 * or a cache structure if it's a listxattr buf), so that
> +	 * it's no longer found. Then add it to the isolate list,
> +	 * to be freed later.
> +	 *
> +	 * In both cases, we're reverting lock order, so use
> +	 * trylock and skip the entry if we can't get the lock.
> +	 */
> +	if (entry->xattr_name != NULL) {
> +		/* Regular cache entry */
> +		if (!spin_trylock(&bucket->lock))
> +			return LRU_SKIP;
> +
> +		kref_get(&entry->ref);
> +
> +		hlist_del_init(&entry->hnode);
> +		atomic_long_dec(&cache->nent);
> +		list_lru_isolate(lru, &entry->lru);
> +
> +		spin_unlock(&bucket->lock);
> +	} else {
> +		/* Listxattr cache entry */
> +		if (!spin_trylock(&cache->listxattr_lock))
> +			return LRU_SKIP;
> +
> +		kref_get(&entry->ref);
> +
> +		cache->listxattr = NULL;
> +		list_lru_isolate(lru, &entry->lru);
> +
> +		spin_unlock(&cache->listxattr_lock);
> +	}
> +
> +	list_add_tail(&entry->dispose, dispose);
> +	return LRU_REMOVED;
> +}
> +
> +static unsigned long
> +nfs4_xattr_entry_scan(struct shrinker *shrink, struct shrink_control *sc)
> +{
> +	LIST_HEAD(dispose);
> +	unsigned long freed;
> +	struct nfs4_xattr_entry *entry;
> +	struct list_lru *lru;
> +
> +	lru = (shrink == &nfs4_xattr_large_entry_shrinker) ?
> +	    &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
> +
> +	freed = list_lru_shrink_walk(lru, sc, entry_lru_isolate, &dispose);
> +
> +	while (!list_empty(&dispose)) {
> +		entry = list_first_entry(&dispose, struct nfs4_xattr_entry,
> +		    dispose);
> +		list_del_init(&entry->dispose);
> +
> +		/*
> +		 * Drop two references: the one that we just grabbed
> +		 * in entry_lru_isolate, and the one that was set
> +		 * when the entry was first allocated.
> +		 */
> +		kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> +		kref_put(&entry->ref, nfs4_xattr_free_entry_cb);
> +	}
> +
> +	return freed;
> +}
> +
> +static unsigned long
> +nfs4_xattr_entry_count(struct shrinker *shrink, struct shrink_control *sc)
> +{
> +	unsigned long count;
> +	struct list_lru *lru;
> +
> +	lru = (shrink == &nfs4_xattr_large_entry_shrinker) ?
> +	    &nfs4_xattr_large_entry_lru : &nfs4_xattr_entry_lru;
> +
> +	count = list_lru_count(lru);
> +	return vfs_pressure_ratio(count);
> +}
> +
> +
> +static void nfs4_xattr_cache_init_once(void *p)
> +{
> +	struct nfs4_xattr_cache *cache = (struct nfs4_xattr_cache *)p;
> +
> +	spin_lock_init(&cache->listxattr_lock);
> +	atomic_long_set(&cache->nent, 0);
> +	nfs4_xattr_hash_init(cache);
> +	cache->listxattr = NULL;
> +	INIT_WORK(&cache->work, nfs4_xattr_discard_cache_worker);
> +	INIT_LIST_HEAD(&cache->lru);
> +	INIT_LIST_HEAD(&cache->dispose);
> +}
> +
> +int __init nfs4_xattr_cache_init(void)
> +{
> +	int ret = 0;
> +
> +	nfs4_xattr_cache_cachep = kmem_cache_create("nfs4_xattr_cache_cache",
> +	    sizeof(struct nfs4_xattr_cache), 0,
> +	    (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD|SLAB_ACCOUNT),
> +	    nfs4_xattr_cache_init_once);
> +	if (nfs4_xattr_cache_cachep == NULL)
> +		return -ENOMEM;
> +
> +	ret = list_lru_init_memcg(&nfs4_xattr_large_entry_lru,
> +	    &nfs4_xattr_large_entry_shrinker);
> +	if (ret)
> +		goto out4;
> +
> +	ret = list_lru_init_memcg(&nfs4_xattr_entry_lru,
> +	    &nfs4_xattr_entry_shrinker);
> +	if (ret)
> +		goto out3;
> +
> +	ret = list_lru_init_memcg(&nfs4_xattr_cache_lru,
> +	    &nfs4_xattr_cache_shrinker);
> +	if (ret)
> +		goto out2;
> +
> +	nfs4_xattr_cache_wq = alloc_workqueue("nfs4_xattr", WQ_MEM_RECLAIM, 0);
> +	if (nfs4_xattr_cache_wq == NULL)
> +		goto out1;
> +
> +	ret = register_shrinker(&nfs4_xattr_cache_shrinker);
> +	if (ret)
> +		goto out0;
> +
> +	ret = register_shrinker(&nfs4_xattr_entry_shrinker);
> +	if (ret)
> +		goto out;
> +
> +	ret = register_shrinker(&nfs4_xattr_large_entry_shrinker);
> +	if (!ret)
> +		return 0;
> +
> +	unregister_shrinker(&nfs4_xattr_entry_shrinker);
> +out:
> +	unregister_shrinker(&nfs4_xattr_cache_shrinker);
> +out0:
> +	destroy_workqueue(nfs4_xattr_cache_wq);
> +out1:
> +	list_lru_destroy(&nfs4_xattr_cache_lru);
> +out2:
> +	list_lru_destroy(&nfs4_xattr_entry_lru);
> +out3:
> +	list_lru_destroy(&nfs4_xattr_large_entry_lru);
> +out4:
> +	kmem_cache_destroy(nfs4_xattr_cache_cachep);
> +
> +	return ret;
> +}
> +
> +void nfs4_xattr_cache_exit(void)
> +{
> +	unregister_shrinker(&nfs4_xattr_entry_shrinker);
> +	unregister_shrinker(&nfs4_xattr_cache_shrinker);
> +	list_lru_destroy(&nfs4_xattr_entry_lru);
> +	list_lru_destroy(&nfs4_xattr_cache_lru);
> +	kmem_cache_destroy(nfs4_xattr_cache_cachep);
> +	destroy_workqueue(nfs4_xattr_cache_wq);
> +}
> diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
> index 6df94857f5bb..079c1ac84cee 100644
> --- a/fs/nfs/nfs4proc.c
> +++ b/fs/nfs/nfs4proc.c
> @@ -7459,6 +7459,7 @@ static int nfs4_xattr_set_nfs4_user(const struct
> xattr_handler *handler,
>  				    size_t buflen, int flags)
>  {
>  	struct nfs_access_entry cache;
> +	int ret;
>  
>  	if (!nfs_server_capable(inode, NFS_CAP_XATTR))
>  		return -EOPNOTSUPP;
> @@ -7477,10 +7478,17 @@ static int nfs4_xattr_set_nfs4_user(const struct
> xattr_handler *handler,
>  			return -EACCES;
>  	}
>  
> -	if (buf == NULL)
> -		return nfs42_proc_removexattr(inode, key);
> -	else
> -		return nfs42_proc_setxattr(inode, key, buf, buflen, flags);
> +	if (buf == NULL) {
> +		ret = nfs42_proc_removexattr(inode, key);
> +		if (!ret)
> +			nfs4_xattr_cache_remove(inode, key);
> +	} else {
> +		ret = nfs42_proc_setxattr(inode, key, buf, buflen, flags);
> +		if (!ret)
> +			nfs4_xattr_cache_add(inode, key, buf, NULL, buflen);
> +	}
> +
> +	return ret;
>  }
>  
>  static int nfs4_xattr_get_nfs4_user(const struct xattr_handler *handler,
> @@ -7488,6 +7496,7 @@ static int nfs4_xattr_get_nfs4_user(const struct
> xattr_handler *handler,
>  				    const char *key, void *buf, size_t buflen)
>  {
>  	struct nfs_access_entry cache;
> +	ssize_t ret;
>  
>  	if (!nfs_server_capable(inode, NFS_CAP_XATTR))
>  		return -EOPNOTSUPP;
> @@ -7497,7 +7506,17 @@ static int nfs4_xattr_get_nfs4_user(const struct
> xattr_handler *handler,
>  			return -EACCES;
>  	}
>  
> -	return nfs42_proc_getxattr(inode, key, buf, buflen);
> +	ret = nfs_revalidate_inode(NFS_SERVER(inode), inode);
> +	if (ret)
> +		return ret;
> +
> +	ret = nfs4_xattr_cache_get(inode, key, buf, buflen);
> +	if (ret >= 0 || (ret < 0 && ret != -ENOENT))
> +		return ret;
> +
> +	ret = nfs42_proc_getxattr(inode, key, buf, buflen);
> +
> +	return ret;
>  }
>  
>  static ssize_t
> @@ -7505,7 +7524,7 @@ nfs4_listxattr_nfs4_user(struct inode *inode, char
> *list, size_t list_len)
>  {
>  	u64 cookie;
>  	bool eof;
> -	int ret, size;
> +	ssize_t ret, size;
>  	char *buf;
>  	size_t buflen;
>  	struct nfs_access_entry cache;
> @@ -7518,6 +7537,14 @@ nfs4_listxattr_nfs4_user(struct inode *inode, char
> *list, size_t list_len)
>  			return 0;
>  	}
>  
> +	ret = nfs_revalidate_inode(NFS_SERVER(inode), inode);
> +	if (ret)
> +		return ret;
> +
> +	ret = nfs4_xattr_cache_list(inode, list, list_len);
> +	if (ret >= 0 || (ret < 0 && ret != -ENOENT))
> +		return ret;
> +
>  	cookie = 0;
>  	eof = false;
>  	buflen = list_len ? list_len : XATTR_LIST_MAX;
> @@ -7537,6 +7564,9 @@ nfs4_listxattr_nfs4_user(struct inode *inode, char
> *list, size_t list_len)
>  		size += ret;
>  	}
>  
> +	if (list_len)
> +		nfs4_xattr_cache_set_list(inode, list, size);
> +
>  	return size;
>  }
>  
> diff --git a/fs/nfs/nfs4super.c b/fs/nfs/nfs4super.c
> index 1475f932d7da..0c1ab846b83d 100644
> --- a/fs/nfs/nfs4super.c
> +++ b/fs/nfs/nfs4super.c
> @@ -69,6 +69,7 @@ static void nfs4_evict_inode(struct inode *inode)
>  	pnfs_destroy_layout(NFS_I(inode));
>  	/* First call standard NFS clear_inode() code */
>  	nfs_clear_inode(inode);
> +	nfs4_xattr_cache_zap(inode);
>  }
>  
>  struct nfs_referral_count {
> @@ -268,6 +269,12 @@ static int __init init_nfs_v4(void)
>  	if (err)
>  		goto out1;
>  
> +#ifdef CONFIG_NFS_V4_2
> +	err = nfs4_xattr_cache_init();
> +	if (err)
> +		goto out2;
> +#endif
> +
>  	err = nfs4_register_sysctl();
>  	if (err)
>  		goto out2;
> @@ -288,6 +295,9 @@ static void __exit exit_nfs_v4(void)
>  	nfs4_pnfs_v3_ds_connect_unload();
>  
>  	unregister_nfs_version(&nfs_v4);
> +#ifdef CONFIG_NFS_V4_2
> +	nfs4_xattr_cache_exit();
> +#endif
>  	nfs4_unregister_sysctl();
>  	nfs_idmap_quit();
>  	nfs_dns_resolver_destroy();
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index 1fcfef670a4a..c08cc22d9c32 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -102,6 +102,8 @@ struct nfs_delegation;
>  
>  struct posix_acl;
>  
> +struct nfs4_xattr_cache;
> +
>  /*
>   * nfs fs inode data in memory
>   */
> @@ -188,6 +190,10 @@ struct nfs_inode {
>  	struct fscache_cookie	*fscache;
>  #endif
>  	struct inode		vfs_inode;
> +
> +#ifdef CONFIG_NFS_V4_2
> +	struct nfs4_xattr_cache *xattr_cache;
> +#endif
>  };
>  
>  struct nfs4_copy_state {
> diff --git a/include/uapi/linux/nfs_fs.h b/include/uapi/linux/nfs_fs.h
> index 7bcc8cd6831d..3afe3767c55d 100644
> --- a/include/uapi/linux/nfs_fs.h
> +++ b/include/uapi/linux/nfs_fs.h
> @@ -56,6 +56,7 @@
>  #define NFSDBG_PNFS		0x1000
>  #define NFSDBG_PNFS_LD		0x2000
>  #define NFSDBG_STATE		0x4000
> +#define NFSDBG_XATTRCACHE	0x8000
>  #define NFSDBG_ALL		0xFFFF
>  
>  

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 05/13] NFSv4.2: add client side XDR handling for extended attributes
  2020-03-11 19:56 ` [PATCH 05/13] NFSv4.2: add client side XDR handling for extended attributes Frank van der Linden
@ 2020-03-12 20:49   ` Schumaker, Anna
  0 siblings, 0 replies; 33+ messages in thread
From: Schumaker, Anna @ 2020-03-12 20:49 UTC (permalink / raw)
  To: fllinden, linux-nfs, trond.myklebust

Hi Frank,

On Wed, 2020-03-11 at 19:56 +0000, Frank van der Linden wrote:
> Define the argument and response structures that will be used for
> RFC 8276 extended attribute RPC calls, and implement the necessary
> functions to encode/decode the extended attribute operations.
> 
> Signed-off-by: Frank van der Linden <fllinden@amazon.com>
> ---
>  fs/nfs/nfs42xdr.c       | 368
> ++++++++++++++++++++++++++++++++++++++++++++++++
>  fs/nfs/nfs4xdr.c        |   6 +
>  include/linux/nfs_xdr.h |  59 +++++++-
>  3 files changed, 432 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/nfs42xdr.c b/fs/nfs/nfs42xdr.c
> index 6712daa9d85b..ac45a1027523 100644
> --- a/fs/nfs/nfs42xdr.c
> +++ b/fs/nfs/nfs42xdr.c
> @@ -407,6 +407,212 @@ static void encode_layouterror(struct xdr_stream *xdr,
>  	encode_device_error(xdr, &args->errors[0]);
>  }
>  
> +#ifdef CONFIG_NFS_V4_2

This file is only compiled when CONFIG_NFS_V4_2=y, so there is no need for the
#ifdef here.

Anna

> +static void encode_setxattr(struct xdr_stream *xdr,
> +			    const struct nfs42_setxattrargs *arg,
> +			    struct compound_hdr *hdr)
> +{
> +	__be32 *p;
> +
> +	BUILD_BUG_ON(XATTR_CREATE != SETXATTR4_CREATE);
> +	BUILD_BUG_ON(XATTR_REPLACE != SETXATTR4_REPLACE);
> +
> +	encode_op_hdr(xdr, OP_SETXATTR, decode_setxattr_maxsz, hdr);
> +	p = reserve_space(xdr, 4);
> +	*p = cpu_to_be32(arg->xattr_flags);
> +	encode_string(xdr, strlen(arg->xattr_name), arg->xattr_name);
> +	p = reserve_space(xdr, 4);
> +	*p = cpu_to_be32(arg->xattr_len);
> +	if (arg->xattr_len)
> +		xdr_write_pages(xdr, arg->xattr_pages, 0, arg->xattr_len);
> +}
> +
> +static int decode_setxattr(struct xdr_stream *xdr,
> +			   struct nfs4_change_info *cinfo)
> +{
> +	int status;
> +
> +	status = decode_op_hdr(xdr, OP_SETXATTR);
> +	if (status)
> +		goto out;
> +	status = decode_change_info(xdr, cinfo);
> +out:
> +	return status;
> +}
> +
> +
> +static void encode_getxattr(struct xdr_stream *xdr, const char *name,
> +			    struct compound_hdr *hdr)
> +{
> +	encode_op_hdr(xdr, OP_GETXATTR, decode_getxattr_maxsz, hdr);
> +	encode_string(xdr, strlen(name), name);
> +}
> +
> +static int decode_getxattr(struct xdr_stream *xdr,
> +			   struct nfs42_getxattrres *res,
> +			   struct rpc_rqst *req)
> +{
> +	int status;
> +	__be32 *p;
> +	u32 len, rdlen;
> +
> +	status = decode_op_hdr(xdr, OP_GETXATTR);
> +	if (status)
> +		return status;
> +
> +	p = xdr_inline_decode(xdr, 4);
> +	if (unlikely(!p))
> +		return -EIO;
> +
> +	len = be32_to_cpup(p);
> +	if (len > req->rq_rcv_buf.page_len)
> +		return -ERANGE;
> +
> +	res->xattr_len = len;
> +
> +	if (len > 0) {
> +		rdlen = xdr_read_pages(xdr, len);
> +		if (rdlen < len)
> +			return -EIO;
> +	}
> +
> +	return 0;
> +}
> +
> +static void encode_removexattr(struct xdr_stream *xdr, const char *name,
> +			       struct compound_hdr *hdr)
> +{
> +	encode_op_hdr(xdr, OP_REMOVEXATTR, decode_removexattr_maxsz, hdr);
> +	encode_string(xdr, strlen(name), name);
> +}
> +
> +
> +static int decode_removexattr(struct xdr_stream *xdr,
> +			   struct nfs4_change_info *cinfo)
> +{
> +	int status;
> +
> +	status = decode_op_hdr(xdr, OP_REMOVEXATTR);
> +	if (status)
> +		goto out;
> +
> +	status = decode_change_info(xdr, cinfo);
> +out:
> +	return status;
> +}
> +
> +static void encode_listxattrs(struct xdr_stream *xdr,
> +			     const struct nfs42_listxattrsargs *arg,
> +			     struct compound_hdr *hdr)
> +{
> +	__be32 *p;
> +
> +	encode_op_hdr(xdr, OP_LISTXATTRS, decode_listxattrs_maxsz + 1, hdr);
> +
> +	p = reserve_space(xdr, 12);
> +	if (unlikely(!p))
> +		return;
> +
> +	p = xdr_encode_hyper(p, arg->cookie);
> +	/*
> +	 * RFC 8276 says to specify the full max length of the LISTXATTRS
> +	 * XDR reply. Count is set to the XDR length of the names array
> +	 * plus the EOF marker. So, add the cookie and the names count.
> +	 */
> +	*p = cpu_to_be32(arg->count + 8 + 4);
> +}
> +
> +static int decode_listxattrs(struct xdr_stream *xdr,
> +			    struct nfs42_listxattrsres *res)
> +{
> +	int status;
> +	__be32 *p;
> +	u32 count, len, ulen;
> +	size_t left, copied;
> +	char *buf;
> +
> +	status = decode_op_hdr(xdr, OP_LISTXATTRS);
> +	if (status) {
> +		/*
> +		 * Special case: for LISTXATTRS, NFS4ERR_TOOSMALL
> +		 * should be translated to ERANGE.
> +		 */
> +		if (status == -ETOOSMALL)
> +			status = -ERANGE;
> +		goto out;
> +	}
> +
> +	p = xdr_inline_decode(xdr, 8);
> +	if (unlikely(!p))
> +		return -EIO;
> +
> +	xdr_decode_hyper(p, &res->cookie);
> +
> +	p = xdr_inline_decode(xdr, 4);
> +	if (unlikely(!p))
> +		return -EIO;
> +
> +	left = res->xattr_len;
> +	buf = res->xattr_buf;
> +
> +	count = be32_to_cpup(p);
> +	copied = 0;
> +
> +	/*
> +	 * We have asked for enough room to encode the maximum number
> +	 * of possible attribute names, so everything should fit.
> +	 *
> +	 * But, don't rely on that assumption. Just decode entries
> +	 * until they don't fit anymore, just in case the server did
> +	 * something odd.
> +	 */
> +	while (count--) {
> +		p = xdr_inline_decode(xdr, 4);
> +		if (unlikely(!p))
> +			return -EIO;
> +
> +		len = be32_to_cpup(p);
> +		if (len > (XATTR_NAME_MAX - XATTR_USER_PREFIX_LEN)) {
> +			status = -ERANGE;
> +			goto out;
> +		}
> +
> +		p = xdr_inline_decode(xdr, len);
> +		if (unlikely(!p))
> +			return -EIO;
> +
> +		ulen = len + XATTR_USER_PREFIX_LEN + 1;
> +		if (buf) {
> +			if (ulen > left) {
> +				status = -ERANGE;
> +				goto out;
> +			}
> +
> +			memcpy(buf, XATTR_USER_PREFIX, XATTR_USER_PREFIX_LEN);
> +			memcpy(buf + XATTR_USER_PREFIX_LEN, p, len);
> +
> +			buf[ulen - 1] = 0;
> +			buf += ulen;
> +			left -= ulen;
> +		}
> +		copied += ulen;
> +	}
> +
> +	p = xdr_inline_decode(xdr, 4);
> +	if (unlikely(!p))
> +		return -EIO;
> +
> +	res->eof = be32_to_cpup(p);
> +	res->copied = copied;
> +
> +out:
> +	if (status == -ERANGE && res->xattr_len == XATTR_LIST_MAX)
> +		status = -E2BIG;
> +
> +	return status;
> +}
> +#endif
> +
>  /*
>   * Encode ALLOCATE request
>   */
> @@ -1062,4 +1268,166 @@ static int nfs4_xdr_dec_layouterror(struct rpc_rqst
> *rqstp,
>  	return status;
>  }
>  
> +#ifdef CONFIG_NFS_V4_2
> +static void nfs4_xdr_enc_setxattr(struct rpc_rqst *req, struct xdr_stream
> *xdr,
> +				  const void *data)
> +{
> +	const struct nfs42_setxattrargs *args = data;
> +	struct compound_hdr hdr = {
> +		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
> +	};
> +
> +	encode_compound_hdr(xdr, req, &hdr);
> +	encode_sequence(xdr, &args->seq_args, &hdr);
> +	encode_putfh(xdr, args->fh, &hdr);
> +	encode_setxattr(xdr, args, &hdr);
> +	encode_nops(&hdr);
> +}
> +
> +static int nfs4_xdr_dec_setxattr(struct rpc_rqst *req, struct xdr_stream
> *xdr,
> +				 void *data)
> +{
> +	struct nfs42_setxattrres *res = data;
> +	struct compound_hdr hdr;
> +	int status;
> +
> +	status = decode_compound_hdr(xdr, &hdr);
> +	if (status)
> +		goto out;
> +	status = decode_sequence(xdr, &res->seq_res, req);
> +	if (status)
> +		goto out;
> +	status = decode_putfh(xdr);
> +	if (status)
> +		goto out;
> +
> +	status = decode_setxattr(xdr, &res->cinfo);
> +out:
> +	return status;
> +}
> +
> +static void nfs4_xdr_enc_getxattr(struct rpc_rqst *req, struct xdr_stream
> *xdr,
> +				  const void *data)
> +{
> +	const struct nfs42_getxattrargs *args = data;
> +	struct compound_hdr hdr = {
> +		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
> +	};
> +	size_t plen;
> +
> +	encode_compound_hdr(xdr, req, &hdr);
> +	encode_sequence(xdr, &args->seq_args, &hdr);
> +	encode_putfh(xdr, args->fh, &hdr);
> +	encode_getxattr(xdr, args->xattr_name, &hdr);
> +
> +	plen = args->xattr_len ? args->xattr_len : XATTR_SIZE_MAX;
> +
> +	rpc_prepare_reply_pages(req, args->xattr_pages, 0, plen,
> +	    hdr.replen);
> +	req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
> +
> +	encode_nops(&hdr);
> +}
> +
> +static int nfs4_xdr_dec_getxattr(struct rpc_rqst *rqstp,
> +				 struct xdr_stream *xdr, void *data)
> +{
> +	struct nfs42_getxattrres *res = data;
> +	struct compound_hdr hdr;
> +	int status;
> +
> +	status = decode_compound_hdr(xdr, &hdr);
> +	if (status)
> +		goto out;
> +	status = decode_sequence(xdr, &res->seq_res, rqstp);
> +	if (status)
> +		goto out;
> +	status = decode_putfh(xdr);
> +	if (status)
> +		goto out;
> +	status = decode_getxattr(xdr, res, rqstp);
> +out:
> +	return status;
> +}
> +
> +static void nfs4_xdr_enc_listxattrs(struct rpc_rqst *req,
> +				    struct xdr_stream *xdr, const void *data)
> +{
> +	const struct nfs42_listxattrsargs *args = data;
> +	struct compound_hdr hdr = {
> +		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
> +	};
> +
> +	encode_compound_hdr(xdr, req, &hdr);
> +	encode_sequence(xdr, &args->seq_args, &hdr);
> +	encode_putfh(xdr, args->fh, &hdr);
> +	encode_listxattrs(xdr, args, &hdr);
> +
> +	rpc_prepare_reply_pages(req, args->xattr_pages, 0, args->count,
> +	    hdr.replen);
> +	req->rq_rcv_buf.flags |= XDRBUF_SPARSE_PAGES;
> +
> +	encode_nops(&hdr);
> +}
> +
> +static int nfs4_xdr_dec_listxattrs(struct rpc_rqst *rqstp,
> +				   struct xdr_stream *xdr, void *data)
> +{
> +	struct nfs42_listxattrsres *res = data;
> +	struct compound_hdr hdr;
> +	int status;
> +
> +	xdr_set_scratch_buffer(xdr, page_address(res->scratch), PAGE_SIZE);
> +
> +	status = decode_compound_hdr(xdr, &hdr);
> +	if (status)
> +		goto out;
> +	status = decode_sequence(xdr, &res->seq_res, rqstp);
> +	if (status)
> +		goto out;
> +	status = decode_putfh(xdr);
> +	if (status)
> +		goto out;
> +	status = decode_listxattrs(xdr, res);
> +out:
> +	return status;
> +}
> +
> +static void nfs4_xdr_enc_removexattr(struct rpc_rqst *req,
> +				     struct xdr_stream *xdr, const void *data)
> +{
> +	const struct nfs42_removexattrargs *args = data;
> +	struct compound_hdr hdr = {
> +		.minorversion = nfs4_xdr_minorversion(&args->seq_args),
> +	};
> +
> +	encode_compound_hdr(xdr, req, &hdr);
> +	encode_sequence(xdr, &args->seq_args, &hdr);
> +	encode_putfh(xdr, args->fh, &hdr);
> +	encode_removexattr(xdr, args->xattr_name, &hdr);
> +	encode_nops(&hdr);
> +}
> +
> +static int nfs4_xdr_dec_removexattr(struct rpc_rqst *req,
> +				    struct xdr_stream *xdr, void *data)
> +{
> +	struct nfs42_removexattrres *res = data;
> +	struct compound_hdr hdr;
> +	int status;
> +
> +	status = decode_compound_hdr(xdr, &hdr);
> +	if (status)
> +		goto out;
> +	status = decode_sequence(xdr, &res->seq_res, req);
> +	if (status)
> +		goto out;
> +	status = decode_putfh(xdr);
> +	if (status)
> +		goto out;
> +
> +	status = decode_removexattr(xdr, &res->cinfo);
> +out:
> +	return status;
> +}
> +#endif
>  #endif /* __LINUX_FS_NFS_NFS4_2XDR_H */
> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> index bebc087a1433..9e1ced791789 100644
> --- a/fs/nfs/nfs4xdr.c
> +++ b/fs/nfs/nfs4xdr.c
> @@ -7479,6 +7479,8 @@ static struct {
>  	{ NFS4ERR_SYMLINK,	-ELOOP		},
>  	{ NFS4ERR_OP_ILLEGAL,	-EOPNOTSUPP	},
>  	{ NFS4ERR_DEADLOCK,	-EDEADLK	},
> +	{ NFS4ERR_NOXATTR,	-ENODATA	},
> +	{ NFS4ERR_XATTR2BIG,	-E2BIG		},
>  	{ -1,			-EIO		}
>  };
>  
> @@ -7607,6 +7609,10 @@ const struct rpc_procinfo nfs4_procedures[] = {
>  	PROC42(COPY_NOTIFY,	enc_copy_notify,	dec_copy_notify),
>  	PROC(LOOKUPP,		enc_lookupp,		dec_lookupp),
>  	PROC42(LAYOUTERROR,	enc_layouterror,	dec_layouterror),
> +	PROC42(GETXATTR,	enc_getxattr,		dec_getxattr),
> +	PROC42(SETXATTR,	enc_setxattr,		dec_setxattr),
> +	PROC42(LISTXATTRS,	enc_listxattrs,		dec_listxattrs),
> +	PROC42(REMOVEXATTR,	enc_removexattr,	dec_removexattr),
>  };
>  
>  static unsigned int nfs_version4_counts[ARRAY_SIZE(nfs4_procedures)];
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index 5076fe42c693..685deed805e8 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -1497,7 +1497,64 @@ struct nfs42_seek_res {
>  	u32	sr_eof;
>  	u64	sr_offset;
>  };
> -#endif
> +
> +struct nfs42_setxattrargs {
> +	struct nfs4_sequence_args	seq_args;
> +	struct nfs_fh			*fh;
> +	const char			*xattr_name;
> +	u32				xattr_flags;
> +	size_t				xattr_len;
> +	struct page			**xattr_pages;
> +};
> +
> +struct nfs42_setxattrres {
> +	struct nfs4_sequence_res	seq_res;
> +	struct nfs4_change_info		cinfo;
> +};
> +
> +struct nfs42_getxattrargs {
> +	struct nfs4_sequence_args	seq_args;
> +	struct nfs_fh			*fh;
> +	const char			*xattr_name;
> +	size_t				xattr_len;
> +	struct page			**xattr_pages;
> +};
> +
> +struct nfs42_getxattrres {
> +	struct nfs4_sequence_res	seq_res;
> +	size_t				xattr_len;
> +};
> +
> +struct nfs42_listxattrsargs {
> +	struct nfs4_sequence_args	seq_args;
> +	struct nfs_fh			*fh;
> +	u32				count;
> +	u64				cookie;
> +	struct page			**xattr_pages;
> +};
> +
> +struct nfs42_listxattrsres {
> +	struct nfs4_sequence_res	seq_res;
> +	struct page			*scratch;
> +	void				*xattr_buf;
> +	size_t				xattr_len;
> +	u64				cookie;
> +	bool				eof;
> +	size_t				copied;
> +};
> +
> +struct nfs42_removexattrargs {
> +	struct nfs4_sequence_args	seq_args;
> +	struct nfs_fh			*fh;
> +	const char			*xattr_name;
> +};
> +
> +struct nfs42_removexattrres {
> +	struct nfs4_sequence_res	seq_res;
> +	struct nfs4_change_info		cinfo;
> +};
> +
> +#endif /* CONFIG_NFS_V4_2 */
>  
>  struct nfs_page;
>  

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/13] NFSv4.2: query the server for extended attribute support
  2020-03-12 16:15   ` Mkrtchyan, Tigran
@ 2020-03-12 20:51     ` Frank van der Linden
  2020-03-12 21:15       ` Frank van der Linden
  0 siblings, 1 reply; 33+ messages in thread
From: Frank van der Linden @ 2020-03-12 20:51 UTC (permalink / raw)
  To: Mkrtchyan, Tigran; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs

On Thu, Mar 12, 2020 at 05:15:10PM +0100, Mkrtchyan, Tigran wrote:
> > +     fattr4_word2_nfs42_mask = FATTR4_WORD2_NFS42_MASK;
> > +
> > +     if (minorversion >= 2) {
> 
> I am not sure you need this extra check as by querying for  FATTR4_WORD0_SUPPORTED_ATTRS
> server already will return FATTR4_WORD2_XATTR_SUPPORT if supported.

There used to be a mount option check here, which I later removed. That
means that fattr4_word2_nfs42_mask is no longer needed, so I'll remove
that.

As for the attribute itself: I suppose the question here is what the client
should use to assume the server has RFC 8276 support:

1) The xattr_support attribute exists
2) The xattr support attribute exists *and* it's true for the root fh

Currently the code does 2) in one operation. That might not be 100%
correct - the RFC does mention that (section 8.2):

"Before interrogating this attribute using GETATTR, a client should
 determine whether it is a supported attribute by interrogating the
 supported_attrs attribute."

That's a "should", not a "MUST", but it's still waving its finger
at you not to do this.

Since 8.2.1 says:

"However, a client may reasonably assume that a server
 (or file system) that does not support the xattr_support attribute
 does not provide xattr support, and it acts on that basis."

..I think you're right, and the code should just use the existence
of the attribute as a signal that the server knows about xattrs -
operations should still error out correctly if it doesn't.

I'll make that change, thanks.

Frank

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/13] NFSv4.2: query the server for extended attribute support
  2020-03-12 20:51     ` Frank van der Linden
@ 2020-03-12 21:15       ` Frank van der Linden
  2020-03-13 11:11         ` Mkrtchyan, Tigran
  0 siblings, 1 reply; 33+ messages in thread
From: Frank van der Linden @ 2020-03-12 21:15 UTC (permalink / raw)
  To: Mkrtchyan, Tigran; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs

On Thu, Mar 12, 2020 at 08:51:39PM +0000, Frank van der Linden wrote:
> 1) The xattr_support attribute exists
> 2) The xattr support attribute exists *and* it's true for the root fh
> 
> Currently the code does 2) in one operation. That might not be 100%
> correct - the RFC does mention that (section 8.2):
> 
> "Before interrogating this attribute using GETATTR, a client should
>  determine whether it is a supported attribute by interrogating the
>  supported_attrs attribute."
> 
> That's a "should", not a "MUST", but it's still waving its finger
> at you not to do this.
> 
> Since 8.2.1 says:
> 
> "However, a client may reasonably assume that a server
>  (or file system) that does not support the xattr_support attribute
>  does not provide xattr support, and it acts on that basis."
> 
> ..I think you're right, and the code should just use the existence
> of the attribute as a signal that the server knows about xattrs -
> operations should still error out correctly if it doesn't.
> 
> I'll make that change, thanks.

..or, alternatively, only query xattr_support in nfs4_server_capabilities,
and then its actual value, if it exists, in nfs4_fs_info.

Any opinions on this?

- Frank

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/13] NFSv4.2: query the server for extended attribute support
  2020-03-12 21:15       ` Frank van der Linden
@ 2020-03-13 11:11         ` Mkrtchyan, Tigran
  2020-03-13 13:50           ` Trond Myklebust
  0 siblings, 1 reply; 33+ messages in thread
From: Mkrtchyan, Tigran @ 2020-03-13 11:11 UTC (permalink / raw)
  To: Frank van der Linden; +Cc: Trond Myklebust, Anna Schumaker, linux-nfs

Hi Frank,

I think the way how you have implemented is almost correct. You query
server for supported attributes. As result client will get all attributes
supported bu the server and if FATTR4_XATTR_SUPPORT is returned, then client
adds xattr capability. This the way how I read rfc8276. Do you have a different
opinion?

Regards,
   Tigran.

----- Original Message -----
> From: "Frank van der Linden" <fllinden@amazon.com>
> To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> Cc: "Trond Myklebust" <trond.myklebust@hammerspace.com>, "Anna Schumaker" <anna.schumaker@netapp.com>, "linux-nfs"
> <linux-nfs@vger.kernel.org>
> Sent: Thursday, March 12, 2020 10:15:55 PM
> Subject: Re: [PATCH 03/13] NFSv4.2: query the server for extended attribute support

> On Thu, Mar 12, 2020 at 08:51:39PM +0000, Frank van der Linden wrote:
>> 1) The xattr_support attribute exists
>> 2) The xattr support attribute exists *and* it's true for the root fh
>> 
>> Currently the code does 2) in one operation. That might not be 100%
>> correct - the RFC does mention that (section 8.2):
>> 
>> "Before interrogating this attribute using GETATTR, a client should
>>  determine whether it is a supported attribute by interrogating the
>>  supported_attrs attribute."
>> 
>> That's a "should", not a "MUST", but it's still waving its finger
>> at you not to do this.
>> 
>> Since 8.2.1 says:
>> 
>> "However, a client may reasonably assume that a server
>>  (or file system) that does not support the xattr_support attribute
>>  does not provide xattr support, and it acts on that basis."
>> 
>> ..I think you're right, and the code should just use the existence
>> of the attribute as a signal that the server knows about xattrs -
>> operations should still error out correctly if it doesn't.
>> 
>> I'll make that change, thanks.
> 
> ..or, alternatively, only query xattr_support in nfs4_server_capabilities,
> and then its actual value, if it exists, in nfs4_fs_info.
> 
> Any opinions on this?
> 
> - Frank

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/13] NFSv4.2: query the server for extended attribute support
  2020-03-13 11:11         ` Mkrtchyan, Tigran
@ 2020-03-13 13:50           ` Trond Myklebust
  2020-03-13 14:19             ` Mkrtchyan, Tigran
  2020-03-13 17:55             ` Frank van der Linden
  0 siblings, 2 replies; 33+ messages in thread
From: Trond Myklebust @ 2020-03-13 13:50 UTC (permalink / raw)
  To: fllinden, tigran.mkrtchyan; +Cc: linux-nfs, anna.schumaker

On Fri, 2020-03-13 at 12:11 +0100, Mkrtchyan, Tigran wrote:
> Hi Frank,
> 
> I think the way how you have implemented is almost correct. You query
> server for supported attributes. As result client will get all
> attributes
> supported bu the server and if FATTR4_XATTR_SUPPORT is returned, then
> client
> adds xattr capability. This the way how I read rfc8276. Do you have a
> different
> opinion?
> 

'xattr_support' seems like a protocol hack to allow the client to
determine whether or not the xattr operations are supported.

The reason why it is a hack is that 'supported_attrs' is also a per-
filesystem attribute, and there is no value in advertising
'xattr_support' there unless your filesystem also supports xattrs.

IOW: the protocol forces you to do 2 round trips to the server in order
to figure out something that really should be obvious with 1 round
trip.

> Regards,
>    Tigran.
> 
> ----- Original Message -----
> > From: "Frank van der Linden" <fllinden@amazon.com>
> > To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> > Cc: "Trond Myklebust" <trond.myklebust@hammerspace.com>, "Anna
> > Schumaker" <anna.schumaker@netapp.com>, "linux-nfs"
> > <linux-nfs@vger.kernel.org>
> > Sent: Thursday, March 12, 2020 10:15:55 PM
> > Subject: Re: [PATCH 03/13] NFSv4.2: query the server for extended
> > attribute support
> > On Thu, Mar 12, 2020 at 08:51:39PM +0000, Frank van der Linden
> > wrote:
> > > 1) The xattr_support attribute exists
> > > 2) The xattr support attribute exists *and* it's true for the
> > > root fh
> > > 
> > > Currently the code does 2) in one operation. That might not be
> > > 100%
> > > correct - the RFC does mention that (section 8.2):
> > > 
> > > "Before interrogating this attribute using GETATTR, a client
> > > should
> > >  determine whether it is a supported attribute by interrogating
> > > the
> > >  supported_attrs attribute."
> > > 
> > > That's a "should", not a "MUST", but it's still waving its finger
> > > at you not to do this.
> > > 
> > > Since 8.2.1 says:
> > > 
> > > "However, a client may reasonably assume that a server
> > >  (or file system) that does not support the xattr_support
> > > attribute
> > >  does not provide xattr support, and it acts on that basis."
> > > 
> > > ..I think you're right, and the code should just use the
> > > existence
> > > of the attribute as a signal that the server knows about xattrs -
> > > operations should still error out correctly if it doesn't.
> > > 
> > > I'll make that change, thanks.
> > 
> > ..or, alternatively, only query xattr_support in
> > nfs4_server_capabilities,
> > and then its actual value, if it exists, in nfs4_fs_info.
> > 
> > Any opinions on this?
> > 
> > - Frank
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/13] NFSv4.2: query the server for extended attribute support
  2020-03-13 13:50           ` Trond Myklebust
@ 2020-03-13 14:19             ` Mkrtchyan, Tigran
  2020-03-13 17:10               ` Trond Myklebust
  2020-03-13 17:55             ` Frank van der Linden
  1 sibling, 1 reply; 33+ messages in thread
From: Mkrtchyan, Tigran @ 2020-03-13 14:19 UTC (permalink / raw)
  To: trondmy; +Cc: Frank van der Linden, linux-nfs, Anna Schumaker



----- Original Message -----
> From: "trondmy" <trondmy@hammerspace.com>
> To: "Frank van der Linden" <fllinden@amazon.com>, "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> Cc: "linux-nfs" <linux-nfs@vger.kernel.org>, "Anna Schumaker" <anna.schumaker@netapp.com>
> Sent: Friday, March 13, 2020 2:50:38 PM
> Subject: Re: [PATCH 03/13] NFSv4.2: query the server for extended attribute support

> On Fri, 2020-03-13 at 12:11 +0100, Mkrtchyan, Tigran wrote:
>> Hi Frank,
>> 
>> I think the way how you have implemented is almost correct. You query
>> server for supported attributes. As result client will get all
>> attributes
>> supported bu the server and if FATTR4_XATTR_SUPPORT is returned, then
>> client
>> adds xattr capability. This the way how I read rfc8276. Do you have a
>> different
>> opinion?
>> 
> 
> 'xattr_support' seems like a protocol hack to allow the client to
> determine whether or not the xattr operations are supported.
> 
> The reason why it is a hack is that 'supported_attrs' is also a per-
> filesystem attribute, and there is no value in advertising
> 'xattr_support' there unless your filesystem also supports xattrs.
> 
> IOW: the protocol forces you to do 2 round trips to the server in order
> to figure out something that really should be obvious with 1 round
> trip.
> 


So you say  that client have to query for xattr_support every time the
fsid is changing?

Tigran.

>> Regards,
>>    Tigran.
>> 
>> ----- Original Message -----
>> > From: "Frank van der Linden" <fllinden@amazon.com>
>> > To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
>> > Cc: "Trond Myklebust" <trond.myklebust@hammerspace.com>, "Anna
>> > Schumaker" <anna.schumaker@netapp.com>, "linux-nfs"
>> > <linux-nfs@vger.kernel.org>
>> > Sent: Thursday, March 12, 2020 10:15:55 PM
>> > Subject: Re: [PATCH 03/13] NFSv4.2: query the server for extended
>> > attribute support
>> > On Thu, Mar 12, 2020 at 08:51:39PM +0000, Frank van der Linden
>> > wrote:
>> > > 1) The xattr_support attribute exists
>> > > 2) The xattr support attribute exists *and* it's true for the
>> > > root fh
>> > > 
>> > > Currently the code does 2) in one operation. That might not be
>> > > 100%
>> > > correct - the RFC does mention that (section 8.2):
>> > > 
>> > > "Before interrogating this attribute using GETATTR, a client
>> > > should
>> > >  determine whether it is a supported attribute by interrogating
>> > > the
>> > >  supported_attrs attribute."
>> > > 
>> > > That's a "should", not a "MUST", but it's still waving its finger
>> > > at you not to do this.
>> > > 
>> > > Since 8.2.1 says:
>> > > 
>> > > "However, a client may reasonably assume that a server
>> > >  (or file system) that does not support the xattr_support
>> > > attribute
>> > >  does not provide xattr support, and it acts on that basis."
>> > > 
>> > > ..I think you're right, and the code should just use the
>> > > existence
>> > > of the attribute as a signal that the server knows about xattrs -
>> > > operations should still error out correctly if it doesn't.
>> > > 
>> > > I'll make that change, thanks.
>> > 
>> > ..or, alternatively, only query xattr_support in
>> > nfs4_server_capabilities,
>> > and then its actual value, if it exists, in nfs4_fs_info.
>> > 
>> > Any opinions on this?
>> > 
>> > - Frank
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/13] NFSv4.2: query the server for extended attribute support
  2020-03-13 14:19             ` Mkrtchyan, Tigran
@ 2020-03-13 17:10               ` Trond Myklebust
  0 siblings, 0 replies; 33+ messages in thread
From: Trond Myklebust @ 2020-03-13 17:10 UTC (permalink / raw)
  To: tigran.mkrtchyan; +Cc: fllinden, linux-nfs, anna.schumaker

On Fri, 2020-03-13 at 15:19 +0100, Mkrtchyan, Tigran wrote:
> 
> ----- Original Message -----
> > From: "trondmy" <trondmy@hammerspace.com>
> > To: "Frank van der Linden" <fllinden@amazon.com>, "Tigran
> > Mkrtchyan" <tigran.mkrtchyan@desy.de>
> > Cc: "linux-nfs" <linux-nfs@vger.kernel.org>, "Anna Schumaker" <
> > anna.schumaker@netapp.com>
> > Sent: Friday, March 13, 2020 2:50:38 PM
> > Subject: Re: [PATCH 03/13] NFSv4.2: query the server for extended
> > attribute support
> > On Fri, 2020-03-13 at 12:11 +0100, Mkrtchyan, Tigran wrote:
> > > Hi Frank,
> > > 
> > > I think the way how you have implemented is almost correct. You
> > > query
> > > server for supported attributes. As result client will get all
> > > attributes
> > > supported bu the server and if FATTR4_XATTR_SUPPORT is returned,
> > > then
> > > client
> > > adds xattr capability. This the way how I read rfc8276. Do you
> > > have a
> > > different
> > > opinion?
> > > 
> > 
> > 'xattr_support' seems like a protocol hack to allow the client to
> > determine whether or not the xattr operations are supported.
> > 
> > The reason why it is a hack is that 'supported_attrs' is also a
> > per-
> > filesystem attribute, and there is no value in advertising
> > 'xattr_support' there unless your filesystem also supports xattrs.
> > 
> > IOW: the protocol forces you to do 2 round trips to the server in
> > order
> > to figure out something that really should be obvious with 1 round
> > trip.
> > 
> 
> So you say  that client have to query for xattr_support every time
> the
> fsid is changing?
> 

According to the spec it is a per-filesystem attribute, just like
supported_attrs...

Cheers
  Trond

> Tigran.
> 
> > > Regards,
> > >    Tigran.
> > > 
> > > ----- Original Message -----
> > > > From: "Frank van der Linden" <fllinden@amazon.com>
> > > > To: "Tigran Mkrtchyan" <tigran.mkrtchyan@desy.de>
> > > > Cc: "Trond Myklebust" <trond.myklebust@hammerspace.com>, "Anna
> > > > Schumaker" <anna.schumaker@netapp.com>, "linux-nfs"
> > > > <linux-nfs@vger.kernel.org>
> > > > Sent: Thursday, March 12, 2020 10:15:55 PM
> > > > Subject: Re: [PATCH 03/13] NFSv4.2: query the server for
> > > > extended
> > > > attribute support
> > > > On Thu, Mar 12, 2020 at 08:51:39PM +0000, Frank van der Linden
> > > > wrote:
> > > > > 1) The xattr_support attribute exists
> > > > > 2) The xattr support attribute exists *and* it's true for the
> > > > > root fh
> > > > > 
> > > > > Currently the code does 2) in one operation. That might not
> > > > > be
> > > > > 100%
> > > > > correct - the RFC does mention that (section 8.2):
> > > > > 
> > > > > "Before interrogating this attribute using GETATTR, a client
> > > > > should
> > > > >  determine whether it is a supported attribute by
> > > > > interrogating
> > > > > the
> > > > >  supported_attrs attribute."
> > > > > 
> > > > > That's a "should", not a "MUST", but it's still waving its
> > > > > finger
> > > > > at you not to do this.
> > > > > 
> > > > > Since 8.2.1 says:
> > > > > 
> > > > > "However, a client may reasonably assume that a server
> > > > >  (or file system) that does not support the xattr_support
> > > > > attribute
> > > > >  does not provide xattr support, and it acts on that basis."
> > > > > 
> > > > > ..I think you're right, and the code should just use the
> > > > > existence
> > > > > of the attribute as a signal that the server knows about
> > > > > xattrs -
> > > > > operations should still error out correctly if it doesn't.
> > > > > 
> > > > > I'll make that change, thanks.
> > > > 
> > > > ..or, alternatively, only query xattr_support in
> > > > nfs4_server_capabilities,
> > > > and then its actual value, if it exists, in nfs4_fs_info.
> > > > 
> > > > Any opinions on this?
> > > > 
> > > > - Frank
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > trond.myklebust@hammerspace.com
-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/13] NFSv4.2: query the server for extended attribute support
  2020-03-13 13:50           ` Trond Myklebust
  2020-03-13 14:19             ` Mkrtchyan, Tigran
@ 2020-03-13 17:55             ` Frank van der Linden
  1 sibling, 0 replies; 33+ messages in thread
From: Frank van der Linden @ 2020-03-13 17:55 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: tigran.mkrtchyan, linux-nfs, anna.schumaker

On Fri, Mar 13, 2020 at 01:50:38PM +0000, Trond Myklebust wrote:
> 'xattr_support' seems like a protocol hack to allow the client to
> determine whether or not the xattr operations are supported.
> 
> The reason why it is a hack is that 'supported_attrs' is also a per-
> filesystem attribute, and there is no value in advertising
> 'xattr_support' there unless your filesystem also supports xattrs.
> 
> IOW: the protocol forces you to do 2 round trips to the server in order
> to figure out something that really should be obvious with 1 round
> trip.

Right, that's the annoying part. I mean, theoretically, a server can
legally reject a GETATTR because you're asking for an unknown
attribute (e.g. xattr_support in this case). So then what I have
in my current patch (asking for both the supported_attrs and
xattr_support at once) might fail, and the mount would fail.

Which means I should split it up and have nfs4_do_fsinfo do
the 2nd part, just in case.

- Frank

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 00/13] client side user xattr (RFC8276) support
  2020-03-12 20:09 ` Anna Schumaker
@ 2020-03-16 15:50   ` Frank van der Linden
  2020-03-17 23:03   ` Frank van der Linden
  1 sibling, 0 replies; 33+ messages in thread
From: Frank van der Linden @ 2020-03-16 15:50 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: trond.myklebust, Linux NFS Mailing List

Hi Anna,

On Thu, Mar 12, 2020 at 04:09:51PM -0400, Anna Schumaker wrote:
> I'm curious if you've tried xfstests with your patches? There are a
> handful of tests using xattrs that might be good to check with, too:
> 
> anna@gouda % grep xattr -l tests/generic/[0-9][0-9][0-9]
> tests/generic/037
> tests/generic/062
> tests/generic/066
> tests/generic/093
> tests/generic/117
> tests/generic/337
> tests/generic/377
> tests/generic/403
> tests/generic/425
> tests/generic/454
> tests/generic/489
> tests/generic/523
> tests/generic/529
> tests/generic/556
> 
> Thanks,
> Anna

I did run xfstests, and it looks fine, meaning that the differences between
actual and expected output are expected. I can rerun the xattr
specific ones and post the report. Will do that with v2.

- Frank

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 00/13] client side user xattr (RFC8276) support
  2020-03-12 20:09 ` Anna Schumaker
  2020-03-16 15:50   ` Frank van der Linden
@ 2020-03-17 23:03   ` Frank van der Linden
  2020-03-19 14:39     ` J. Bruce Fields
  1 sibling, 1 reply; 33+ messages in thread
From: Frank van der Linden @ 2020-03-17 23:03 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: trond.myklebust, Linux NFS Mailing List

On Thu, Mar 12, 2020 at 04:09:51PM -0400, Anna Schumaker wrote:
> I'm curious if you've tried xfstests with your patches? There are a
> handful of tests using xattrs that might be good to check with, too:
> 
> anna@gouda % grep xattr -l tests/generic/[0-9][0-9][0-9]
> tests/generic/037
> tests/generic/062
> tests/generic/066
> tests/generic/093
> tests/generic/117
> tests/generic/337
> tests/generic/377
> tests/generic/403
> tests/generic/425
> tests/generic/454
> tests/generic/489
> tests/generic/523
> tests/generic/529
> tests/generic/556
> 
> Thanks,
> Anna

I ran did a "check -nfs -g quick" run of xfstests-dev. The following tests
were applicable to extended attributes:

generic/020     fail   Doesn't compute MAX_ATTR right for NFS, passes
                       after fixing that.
generic/037     pass
generic/062     fail   It unconditionally expects the "system" and
                       "trusted" namespaces to be there too, not
                       easily fixed.
generic/066     pass
generic/093     fail   Capabilities use the "security" namespace, can't
                       work on NFS.
generic/097     fail   "trusted" namespace explicitly used, can't work
                       on NFS.
generic/103     fail   fallocate fails on NFS, not xattr related
generic/117     pass
generic/377     fail   Doesn't expect the "system.nfs4acl" attribute to
                       show up in listxattr.  Can be fixed by filtering
                       out only "user" namespace xattrs.
generic/403     fail   Uses the "trusted" namespace, but does not really
                       need it. Works if converted to the "user" namespace.
generic/454     pass
generic/523     pass


In other words, there were no problems with the patches themselves, but
xfstests will need some work to work properly.

I can send a few simple fixes in for xfstests, but a few need a bit more
work, specifically the ones that expected certain xattr namespaces to be
there. Right now there is a "_require_attr" check function, that probably
needs to be split up in to "_require_attr_user, _require_attr_system", etc
functions, which is a bit more work.

- Frank

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 00/13] client side user xattr (RFC8276) support
  2020-03-17 23:03   ` Frank van der Linden
@ 2020-03-19 14:39     ` J. Bruce Fields
  0 siblings, 0 replies; 33+ messages in thread
From: J. Bruce Fields @ 2020-03-19 14:39 UTC (permalink / raw)
  To: Frank van der Linden
  Cc: Anna Schumaker, trond.myklebust, Linux NFS Mailing List

On Tue, Mar 17, 2020 at 11:03:39PM +0000, Frank van der Linden wrote:
> On Thu, Mar 12, 2020 at 04:09:51PM -0400, Anna Schumaker wrote:
> > I'm curious if you've tried xfstests with your patches? There are a
> > handful of tests using xattrs that might be good to check with, too:
> > 
> > anna@gouda % grep xattr -l tests/generic/[0-9][0-9][0-9]
> > tests/generic/037
> > tests/generic/062
> > tests/generic/066
> > tests/generic/093
> > tests/generic/117
> > tests/generic/337
> > tests/generic/377
> > tests/generic/403
> > tests/generic/425
> > tests/generic/454
> > tests/generic/489
> > tests/generic/523
> > tests/generic/529
> > tests/generic/556
> > 
> > Thanks,
> > Anna
> 
> I ran did a "check -nfs -g quick" run of xfstests-dev. The following tests
> were applicable to extended attributes:
> 
> generic/020     fail   Doesn't compute MAX_ATTR right for NFS, passes
>                        after fixing that.
> generic/037     pass
> generic/062     fail   It unconditionally expects the "system" and
>                        "trusted" namespaces to be there too, not
>                        easily fixed.
> generic/066     pass
> generic/093     fail   Capabilities use the "security" namespace, can't
>                        work on NFS.
> generic/097     fail   "trusted" namespace explicitly used, can't work
>                        on NFS.
> generic/103     fail   fallocate fails on NFS, not xattr related
> generic/117     pass
> generic/377     fail   Doesn't expect the "system.nfs4acl" attribute to
>                        show up in listxattr.  Can be fixed by filtering
>                        out only "user" namespace xattrs.
> generic/403     fail   Uses the "trusted" namespace, but does not really
>                        need it. Works if converted to the "user" namespace.
> generic/454     pass
> generic/523     pass
> 
> 
> In other words, there were no problems with the patches themselves, but
> xfstests will need some work to work properly.
> 
> I can send a few simple fixes in for xfstests, but a few need a bit more
> work, specifically the ones that expected certain xattr namespaces to be
> there. Right now there is a "_require_attr" check function, that probably
> needs to be split up in to "_require_attr_user, _require_attr_system", etc
> functions, which is a bit more work.

I just took a quick look at common/attr and all I see in _require_attrs
is:

	attr -s "user.xfstests" -V "attr" $TEST_DIR/syscalltest

What am I missing?

--b.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [nfs] c5654df66d: stress-ng.msg.ops_per_sec 15.5% improvement
       [not found]   ` <20200324060215.GD11705@shao2-debian>
@ 2020-03-24 16:21     ` Frank van der Linden
  0 siblings, 0 replies; 33+ messages in thread
From: Frank van der Linden @ 2020-03-24 16:21 UTC (permalink / raw)
  To: kernel test robot; +Cc: trond.myklebust, anna.schumaker, linux-nfs, lkp

On Tue, Mar 24, 2020 at 02:02:15PM +0800, kernel test robot wrote:
> Greeting,
> 
> FYI, we noticed a 15.5% improvement of stress-ng.msg.ops_per_sec due to commit:
> 
> 
> commit: c5654df66d65f6b5f8967f15a0b61f89acb5941e ("[PATCH 09/13] nfs: define and use the NFS_INO_INVALID_XATTR flag")
> url: https://github.com/0day-ci/linux/commits/Frank-van-der-Linden/client-side-user-xattr-RFC8276-support/20200312-064740
> base: git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next

Hm, I think I have doubts about the validity of a test that sees a 15%
improvement in SysV msg ops after a change in the NFS client code :-)

- Frank

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2020-03-24 16:21 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-11 19:56 [PATCH 00/13] client side user xattr (RFC8276) support Frank van der Linden
2020-03-11 19:56 ` [PATCH 01/13] nfs,nfsd: NFSv4.2 extended attribute protocol definitions Frank van der Linden
2020-03-11 19:56 ` [PATCH 02/13] nfs: add client side only definitions for user xattrs Frank van der Linden
2020-03-11 19:56 ` [PATCH 03/13] NFSv4.2: query the server for extended attribute support Frank van der Linden
2020-03-12 16:15   ` Mkrtchyan, Tigran
2020-03-12 20:51     ` Frank van der Linden
2020-03-12 21:15       ` Frank van der Linden
2020-03-13 11:11         ` Mkrtchyan, Tigran
2020-03-13 13:50           ` Trond Myklebust
2020-03-13 14:19             ` Mkrtchyan, Tigran
2020-03-13 17:10               ` Trond Myklebust
2020-03-13 17:55             ` Frank van der Linden
2020-03-11 19:56 ` [PATCH 04/13] NFSv4.2: define limits and sizes for user xattr handling Frank van der Linden
2020-03-12 20:35   ` Schumaker, Anna
2020-03-11 19:56 ` [PATCH 05/13] NFSv4.2: add client side XDR handling for extended attributes Frank van der Linden
2020-03-12 20:49   ` Schumaker, Anna
2020-03-11 19:56 ` [PATCH 06/13] nfs: define nfs_access_get_cached function Frank van der Linden
2020-03-11 19:56 ` [PATCH 07/13] NFSv4.2: query the extended attribute access bits Frank van der Linden
2020-03-11 19:56 ` [PATCH 08/13] nfs: modify update_changeattr to deal with regular files Frank van der Linden
2020-03-11 19:56 ` [PATCH 09/13] nfs: define and use the NFS_INO_INVALID_XATTR flag Frank van der Linden
     [not found]   ` <20200324060215.GD11705@shao2-debian>
2020-03-24 16:21     ` [nfs] c5654df66d: stress-ng.msg.ops_per_sec 15.5% improvement Frank van der Linden
2020-03-11 19:56 ` [PATCH 10/13] nfs: make the buf_to_pages_noslab function available to the nfs code Frank van der Linden
2020-03-12 20:36   ` Schumaker, Anna
2020-03-11 19:56 ` [PATCH 11/13] NFSv4.2: add the extended attribute proc functions Frank van der Linden
2020-03-11 19:56 ` [PATCH 12/13] NFSv4.2: hook in the user extended attribute handlers Frank van der Linden
2020-03-11 19:56 ` [PATCH 13/13] NFSv4.2: add client side xattr caching Frank van der Linden
2020-03-12 20:39   ` Schumaker, Anna
2020-03-12 20:48   ` Schumaker, Anna
2020-03-12 19:06 ` [PATCH 00/13] client side user xattr (RFC8276) support Mkrtchyan, Tigran
2020-03-12 20:09 ` Anna Schumaker
2020-03-16 15:50   ` Frank van der Linden
2020-03-17 23:03   ` Frank van der Linden
2020-03-19 14:39     ` J. Bruce Fields

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).