All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 00/27] Readdir improvements
@ 2022-02-27 23:12 trondmy
  2022-02-27 23:12 ` [PATCH v9 01/27] NFS: Return valid errors from nfs2/3_decode_dirent() trondmy
  2022-03-09 21:32 ` [PATCH v9 00/27] Readdir improvements Benjamin Coddington
  0 siblings, 2 replies; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

The current NFS readdir code will always try to maximise the amount of
readahead it performs on the assumption that we can cache anything that
isn't immediately read by the process.
There are several cases where this assumption breaks down, including
when the 'ls -l' heuristic kicks in to try to force use of readdirplus
as a batch replacement for lookup/getattr.

This series also implement Ben's page cache filter to ensure that we can
improve the ability to share cached data between processes that are
reading the same directory at the same time, and to avoid live-locks
when the directory is simultaneously changing.

--
v2: Remove reset of dtsize when NFS_INO_FORCE_READDIR is set
v3: Avoid excessive window shrinking in uncached_readdir case
v4: Track 'ls -l' cache hit/miss statistics
    Improved algorithm for falling back to uncached readdir
    Skip readdirplus when files are being written to
v5: bugfixes
    Skip readdirplus when the acdirmax/acregmax values are low
    Request a full XDR buffer when doing READDIRPLUS
v6: Add tracing
    Don't have lookup request readdirplus when it won't help
v7: Implement Ben's page cache filter
    Reduce the use of uncached readdir
    Change indexing of the page cache to improve seekdir() performance.
v8: Reduce the page cache overhead by shrinking the cookie hashvalue size
    Incorporate other feedback from Anna, Ben and Neil
    Fix nfs2/3_decode_dirent return values
    Fix the change attribute value set in nfs_readdir_page_get_next()
v9: Address bugs that were hit when testing with large directories
    Misc cleanups

Trond Myklebust (27):
  NFS: Return valid errors from nfs2/3_decode_dirent()
  NFS: constify nfs_server_capable() and nfs_have_writebacks()
  NFS: Trace lookup revalidation failure
  NFS: Initialise the readdir verifier as best we can in nfs_opendir()
  NFS: Use kzalloc() to avoid initialising the nfs_open_dir_context
  NFS: Calculate page offsets algorithmically
  NFS: Store the change attribute in the directory page cache
  NFS: Don't re-read the entire page cache to find the next cookie
  NFS: Don't advance the page pointer unless the page is full
  NFS: Adjust the amount of readahead performed by NFS readdir
  NFS: If the cookie verifier changes, we must invalidate the page cache
  NFS: Simplify nfs_readdir_xdr_to_array()
  NFS: Reduce use of uncached readdir
  NFS: Improve heuristic for readdirplus
  NFS: Don't ask for readdirplus unless it can help nfs_getattr()
  NFSv4: Ask for a full XDR buffer of readdir goodness
  NFS: Readdirplus can't help lookup for case insensitive filesystems
  NFS: Don't request readdirplus when revalidation was forced
  NFS: Add basic readdir tracing
  NFS: Trace effects of readdirplus on the dcache
  NFS: Trace effects of the readdirplus heuristic
  NFS: Clean up page array initialisation/free
  NFS: Convert readdir page cache to use a cookie based index
  NFS: Fix up forced readdirplus
  NFS: Remove unnecessary cache invalidations for directories
  NFS: Optimise away the previous cookie field
  NFS: Cache all entries in the readdirplus reply

 fs/nfs/Kconfig          |   4 +
 fs/nfs/dir.c            | 602 ++++++++++++++++++++++++----------------
 fs/nfs/inode.c          |  46 ++-
 fs/nfs/internal.h       |   4 +-
 fs/nfs/nfs2xdr.c        |   3 +-
 fs/nfs/nfs3xdr.c        |  29 +-
 fs/nfs/nfs4proc.c       |   2 -
 fs/nfs/nfs4xdr.c        |   7 +-
 fs/nfs/nfstrace.h       | 123 +++++++-
 include/linux/nfs_fs.h  |  19 +-
 include/linux/nfs_xdr.h |   3 +-
 11 files changed, 532 insertions(+), 310 deletions(-)

-- 
2.35.1


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH v9 01/27] NFS: Return valid errors from nfs2/3_decode_dirent()
  2022-02-27 23:12 [PATCH v9 00/27] Readdir improvements trondmy
@ 2022-02-27 23:12 ` trondmy
  2022-02-27 23:12   ` [PATCH v9 02/27] NFS: constify nfs_server_capable() and nfs_have_writebacks() trondmy
  2022-03-09 21:32 ` [PATCH v9 00/27] Readdir improvements Benjamin Coddington
  1 sibling, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Valid return values for decode_dirent() callback functions are:
 0: Success
 -EBADCOOKIE: End of directory
 -EAGAIN: End of xdr_stream

All errors need to map into one of those three values.

Fixes: 573c4e1ef53a ("NFS: Simplify ->decode_dirent() calling sequence")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs2xdr.c |  2 +-
 fs/nfs/nfs3xdr.c | 21 ++++++---------------
 2 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c
index 7fba7711e6b3..3d5ba43f44bb 100644
--- a/fs/nfs/nfs2xdr.c
+++ b/fs/nfs/nfs2xdr.c
@@ -949,7 +949,7 @@ int nfs2_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
 
 	error = decode_filename_inline(xdr, &entry->name, &entry->len);
 	if (unlikely(error))
-		return error;
+		return -EAGAIN;
 
 	/*
 	 * The type (size and byte order) of nfscookie isn't defined in
diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index 54a1d21cbcc6..7ab60ad98776 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -1967,7 +1967,6 @@ int nfs3_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
 		       bool plus)
 {
 	struct user_namespace *userns = rpc_userns(entry->server->client);
-	struct nfs_entry old = *entry;
 	__be32 *p;
 	int error;
 	u64 new_cookie;
@@ -1987,15 +1986,15 @@ int nfs3_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
 
 	error = decode_fileid3(xdr, &entry->ino);
 	if (unlikely(error))
-		return error;
+		return -EAGAIN;
 
 	error = decode_inline_filename3(xdr, &entry->name, &entry->len);
 	if (unlikely(error))
-		return error;
+		return -EAGAIN;
 
 	error = decode_cookie3(xdr, &new_cookie);
 	if (unlikely(error))
-		return error;
+		return -EAGAIN;
 
 	entry->d_type = DT_UNKNOWN;
 
@@ -2003,7 +2002,7 @@ int nfs3_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
 		entry->fattr->valid = 0;
 		error = decode_post_op_attr(xdr, entry->fattr, userns);
 		if (unlikely(error))
-			return error;
+			return -EAGAIN;
 		if (entry->fattr->valid & NFS_ATTR_FATTR_V3)
 			entry->d_type = nfs_umode_to_dtype(entry->fattr->mode);
 
@@ -2018,11 +2017,8 @@ int nfs3_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
 			return -EAGAIN;
 		if (*p != xdr_zero) {
 			error = decode_nfs_fh3(xdr, entry->fh);
-			if (unlikely(error)) {
-				if (error == -E2BIG)
-					goto out_truncated;
-				return error;
-			}
+			if (unlikely(error))
+				return -EAGAIN;
 		} else
 			zero_nfs_fh3(entry->fh);
 	}
@@ -2031,11 +2027,6 @@ int nfs3_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
 	entry->cookie = new_cookie;
 
 	return 0;
-
-out_truncated:
-	dprintk("NFS: directory entry contains invalid file handle\n");
-	*entry = old;
-	return -EAGAIN;
 }
 
 /*
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 02/27] NFS: constify nfs_server_capable() and nfs_have_writebacks()
  2022-02-27 23:12 ` [PATCH v9 01/27] NFS: Return valid errors from nfs2/3_decode_dirent() trondmy
@ 2022-02-27 23:12   ` trondmy
  2022-02-27 23:12     ` [PATCH v9 03/27] NFS: Trace lookup revalidation failure trondmy
  0 siblings, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 include/linux/nfs_fs.h | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 72a732a5103c..6e10725887d1 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -363,7 +363,7 @@ static inline void nfs_mark_for_revalidate(struct inode *inode)
 	spin_unlock(&inode->i_lock);
 }
 
-static inline int nfs_server_capable(struct inode *inode, int cap)
+static inline int nfs_server_capable(const struct inode *inode, int cap)
 {
 	return NFS_SERVER(inode)->caps & cap;
 }
@@ -587,12 +587,11 @@ extern struct nfs_commit_data *nfs_commitdata_alloc(bool never_fail);
 extern void nfs_commit_free(struct nfs_commit_data *data);
 bool nfs_commit_end(struct nfs_mds_commit_info *cinfo);
 
-static inline int
-nfs_have_writebacks(struct inode *inode)
+static inline bool nfs_have_writebacks(const struct inode *inode)
 {
 	if (S_ISREG(inode->i_mode))
 		return atomic_long_read(&NFS_I(inode)->nrequests) != 0;
-	return 0;
+	return false;
 }
 
 /*
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 03/27] NFS: Trace lookup revalidation failure
  2022-02-27 23:12   ` [PATCH v9 02/27] NFS: constify nfs_server_capable() and nfs_have_writebacks() trondmy
@ 2022-02-27 23:12     ` trondmy
  2022-02-27 23:12       ` [PATCH v9 04/27] NFS: Initialise the readdir verifier as best we can in nfs_opendir() trondmy
  2022-03-09 13:42       ` [PATCH v9 03/27] NFS: Trace lookup revalidation failure Benjamin Coddington
  0 siblings, 2 replies; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Enable tracing of lookup revalidation failures.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c | 17 +++++------------
 1 file changed, 5 insertions(+), 12 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index ebddc736eac2..1aa55cac9d9a 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1474,9 +1474,7 @@ nfs_lookup_revalidate_done(struct inode *dir, struct dentry *dentry,
 {
 	switch (error) {
 	case 1:
-		dfprintk(LOOKUPCACHE, "NFS: %s(%pd2) is valid\n",
-			__func__, dentry);
-		return 1;
+		break;
 	case 0:
 		/*
 		 * We can't d_drop the root of a disconnected tree:
@@ -1485,13 +1483,10 @@ nfs_lookup_revalidate_done(struct inode *dir, struct dentry *dentry,
 		 * inodes on unmount and further oopses.
 		 */
 		if (inode && IS_ROOT(dentry))
-			return 1;
-		dfprintk(LOOKUPCACHE, "NFS: %s(%pd2) is invalid\n",
-				__func__, dentry);
-		return 0;
+			error = 1;
+		break;
 	}
-	dfprintk(LOOKUPCACHE, "NFS: %s(%pd2) lookup returned error %d\n",
-				__func__, dentry, error);
+	trace_nfs_lookup_revalidate_exit(dir, dentry, 0, error);
 	return error;
 }
 
@@ -1623,9 +1618,7 @@ nfs_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
 		goto out_bad;
 
 	trace_nfs_lookup_revalidate_enter(dir, dentry, flags);
-	error = nfs_lookup_revalidate_dentry(dir, dentry, inode);
-	trace_nfs_lookup_revalidate_exit(dir, dentry, flags, error);
-	return error;
+	return nfs_lookup_revalidate_dentry(dir, dentry, inode);
 out_valid:
 	return nfs_lookup_revalidate_done(dir, dentry, inode, 1);
 out_bad:
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 04/27] NFS: Initialise the readdir verifier as best we can in nfs_opendir()
  2022-02-27 23:12     ` [PATCH v9 03/27] NFS: Trace lookup revalidation failure trondmy
@ 2022-02-27 23:12       ` trondmy
  2022-02-27 23:12         ` [PATCH v9 05/27] NFS: Use kzalloc() to avoid initialising the nfs_open_dir_context trondmy
  2022-03-09 13:42       ` [PATCH v9 03/27] NFS: Trace lookup revalidation failure Benjamin Coddington
  1 sibling, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

For the purpose of ensuring that opendir() followed by seekdir() work as
correctly as possible, try to initialise the readdir verifier in
nfs_opendir().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 1aa55cac9d9a..1dfbd05081ad 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -89,6 +89,7 @@ static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct inode *dir
 						      NFS_INO_REVAL_FORCED);
 		list_add(&ctx->list, &nfsi->open_files);
 		clear_bit(NFS_INO_FORCE_READDIR, &nfsi->flags);
+		memcpy(ctx->verf, nfsi->cookieverf, sizeof(ctx->verf));
 		spin_unlock(&dir->i_lock);
 		return ctx;
 	}
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 05/27] NFS: Use kzalloc() to avoid initialising the nfs_open_dir_context
  2022-02-27 23:12       ` [PATCH v9 04/27] NFS: Initialise the readdir verifier as best we can in nfs_opendir() trondmy
@ 2022-02-27 23:12         ` trondmy
  2022-02-27 23:12           ` [PATCH v9 06/27] NFS: Calculate page offsets algorithmically trondmy
  0 siblings, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 1dfbd05081ad..379f88b158fb 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -69,18 +69,15 @@ const struct address_space_operations nfs_dir_aops = {
 	.freepage = nfs_readdir_clear_array,
 };
 
-static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct inode *dir)
+static struct nfs_open_dir_context *
+alloc_nfs_open_dir_context(struct inode *dir)
 {
 	struct nfs_inode *nfsi = NFS_I(dir);
 	struct nfs_open_dir_context *ctx;
-	ctx = kmalloc(sizeof(*ctx), GFP_KERNEL_ACCOUNT);
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL_ACCOUNT);
 	if (ctx != NULL) {
-		ctx->duped = 0;
 		ctx->attr_gencount = nfsi->attr_gencount;
-		ctx->dir_cookie = 0;
-		ctx->dup_cookie = 0;
-		ctx->page_index = 0;
-		ctx->eof = false;
 		spin_lock(&dir->i_lock);
 		if (list_empty(&nfsi->open_files) &&
 		    (nfsi->cache_validity & NFS_INO_DATA_INVAL_DEFER))
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 06/27] NFS: Calculate page offsets algorithmically
  2022-02-27 23:12         ` [PATCH v9 05/27] NFS: Use kzalloc() to avoid initialising the nfs_open_dir_context trondmy
@ 2022-02-27 23:12           ` trondmy
  2022-02-27 23:12             ` [PATCH v9 07/27] NFS: Store the change attribute in the directory page cache trondmy
  0 siblings, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Instead of relying on counting the page offsets as we walk through the
page cache, switch to calculating them algorithmically.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 379f88b158fb..6f0a38db6c37 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -249,17 +249,20 @@ static const char *nfs_readdir_copy_name(const char *name, unsigned int len)
 	return ret;
 }
 
+static size_t nfs_readdir_array_maxentries(void)
+{
+	return (PAGE_SIZE - sizeof(struct nfs_cache_array)) /
+	       sizeof(struct nfs_cache_array_entry);
+}
+
 /*
  * Check that the next array entry lies entirely within the page bounds
  */
 static int nfs_readdir_array_can_expand(struct nfs_cache_array *array)
 {
-	struct nfs_cache_array_entry *cache_entry;
-
 	if (array->page_full)
 		return -ENOSPC;
-	cache_entry = &array->array[array->size + 1];
-	if ((char *)cache_entry - (char *)array > PAGE_SIZE) {
+	if (array->size == nfs_readdir_array_maxentries()) {
 		array->page_full = 1;
 		return -ENOSPC;
 	}
@@ -318,6 +321,11 @@ static struct page *nfs_readdir_page_get_locked(struct address_space *mapping,
 	return page;
 }
 
+static loff_t nfs_readdir_page_offset(struct page *page)
+{
+	return (loff_t)page->index * (loff_t)nfs_readdir_array_maxentries();
+}
+
 static u64 nfs_readdir_page_last_cookie(struct page *page)
 {
 	struct nfs_cache_array *array;
@@ -448,7 +456,7 @@ static int nfs_readdir_search_for_cookie(struct nfs_cache_array *array,
 		if (array->array[i].cookie == desc->dir_cookie) {
 			struct nfs_inode *nfsi = NFS_I(file_inode(desc->file));
 
-			new_pos = desc->current_index + i;
+			new_pos = nfs_readdir_page_offset(desc->page) + i;
 			if (desc->attr_gencount != nfsi->attr_gencount ||
 			    !nfs_readdir_inode_mapping_valid(nfsi)) {
 				desc->duped = 0;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 07/27] NFS: Store the change attribute in the directory page cache
  2022-02-27 23:12           ` [PATCH v9 06/27] NFS: Calculate page offsets algorithmically trondmy
@ 2022-02-27 23:12             ` trondmy
  2022-02-27 23:12               ` [PATCH v9 08/27] NFS: Don't re-read the entire page cache to find the next cookie trondmy
  2022-03-01 19:09               ` [PATCH v9 07/27] NFS: Store the change attribute in the directory page cache Anna Schumaker
  0 siblings, 2 replies; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Use the change attribute and the first cookie in a directory page cache
entry to validate that the page is up to date.

Suggested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c | 68 ++++++++++++++++++++++++++++------------------------
 1 file changed, 37 insertions(+), 31 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 6f0a38db6c37..bfb553c57274 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -140,6 +140,7 @@ struct nfs_cache_array_entry {
 };
 
 struct nfs_cache_array {
+	u64 change_attr;
 	u64 last_cookie;
 	unsigned int size;
 	unsigned char page_full : 1,
@@ -176,12 +177,14 @@ static void nfs_readdir_array_init(struct nfs_cache_array *array)
 	memset(array, 0, sizeof(struct nfs_cache_array));
 }
 
-static void nfs_readdir_page_init_array(struct page *page, u64 last_cookie)
+static void nfs_readdir_page_init_array(struct page *page, u64 last_cookie,
+					u64 change_attr)
 {
 	struct nfs_cache_array *array;
 
 	array = kmap_atomic(page);
 	nfs_readdir_array_init(array);
+	array->change_attr = change_attr;
 	array->last_cookie = last_cookie;
 	array->cookies_are_ordered = 1;
 	kunmap_atomic(array);
@@ -208,7 +211,7 @@ nfs_readdir_page_array_alloc(u64 last_cookie, gfp_t gfp_flags)
 {
 	struct page *page = alloc_page(gfp_flags);
 	if (page)
-		nfs_readdir_page_init_array(page, last_cookie);
+		nfs_readdir_page_init_array(page, last_cookie, 0);
 	return page;
 }
 
@@ -305,19 +308,43 @@ int nfs_readdir_add_to_array(struct nfs_entry *entry, struct page *page)
 	return ret;
 }
 
+static bool nfs_readdir_page_validate(struct page *page, u64 last_cookie,
+				      u64 change_attr)
+{
+	struct nfs_cache_array *array = kmap_atomic(page);
+	int ret = true;
+
+	if (array->change_attr != change_attr)
+		ret = false;
+	if (array->size > 0 && array->array[0].cookie != last_cookie)
+		ret = false;
+	kunmap_atomic(array);
+	return ret;
+}
+
+static void nfs_readdir_page_unlock_and_put(struct page *page)
+{
+	unlock_page(page);
+	put_page(page);
+}
+
 static struct page *nfs_readdir_page_get_locked(struct address_space *mapping,
 						pgoff_t index, u64 last_cookie)
 {
 	struct page *page;
+	u64 change_attr;
 
 	page = grab_cache_page(mapping, index);
-	if (page && !PageUptodate(page)) {
-		nfs_readdir_page_init_array(page, last_cookie);
-		if (invalidate_inode_pages2_range(mapping, index + 1, -1) < 0)
-			nfs_zap_mapping(mapping->host, mapping);
-		SetPageUptodate(page);
+	if (!page)
+		return NULL;
+	change_attr = inode_peek_iversion_raw(mapping->host);
+	if (PageUptodate(page)) {
+		if (nfs_readdir_page_validate(page, last_cookie, change_attr))
+			return page;
+		nfs_readdir_clear_array(page);
 	}
-
+	nfs_readdir_page_init_array(page, last_cookie, change_attr);
+	SetPageUptodate(page);
 	return page;
 }
 
@@ -357,12 +384,6 @@ static void nfs_readdir_page_set_eof(struct page *page)
 	kunmap_atomic(array);
 }
 
-static void nfs_readdir_page_unlock_and_put(struct page *page)
-{
-	unlock_page(page);
-	put_page(page);
-}
-
 static struct page *nfs_readdir_page_get_next(struct address_space *mapping,
 					      pgoff_t index, u64 cookie)
 {
@@ -419,16 +440,6 @@ static int nfs_readdir_search_for_pos(struct nfs_cache_array *array,
 	return -EBADCOOKIE;
 }
 
-static bool
-nfs_readdir_inode_mapping_valid(struct nfs_inode *nfsi)
-{
-	if (nfsi->cache_validity & (NFS_INO_INVALID_CHANGE |
-				    NFS_INO_INVALID_DATA))
-		return false;
-	smp_rmb();
-	return !test_bit(NFS_INO_INVALIDATING, &nfsi->flags);
-}
-
 static bool nfs_readdir_array_cookie_in_range(struct nfs_cache_array *array,
 					      u64 cookie)
 {
@@ -457,8 +468,7 @@ static int nfs_readdir_search_for_cookie(struct nfs_cache_array *array,
 			struct nfs_inode *nfsi = NFS_I(file_inode(desc->file));
 
 			new_pos = nfs_readdir_page_offset(desc->page) + i;
-			if (desc->attr_gencount != nfsi->attr_gencount ||
-			    !nfs_readdir_inode_mapping_valid(nfsi)) {
+			if (desc->attr_gencount != nfsi->attr_gencount) {
 				desc->duped = 0;
 				desc->attr_gencount = nfsi->attr_gencount;
 			} else if (new_pos < desc->prev_index) {
@@ -1095,11 +1105,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 	 * to either find the entry with the appropriate number or
 	 * revalidate the cookie.
 	 */
-	if (ctx->pos == 0 || nfs_attribute_cache_expired(inode)) {
-		res = nfs_revalidate_mapping(inode, file->f_mapping);
-		if (res < 0)
-			goto out;
-	}
+	nfs_revalidate_inode(inode, NFS_INO_INVALID_CHANGE);
 
 	res = -ENOMEM;
 	desc = kzalloc(sizeof(*desc), GFP_KERNEL);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 08/27] NFS: Don't re-read the entire page cache to find the next cookie
  2022-02-27 23:12             ` [PATCH v9 07/27] NFS: Store the change attribute in the directory page cache trondmy
@ 2022-02-27 23:12               ` trondmy
  2022-02-27 23:12                 ` [PATCH v9 09/27] NFS: Don't advance the page pointer unless the page is full trondmy
  2022-03-01 19:09               ` [PATCH v9 07/27] NFS: Store the change attribute in the directory page cache Anna Schumaker
  1 sibling, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

If the page cache entry that was last read gets invalidated for some
reason, then make sure we can re-create it on the next call to readdir.
This, combined with the cache page validation, allows us to reuse the
cached value of page-index on successive calls to nfs_readdir.

Credit is due to Benjamin Coddington for showing that the concept works,
and that it allows for improved cache sharing between processes even in
the case where pages are lost due to LRU or active invalidation.

Suggested-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c           | 10 +++++++---
 include/linux/nfs_fs.h |  1 +
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index bfb553c57274..37f78b0ebc40 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1120,6 +1120,8 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 	desc->dup_cookie = dir_ctx->dup_cookie;
 	desc->duped = dir_ctx->duped;
 	page_index = dir_ctx->page_index;
+	desc->page_index = page_index;
+	desc->last_cookie = dir_ctx->last_cookie;
 	desc->attr_gencount = dir_ctx->attr_gencount;
 	desc->eof = dir_ctx->eof;
 	memcpy(desc->verf, dir_ctx->verf, sizeof(desc->verf));
@@ -1168,6 +1170,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 	spin_lock(&file->f_lock);
 	dir_ctx->dir_cookie = desc->dir_cookie;
 	dir_ctx->dup_cookie = desc->dup_cookie;
+	dir_ctx->last_cookie = desc->last_cookie;
 	dir_ctx->duped = desc->duped;
 	dir_ctx->attr_gencount = desc->attr_gencount;
 	dir_ctx->page_index = desc->page_index;
@@ -1209,10 +1212,11 @@ static loff_t nfs_llseek_dir(struct file *filp, loff_t offset, int whence)
 	}
 	if (offset != filp->f_pos) {
 		filp->f_pos = offset;
-		if (nfs_readdir_use_cookie(filp))
-			dir_ctx->dir_cookie = offset;
-		else
+		if (!nfs_readdir_use_cookie(filp)) {
 			dir_ctx->dir_cookie = 0;
+			dir_ctx->page_index = 0;
+		} else
+			dir_ctx->dir_cookie = offset;
 		if (offset == 0)
 			memset(dir_ctx->verf, 0, sizeof(dir_ctx->verf));
 		dir_ctx->duped = 0;
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 6e10725887d1..1c533f2c1f36 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -105,6 +105,7 @@ struct nfs_open_dir_context {
 	__be32	verf[NFS_DIR_VERIFIER_SIZE];
 	__u64 dir_cookie;
 	__u64 dup_cookie;
+	__u64 last_cookie;
 	pgoff_t page_index;
 	signed char duped;
 	bool eof;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 09/27] NFS: Don't advance the page pointer unless the page is full
  2022-02-27 23:12               ` [PATCH v9 08/27] NFS: Don't re-read the entire page cache to find the next cookie trondmy
@ 2022-02-27 23:12                 ` trondmy
  2022-02-27 23:12                   ` [PATCH v9 10/27] NFS: Adjust the amount of readahead performed by NFS readdir trondmy
  0 siblings, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

When we hit the end of the data in the readdir page, we don't want to
start filling a new page, unless this one is full.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c | 32 ++++++++++++++++++++++----------
 1 file changed, 22 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 37f78b0ebc40..e2aafc7263d5 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -417,6 +417,18 @@ bool nfs_readdir_use_cookie(const struct file *filp)
 	return true;
 }
 
+static void nfs_readdir_seek_next_array(struct nfs_cache_array *array,
+					struct nfs_readdir_descriptor *desc)
+{
+	if (array->page_full) {
+		desc->last_cookie = array->last_cookie;
+		desc->current_index += array->size;
+		desc->cache_entry_index = 0;
+		desc->page_index++;
+	} else
+		desc->last_cookie = array->array[0].cookie;
+}
+
 static int nfs_readdir_search_for_pos(struct nfs_cache_array *array,
 				      struct nfs_readdir_descriptor *desc)
 {
@@ -428,6 +440,7 @@ static int nfs_readdir_search_for_pos(struct nfs_cache_array *array,
 	if (diff >= array->size) {
 		if (array->page_is_eof)
 			goto out_eof;
+		nfs_readdir_seek_next_array(array, desc);
 		return -EAGAIN;
 	}
 
@@ -500,7 +513,8 @@ static int nfs_readdir_search_for_cookie(struct nfs_cache_array *array,
 		status = -EBADCOOKIE;
 		if (desc->dir_cookie == array->last_cookie)
 			desc->eof = true;
-	}
+	} else
+		nfs_readdir_seek_next_array(array, desc);
 out:
 	return status;
 }
@@ -517,11 +531,6 @@ static int nfs_readdir_search_array(struct nfs_readdir_descriptor *desc)
 	else
 		status = nfs_readdir_search_for_cookie(array, desc);
 
-	if (status == -EAGAIN) {
-		desc->last_cookie = array->last_cookie;
-		desc->current_index += array->size;
-		desc->page_index++;
-	}
 	kunmap_atomic(array);
 	return status;
 }
@@ -998,7 +1007,7 @@ static void nfs_do_filldir(struct nfs_readdir_descriptor *desc,
 {
 	struct file	*file = desc->file;
 	struct nfs_cache_array *array;
-	unsigned int i = 0;
+	unsigned int i;
 
 	array = kmap(desc->page);
 	for (i = desc->cache_entry_index; i < array->size; i++) {
@@ -1011,10 +1020,13 @@ static void nfs_do_filldir(struct nfs_readdir_descriptor *desc,
 			break;
 		}
 		memcpy(desc->verf, verf, sizeof(desc->verf));
-		if (i < (array->size-1))
-			desc->dir_cookie = array->array[i+1].cookie;
-		else
+		if (i == array->size - 1) {
 			desc->dir_cookie = array->last_cookie;
+			nfs_readdir_seek_next_array(array, desc);
+		} else {
+			desc->dir_cookie = array->array[i + 1].cookie;
+			desc->last_cookie = array->array[0].cookie;
+		}
 		if (nfs_readdir_use_cookie(file))
 			desc->ctx->pos = desc->dir_cookie;
 		else
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 10/27] NFS: Adjust the amount of readahead performed by NFS readdir
  2022-02-27 23:12                 ` [PATCH v9 09/27] NFS: Don't advance the page pointer unless the page is full trondmy
@ 2022-02-27 23:12                   ` trondmy
  2022-02-27 23:12                     ` [PATCH v9 11/27] NFS: If the cookie verifier changes, we must invalidate the page cache trondmy
  0 siblings, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

The current NFS readdir code will always try to maximise the amount of
readahead it performs on the assumption that we can cache anything that
isn't immediately read by the process.
There are several cases where this assumption breaks down, including
when the 'ls -l' heuristic kicks in to try to force use of readdirplus
as a batch replacement for lookup/getattr.

This patch therefore tries to tone down the amount of readahead we
perform, and adjust it to try to match the amount of data being
requested by user space.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c           | 53 +++++++++++++++++++++++++++++++++++++++++-
 include/linux/nfs_fs.h |  1 +
 2 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index e2aafc7263d5..ca71271b5c62 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -69,6 +69,8 @@ const struct address_space_operations nfs_dir_aops = {
 	.freepage = nfs_readdir_clear_array,
 };
 
+#define NFS_INIT_DTSIZE PAGE_SIZE
+
 static struct nfs_open_dir_context *
 alloc_nfs_open_dir_context(struct inode *dir)
 {
@@ -78,6 +80,7 @@ alloc_nfs_open_dir_context(struct inode *dir)
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL_ACCOUNT);
 	if (ctx != NULL) {
 		ctx->attr_gencount = nfsi->attr_gencount;
+		ctx->dtsize = NFS_INIT_DTSIZE;
 		spin_lock(&dir->i_lock);
 		if (list_empty(&nfsi->open_files) &&
 		    (nfsi->cache_validity & NFS_INO_DATA_INVAL_DEFER))
@@ -154,6 +157,7 @@ struct nfs_readdir_descriptor {
 	struct page	*page;
 	struct dir_context *ctx;
 	pgoff_t		page_index;
+	pgoff_t		page_index_max;
 	u64		dir_cookie;
 	u64		last_cookie;
 	u64		dup_cookie;
@@ -166,12 +170,36 @@ struct nfs_readdir_descriptor {
 	unsigned long	gencount;
 	unsigned long	attr_gencount;
 	unsigned int	cache_entry_index;
+	unsigned int	buffer_fills;
+	unsigned int	dtsize;
 	signed char duped;
 	bool plus;
 	bool eob;
 	bool eof;
 };
 
+static void nfs_set_dtsize(struct nfs_readdir_descriptor *desc, unsigned int sz)
+{
+	struct nfs_server *server = NFS_SERVER(file_inode(desc->file));
+	unsigned int maxsize = server->dtsize;
+
+	if (sz > maxsize)
+		sz = maxsize;
+	if (sz < NFS_MIN_FILE_IO_SIZE)
+		sz = NFS_MIN_FILE_IO_SIZE;
+	desc->dtsize = sz;
+}
+
+static void nfs_shrink_dtsize(struct nfs_readdir_descriptor *desc)
+{
+	nfs_set_dtsize(desc, desc->dtsize >> 1);
+}
+
+static void nfs_grow_dtsize(struct nfs_readdir_descriptor *desc)
+{
+	nfs_set_dtsize(desc, desc->dtsize << 1);
+}
+
 static void nfs_readdir_array_init(struct nfs_cache_array *array)
 {
 	memset(array, 0, sizeof(struct nfs_cache_array));
@@ -784,6 +812,7 @@ static int nfs_readdir_page_filler(struct nfs_readdir_descriptor *desc,
 				break;
 			arrays++;
 			*arrays = page = new;
+			desc->page_index_max++;
 		} else {
 			new = nfs_readdir_page_get_next(mapping,
 							page->index + 1,
@@ -793,6 +822,7 @@ static int nfs_readdir_page_filler(struct nfs_readdir_descriptor *desc,
 			if (page != *arrays)
 				nfs_readdir_page_unlock_and_put(page);
 			page = new;
+			desc->page_index_max = new->index;
 		}
 		status = nfs_readdir_add_to_array(entry, page);
 	} while (!status && !entry->eof);
@@ -858,7 +888,7 @@ static int nfs_readdir_xdr_to_array(struct nfs_readdir_descriptor *desc,
 	struct nfs_entry *entry;
 	size_t array_size;
 	struct inode *inode = file_inode(desc->file);
-	size_t dtsize = NFS_SERVER(inode)->dtsize;
+	unsigned int dtsize = desc->dtsize;
 	int status = -ENOMEM;
 
 	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
@@ -894,6 +924,7 @@ static int nfs_readdir_xdr_to_array(struct nfs_readdir_descriptor *desc,
 
 		status = nfs_readdir_page_filler(desc, entry, pages, pglen,
 						 arrays, narrays);
+		desc->buffer_fills++;
 	} while (!status && nfs_readdir_page_needs_filling(page) &&
 		page_mapping(page));
 
@@ -941,6 +972,10 @@ static int find_and_lock_cache_page(struct nfs_readdir_descriptor *desc)
 	if (!desc->page)
 		return -ENOMEM;
 	if (nfs_readdir_page_needs_filling(desc->page)) {
+		/* Grow the dtsize if we had to go back for more pages */
+		if (desc->page_index == desc->page_index_max)
+			nfs_grow_dtsize(desc);
+		desc->page_index_max = desc->page_index;
 		res = nfs_readdir_xdr_to_array(desc, nfsi->cookieverf, verf,
 					       &desc->page, 1);
 		if (res < 0) {
@@ -1075,6 +1110,7 @@ static int uncached_readdir(struct nfs_readdir_descriptor *desc)
 	desc->cache_entry_index = 0;
 	desc->last_cookie = desc->dir_cookie;
 	desc->duped = 0;
+	desc->page_index_max = 0;
 
 	status = nfs_readdir_xdr_to_array(desc, desc->verf, verf, arrays, sz);
 
@@ -1084,10 +1120,22 @@ static int uncached_readdir(struct nfs_readdir_descriptor *desc)
 	}
 	desc->page = NULL;
 
+	/*
+	 * Grow the dtsize if we have to go back for more pages,
+	 * or shrink it if we're reading too many.
+	 */
+	if (!desc->eof) {
+		if (!desc->eob)
+			nfs_grow_dtsize(desc);
+		else if (desc->buffer_fills == 1 &&
+			 i < (desc->page_index_max >> 1))
+			nfs_shrink_dtsize(desc);
+	}
 
 	for (i = 0; i < sz && arrays[i]; i++)
 		nfs_readdir_page_array_free(arrays[i]);
 out:
+	desc->page_index_max = -1;
 	kfree(arrays);
 	dfprintk(DIRCACHE, "NFS: %s: returns %d\n", __func__, status);
 	return status;
@@ -1126,6 +1174,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 	desc->file = file;
 	desc->ctx = ctx;
 	desc->plus = nfs_use_readdirplus(inode, ctx);
+	desc->page_index_max = -1;
 
 	spin_lock(&file->f_lock);
 	desc->dir_cookie = dir_ctx->dir_cookie;
@@ -1136,6 +1185,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 	desc->last_cookie = dir_ctx->last_cookie;
 	desc->attr_gencount = dir_ctx->attr_gencount;
 	desc->eof = dir_ctx->eof;
+	nfs_set_dtsize(desc, dir_ctx->dtsize);
 	memcpy(desc->verf, dir_ctx->verf, sizeof(desc->verf));
 	spin_unlock(&file->f_lock);
 
@@ -1187,6 +1237,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 	dir_ctx->attr_gencount = desc->attr_gencount;
 	dir_ctx->page_index = desc->page_index;
 	dir_ctx->eof = desc->eof;
+	dir_ctx->dtsize = desc->dtsize;
 	memcpy(dir_ctx->verf, desc->verf, sizeof(dir_ctx->verf));
 	spin_unlock(&file->f_lock);
 out_free:
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 1c533f2c1f36..691a27936849 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -107,6 +107,7 @@ struct nfs_open_dir_context {
 	__u64 dup_cookie;
 	__u64 last_cookie;
 	pgoff_t page_index;
+	unsigned int dtsize;
 	signed char duped;
 	bool eof;
 };
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 11/27] NFS: If the cookie verifier changes, we must invalidate the page cache
  2022-02-27 23:12                   ` [PATCH v9 10/27] NFS: Adjust the amount of readahead performed by NFS readdir trondmy
@ 2022-02-27 23:12                     ` trondmy
  2022-02-27 23:12                       ` [PATCH v9 12/27] NFS: Simplify nfs_readdir_xdr_to_array() trondmy
  0 siblings, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Ensure that if the cookie verifier changes when we use the zero-valued
cookie, then we invalidate any cached pages.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index ca71271b5c62..eaf8d5cddb0f 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -990,9 +990,14 @@ static int find_and_lock_cache_page(struct nfs_readdir_descriptor *desc)
 		/*
 		 * Set the cookie verifier if the page cache was empty
 		 */
-		if (desc->page_index == 0)
+		if (desc->last_cookie == 0 &&
+		    memcmp(nfsi->cookieverf, verf, sizeof(nfsi->cookieverf))) {
 			memcpy(nfsi->cookieverf, verf,
 			       sizeof(nfsi->cookieverf));
+			invalidate_inode_pages2_range(desc->file->f_mapping,
+						      desc->page_index_max + 1,
+						      -1);
+		}
 	}
 	res = nfs_readdir_search_array(desc);
 	if (res == 0)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 12/27] NFS: Simplify nfs_readdir_xdr_to_array()
  2022-02-27 23:12                     ` [PATCH v9 11/27] NFS: If the cookie verifier changes, we must invalidate the page cache trondmy
@ 2022-02-27 23:12                       ` trondmy
  2022-02-27 23:12                         ` [PATCH v9 13/27] NFS: Reduce use of uncached readdir trondmy
  0 siblings, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Recent changes to readdir mean that we can cope with partially filled
page cache entries, so we no longer need to rely on looping in
nfs_readdir_xdr_to_array().

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c | 29 +++++++++++------------------
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index eaf8d5cddb0f..0c190c93901e 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -889,6 +889,7 @@ static int nfs_readdir_xdr_to_array(struct nfs_readdir_descriptor *desc,
 	size_t array_size;
 	struct inode *inode = file_inode(desc->file);
 	unsigned int dtsize = desc->dtsize;
+	unsigned int pglen;
 	int status = -ENOMEM;
 
 	entry = kzalloc(sizeof(*entry), GFP_KERNEL);
@@ -906,28 +907,20 @@ static int nfs_readdir_xdr_to_array(struct nfs_readdir_descriptor *desc,
 	if (!pages)
 		goto out;
 
-	do {
-		unsigned int pglen;
-		status = nfs_readdir_xdr_filler(desc, verf_arg, entry->cookie,
-						pages, dtsize,
-						verf_res);
-		if (status < 0)
-			break;
-
-		pglen = status;
-		if (pglen == 0) {
-			nfs_readdir_page_set_eof(page);
-			break;
-		}
-
-		verf_arg = verf_res;
+	status = nfs_readdir_xdr_filler(desc, verf_arg, entry->cookie, pages,
+					dtsize, verf_res);
+	if (status < 0)
+		goto free_pages;
 
+	pglen = status;
+	if (pglen != 0)
 		status = nfs_readdir_page_filler(desc, entry, pages, pglen,
 						 arrays, narrays);
-		desc->buffer_fills++;
-	} while (!status && nfs_readdir_page_needs_filling(page) &&
-		page_mapping(page));
+	else
+		nfs_readdir_page_set_eof(page);
+	desc->buffer_fills++;
 
+free_pages:
 	nfs_readdir_free_pages(pages, array_size);
 out:
 	nfs_free_fattr(entry->fattr);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 13/27] NFS: Reduce use of uncached readdir
  2022-02-27 23:12                       ` [PATCH v9 12/27] NFS: Simplify nfs_readdir_xdr_to_array() trondmy
@ 2022-02-27 23:12                         ` trondmy
  2022-02-27 23:12                           ` [PATCH v9 14/27] NFS: Improve heuristic for readdirplus trondmy
  0 siblings, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

When reading a very large directory, we want to try to keep the page
cache up to date if doing so is inexpensive. With the change to allow
readdir to continue reading even when the cache is incomplete, we no
longer need to fall back to uncached readdir in order to scale to large
directories.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c | 23 +++--------------------
 1 file changed, 3 insertions(+), 20 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 0c190c93901e..0b7d4be38452 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -999,28 +999,11 @@ static int find_and_lock_cache_page(struct nfs_readdir_descriptor *desc)
 	return res;
 }
 
-static bool nfs_readdir_dont_search_cache(struct nfs_readdir_descriptor *desc)
-{
-	struct address_space *mapping = desc->file->f_mapping;
-	struct inode *dir = file_inode(desc->file);
-	unsigned int dtsize = NFS_SERVER(dir)->dtsize;
-	loff_t size = i_size_read(dir);
-
-	/*
-	 * Default to uncached readdir if the page cache is empty, and
-	 * we're looking for a non-zero cookie in a large directory.
-	 */
-	return desc->dir_cookie != 0 && mapping->nrpages == 0 && size > dtsize;
-}
-
 /* Search for desc->dir_cookie from the beginning of the page cache */
 static int readdir_search_pagecache(struct nfs_readdir_descriptor *desc)
 {
 	int res;
 
-	if (nfs_readdir_dont_search_cache(desc))
-		return -EBADCOOKIE;
-
 	do {
 		if (desc->page_index == 0) {
 			desc->current_index = 0;
@@ -1273,10 +1256,10 @@ static loff_t nfs_llseek_dir(struct file *filp, loff_t offset, int whence)
 	}
 	if (offset != filp->f_pos) {
 		filp->f_pos = offset;
-		if (!nfs_readdir_use_cookie(filp)) {
+		dir_ctx->page_index = 0;
+		if (!nfs_readdir_use_cookie(filp))
 			dir_ctx->dir_cookie = 0;
-			dir_ctx->page_index = 0;
-		} else
+		else
 			dir_ctx->dir_cookie = offset;
 		if (offset == 0)
 			memset(dir_ctx->verf, 0, sizeof(dir_ctx->verf));
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 14/27] NFS: Improve heuristic for readdirplus
  2022-02-27 23:12                         ` [PATCH v9 13/27] NFS: Reduce use of uncached readdir trondmy
@ 2022-02-27 23:12                           ` trondmy
  2022-02-27 23:12                             ` [PATCH v9 15/27] NFS: Don't ask for readdirplus unless it can help nfs_getattr() trondmy
  2022-03-09 17:39                             ` [PATCH v9 14/27] NFS: Improve heuristic for readdirplus Benjamin Coddington
  0 siblings, 2 replies; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

The heuristic for readdirplus is designed to try to detect 'ls -l' and
similar patterns. It does so by looking for cache hit/miss patterns in
both the attribute cache and in the dcache of the files in a given
directory, and then sets a flag for the readdirplus code to interpret.

The problem with this approach is that a single attribute or dcache miss
can cause the NFS code to force a refresh of the attributes for the
entire set of files contained in the directory.

To be able to make a more nuanced decision, let's sample the number of
hits and misses in the set of open directory descriptors. That allows us
to set thresholds at which we start preferring READDIRPLUS over regular
READDIR, or at which we start to force a re-read of the remaining
readdir cache using READDIRPLUS.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c           | 82 ++++++++++++++++++++++++++----------------
 fs/nfs/inode.c         |  4 +--
 fs/nfs/internal.h      |  4 +--
 fs/nfs/nfstrace.h      |  1 -
 include/linux/nfs_fs.h |  5 +--
 5 files changed, 58 insertions(+), 38 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 0b7d4be38452..c5c7175a257c 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -87,8 +87,7 @@ alloc_nfs_open_dir_context(struct inode *dir)
 			nfs_set_cache_invalid(dir,
 					      NFS_INO_INVALID_DATA |
 						      NFS_INO_REVAL_FORCED);
-		list_add(&ctx->list, &nfsi->open_files);
-		clear_bit(NFS_INO_FORCE_READDIR, &nfsi->flags);
+		list_add_tail_rcu(&ctx->list, &nfsi->open_files);
 		memcpy(ctx->verf, nfsi->cookieverf, sizeof(ctx->verf));
 		spin_unlock(&dir->i_lock);
 		return ctx;
@@ -99,9 +98,9 @@ alloc_nfs_open_dir_context(struct inode *dir)
 static void put_nfs_open_dir_context(struct inode *dir, struct nfs_open_dir_context *ctx)
 {
 	spin_lock(&dir->i_lock);
-	list_del(&ctx->list);
+	list_del_rcu(&ctx->list);
 	spin_unlock(&dir->i_lock);
-	kfree(ctx);
+	kfree_rcu(ctx, rcu_head);
 }
 
 /*
@@ -594,7 +593,6 @@ static int nfs_readdir_xdr_filler(struct nfs_readdir_descriptor *desc,
 		/* We requested READDIRPLUS, but the server doesn't grok it */
 		if (error == -ENOTSUPP && desc->plus) {
 			NFS_SERVER(inode)->caps &= ~NFS_CAP_READDIRPLUS;
-			clear_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(inode)->flags);
 			desc->plus = arg.plus = false;
 			goto again;
 		}
@@ -644,51 +642,61 @@ int nfs_same_file(struct dentry *dentry, struct nfs_entry *entry)
 	return 1;
 }
 
-static
-bool nfs_use_readdirplus(struct inode *dir, struct dir_context *ctx)
+#define NFS_READDIR_CACHE_USAGE_THRESHOLD (8UL)
+
+static bool nfs_use_readdirplus(struct inode *dir, struct dir_context *ctx,
+				unsigned int cache_hits,
+				unsigned int cache_misses)
 {
 	if (!nfs_server_capable(dir, NFS_CAP_READDIRPLUS))
 		return false;
-	if (test_and_clear_bit(NFS_INO_ADVISE_RDPLUS, &NFS_I(dir)->flags))
-		return true;
-	if (ctx->pos == 0)
+	if (ctx->pos == 0 ||
+	    cache_hits + cache_misses > NFS_READDIR_CACHE_USAGE_THRESHOLD)
 		return true;
 	return false;
 }
 
 /*
- * This function is called by the lookup and getattr code to request the
+ * This function is called by the getattr code to request the
  * use of readdirplus to accelerate any future lookups in the same
  * directory.
  */
-void nfs_advise_use_readdirplus(struct inode *dir)
+void nfs_readdir_record_entry_cache_hit(struct inode *dir)
 {
 	struct nfs_inode *nfsi = NFS_I(dir);
+	struct nfs_open_dir_context *ctx;
 
-	if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS) &&
-	    !list_empty(&nfsi->open_files))
-		set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags);
+	if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS)) {
+		rcu_read_lock();
+		list_for_each_entry_rcu (ctx, &nfsi->open_files, list)
+			atomic_inc(&ctx->cache_hits);
+		rcu_read_unlock();
+	}
 }
 
 /*
  * This function is mainly for use by nfs_getattr().
  *
  * If this is an 'ls -l', we want to force use of readdirplus.
- * Do this by checking if there is an active file descriptor
- * and calling nfs_advise_use_readdirplus, then forcing a
- * cache flush.
  */
-void nfs_force_use_readdirplus(struct inode *dir)
+void nfs_readdir_record_entry_cache_miss(struct inode *dir)
 {
 	struct nfs_inode *nfsi = NFS_I(dir);
+	struct nfs_open_dir_context *ctx;
 
-	if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS) &&
-	    !list_empty(&nfsi->open_files)) {
-		set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags);
-		set_bit(NFS_INO_FORCE_READDIR, &nfsi->flags);
+	if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS)) {
+		rcu_read_lock();
+		list_for_each_entry_rcu (ctx, &nfsi->open_files, list)
+			atomic_inc(&ctx->cache_misses);
+		rcu_read_unlock();
 	}
 }
 
+static void nfs_lookup_advise_force_readdirplus(struct inode *dir)
+{
+	nfs_readdir_record_entry_cache_miss(dir);
+}
+
 static
 void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry,
 		unsigned long dir_verifier)
@@ -1122,6 +1130,19 @@ static int uncached_readdir(struct nfs_readdir_descriptor *desc)
 	return status;
 }
 
+#define NFS_READDIR_CACHE_MISS_THRESHOLD (16UL)
+
+static void nfs_readdir_handle_cache_misses(struct inode *inode,
+					    struct nfs_readdir_descriptor *desc,
+					    pgoff_t page_index,
+					    unsigned int cache_misses)
+{
+	if (desc->ctx->pos == 0 ||
+	    cache_misses <= NFS_READDIR_CACHE_MISS_THRESHOLD)
+		return;
+	invalidate_mapping_pages(inode->i_mapping, page_index + 1, -1);
+}
+
 /* The file offset position represents the dirent entry number.  A
    last cookie cache takes care of the common case of reading the
    whole directory.
@@ -1133,6 +1154,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 	struct nfs_inode *nfsi = NFS_I(inode);
 	struct nfs_open_dir_context *dir_ctx = file->private_data;
 	struct nfs_readdir_descriptor *desc;
+	unsigned int cache_hits, cache_misses;
 	pgoff_t page_index;
 	int res;
 
@@ -1154,7 +1176,6 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 		goto out;
 	desc->file = file;
 	desc->ctx = ctx;
-	desc->plus = nfs_use_readdirplus(inode, ctx);
 	desc->page_index_max = -1;
 
 	spin_lock(&file->f_lock);
@@ -1168,6 +1189,8 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 	desc->eof = dir_ctx->eof;
 	nfs_set_dtsize(desc, dir_ctx->dtsize);
 	memcpy(desc->verf, dir_ctx->verf, sizeof(desc->verf));
+	cache_hits = atomic_xchg(&dir_ctx->cache_hits, 0);
+	cache_misses = atomic_xchg(&dir_ctx->cache_misses, 0);
 	spin_unlock(&file->f_lock);
 
 	if (desc->eof) {
@@ -1175,9 +1198,8 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 		goto out_free;
 	}
 
-	if (test_and_clear_bit(NFS_INO_FORCE_READDIR, &nfsi->flags) &&
-	    list_is_singular(&nfsi->open_files))
-		invalidate_mapping_pages(inode->i_mapping, page_index + 1, -1);
+	desc->plus = nfs_use_readdirplus(inode, ctx, cache_hits, cache_misses);
+	nfs_readdir_handle_cache_misses(inode, desc, page_index, cache_misses);
 
 	do {
 		res = readdir_search_pagecache(desc);
@@ -1196,7 +1218,6 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 			break;
 		}
 		if (res == -ETOOSMALL && desc->plus) {
-			clear_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags);
 			nfs_zap_caches(inode);
 			desc->page_index = 0;
 			desc->plus = false;
@@ -1610,7 +1631,7 @@ nfs_lookup_revalidate_dentry(struct inode *dir, struct dentry *dentry,
 	nfs_set_verifier(dentry, dir_verifier);
 
 	/* set a readdirplus hint that we had a cache miss */
-	nfs_force_use_readdirplus(dir);
+	nfs_lookup_advise_force_readdirplus(dir);
 	ret = 1;
 out:
 	nfs_free_fattr(fattr);
@@ -1667,7 +1688,6 @@ nfs_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
 				nfs_mark_dir_for_revalidate(dir);
 			goto out_bad;
 		}
-		nfs_advise_use_readdirplus(dir);
 		goto out_valid;
 	}
 
@@ -1872,7 +1892,7 @@ struct dentry *nfs_lookup(struct inode *dir, struct dentry * dentry, unsigned in
 		goto out;
 
 	/* Notify readdir to use READDIRPLUS */
-	nfs_force_use_readdirplus(dir);
+	nfs_lookup_advise_force_readdirplus(dir);
 
 no_entry:
 	res = d_splice_alias(inode, dentry);
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 7cecabf57b95..bbf4357ff727 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -787,7 +787,7 @@ static void nfs_readdirplus_parent_cache_miss(struct dentry *dentry)
 	if (!nfs_server_capable(d_inode(dentry), NFS_CAP_READDIRPLUS))
 		return;
 	parent = dget_parent(dentry);
-	nfs_force_use_readdirplus(d_inode(parent));
+	nfs_readdir_record_entry_cache_miss(d_inode(parent));
 	dput(parent);
 }
 
@@ -798,7 +798,7 @@ static void nfs_readdirplus_parent_cache_hit(struct dentry *dentry)
 	if (!nfs_server_capable(d_inode(dentry), NFS_CAP_READDIRPLUS))
 		return;
 	parent = dget_parent(dentry);
-	nfs_advise_use_readdirplus(d_inode(parent));
+	nfs_readdir_record_entry_cache_hit(d_inode(parent));
 	dput(parent);
 }
 
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index b5398af53c7f..194840a97e3a 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -366,8 +366,8 @@ extern struct nfs_client *nfs_init_client(struct nfs_client *clp,
 			   const struct nfs_client_initdata *);
 
 /* dir.c */
-extern void nfs_advise_use_readdirplus(struct inode *dir);
-extern void nfs_force_use_readdirplus(struct inode *dir);
+extern void nfs_readdir_record_entry_cache_hit(struct inode *dir);
+extern void nfs_readdir_record_entry_cache_miss(struct inode *dir);
 extern unsigned long nfs_access_cache_count(struct shrinker *shrink,
 					    struct shrink_control *sc);
 extern unsigned long nfs_access_cache_scan(struct shrinker *shrink,
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index 45a310b586ce..3672f6703ee7 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -36,7 +36,6 @@
 
 #define nfs_show_nfsi_flags(v) \
 	__print_flags(v, "|", \
-			{ BIT(NFS_INO_ADVISE_RDPLUS), "ADVISE_RDPLUS" }, \
 			{ BIT(NFS_INO_STALE), "STALE" }, \
 			{ BIT(NFS_INO_ACL_LRU_SET), "ACL_LRU_SET" }, \
 			{ BIT(NFS_INO_INVALIDATING), "INVALIDATING" }, \
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 691a27936849..20a4cf0acad2 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -101,6 +101,8 @@ struct nfs_open_context {
 
 struct nfs_open_dir_context {
 	struct list_head list;
+	atomic_t cache_hits;
+	atomic_t cache_misses;
 	unsigned long attr_gencount;
 	__be32	verf[NFS_DIR_VERIFIER_SIZE];
 	__u64 dir_cookie;
@@ -110,6 +112,7 @@ struct nfs_open_dir_context {
 	unsigned int dtsize;
 	signed char duped;
 	bool eof;
+	struct rcu_head rcu_head;
 };
 
 /*
@@ -274,13 +277,11 @@ struct nfs4_copy_state {
 /*
  * Bit offsets in flags field
  */
-#define NFS_INO_ADVISE_RDPLUS	(0)		/* advise readdirplus */
 #define NFS_INO_STALE		(1)		/* possible stale inode */
 #define NFS_INO_ACL_LRU_SET	(2)		/* Inode is on the LRU list */
 #define NFS_INO_INVALIDATING	(3)		/* inode is being invalidated */
 #define NFS_INO_PRESERVE_UNLINKED (4)		/* preserve file if removed while open */
 #define NFS_INO_FSCACHE		(5)		/* inode can be cached by FS-Cache */
-#define NFS_INO_FORCE_READDIR	(7)		/* force readdirplus */
 #define NFS_INO_LAYOUTCOMMIT	(9)		/* layoutcommit required */
 #define NFS_INO_LAYOUTCOMMITTING (10)		/* layoutcommit inflight */
 #define NFS_INO_LAYOUTSTATS	(11)		/* layoutstats inflight */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 15/27] NFS: Don't ask for readdirplus unless it can help nfs_getattr()
  2022-02-27 23:12                           ` [PATCH v9 14/27] NFS: Improve heuristic for readdirplus trondmy
@ 2022-02-27 23:12                             ` trondmy
  2022-02-27 23:12                               ` [PATCH v9 16/27] NFSv4: Ask for a full XDR buffer of readdir goodness trondmy
  2022-03-09 17:39                             ` [PATCH v9 14/27] NFS: Improve heuristic for readdirplus Benjamin Coddington
  1 sibling, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

If attribute caching is turned off, then use of readdirplus is not going
to help stat() performance.
Readdirplus also doesn't help if a file is being written to, since we
will have to flush those writes in order to sync the mtime/ctime.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/inode.c | 33 +++++++++++++++++----------------
 1 file changed, 17 insertions(+), 16 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index bbf4357ff727..10d17cfb8639 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -780,24 +780,26 @@ void nfs_setattr_update_inode(struct inode *inode, struct iattr *attr,
 }
 EXPORT_SYMBOL_GPL(nfs_setattr_update_inode);
 
-static void nfs_readdirplus_parent_cache_miss(struct dentry *dentry)
+/*
+ * Don't request help from readdirplus if the file is being written to,
+ * or if attribute caching is turned off
+ */
+static bool nfs_getattr_readdirplus_enable(const struct inode *inode)
 {
-	struct dentry *parent;
+	return nfs_server_capable(inode, NFS_CAP_READDIRPLUS) &&
+	       !nfs_have_writebacks(inode) && NFS_MAXATTRTIMEO(inode) > 5 * HZ;
+}
 
-	if (!nfs_server_capable(d_inode(dentry), NFS_CAP_READDIRPLUS))
-		return;
-	parent = dget_parent(dentry);
+static void nfs_readdirplus_parent_cache_miss(struct dentry *dentry)
+{
+	struct dentry *parent = dget_parent(dentry);
 	nfs_readdir_record_entry_cache_miss(d_inode(parent));
 	dput(parent);
 }
 
 static void nfs_readdirplus_parent_cache_hit(struct dentry *dentry)
 {
-	struct dentry *parent;
-
-	if (!nfs_server_capable(d_inode(dentry), NFS_CAP_READDIRPLUS))
-		return;
-	parent = dget_parent(dentry);
+	struct dentry *parent = dget_parent(dentry);
 	nfs_readdir_record_entry_cache_hit(d_inode(parent));
 	dput(parent);
 }
@@ -835,6 +837,7 @@ int nfs_getattr(struct user_namespace *mnt_userns, const struct path *path,
 	int err = 0;
 	bool force_sync = query_flags & AT_STATX_FORCE_SYNC;
 	bool do_update = false;
+	bool readdirplus_enabled = nfs_getattr_readdirplus_enable(inode);
 
 	trace_nfs_getattr_enter(inode);
 
@@ -843,7 +846,8 @@ int nfs_getattr(struct user_namespace *mnt_userns, const struct path *path,
 			STATX_INO | STATX_SIZE | STATX_BLOCKS;
 
 	if ((query_flags & AT_STATX_DONT_SYNC) && !force_sync) {
-		nfs_readdirplus_parent_cache_hit(path->dentry);
+		if (readdirplus_enabled)
+			nfs_readdirplus_parent_cache_hit(path->dentry);
 		goto out_no_revalidate;
 	}
 
@@ -893,15 +897,12 @@ int nfs_getattr(struct user_namespace *mnt_userns, const struct path *path,
 		do_update |= cache_validity & NFS_INO_INVALID_BLOCKS;
 
 	if (do_update) {
-		/* Update the attribute cache */
-		if (!(server->flags & NFS_MOUNT_NOAC))
+		if (readdirplus_enabled)
 			nfs_readdirplus_parent_cache_miss(path->dentry);
-		else
-			nfs_readdirplus_parent_cache_hit(path->dentry);
 		err = __nfs_revalidate_inode(server, inode);
 		if (err)
 			goto out;
-	} else
+	} else if (readdirplus_enabled)
 		nfs_readdirplus_parent_cache_hit(path->dentry);
 out_no_revalidate:
 	/* Only return attributes that were revalidated. */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 16/27] NFSv4: Ask for a full XDR buffer of readdir goodness
  2022-02-27 23:12                             ` [PATCH v9 15/27] NFS: Don't ask for readdirplus unless it can help nfs_getattr() trondmy
@ 2022-02-27 23:12                               ` trondmy
  2022-02-27 23:12                                 ` [PATCH v9 17/27] NFS: Readdirplus can't help lookup for case insensitive filesystems trondmy
  0 siblings, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Instead of pretending that we know the ratio of directory info vs
readdirplus attribute info, just set the 'dircount' field to the same
value as the 'maxcount' field.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/nfs3xdr.c | 7 ++++---
 fs/nfs/nfs4xdr.c | 6 +++---
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index 7ab60ad98776..d6779ceeb39e 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -1261,6 +1261,8 @@ static void nfs3_xdr_enc_readdir3args(struct rpc_rqst *req,
 static void encode_readdirplus3args(struct xdr_stream *xdr,
 				    const struct nfs3_readdirargs *args)
 {
+	uint32_t dircount = args->count;
+	uint32_t maxcount = args->count;
 	__be32 *p;
 
 	encode_nfs_fh3(xdr, args->fh);
@@ -1273,9 +1275,8 @@ static void encode_readdirplus3args(struct xdr_stream *xdr,
 	 * readdirplus: need dircount + buffer size.
 	 * We just make sure we make dircount big enough
 	 */
-	*p++ = cpu_to_be32(args->count >> 3);
-
-	*p = cpu_to_be32(args->count);
+	*p++ = cpu_to_be32(dircount);
+	*p = cpu_to_be32(maxcount);
 }
 
 static void nfs3_xdr_enc_readdirplus3args(struct rpc_rqst *req,
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 8e70b92df4cc..b7780b97dc4d 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -1605,7 +1605,8 @@ static void encode_readdir(struct xdr_stream *xdr, const struct nfs4_readdir_arg
 		FATTR4_WORD0_RDATTR_ERROR,
 		FATTR4_WORD1_MOUNTED_ON_FILEID,
 	};
-	uint32_t dircount = readdir->count >> 1;
+	uint32_t dircount = readdir->count;
+	uint32_t maxcount = readdir->count;
 	__be32 *p, verf[2];
 	uint32_t attrlen = 0;
 	unsigned int i;
@@ -1618,7 +1619,6 @@ static void encode_readdir(struct xdr_stream *xdr, const struct nfs4_readdir_arg
 			FATTR4_WORD1_SPACE_USED|FATTR4_WORD1_TIME_ACCESS|
 			FATTR4_WORD1_TIME_METADATA|FATTR4_WORD1_TIME_MODIFY;
 		attrs[2] |= FATTR4_WORD2_SECURITY_LABEL;
-		dircount >>= 1;
 	}
 	/* Use mounted_on_fileid only if the server supports it */
 	if (!(readdir->bitmask[1] & FATTR4_WORD1_MOUNTED_ON_FILEID))
@@ -1634,7 +1634,7 @@ static void encode_readdir(struct xdr_stream *xdr, const struct nfs4_readdir_arg
 	encode_nfs4_verifier(xdr, &readdir->verifier);
 	p = reserve_space(xdr, 12 + (attrlen << 2));
 	*p++ = cpu_to_be32(dircount);
-	*p++ = cpu_to_be32(readdir->count);
+	*p++ = cpu_to_be32(maxcount);
 	*p++ = cpu_to_be32(attrlen);
 	for (i = 0; i < attrlen; i++)
 		*p++ = cpu_to_be32(attrs[i]);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 17/27] NFS: Readdirplus can't help lookup for case insensitive filesystems
  2022-02-27 23:12                               ` [PATCH v9 16/27] NFSv4: Ask for a full XDR buffer of readdir goodness trondmy
@ 2022-02-27 23:12                                 ` trondmy
  2022-02-27 23:12                                   ` [PATCH v9 18/27] NFS: Don't request readdirplus when revalidation was forced trondmy
  0 siblings, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

If the filesystem is case insensitive, then readdirplus can't help with
cache misses, since it won't return case folded variants of the filename.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index c5c7175a257c..5892c4ee3a6d 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -694,6 +694,8 @@ void nfs_readdir_record_entry_cache_miss(struct inode *dir)
 
 static void nfs_lookup_advise_force_readdirplus(struct inode *dir)
 {
+	if (nfs_server_capable(dir, NFS_CAP_CASE_INSENSITIVE))
+		return;
 	nfs_readdir_record_entry_cache_miss(dir);
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 18/27] NFS: Don't request readdirplus when revalidation was forced
  2022-02-27 23:12                                 ` [PATCH v9 17/27] NFS: Readdirplus can't help lookup for case insensitive filesystems trondmy
@ 2022-02-27 23:12                                   ` trondmy
  2022-02-27 23:12                                     ` [PATCH v9 19/27] NFS: Add basic readdir tracing trondmy
  0 siblings, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

If the revalidation was forced, due to the presence of a LOOKUP_EXCL or
a LOOKUP_REVAL flag, then readdirplus won't help. It also can't help
when we're doing a path component lookup.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 5892c4ee3a6d..1da741dd2135 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -692,10 +692,13 @@ void nfs_readdir_record_entry_cache_miss(struct inode *dir)
 	}
 }
 
-static void nfs_lookup_advise_force_readdirplus(struct inode *dir)
+static void nfs_lookup_advise_force_readdirplus(struct inode *dir,
+						unsigned int flags)
 {
 	if (nfs_server_capable(dir, NFS_CAP_CASE_INSENSITIVE))
 		return;
+	if (flags & (LOOKUP_EXCL | LOOKUP_PARENT | LOOKUP_REVAL))
+		return;
 	nfs_readdir_record_entry_cache_miss(dir);
 }
 
@@ -1594,15 +1597,17 @@ nfs_lookup_revalidate_delegated(struct inode *dir, struct dentry *dentry,
 	return nfs_lookup_revalidate_done(dir, dentry, inode, 1);
 }
 
-static int
-nfs_lookup_revalidate_dentry(struct inode *dir, struct dentry *dentry,
-			     struct inode *inode)
+static int nfs_lookup_revalidate_dentry(struct inode *dir,
+					struct dentry *dentry,
+					struct inode *inode, unsigned int flags)
 {
 	struct nfs_fh *fhandle;
 	struct nfs_fattr *fattr;
 	unsigned long dir_verifier;
 	int ret;
 
+	trace_nfs_lookup_revalidate_enter(dir, dentry, flags);
+
 	ret = -ENOMEM;
 	fhandle = nfs_alloc_fhandle();
 	fattr = nfs_alloc_fattr_with_label(NFS_SERVER(inode));
@@ -1623,6 +1628,10 @@ nfs_lookup_revalidate_dentry(struct inode *dir, struct dentry *dentry,
 		}
 		goto out;
 	}
+
+	/* Request help from readdirplus */
+	nfs_lookup_advise_force_readdirplus(dir, flags);
+
 	ret = 0;
 	if (nfs_compare_fh(NFS_FH(inode), fhandle))
 		goto out;
@@ -1632,8 +1641,6 @@ nfs_lookup_revalidate_dentry(struct inode *dir, struct dentry *dentry,
 	nfs_setsecurity(inode, fattr);
 	nfs_set_verifier(dentry, dir_verifier);
 
-	/* set a readdirplus hint that we had a cache miss */
-	nfs_lookup_advise_force_readdirplus(dir);
 	ret = 1;
 out:
 	nfs_free_fattr(fattr);
@@ -1699,8 +1706,7 @@ nfs_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
 	if (NFS_STALE(inode))
 		goto out_bad;
 
-	trace_nfs_lookup_revalidate_enter(dir, dentry, flags);
-	return nfs_lookup_revalidate_dentry(dir, dentry, inode);
+	return nfs_lookup_revalidate_dentry(dir, dentry, inode, flags);
 out_valid:
 	return nfs_lookup_revalidate_done(dir, dentry, inode, 1);
 out_bad:
@@ -1894,7 +1900,7 @@ struct dentry *nfs_lookup(struct inode *dir, struct dentry * dentry, unsigned in
 		goto out;
 
 	/* Notify readdir to use READDIRPLUS */
-	nfs_lookup_advise_force_readdirplus(dir);
+	nfs_lookup_advise_force_readdirplus(dir, flags);
 
 no_entry:
 	res = d_splice_alias(inode, dentry);
@@ -2157,7 +2163,7 @@ nfs4_do_lookup_revalidate(struct inode *dir, struct dentry *dentry,
 reval_dentry:
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
-	return nfs_lookup_revalidate_dentry(dir, dentry, inode);
+	return nfs_lookup_revalidate_dentry(dir, dentry, inode, flags);
 
 full_reval:
 	return nfs_do_lookup_revalidate(dir, dentry, flags);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 19/27] NFS: Add basic readdir tracing
  2022-02-27 23:12                                   ` [PATCH v9 18/27] NFS: Don't request readdirplus when revalidation was forced trondmy
@ 2022-02-27 23:12                                     ` trondmy
  2022-02-27 23:12                                       ` [PATCH v9 20/27] NFS: Trace effects of readdirplus on the dcache trondmy
  0 siblings, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Add tracing to track how often the client goes to the server for updated
readdir information.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c      | 13 ++++++++-
 fs/nfs/nfstrace.h | 68 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 80 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 1da741dd2135..0dda082610cc 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -982,10 +982,14 @@ static int find_and_lock_cache_page(struct nfs_readdir_descriptor *desc)
 		if (desc->page_index == desc->page_index_max)
 			nfs_grow_dtsize(desc);
 		desc->page_index_max = desc->page_index;
+		trace_nfs_readdir_cache_fill(desc->file, nfsi->cookieverf,
+					     desc->last_cookie,
+					     desc->page->index, desc->dtsize);
 		res = nfs_readdir_xdr_to_array(desc, nfsi->cookieverf, verf,
 					       &desc->page, 1);
 		if (res < 0) {
 			nfs_readdir_page_unlock_and_put_cached(desc);
+			trace_nfs_readdir_cache_fill_done(inode, res);
 			if (res == -EBADCOOKIE || res == -ENOTSYNC) {
 				invalidate_inode_pages2(desc->file->f_mapping);
 				desc->page_index = 0;
@@ -1106,7 +1110,14 @@ static int uncached_readdir(struct nfs_readdir_descriptor *desc)
 	desc->duped = 0;
 	desc->page_index_max = 0;
 
+	trace_nfs_readdir_uncached(desc->file, desc->verf, desc->last_cookie,
+				   -1, desc->dtsize);
+
 	status = nfs_readdir_xdr_to_array(desc, desc->verf, verf, arrays, sz);
+	if (status < 0) {
+		trace_nfs_readdir_uncached_done(file_inode(desc->file), status);
+		goto out_free;
+	}
 
 	for (i = 0; !desc->eob && i < sz && arrays[i]; i++) {
 		desc->page = arrays[i];
@@ -1125,7 +1136,7 @@ static int uncached_readdir(struct nfs_readdir_descriptor *desc)
 			 i < (desc->page_index_max >> 1))
 			nfs_shrink_dtsize(desc);
 	}
-
+out_free:
 	for (i = 0; i < sz && arrays[i]; i++)
 		nfs_readdir_page_array_free(arrays[i]);
 out:
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index 3672f6703ee7..c2d0543ecb2d 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -160,6 +160,8 @@ DEFINE_NFS_INODE_EVENT(nfs_fsync_enter);
 DEFINE_NFS_INODE_EVENT_DONE(nfs_fsync_exit);
 DEFINE_NFS_INODE_EVENT(nfs_access_enter);
 DEFINE_NFS_INODE_EVENT_DONE(nfs_set_cache_invalid);
+DEFINE_NFS_INODE_EVENT_DONE(nfs_readdir_cache_fill_done);
+DEFINE_NFS_INODE_EVENT_DONE(nfs_readdir_uncached_done);
 
 TRACE_EVENT(nfs_access_exit,
 		TP_PROTO(
@@ -271,6 +273,72 @@ DEFINE_NFS_UPDATE_SIZE_EVENT(wcc);
 DEFINE_NFS_UPDATE_SIZE_EVENT(update);
 DEFINE_NFS_UPDATE_SIZE_EVENT(grow);
 
+DECLARE_EVENT_CLASS(nfs_readdir_event,
+		TP_PROTO(
+			const struct file *file,
+			const __be32 *verifier,
+			u64 cookie,
+			pgoff_t page_index,
+			unsigned int dtsize
+		),
+
+		TP_ARGS(file, verifier, cookie, page_index, dtsize),
+
+		TP_STRUCT__entry(
+			__field(dev_t, dev)
+			__field(u32, fhandle)
+			__field(u64, fileid)
+			__field(u64, version)
+			__array(char, verifier, NFS4_VERIFIER_SIZE)
+			__field(u64, cookie)
+			__field(pgoff_t, index)
+			__field(unsigned int, dtsize)
+		),
+
+		TP_fast_assign(
+			const struct inode *dir = file_inode(file);
+			const struct nfs_inode *nfsi = NFS_I(dir);
+
+			__entry->dev = dir->i_sb->s_dev;
+			__entry->fileid = nfsi->fileid;
+			__entry->fhandle = nfs_fhandle_hash(&nfsi->fh);
+			__entry->version = inode_peek_iversion_raw(dir);
+			if (cookie != 0)
+				memcpy(__entry->verifier, verifier,
+				       NFS4_VERIFIER_SIZE);
+			else
+				memset(__entry->verifier, 0,
+				       NFS4_VERIFIER_SIZE);
+			__entry->cookie = cookie;
+			__entry->index = page_index;
+			__entry->dtsize = dtsize;
+		),
+
+		TP_printk(
+			"fileid=%02x:%02x:%llu fhandle=0x%08x version=%llu "
+			"cookie=%s:0x%llx cache_index=%lu dtsize=%u",
+			MAJOR(__entry->dev), MINOR(__entry->dev),
+			(unsigned long long)__entry->fileid, __entry->fhandle,
+			__entry->version, show_nfs4_verifier(__entry->verifier),
+			(unsigned long long)__entry->cookie, __entry->index,
+			__entry->dtsize
+		)
+);
+
+#define DEFINE_NFS_READDIR_EVENT(name) \
+	DEFINE_EVENT(nfs_readdir_event, name, \
+			TP_PROTO( \
+				const struct file *file, \
+				const __be32 *verifier, \
+				u64 cookie, \
+				pgoff_t page_index, \
+				unsigned int dtsize \
+				), \
+			TP_ARGS(file, verifier, cookie, page_index, dtsize))
+
+DEFINE_NFS_READDIR_EVENT(nfs_readdir_cache_fill);
+DEFINE_NFS_READDIR_EVENT(nfs_readdir_uncached);
+
 DECLARE_EVENT_CLASS(nfs_lookup_event,
 		TP_PROTO(
 			const struct inode *dir,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 20/27] NFS: Trace effects of readdirplus on the dcache
  2022-02-27 23:12                                     ` [PATCH v9 19/27] NFS: Add basic readdir tracing trondmy
@ 2022-02-27 23:12                                       ` trondmy
  2022-02-27 23:12                                         ` [PATCH v9 21/27] NFS: Trace effects of the readdirplus heuristic trondmy
  0 siblings, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Trace the effects of readdirplus on attribute and dentry revalidation.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c      | 5 +++++
 fs/nfs/nfstrace.h | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 0dda082610cc..9a2415b5be73 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -752,8 +752,12 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry,
 			status = nfs_refresh_inode(d_inode(dentry), entry->fattr);
 			if (!status)
 				nfs_setsecurity(d_inode(dentry), entry->fattr);
+			trace_nfs_readdir_lookup_revalidate(d_inode(parent),
+							    dentry, 0, status);
 			goto out;
 		} else {
+			trace_nfs_readdir_lookup_revalidate_failed(
+				d_inode(parent), dentry, 0);
 			d_invalidate(dentry);
 			dput(dentry);
 			dentry = NULL;
@@ -775,6 +779,7 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry,
 		dentry = alias;
 	}
 	nfs_set_verifier(dentry, dir_verifier);
+	trace_nfs_readdir_lookup(d_inode(parent), dentry, 0);
 out:
 	dput(dentry);
 }
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index c2d0543ecb2d..7c1102b991d0 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -432,6 +432,9 @@ DEFINE_NFS_LOOKUP_EVENT(nfs_lookup_enter);
 DEFINE_NFS_LOOKUP_EVENT_DONE(nfs_lookup_exit);
 DEFINE_NFS_LOOKUP_EVENT(nfs_lookup_revalidate_enter);
 DEFINE_NFS_LOOKUP_EVENT_DONE(nfs_lookup_revalidate_exit);
+DEFINE_NFS_LOOKUP_EVENT(nfs_readdir_lookup);
+DEFINE_NFS_LOOKUP_EVENT(nfs_readdir_lookup_revalidate_failed);
+DEFINE_NFS_LOOKUP_EVENT_DONE(nfs_readdir_lookup_revalidate);
 
 TRACE_EVENT(nfs_atomic_open_enter,
 		TP_PROTO(
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 21/27] NFS: Trace effects of the readdirplus heuristic
  2022-02-27 23:12                                       ` [PATCH v9 20/27] NFS: Trace effects of readdirplus on the dcache trondmy
@ 2022-02-27 23:12                                         ` trondmy
  2022-02-27 23:12                                           ` [PATCH v9 22/27] NFS: Clean up page array initialisation/free trondmy
  0 siblings, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Enable tracking of when the readdirplus heuristic causes a page cache
invalidation.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c      | 11 ++++++++++-
 fs/nfs/nfstrace.h | 50 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 9a2415b5be73..483bb67d2ace 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -998,6 +998,8 @@ static int find_and_lock_cache_page(struct nfs_readdir_descriptor *desc)
 			if (res == -EBADCOOKIE || res == -ENOTSYNC) {
 				invalidate_inode_pages2(desc->file->f_mapping);
 				desc->page_index = 0;
+				trace_nfs_readdir_invalidate_cache_range(
+					inode, 0, MAX_LFS_FILESIZE);
 				return -EAGAIN;
 			}
 			return res;
@@ -1012,6 +1014,9 @@ static int find_and_lock_cache_page(struct nfs_readdir_descriptor *desc)
 			invalidate_inode_pages2_range(desc->file->f_mapping,
 						      desc->page_index_max + 1,
 						      -1);
+			trace_nfs_readdir_invalidate_cache_range(
+				inode, desc->page_index_max + 1,
+				MAX_LFS_FILESIZE);
 		}
 	}
 	res = nfs_readdir_search_array(desc);
@@ -1161,7 +1166,11 @@ static void nfs_readdir_handle_cache_misses(struct inode *inode,
 	if (desc->ctx->pos == 0 ||
 	    cache_misses <= NFS_READDIR_CACHE_MISS_THRESHOLD)
 		return;
-	invalidate_mapping_pages(inode->i_mapping, page_index + 1, -1);
+	if (invalidate_mapping_pages(inode->i_mapping, page_index + 1, -1) == 0)
+		return;
+	trace_nfs_readdir_invalidate_cache_range(
+		inode, (loff_t)(page_index + 1) << PAGE_SHIFT,
+		MAX_LFS_FILESIZE);
 }
 
 /* The file offset position represents the dirent entry number.  A
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index 7c1102b991d0..ec2645d20abf 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -273,6 +273,56 @@ DEFINE_NFS_UPDATE_SIZE_EVENT(wcc);
 DEFINE_NFS_UPDATE_SIZE_EVENT(update);
 DEFINE_NFS_UPDATE_SIZE_EVENT(grow);
 
+DECLARE_EVENT_CLASS(nfs_inode_range_event,
+		TP_PROTO(
+			const struct inode *inode,
+			loff_t range_start,
+			loff_t range_end
+		),
+
+		TP_ARGS(inode, range_start, range_end),
+
+		TP_STRUCT__entry(
+			__field(dev_t, dev)
+			__field(u32, fhandle)
+			__field(u64, fileid)
+			__field(u64, version)
+			__field(loff_t, range_start)
+			__field(loff_t, range_end)
+		),
+
+		TP_fast_assign(
+			const struct nfs_inode *nfsi = NFS_I(inode);
+
+			__entry->dev = inode->i_sb->s_dev;
+			__entry->fhandle = nfs_fhandle_hash(&nfsi->fh);
+			__entry->fileid = nfsi->fileid;
+			__entry->version = inode_peek_iversion_raw(inode);
+			__entry->range_start = range_start;
+			__entry->range_end = range_end;
+		),
+
+		TP_printk(
+			"fileid=%02x:%02x:%llu fhandle=0x%08x version=%llu "
+			"range=[%lld, %lld]",
+			MAJOR(__entry->dev), MINOR(__entry->dev),
+			(unsigned long long)__entry->fileid,
+			__entry->fhandle, __entry->version,
+			__entry->range_start, __entry->range_end
+		)
+);
+
+#define DEFINE_NFS_INODE_RANGE_EVENT(name) \
+	DEFINE_EVENT(nfs_inode_range_event, name, \
+			TP_PROTO( \
+				const struct inode *inode, \
+				loff_t range_start, \
+				loff_t range_end \
+			), \
+			TP_ARGS(inode, range_start, range_end))
+
+DEFINE_NFS_INODE_RANGE_EVENT(nfs_readdir_invalidate_cache_range);
+
 DECLARE_EVENT_CLASS(nfs_readdir_event,
 		TP_PROTO(
 			const struct file *file,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 22/27] NFS: Clean up page array initialisation/free
  2022-02-27 23:12                                         ` [PATCH v9 21/27] NFS: Trace effects of the readdirplus heuristic trondmy
@ 2022-02-27 23:12                                           ` trondmy
  2022-02-27 23:12                                             ` [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index trondmy
  0 siblings, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 483bb67d2ace..95a29a973dc8 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -199,20 +199,17 @@ static void nfs_grow_dtsize(struct nfs_readdir_descriptor *desc)
 	nfs_set_dtsize(desc, desc->dtsize << 1);
 }
 
-static void nfs_readdir_array_init(struct nfs_cache_array *array)
-{
-	memset(array, 0, sizeof(struct nfs_cache_array));
-}
-
 static void nfs_readdir_page_init_array(struct page *page, u64 last_cookie,
 					u64 change_attr)
 {
 	struct nfs_cache_array *array;
 
 	array = kmap_atomic(page);
-	nfs_readdir_array_init(array);
 	array->change_attr = change_attr;
 	array->last_cookie = last_cookie;
+	array->size = 0;
+	array->page_full = 0;
+	array->page_is_eof = 0;
 	array->cookies_are_ordered = 1;
 	kunmap_atomic(array);
 }
@@ -220,16 +217,15 @@ static void nfs_readdir_page_init_array(struct page *page, u64 last_cookie,
 /*
  * we are freeing strings created by nfs_add_to_readdir_array()
  */
-static
-void nfs_readdir_clear_array(struct page *page)
+static void nfs_readdir_clear_array(struct page *page)
 {
 	struct nfs_cache_array *array;
-	int i;
+	unsigned int i;
 
 	array = kmap_atomic(page);
 	for (i = 0; i < array->size; i++)
 		kfree(array->array[i].name);
-	nfs_readdir_array_init(array);
+	array->size = 0;
 	kunmap_atomic(array);
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index
  2022-02-27 23:12                                           ` [PATCH v9 22/27] NFS: Clean up page array initialisation/free trondmy
@ 2022-02-27 23:12                                             ` trondmy
  2022-02-27 23:12                                               ` [PATCH v9 24/27] NFS: Fix up forced readdirplus trondmy
  2022-03-09 20:01                                               ` [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index Benjamin Coddington
  0 siblings, 2 replies; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Instead of using a linear index to address the pages, use the cookie of
the first entry, since that is what we use to match the page anyway.

This allows us to avoid re-reading the entire cache on a seekdir() type
of operation. The latter is very common when re-exporting NFS, and is a
major performance drain.

The change does affect our duplicate cookie detection, since we can no
longer rely on the page index as a linear offset for detecting whether
we looped backwards. However since we no longer do a linear search
through all the pages on each call to nfs_readdir(), this is less of a
concern than it was previously.
The other downside is that invalidate_mapping_pages() no longer can use
the page index to avoid clearing pages that have been read. A subsequent
patch will restore the functionality this provides to the 'ls -l'
heuristic.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/Kconfig         |   4 ++
 fs/nfs/dir.c           | 149 ++++++++++++++++++-----------------------
 include/linux/nfs_fs.h |   2 -
 3 files changed, 69 insertions(+), 86 deletions(-)

diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index 14a72224b657..47a53b3362b6 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -4,6 +4,10 @@ config NFS_FS
 	depends on INET && FILE_LOCKING && MULTIUSER
 	select LOCKD
 	select SUNRPC
+	select CRYPTO
+	select CRYPTO_HASH
+	select XXHASH
+	select CRYPTO_XXHASH
 	select NFS_ACL_SUPPORT if NFS_V3_ACL
 	help
 	  Choose Y here if you want to access files residing on other
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 95a29a973dc8..707ad0fd5a4e 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -39,6 +39,7 @@
 #include <linux/sched.h>
 #include <linux/kmemleak.h>
 #include <linux/xattr.h>
+#include <linux/xxhash.h>
 
 #include "delegation.h"
 #include "iostat.h"
@@ -159,9 +160,7 @@ struct nfs_readdir_descriptor {
 	pgoff_t		page_index_max;
 	u64		dir_cookie;
 	u64		last_cookie;
-	u64		dup_cookie;
 	loff_t		current_index;
-	loff_t		prev_index;
 
 	__be32		verf[NFS_DIR_VERIFIER_SIZE];
 	unsigned long	dir_verifier;
@@ -171,7 +170,6 @@ struct nfs_readdir_descriptor {
 	unsigned int	cache_entry_index;
 	unsigned int	buffer_fills;
 	unsigned int	dtsize;
-	signed char duped;
 	bool plus;
 	bool eob;
 	bool eof;
@@ -331,6 +329,28 @@ int nfs_readdir_add_to_array(struct nfs_entry *entry, struct page *page)
 	return ret;
 }
 
+#define NFS_READDIR_COOKIE_MASK (U32_MAX >> 14)
+/*
+ * Hash algorithm allowing content addressible access to sequences
+ * of directory cookies. Content is addressed by the value of the
+ * cookie index of the first readdir entry in a page.
+ *
+ * The xxhash algorithm is chosen because it is fast, and is supposed
+ * to result in a decent flat distribution of hashes.
+ *
+ * We then select only the first 18 bits to avoid issues with excessive
+ * memory use for the page cache XArray. 18 bits should allow the caching
+ * of 262144 pages of sequences of readdir entries. Since each page holds
+ * 127 readdir entries for a typical 64-bit system, that works out to a
+ * cache of ~ 33 million entries per directory.
+ */
+static pgoff_t nfs_readdir_page_cookie_hash(u64 cookie)
+{
+	if (cookie == 0)
+		return 0;
+	return xxhash(&cookie, sizeof(cookie), 0) & NFS_READDIR_COOKIE_MASK;
+}
+
 static bool nfs_readdir_page_validate(struct page *page, u64 last_cookie,
 				      u64 change_attr)
 {
@@ -352,15 +372,15 @@ static void nfs_readdir_page_unlock_and_put(struct page *page)
 }
 
 static struct page *nfs_readdir_page_get_locked(struct address_space *mapping,
-						pgoff_t index, u64 last_cookie)
+						u64 last_cookie,
+						u64 change_attr)
 {
+	pgoff_t index = nfs_readdir_page_cookie_hash(last_cookie);
 	struct page *page;
-	u64 change_attr;
 
 	page = grab_cache_page(mapping, index);
 	if (!page)
 		return NULL;
-	change_attr = inode_peek_iversion_raw(mapping->host);
 	if (PageUptodate(page)) {
 		if (nfs_readdir_page_validate(page, last_cookie, change_attr))
 			return page;
@@ -371,11 +391,6 @@ static struct page *nfs_readdir_page_get_locked(struct address_space *mapping,
 	return page;
 }
 
-static loff_t nfs_readdir_page_offset(struct page *page)
-{
-	return (loff_t)page->index * (loff_t)nfs_readdir_array_maxentries();
-}
-
 static u64 nfs_readdir_page_last_cookie(struct page *page)
 {
 	struct nfs_cache_array *array;
@@ -408,11 +423,11 @@ static void nfs_readdir_page_set_eof(struct page *page)
 }
 
 static struct page *nfs_readdir_page_get_next(struct address_space *mapping,
-					      pgoff_t index, u64 cookie)
+					      u64 cookie, u64 change_attr)
 {
 	struct page *page;
 
-	page = nfs_readdir_page_get_locked(mapping, index, cookie);
+	page = nfs_readdir_page_get_locked(mapping, cookie, change_attr);
 	if (page) {
 		if (nfs_readdir_page_last_cookie(page) == cookie)
 			return page;
@@ -452,6 +467,13 @@ static void nfs_readdir_seek_next_array(struct nfs_cache_array *array,
 		desc->last_cookie = array->array[0].cookie;
 }
 
+static void nfs_readdir_rewind_search(struct nfs_readdir_descriptor *desc)
+{
+	desc->current_index = 0;
+	desc->last_cookie = 0;
+	desc->page_index = 0;
+}
+
 static int nfs_readdir_search_for_pos(struct nfs_cache_array *array,
 				      struct nfs_readdir_descriptor *desc)
 {
@@ -492,8 +514,7 @@ static bool nfs_readdir_array_cookie_in_range(struct nfs_cache_array *array,
 static int nfs_readdir_search_for_cookie(struct nfs_cache_array *array,
 					 struct nfs_readdir_descriptor *desc)
 {
-	int i;
-	loff_t new_pos;
+	unsigned int i;
 	int status = -EAGAIN;
 
 	if (!nfs_readdir_array_cookie_in_range(array, desc->dir_cookie))
@@ -501,32 +522,10 @@ static int nfs_readdir_search_for_cookie(struct nfs_cache_array *array,
 
 	for (i = 0; i < array->size; i++) {
 		if (array->array[i].cookie == desc->dir_cookie) {
-			struct nfs_inode *nfsi = NFS_I(file_inode(desc->file));
-
-			new_pos = nfs_readdir_page_offset(desc->page) + i;
-			if (desc->attr_gencount != nfsi->attr_gencount) {
-				desc->duped = 0;
-				desc->attr_gencount = nfsi->attr_gencount;
-			} else if (new_pos < desc->prev_index) {
-				if (desc->duped > 0
-				    && desc->dup_cookie == desc->dir_cookie) {
-					if (printk_ratelimit()) {
-						pr_notice("NFS: directory %pD2 contains a readdir loop."
-								"Please contact your server vendor.  "
-								"The file: %s has duplicate cookie %llu\n",
-								desc->file, array->array[i].name, desc->dir_cookie);
-					}
-					status = -ELOOP;
-					goto out;
-				}
-				desc->dup_cookie = desc->dir_cookie;
-				desc->duped = -1;
-			}
 			if (nfs_readdir_use_cookie(desc->file))
 				desc->ctx->pos = desc->dir_cookie;
 			else
-				desc->ctx->pos = new_pos;
-			desc->prev_index = new_pos;
+				desc->ctx->pos = desc->current_index + i;
 			desc->cache_entry_index = i;
 			return 0;
 		}
@@ -538,7 +537,6 @@ static int nfs_readdir_search_for_cookie(struct nfs_cache_array *array,
 			desc->eof = true;
 	} else
 		nfs_readdir_seek_next_array(array, desc);
-out:
 	return status;
 }
 
@@ -783,10 +781,9 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry,
 /* Perform conversion from xdr to cache array */
 static int nfs_readdir_page_filler(struct nfs_readdir_descriptor *desc,
 				   struct nfs_entry *entry,
-				   struct page **xdr_pages,
-				   unsigned int buflen,
-				   struct page **arrays,
-				   size_t narrays)
+				   struct page **xdr_pages, unsigned int buflen,
+				   struct page **arrays, size_t narrays,
+				   u64 change_attr)
 {
 	struct address_space *mapping = desc->file->f_mapping;
 	struct xdr_stream stream;
@@ -826,18 +823,16 @@ static int nfs_readdir_page_filler(struct nfs_readdir_descriptor *desc,
 				break;
 			arrays++;
 			*arrays = page = new;
-			desc->page_index_max++;
 		} else {
-			new = nfs_readdir_page_get_next(mapping,
-							page->index + 1,
-							entry->prev_cookie);
+			new = nfs_readdir_page_get_next(
+				mapping, entry->prev_cookie, change_attr);
 			if (!new)
 				break;
 			if (page != *arrays)
 				nfs_readdir_page_unlock_and_put(page);
 			page = new;
-			desc->page_index_max = new->index;
 		}
+		desc->page_index_max++;
 		status = nfs_readdir_add_to_array(entry, page);
 	} while (!status && !entry->eof);
 
@@ -897,6 +892,7 @@ static int nfs_readdir_xdr_to_array(struct nfs_readdir_descriptor *desc,
 				    __be32 *verf_arg, __be32 *verf_res,
 				    struct page **arrays, size_t narrays)
 {
+	u64 change_attr;
 	struct page **pages;
 	struct page *page = *arrays;
 	struct nfs_entry *entry;
@@ -921,6 +917,7 @@ static int nfs_readdir_xdr_to_array(struct nfs_readdir_descriptor *desc,
 	if (!pages)
 		goto out;
 
+	change_attr = inode_peek_iversion_raw(inode);
 	status = nfs_readdir_xdr_filler(desc, verf_arg, entry->cookie, pages,
 					dtsize, verf_res);
 	if (status < 0)
@@ -929,7 +926,7 @@ static int nfs_readdir_xdr_to_array(struct nfs_readdir_descriptor *desc,
 	pglen = status;
 	if (pglen != 0)
 		status = nfs_readdir_page_filler(desc, entry, pages, pglen,
-						 arrays, narrays);
+						 arrays, narrays, change_attr);
 	else
 		nfs_readdir_page_set_eof(page);
 	desc->buffer_fills++;
@@ -959,9 +956,11 @@ nfs_readdir_page_unlock_and_put_cached(struct nfs_readdir_descriptor *desc)
 static struct page *
 nfs_readdir_page_get_cached(struct nfs_readdir_descriptor *desc)
 {
-	return nfs_readdir_page_get_locked(desc->file->f_mapping,
-					   desc->page_index,
-					   desc->last_cookie);
+	struct address_space *mapping = desc->file->f_mapping;
+	u64 change_attr = inode_peek_iversion_raw(mapping->host);
+
+	return nfs_readdir_page_get_locked(mapping, desc->last_cookie,
+					   change_attr);
 }
 
 /*
@@ -993,7 +992,7 @@ static int find_and_lock_cache_page(struct nfs_readdir_descriptor *desc)
 			trace_nfs_readdir_cache_fill_done(inode, res);
 			if (res == -EBADCOOKIE || res == -ENOTSYNC) {
 				invalidate_inode_pages2(desc->file->f_mapping);
-				desc->page_index = 0;
+				nfs_readdir_rewind_search(desc);
 				trace_nfs_readdir_invalidate_cache_range(
 					inode, 0, MAX_LFS_FILESIZE);
 				return -EAGAIN;
@@ -1007,12 +1006,10 @@ static int find_and_lock_cache_page(struct nfs_readdir_descriptor *desc)
 		    memcmp(nfsi->cookieverf, verf, sizeof(nfsi->cookieverf))) {
 			memcpy(nfsi->cookieverf, verf,
 			       sizeof(nfsi->cookieverf));
-			invalidate_inode_pages2_range(desc->file->f_mapping,
-						      desc->page_index_max + 1,
+			invalidate_inode_pages2_range(desc->file->f_mapping, 1,
 						      -1);
 			trace_nfs_readdir_invalidate_cache_range(
-				inode, desc->page_index_max + 1,
-				MAX_LFS_FILESIZE);
+				inode, 1, MAX_LFS_FILESIZE);
 		}
 	}
 	res = nfs_readdir_search_array(desc);
@@ -1028,11 +1025,6 @@ static int readdir_search_pagecache(struct nfs_readdir_descriptor *desc)
 	int res;
 
 	do {
-		if (desc->page_index == 0) {
-			desc->current_index = 0;
-			desc->prev_index = 0;
-			desc->last_cookie = 0;
-		}
 		res = find_and_lock_cache_page(desc);
 	} while (res == -EAGAIN);
 	return res;
@@ -1070,8 +1062,6 @@ static void nfs_do_filldir(struct nfs_readdir_descriptor *desc,
 			desc->ctx->pos = desc->dir_cookie;
 		else
 			desc->ctx->pos++;
-		if (desc->duped != 0)
-			desc->duped = 1;
 	}
 	if (array->page_is_eof)
 		desc->eof = !desc->eob;
@@ -1113,7 +1103,6 @@ static int uncached_readdir(struct nfs_readdir_descriptor *desc)
 	desc->page_index = 0;
 	desc->cache_entry_index = 0;
 	desc->last_cookie = desc->dir_cookie;
-	desc->duped = 0;
 	desc->page_index_max = 0;
 
 	trace_nfs_readdir_uncached(desc->file, desc->verf, desc->last_cookie,
@@ -1146,6 +1135,8 @@ static int uncached_readdir(struct nfs_readdir_descriptor *desc)
 	for (i = 0; i < sz && arrays[i]; i++)
 		nfs_readdir_page_array_free(arrays[i]);
 out:
+	if (!nfs_readdir_use_cookie(desc->file))
+		nfs_readdir_rewind_search(desc);
 	desc->page_index_max = -1;
 	kfree(arrays);
 	dfprintk(DIRCACHE, "NFS: %s: returns %d\n", __func__, status);
@@ -1156,17 +1147,14 @@ static int uncached_readdir(struct nfs_readdir_descriptor *desc)
 
 static void nfs_readdir_handle_cache_misses(struct inode *inode,
 					    struct nfs_readdir_descriptor *desc,
-					    pgoff_t page_index,
 					    unsigned int cache_misses)
 {
 	if (desc->ctx->pos == 0 ||
 	    cache_misses <= NFS_READDIR_CACHE_MISS_THRESHOLD)
 		return;
-	if (invalidate_mapping_pages(inode->i_mapping, page_index + 1, -1) == 0)
+	if (invalidate_mapping_pages(inode->i_mapping, 0, -1) == 0)
 		return;
-	trace_nfs_readdir_invalidate_cache_range(
-		inode, (loff_t)(page_index + 1) << PAGE_SHIFT,
-		MAX_LFS_FILESIZE);
+	trace_nfs_readdir_invalidate_cache_range(inode, 0, MAX_LFS_FILESIZE);
 }
 
 /* The file offset position represents the dirent entry number.  A
@@ -1181,7 +1169,6 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 	struct nfs_open_dir_context *dir_ctx = file->private_data;
 	struct nfs_readdir_descriptor *desc;
 	unsigned int cache_hits, cache_misses;
-	pgoff_t page_index;
 	int res;
 
 	dfprintk(FILE, "NFS: readdir(%pD2) starting at cookie %llu\n",
@@ -1206,10 +1193,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 
 	spin_lock(&file->f_lock);
 	desc->dir_cookie = dir_ctx->dir_cookie;
-	desc->dup_cookie = dir_ctx->dup_cookie;
-	desc->duped = dir_ctx->duped;
-	page_index = dir_ctx->page_index;
-	desc->page_index = page_index;
+	desc->page_index = dir_ctx->page_index;
 	desc->last_cookie = dir_ctx->last_cookie;
 	desc->attr_gencount = dir_ctx->attr_gencount;
 	desc->eof = dir_ctx->eof;
@@ -1225,7 +1209,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 	}
 
 	desc->plus = nfs_use_readdirplus(inode, ctx, cache_hits, cache_misses);
-	nfs_readdir_handle_cache_misses(inode, desc, page_index, cache_misses);
+	nfs_readdir_handle_cache_misses(inode, desc, cache_misses);
 
 	do {
 		res = readdir_search_pagecache(desc);
@@ -1245,7 +1229,6 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 		}
 		if (res == -ETOOSMALL && desc->plus) {
 			nfs_zap_caches(inode);
-			desc->page_index = 0;
 			desc->plus = false;
 			desc->eof = false;
 			continue;
@@ -1259,9 +1242,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 
 	spin_lock(&file->f_lock);
 	dir_ctx->dir_cookie = desc->dir_cookie;
-	dir_ctx->dup_cookie = desc->dup_cookie;
 	dir_ctx->last_cookie = desc->last_cookie;
-	dir_ctx->duped = desc->duped;
 	dir_ctx->attr_gencount = desc->attr_gencount;
 	dir_ctx->page_index = desc->page_index;
 	dir_ctx->eof = desc->eof;
@@ -1304,13 +1285,13 @@ static loff_t nfs_llseek_dir(struct file *filp, loff_t offset, int whence)
 	if (offset != filp->f_pos) {
 		filp->f_pos = offset;
 		dir_ctx->page_index = 0;
-		if (!nfs_readdir_use_cookie(filp))
+		if (!nfs_readdir_use_cookie(filp)) {
 			dir_ctx->dir_cookie = 0;
-		else
+			dir_ctx->last_cookie = 0;
+		} else {
 			dir_ctx->dir_cookie = offset;
-		if (offset == 0)
-			memset(dir_ctx->verf, 0, sizeof(dir_ctx->verf));
-		dir_ctx->duped = 0;
+			dir_ctx->last_cookie = offset;
+		}
 		dir_ctx->eof = false;
 	}
 	spin_unlock(&filp->f_lock);
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 20a4cf0acad2..42aad886d3c0 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -106,11 +106,9 @@ struct nfs_open_dir_context {
 	unsigned long attr_gencount;
 	__be32	verf[NFS_DIR_VERIFIER_SIZE];
 	__u64 dir_cookie;
-	__u64 dup_cookie;
 	__u64 last_cookie;
 	pgoff_t page_index;
 	unsigned int dtsize;
-	signed char duped;
 	bool eof;
 	struct rcu_head rcu_head;
 };
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 24/27] NFS: Fix up forced readdirplus
  2022-02-27 23:12                                             ` [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index trondmy
@ 2022-02-27 23:12                                               ` trondmy
  2022-02-27 23:12                                                 ` [PATCH v9 25/27] NFS: Remove unnecessary cache invalidations for directories trondmy
  2022-03-09 20:01                                               ` [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index Benjamin Coddington
  1 sibling, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Avoid clearing the entire readdir page cache if we're just doing forced
readdirplus for the 'ls -l' heuristic.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c           | 56 +++++++++++++++++++++++++++++-------------
 fs/nfs/nfstrace.h      |  1 +
 include/linux/nfs_fs.h |  1 +
 3 files changed, 41 insertions(+), 17 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 707ad0fd5a4e..68b0f19053ac 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -170,6 +170,7 @@ struct nfs_readdir_descriptor {
 	unsigned int	cache_entry_index;
 	unsigned int	buffer_fills;
 	unsigned int	dtsize;
+	bool clear_cache;
 	bool plus;
 	bool eob;
 	bool eof;
@@ -227,6 +228,13 @@ static void nfs_readdir_clear_array(struct page *page)
 	kunmap_atomic(array);
 }
 
+static void nfs_readdir_page_reinit_array(struct page *page, u64 last_cookie,
+					  u64 change_attr)
+{
+	nfs_readdir_clear_array(page);
+	nfs_readdir_page_init_array(page, last_cookie, change_attr);
+}
+
 static struct page *
 nfs_readdir_page_array_alloc(u64 last_cookie, gfp_t gfp_flags)
 {
@@ -428,12 +436,11 @@ static struct page *nfs_readdir_page_get_next(struct address_space *mapping,
 	struct page *page;
 
 	page = nfs_readdir_page_get_locked(mapping, cookie, change_attr);
-	if (page) {
-		if (nfs_readdir_page_last_cookie(page) == cookie)
-			return page;
-		nfs_readdir_page_unlock_and_put(page);
-	}
-	return NULL;
+	if (!page)
+		return NULL;
+	if (nfs_readdir_page_last_cookie(page) != cookie)
+		nfs_readdir_page_reinit_array(page, cookie, change_attr);
+	return page;
 }
 
 static inline
@@ -958,9 +965,15 @@ nfs_readdir_page_get_cached(struct nfs_readdir_descriptor *desc)
 {
 	struct address_space *mapping = desc->file->f_mapping;
 	u64 change_attr = inode_peek_iversion_raw(mapping->host);
+	u64 cookie = desc->last_cookie;
+	struct page *page;
 
-	return nfs_readdir_page_get_locked(mapping, desc->last_cookie,
-					   change_attr);
+	page = nfs_readdir_page_get_locked(mapping, cookie, change_attr);
+	if (!page)
+		return NULL;
+	if (desc->clear_cache && !nfs_readdir_page_needs_filling(page))
+		nfs_readdir_page_reinit_array(page, cookie, change_attr);
+	return page;
 }
 
 /*
@@ -1011,6 +1024,7 @@ static int find_and_lock_cache_page(struct nfs_readdir_descriptor *desc)
 			trace_nfs_readdir_invalidate_cache_range(
 				inode, 1, MAX_LFS_FILESIZE);
 		}
+		desc->clear_cache = false;
 	}
 	res = nfs_readdir_search_array(desc);
 	if (res == 0)
@@ -1145,16 +1159,17 @@ static int uncached_readdir(struct nfs_readdir_descriptor *desc)
 
 #define NFS_READDIR_CACHE_MISS_THRESHOLD (16UL)
 
-static void nfs_readdir_handle_cache_misses(struct inode *inode,
+static bool nfs_readdir_handle_cache_misses(struct inode *inode,
 					    struct nfs_readdir_descriptor *desc,
-					    unsigned int cache_misses)
+					    unsigned int cache_misses,
+					    bool force_clear)
 {
-	if (desc->ctx->pos == 0 ||
-	    cache_misses <= NFS_READDIR_CACHE_MISS_THRESHOLD)
-		return;
-	if (invalidate_mapping_pages(inode->i_mapping, 0, -1) == 0)
-		return;
-	trace_nfs_readdir_invalidate_cache_range(inode, 0, MAX_LFS_FILESIZE);
+	if (desc->ctx->pos == 0 || !desc->plus)
+		return false;
+	if (cache_misses <= NFS_READDIR_CACHE_MISS_THRESHOLD && !force_clear)
+		return false;
+	trace_nfs_readdir_force_readdirplus(inode);
+	return true;
 }
 
 /* The file offset position represents the dirent entry number.  A
@@ -1169,6 +1184,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 	struct nfs_open_dir_context *dir_ctx = file->private_data;
 	struct nfs_readdir_descriptor *desc;
 	unsigned int cache_hits, cache_misses;
+	bool force_clear;
 	int res;
 
 	dfprintk(FILE, "NFS: readdir(%pD2) starting at cookie %llu\n",
@@ -1201,6 +1217,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 	memcpy(desc->verf, dir_ctx->verf, sizeof(desc->verf));
 	cache_hits = atomic_xchg(&dir_ctx->cache_hits, 0);
 	cache_misses = atomic_xchg(&dir_ctx->cache_misses, 0);
+	force_clear = dir_ctx->force_clear;
 	spin_unlock(&file->f_lock);
 
 	if (desc->eof) {
@@ -1209,7 +1226,9 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 	}
 
 	desc->plus = nfs_use_readdirplus(inode, ctx, cache_hits, cache_misses);
-	nfs_readdir_handle_cache_misses(inode, desc, cache_misses);
+	force_clear = nfs_readdir_handle_cache_misses(inode, desc, cache_misses,
+						      force_clear);
+	desc->clear_cache = force_clear;
 
 	do {
 		res = readdir_search_pagecache(desc);
@@ -1238,6 +1257,8 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 
 		nfs_do_filldir(desc, nfsi->cookieverf);
 		nfs_readdir_page_unlock_and_put_cached(desc);
+		if (desc->page_index == desc->page_index_max)
+			desc->clear_cache = force_clear;
 	} while (!desc->eob && !desc->eof);
 
 	spin_lock(&file->f_lock);
@@ -1245,6 +1266,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 	dir_ctx->last_cookie = desc->last_cookie;
 	dir_ctx->attr_gencount = desc->attr_gencount;
 	dir_ctx->page_index = desc->page_index;
+	dir_ctx->force_clear = force_clear;
 	dir_ctx->eof = desc->eof;
 	dir_ctx->dtsize = desc->dtsize;
 	memcpy(dir_ctx->verf, desc->verf, sizeof(dir_ctx->verf));
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index ec2645d20abf..59f4ca803fd0 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -160,6 +160,7 @@ DEFINE_NFS_INODE_EVENT(nfs_fsync_enter);
 DEFINE_NFS_INODE_EVENT_DONE(nfs_fsync_exit);
 DEFINE_NFS_INODE_EVENT(nfs_access_enter);
 DEFINE_NFS_INODE_EVENT_DONE(nfs_set_cache_invalid);
+DEFINE_NFS_INODE_EVENT(nfs_readdir_force_readdirplus);
 DEFINE_NFS_INODE_EVENT_DONE(nfs_readdir_cache_fill_done);
 DEFINE_NFS_INODE_EVENT_DONE(nfs_readdir_uncached_done);
 
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 42aad886d3c0..3893386ceaed 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -109,6 +109,7 @@ struct nfs_open_dir_context {
 	__u64 last_cookie;
 	pgoff_t page_index;
 	unsigned int dtsize;
+	bool force_clear;
 	bool eof;
 	struct rcu_head rcu_head;
 };
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 25/27] NFS: Remove unnecessary cache invalidations for directories
  2022-02-27 23:12                                               ` [PATCH v9 24/27] NFS: Fix up forced readdirplus trondmy
@ 2022-02-27 23:12                                                 ` trondmy
  2022-02-27 23:12                                                   ` [PATCH v9 26/27] NFS: Optimise away the previous cookie field trondmy
  0 siblings, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Now that the directory page cache entries police themselves, don't
bother with marking the page cache for invalidation.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c           | 5 -----
 fs/nfs/inode.c         | 9 +++------
 fs/nfs/nfs4proc.c      | 2 --
 include/linux/nfs_fs.h | 2 --
 4 files changed, 3 insertions(+), 15 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 68b0f19053ac..5a2c98b2cc15 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -83,11 +83,6 @@ alloc_nfs_open_dir_context(struct inode *dir)
 		ctx->attr_gencount = nfsi->attr_gencount;
 		ctx->dtsize = NFS_INIT_DTSIZE;
 		spin_lock(&dir->i_lock);
-		if (list_empty(&nfsi->open_files) &&
-		    (nfsi->cache_validity & NFS_INO_DATA_INVAL_DEFER))
-			nfs_set_cache_invalid(dir,
-					      NFS_INO_INVALID_DATA |
-						      NFS_INO_REVAL_FORCED);
 		list_add_tail_rcu(&ctx->list, &nfsi->open_files);
 		memcpy(ctx->verf, nfsi->cookieverf, sizeof(ctx->verf));
 		spin_unlock(&dir->i_lock);
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 10d17cfb8639..43af1b6de5a6 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -210,6 +210,8 @@ void nfs_set_cache_invalid(struct inode *inode, unsigned long flags)
 	if (flags & NFS_INO_INVALID_DATA)
 		nfs_fscache_invalidate(inode, 0);
 	flags &= ~NFS_INO_REVAL_FORCED;
+	if (S_ISDIR(inode->i_mode))
+		flags &= ~(NFS_INO_INVALID_DATA | NFS_INO_DATA_INVAL_DEFER);
 
 	nfsi->cache_validity |= flags;
 
@@ -1429,10 +1431,7 @@ static void nfs_wcc_update_inode(struct inode *inode, struct nfs_fattr *fattr)
 			&& (fattr->valid & NFS_ATTR_FATTR_CHANGE)
 			&& inode_eq_iversion_raw(inode, fattr->pre_change_attr)) {
 		inode_set_iversion_raw(inode, fattr->change_attr);
-		if (S_ISDIR(inode->i_mode))
-			nfs_set_cache_invalid(inode, NFS_INO_INVALID_DATA);
-		else if (nfs_server_capable(inode, NFS_CAP_XATTR))
-			nfs_set_cache_invalid(inode, NFS_INO_INVALID_XATTR);
+		nfs_set_cache_invalid(inode, NFS_INO_INVALID_XATTR);
 	}
 	/* If we have atomic WCC data, we may update some attributes */
 	ts = inode->i_ctime;
@@ -1851,8 +1850,6 @@ EXPORT_SYMBOL_GPL(nfs_refresh_inode);
 static int nfs_post_op_update_inode_locked(struct inode *inode,
 		struct nfs_fattr *fattr, unsigned int invalid)
 {
-	if (S_ISDIR(inode->i_mode))
-		invalid |= NFS_INO_INVALID_DATA;
 	nfs_set_cache_invalid(inode, invalid);
 	if ((fattr->valid & NFS_ATTR_FATTR) == 0)
 		return 0;
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 8b875355824b..f1aa6b3c8523 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -1206,8 +1206,6 @@ nfs4_update_changeattr_locked(struct inode *inode,
 	u64 change_attr = inode_peek_iversion_raw(inode);
 
 	cache_validity |= NFS_INO_INVALID_CTIME | NFS_INO_INVALID_MTIME;
-	if (S_ISDIR(inode->i_mode))
-		cache_validity |= NFS_INO_INVALID_DATA;
 
 	switch (NFS_SERVER(inode)->change_attr_type) {
 	case NFS4_CHANGE_TYPE_IS_UNDEFINED:
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 3893386ceaed..72f42b1d0d3c 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -360,8 +360,6 @@ static inline void nfs_mark_for_revalidate(struct inode *inode)
 	nfsi->cache_validity |= NFS_INO_INVALID_ACCESS | NFS_INO_INVALID_ACL |
 				NFS_INO_INVALID_CHANGE | NFS_INO_INVALID_CTIME |
 				NFS_INO_INVALID_SIZE;
-	if (S_ISDIR(inode->i_mode))
-		nfsi->cache_validity |= NFS_INO_INVALID_DATA;
 	spin_unlock(&inode->i_lock);
 }
 
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 26/27] NFS: Optimise away the previous cookie field
  2022-02-27 23:12                                                 ` [PATCH v9 25/27] NFS: Remove unnecessary cache invalidations for directories trondmy
@ 2022-02-27 23:12                                                   ` trondmy
  2022-02-27 23:12                                                     ` [PATCH v9 27/27] NFS: Cache all entries in the readdirplus reply trondmy
  0 siblings, 1 reply; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Replace the 'previous cookie' field in struct nfs_entry with the
array->last_cookie.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c            | 26 ++++++++++++++------------
 fs/nfs/nfs2xdr.c        |  1 -
 fs/nfs/nfs3xdr.c        |  1 -
 fs/nfs/nfs4xdr.c        |  1 -
 include/linux/nfs_xdr.h |  3 +--
 5 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 5a2c98b2cc15..c2c847845464 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -296,19 +296,20 @@ static int nfs_readdir_array_can_expand(struct nfs_cache_array *array)
 	return 0;
 }
 
-static
-int nfs_readdir_add_to_array(struct nfs_entry *entry, struct page *page)
+static int nfs_readdir_page_array_append(struct page *page,
+					 const struct nfs_entry *entry,
+					 u64 *cookie)
 {
 	struct nfs_cache_array *array;
 	struct nfs_cache_array_entry *cache_entry;
 	const char *name;
-	int ret;
+	int ret = -ENOMEM;
 
 	name = nfs_readdir_copy_name(entry->name, entry->len);
-	if (!name)
-		return -ENOMEM;
 
 	array = kmap_atomic(page);
+	if (!name)
+		goto out;
 	ret = nfs_readdir_array_can_expand(array);
 	if (ret) {
 		kfree(name);
@@ -316,7 +317,7 @@ int nfs_readdir_add_to_array(struct nfs_entry *entry, struct page *page)
 	}
 
 	cache_entry = &array->array[array->size];
-	cache_entry->cookie = entry->prev_cookie;
+	cache_entry->cookie = array->last_cookie;
 	cache_entry->ino = entry->ino;
 	cache_entry->d_type = entry->d_type;
 	cache_entry->name_len = entry->len;
@@ -328,6 +329,7 @@ int nfs_readdir_add_to_array(struct nfs_entry *entry, struct page *page)
 	if (entry->eof != 0)
 		nfs_readdir_array_set_eof(array);
 out:
+	*cookie = array->last_cookie;
 	kunmap_atomic(array);
 	return ret;
 }
@@ -791,6 +793,7 @@ static int nfs_readdir_page_filler(struct nfs_readdir_descriptor *desc,
 	struct xdr_stream stream;
 	struct xdr_buf buf;
 	struct page *scratch, *new, *page = *arrays;
+	u64 cookie;
 	int status;
 
 	scratch = alloc_page(GFP_KERNEL);
@@ -812,22 +815,21 @@ static int nfs_readdir_page_filler(struct nfs_readdir_descriptor *desc,
 			nfs_prime_dcache(file_dentry(desc->file), entry,
 					desc->dir_verifier);
 
-		status = nfs_readdir_add_to_array(entry, page);
+		status = nfs_readdir_page_array_append(page, entry, &cookie);
 		if (status != -ENOSPC)
 			continue;
 
 		if (page->mapping != mapping) {
 			if (!--narrays)
 				break;
-			new = nfs_readdir_page_array_alloc(entry->prev_cookie,
-							   GFP_KERNEL);
+			new = nfs_readdir_page_array_alloc(cookie, GFP_KERNEL);
 			if (!new)
 				break;
 			arrays++;
 			*arrays = page = new;
 		} else {
-			new = nfs_readdir_page_get_next(
-				mapping, entry->prev_cookie, change_attr);
+			new = nfs_readdir_page_get_next(mapping, cookie,
+							change_attr);
 			if (!new)
 				break;
 			if (page != *arrays)
@@ -835,7 +837,7 @@ static int nfs_readdir_page_filler(struct nfs_readdir_descriptor *desc,
 			page = new;
 		}
 		desc->page_index_max++;
-		status = nfs_readdir_add_to_array(entry, page);
+		status = nfs_readdir_page_array_append(page, entry, &cookie);
 	} while (!status && !entry->eof);
 
 	switch (status) {
diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c
index 3d5ba43f44bb..05c3b4b2b3dd 100644
--- a/fs/nfs/nfs2xdr.c
+++ b/fs/nfs/nfs2xdr.c
@@ -955,7 +955,6 @@ int nfs2_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
 	 * The type (size and byte order) of nfscookie isn't defined in
 	 * RFC 1094.  This implementation assumes that it's an XDR uint32.
 	 */
-	entry->prev_cookie = entry->cookie;
 	p = xdr_inline_decode(xdr, 4);
 	if (unlikely(!p))
 		return -EAGAIN;
diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index d6779ceeb39e..3b0b650c9c5a 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -2024,7 +2024,6 @@ int nfs3_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
 			zero_nfs_fh3(entry->fh);
 	}
 
-	entry->prev_cookie = entry->cookie;
 	entry->cookie = new_cookie;
 
 	return 0;
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index b7780b97dc4d..86a5f6516928 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -7508,7 +7508,6 @@ int nfs4_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
 	if (entry->fattr->valid & NFS_ATTR_FATTR_TYPE)
 		entry->d_type = nfs_umode_to_dtype(entry->fattr->mode);
 
-	entry->prev_cookie = entry->cookie;
 	entry->cookie = new_cookie;
 
 	return 0;
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 728cb0c1f0b6..82f7c2730b9a 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -745,8 +745,7 @@ struct nfs_auth_info {
  */
 struct nfs_entry {
 	__u64			ino;
-	__u64			cookie,
-				prev_cookie;
+	__u64			cookie;
 	const char *		name;
 	unsigned int		len;
 	int			eof;
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH v9 27/27] NFS: Cache all entries in the readdirplus reply
  2022-02-27 23:12                                                   ` [PATCH v9 26/27] NFS: Optimise away the previous cookie field trondmy
@ 2022-02-27 23:12                                                     ` trondmy
  0 siblings, 0 replies; 46+ messages in thread
From: trondmy @ 2022-02-27 23:12 UTC (permalink / raw)
  To: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Even if we're not able to cache all the entries in the readdir buffer,
let's ensure that we do prime the dcache.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/dir.c | 40 ++++++++++++++++++++++++++--------------
 1 file changed, 26 insertions(+), 14 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index c2c847845464..4cd77bc5022d 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -782,6 +782,21 @@ void nfs_prime_dcache(struct dentry *parent, struct nfs_entry *entry,
 	dput(dentry);
 }
 
+static int nfs_readdir_entry_decode(struct nfs_readdir_descriptor *desc,
+				    struct nfs_entry *entry,
+				    struct xdr_stream *stream)
+{
+	int ret;
+
+	if (entry->fattr->label)
+		entry->fattr->label->len = NFS4_MAXLABELLEN;
+	ret = xdr_decode(desc, entry, stream);
+	if (ret || !desc->plus)
+		return ret;
+	nfs_prime_dcache(file_dentry(desc->file), entry, desc->dir_verifier);
+	return 0;
+}
+
 /* Perform conversion from xdr to cache array */
 static int nfs_readdir_page_filler(struct nfs_readdir_descriptor *desc,
 				   struct nfs_entry *entry,
@@ -804,17 +819,10 @@ static int nfs_readdir_page_filler(struct nfs_readdir_descriptor *desc,
 	xdr_set_scratch_page(&stream, scratch);
 
 	do {
-		if (entry->fattr->label)
-			entry->fattr->label->len = NFS4_MAXLABELLEN;
-
-		status = xdr_decode(desc, entry, &stream);
+		status = nfs_readdir_entry_decode(desc, entry, &stream);
 		if (status != 0)
 			break;
 
-		if (desc->plus)
-			nfs_prime_dcache(file_dentry(desc->file), entry,
-					desc->dir_verifier);
-
 		status = nfs_readdir_page_array_append(page, entry, &cookie);
 		if (status != -ENOSPC)
 			continue;
@@ -842,15 +850,19 @@ static int nfs_readdir_page_filler(struct nfs_readdir_descriptor *desc,
 
 	switch (status) {
 	case -EBADCOOKIE:
-		if (entry->eof) {
-			nfs_readdir_page_set_eof(page);
-			status = 0;
-		}
-		break;
-	case -ENOSPC:
+		if (!entry->eof)
+			break;
+		nfs_readdir_page_set_eof(page);
+		fallthrough;
 	case -EAGAIN:
 		status = 0;
 		break;
+	case -ENOSPC:
+		status = 0;
+		if (!desc->plus)
+			break;
+		while (!nfs_readdir_entry_decode(desc, entry, &stream))
+			;
 	}
 
 	if (page != *arrays)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 07/27] NFS: Store the change attribute in the directory page cache
  2022-02-27 23:12             ` [PATCH v9 07/27] NFS: Store the change attribute in the directory page cache trondmy
  2022-02-27 23:12               ` [PATCH v9 08/27] NFS: Don't re-read the entire page cache to find the next cookie trondmy
@ 2022-03-01 19:09               ` Anna Schumaker
  2022-03-01 23:11                 ` Trond Myklebust
  1 sibling, 1 reply; 46+ messages in thread
From: Anna Schumaker @ 2022-03-01 19:09 UTC (permalink / raw)
  To: trondmy; +Cc: Linux NFS Mailing List

Hi Trond,

On Mon, Feb 28, 2022 at 5:51 AM <trondmy@kernel.org> wrote:
>
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> Use the change attribute and the first cookie in a directory page cache
> entry to validate that the page is up to date.

Starting with this patch I'm seeing cthon basic tests fail on NFS v3:

Tue Mar  1 14:08:39 EST 2022
./server -b -o tcp,v3,sec=sys -m /mnt/nfsv3tcp -p /srv/test/anna/nfsv3tcp server
./server -b -o proto=tcp,sec=sys,v4.0 -m /mnt/nfsv4tcp -p
/srv/test/anna/nfsv4tcp server
./server -b -o proto=tcp,sec=sys,v4.1 -m /mnt/nfsv41tcp -p
/srv/test/anna/nfsv41tcp server
./server -b -o proto=tcp,sec=sys,v4.2 -m /mnt/nfsv42tcp -p
/srv/test/anna/nfsv42tcp server
Waiting for 'b' to finish...
The '-b' test using '-o tcp,v3,sec=sys' args to server: Failed!!
 Done: 14:08:41

Anna
>
> Suggested-by: Benjamin Coddington <bcodding@redhat.com>
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
>  fs/nfs/dir.c | 68 ++++++++++++++++++++++++++++------------------------
>  1 file changed, 37 insertions(+), 31 deletions(-)
>
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index 6f0a38db6c37..bfb553c57274 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -140,6 +140,7 @@ struct nfs_cache_array_entry {
>  };
>
>  struct nfs_cache_array {
> +       u64 change_attr;
>         u64 last_cookie;
>         unsigned int size;
>         unsigned char page_full : 1,
> @@ -176,12 +177,14 @@ static void nfs_readdir_array_init(struct nfs_cache_array *array)
>         memset(array, 0, sizeof(struct nfs_cache_array));
>  }
>
> -static void nfs_readdir_page_init_array(struct page *page, u64 last_cookie)
> +static void nfs_readdir_page_init_array(struct page *page, u64 last_cookie,
> +                                       u64 change_attr)
>  {
>         struct nfs_cache_array *array;
>
>         array = kmap_atomic(page);
>         nfs_readdir_array_init(array);
> +       array->change_attr = change_attr;
>         array->last_cookie = last_cookie;
>         array->cookies_are_ordered = 1;
>         kunmap_atomic(array);
> @@ -208,7 +211,7 @@ nfs_readdir_page_array_alloc(u64 last_cookie, gfp_t gfp_flags)
>  {
>         struct page *page = alloc_page(gfp_flags);
>         if (page)
> -               nfs_readdir_page_init_array(page, last_cookie);
> +               nfs_readdir_page_init_array(page, last_cookie, 0);
>         return page;
>  }
>
> @@ -305,19 +308,43 @@ int nfs_readdir_add_to_array(struct nfs_entry *entry, struct page *page)
>         return ret;
>  }
>
> +static bool nfs_readdir_page_validate(struct page *page, u64 last_cookie,
> +                                     u64 change_attr)
> +{
> +       struct nfs_cache_array *array = kmap_atomic(page);
> +       int ret = true;
> +
> +       if (array->change_attr != change_attr)
> +               ret = false;
> +       if (array->size > 0 && array->array[0].cookie != last_cookie)
> +               ret = false;
> +       kunmap_atomic(array);
> +       return ret;
> +}
> +
> +static void nfs_readdir_page_unlock_and_put(struct page *page)
> +{
> +       unlock_page(page);
> +       put_page(page);
> +}
> +
>  static struct page *nfs_readdir_page_get_locked(struct address_space *mapping,
>                                                 pgoff_t index, u64 last_cookie)
>  {
>         struct page *page;
> +       u64 change_attr;
>
>         page = grab_cache_page(mapping, index);
> -       if (page && !PageUptodate(page)) {
> -               nfs_readdir_page_init_array(page, last_cookie);
> -               if (invalidate_inode_pages2_range(mapping, index + 1, -1) < 0)
> -                       nfs_zap_mapping(mapping->host, mapping);
> -               SetPageUptodate(page);
> +       if (!page)
> +               return NULL;
> +       change_attr = inode_peek_iversion_raw(mapping->host);
> +       if (PageUptodate(page)) {
> +               if (nfs_readdir_page_validate(page, last_cookie, change_attr))
> +                       return page;
> +               nfs_readdir_clear_array(page);
>         }
> -
> +       nfs_readdir_page_init_array(page, last_cookie, change_attr);
> +       SetPageUptodate(page);
>         return page;
>  }
>
> @@ -357,12 +384,6 @@ static void nfs_readdir_page_set_eof(struct page *page)
>         kunmap_atomic(array);
>  }
>
> -static void nfs_readdir_page_unlock_and_put(struct page *page)
> -{
> -       unlock_page(page);
> -       put_page(page);
> -}
> -
>  static struct page *nfs_readdir_page_get_next(struct address_space *mapping,
>                                               pgoff_t index, u64 cookie)
>  {
> @@ -419,16 +440,6 @@ static int nfs_readdir_search_for_pos(struct nfs_cache_array *array,
>         return -EBADCOOKIE;
>  }
>
> -static bool
> -nfs_readdir_inode_mapping_valid(struct nfs_inode *nfsi)
> -{
> -       if (nfsi->cache_validity & (NFS_INO_INVALID_CHANGE |
> -                                   NFS_INO_INVALID_DATA))
> -               return false;
> -       smp_rmb();
> -       return !test_bit(NFS_INO_INVALIDATING, &nfsi->flags);
> -}
> -
>  static bool nfs_readdir_array_cookie_in_range(struct nfs_cache_array *array,
>                                               u64 cookie)
>  {
> @@ -457,8 +468,7 @@ static int nfs_readdir_search_for_cookie(struct nfs_cache_array *array,
>                         struct nfs_inode *nfsi = NFS_I(file_inode(desc->file));
>
>                         new_pos = nfs_readdir_page_offset(desc->page) + i;
> -                       if (desc->attr_gencount != nfsi->attr_gencount ||
> -                           !nfs_readdir_inode_mapping_valid(nfsi)) {
> +                       if (desc->attr_gencount != nfsi->attr_gencount) {
>                                 desc->duped = 0;
>                                 desc->attr_gencount = nfsi->attr_gencount;
>                         } else if (new_pos < desc->prev_index) {
> @@ -1095,11 +1105,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
>          * to either find the entry with the appropriate number or
>          * revalidate the cookie.
>          */
> -       if (ctx->pos == 0 || nfs_attribute_cache_expired(inode)) {
> -               res = nfs_revalidate_mapping(inode, file->f_mapping);
> -               if (res < 0)
> -                       goto out;
> -       }
> +       nfs_revalidate_inode(inode, NFS_INO_INVALID_CHANGE);
>
>         res = -ENOMEM;
>         desc = kzalloc(sizeof(*desc), GFP_KERNEL);
> --
> 2.35.1
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 07/27] NFS: Store the change attribute in the directory page cache
  2022-03-01 19:09               ` [PATCH v9 07/27] NFS: Store the change attribute in the directory page cache Anna Schumaker
@ 2022-03-01 23:11                 ` Trond Myklebust
  0 siblings, 0 replies; 46+ messages in thread
From: Trond Myklebust @ 2022-03-01 23:11 UTC (permalink / raw)
  To: schumaker.anna, trondmy; +Cc: linux-nfs

On Tue, 2022-03-01 at 14:09 -0500, Anna Schumaker wrote:
> Hi Trond,
> 
> On Mon, Feb 28, 2022 at 5:51 AM <trondmy@kernel.org> wrote:
> > 
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > 
> > Use the change attribute and the first cookie in a directory page
> > cache
> > entry to validate that the page is up to date.
> 
> Starting with this patch I'm seeing cthon basic tests fail on NFS v3:
> 
> Tue Mar  1 14:08:39 EST 2022
> ./server -b -o tcp,v3,sec=sys -m /mnt/nfsv3tcp -p
> /srv/test/anna/nfsv3tcp server
> ./server -b -o proto=tcp,sec=sys,v4.0 -m /mnt/nfsv4tcp -p
> /srv/test/anna/nfsv4tcp server
> ./server -b -o proto=tcp,sec=sys,v4.1 -m /mnt/nfsv41tcp -p
> /srv/test/anna/nfsv41tcp server
> ./server -b -o proto=tcp,sec=sys,v4.2 -m /mnt/nfsv42tcp -p
> /srv/test/anna/nfsv42tcp server
> Waiting for 'b' to finish...
> The '-b' test using '-o tcp,v3,sec=sys' args to server: Failed!!
>  Done: 14:08:41

Which tests are failing, and what is the server configuration that
you're testing against?
I've not been seeing issues with either connectathon or xfstests on the
platforms I've tested.

> 
> Anna
> > 
> > Suggested-by: Benjamin Coddington <bcodding@redhat.com>
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > ---
> >  fs/nfs/dir.c | 68 ++++++++++++++++++++++++++++--------------------
> > ----
> >  1 file changed, 37 insertions(+), 31 deletions(-)
> > 
> > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > index 6f0a38db6c37..bfb553c57274 100644
> > --- a/fs/nfs/dir.c
> > +++ b/fs/nfs/dir.c
> > @@ -140,6 +140,7 @@ struct nfs_cache_array_entry {
> >  };
> > 
> >  struct nfs_cache_array {
> > +       u64 change_attr;
> >         u64 last_cookie;
> >         unsigned int size;
> >         unsigned char page_full : 1,
> > @@ -176,12 +177,14 @@ static void nfs_readdir_array_init(struct
> > nfs_cache_array *array)
> >         memset(array, 0, sizeof(struct nfs_cache_array));
> >  }
> > 
> > -static void nfs_readdir_page_init_array(struct page *page, u64
> > last_cookie)
> > +static void nfs_readdir_page_init_array(struct page *page, u64
> > last_cookie,
> > +                                       u64 change_attr)
> >  {
> >         struct nfs_cache_array *array;
> > 
> >         array = kmap_atomic(page);
> >         nfs_readdir_array_init(array);
> > +       array->change_attr = change_attr;
> >         array->last_cookie = last_cookie;
> >         array->cookies_are_ordered = 1;
> >         kunmap_atomic(array);
> > @@ -208,7 +211,7 @@ nfs_readdir_page_array_alloc(u64 last_cookie,
> > gfp_t gfp_flags)
> >  {
> >         struct page *page = alloc_page(gfp_flags);
> >         if (page)
> > -               nfs_readdir_page_init_array(page, last_cookie);
> > +               nfs_readdir_page_init_array(page, last_cookie, 0);
> >         return page;
> >  }
> > 
> > @@ -305,19 +308,43 @@ int nfs_readdir_add_to_array(struct nfs_entry
> > *entry, struct page *page)
> >         return ret;
> >  }
> > 
> > +static bool nfs_readdir_page_validate(struct page *page, u64
> > last_cookie,
> > +                                     u64 change_attr)
> > +{
> > +       struct nfs_cache_array *array = kmap_atomic(page);
> > +       int ret = true;
> > +
> > +       if (array->change_attr != change_attr)
> > +               ret = false;
> > +       if (array->size > 0 && array->array[0].cookie !=
> > last_cookie)
> > +               ret = false;
> > +       kunmap_atomic(array);
> > +       return ret;
> > +}
> > +
> > +static void nfs_readdir_page_unlock_and_put(struct page *page)
> > +{
> > +       unlock_page(page);
> > +       put_page(page);
> > +}
> > +
> >  static struct page *nfs_readdir_page_get_locked(struct
> > address_space *mapping,
> >                                                 pgoff_t index, u64
> > last_cookie)
> >  {
> >         struct page *page;
> > +       u64 change_attr;
> > 
> >         page = grab_cache_page(mapping, index);
> > -       if (page && !PageUptodate(page)) {
> > -               nfs_readdir_page_init_array(page, last_cookie);
> > -               if (invalidate_inode_pages2_range(mapping, index +
> > 1, -1) < 0)
> > -                       nfs_zap_mapping(mapping->host, mapping);
> > -               SetPageUptodate(page);
> > +       if (!page)
> > +               return NULL;
> > +       change_attr = inode_peek_iversion_raw(mapping->host);
> > +       if (PageUptodate(page)) {
> > +               if (nfs_readdir_page_validate(page, last_cookie,
> > change_attr))
> > +                       return page;
> > +               nfs_readdir_clear_array(page);
> >         }
> > -
> > +       nfs_readdir_page_init_array(page, last_cookie,
> > change_attr);
> > +       SetPageUptodate(page);
> >         return page;
> >  }
> > 
> > @@ -357,12 +384,6 @@ static void nfs_readdir_page_set_eof(struct
> > page *page)
> >         kunmap_atomic(array);
> >  }
> > 
> > -static void nfs_readdir_page_unlock_and_put(struct page *page)
> > -{
> > -       unlock_page(page);
> > -       put_page(page);
> > -}
> > -
> >  static struct page *nfs_readdir_page_get_next(struct address_space
> > *mapping,
> >                                               pgoff_t index, u64
> > cookie)
> >  {
> > @@ -419,16 +440,6 @@ static int nfs_readdir_search_for_pos(struct
> > nfs_cache_array *array,
> >         return -EBADCOOKIE;
> >  }
> > 
> > -static bool
> > -nfs_readdir_inode_mapping_valid(struct nfs_inode *nfsi)
> > -{
> > -       if (nfsi->cache_validity & (NFS_INO_INVALID_CHANGE |
> > -                                   NFS_INO_INVALID_DATA))
> > -               return false;
> > -       smp_rmb();
> > -       return !test_bit(NFS_INO_INVALIDATING, &nfsi->flags);
> > -}
> > -
> >  static bool nfs_readdir_array_cookie_in_range(struct
> > nfs_cache_array *array,
> >                                               u64 cookie)
> >  {
> > @@ -457,8 +468,7 @@ static int nfs_readdir_search_for_cookie(struct
> > nfs_cache_array *array,
> >                         struct nfs_inode *nfsi =
> > NFS_I(file_inode(desc->file));
> > 
> >                         new_pos = nfs_readdir_page_offset(desc-
> > >page) + i;
> > -                       if (desc->attr_gencount != nfsi-
> > >attr_gencount ||
> > -                           !nfs_readdir_inode_mapping_valid(nfsi))
> > {
> > +                       if (desc->attr_gencount != nfsi-
> > >attr_gencount) {
> >                                 desc->duped = 0;
> >                                 desc->attr_gencount = nfsi-
> > >attr_gencount;
> >                         } else if (new_pos < desc->prev_index) {
> > @@ -1095,11 +1105,7 @@ static int nfs_readdir(struct file *file,
> > struct dir_context *ctx)
> >          * to either find the entry with the appropriate number or
> >          * revalidate the cookie.
> >          */
> > -       if (ctx->pos == 0 || nfs_attribute_cache_expired(inode)) {
> > -               res = nfs_revalidate_mapping(inode, file-
> > >f_mapping);
> > -               if (res < 0)
> > -                       goto out;
> > -       }
> > +       nfs_revalidate_inode(inode, NFS_INO_INVALID_CHANGE);
> > 
> >         res = -ENOMEM;
> >         desc = kzalloc(sizeof(*desc), GFP_KERNEL);
> > --
> > 2.35.1
> > 

-- 
Trond Myklebust
CTO, Hammerspace Inc
4984 El Camino Real, Suite 208
Los Altos, CA 94022
​
www.hammer.space


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 03/27] NFS: Trace lookup revalidation failure
  2022-02-27 23:12     ` [PATCH v9 03/27] NFS: Trace lookup revalidation failure trondmy
  2022-02-27 23:12       ` [PATCH v9 04/27] NFS: Initialise the readdir verifier as best we can in nfs_opendir() trondmy
@ 2022-03-09 13:42       ` Benjamin Coddington
  2022-03-09 15:28         ` Chuck Lever III
  1 sibling, 1 reply; 46+ messages in thread
From: Benjamin Coddington @ 2022-03-09 13:42 UTC (permalink / raw)
  To: trondmy; +Cc: linux-nfs

On 24 Feb 2022, at 21:09, Trond Myklebust wrote:
> On Thu, 2022-02-24 at 09:14 -0500, Benjamin Coddington wrote:
>> There's a path through nfs4_lookup_revalidate that will now only produce
>> this exit tracepoint.  Does it need the _enter tracepoint added?
>
> You're thinking about the nfs_lookup_revalidate_delegated() path? The
> _enter() tracepoint doesn't provide any useful information that isn't
> already provided by the _exit(), AFAICS.

No, the path through nfs4_do_lookup_revalidate(), reval_dentry: jump.  But I
agree there's not much value in the _enter() tracepoint.  Maybe we can
remove it, and _exit more like _done.

I am thinking about hearing back from folks about mis-matched _enter() and
_exit() results, but also realize this is nit-picking.

Ben


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 03/27] NFS: Trace lookup revalidation failure
  2022-03-09 13:42       ` [PATCH v9 03/27] NFS: Trace lookup revalidation failure Benjamin Coddington
@ 2022-03-09 15:28         ` Chuck Lever III
  2022-03-09 21:35           ` Benjamin Coddington
  0 siblings, 1 reply; 46+ messages in thread
From: Chuck Lever III @ 2022-03-09 15:28 UTC (permalink / raw)
  To: Benjamin Coddington, trondmy; +Cc: Linux NFS Mailing List



> On Mar 9, 2022, at 8:42 AM, Benjamin Coddington <bcodding@redhat.com> wrote:
> 
> On 24 Feb 2022, at 21:09, Trond Myklebust wrote:
>> On Thu, 2022-02-24 at 09:14 -0500, Benjamin Coddington wrote:
>>> There's a path through nfs4_lookup_revalidate that will now only produce
>>> this exit tracepoint.  Does it need the _enter tracepoint added?
>> 
>> You're thinking about the nfs_lookup_revalidate_delegated() path? The
>> _enter() tracepoint doesn't provide any useful information that isn't
>> already provided by the _exit(), AFAICS.
> 
> No, the path through nfs4_do_lookup_revalidate(), reval_dentry: jump.  But I
> agree there's not much value in the _enter() tracepoint.  Maybe we can
> remove it, and _exit more like _done.
> 
> I am thinking about hearing back from folks about mis-matched _enter() and
> _exit() results, but also realize this is nit-picking.

I think the _enter / _exit trace points simply replaced
dprintk call sites which did much the same reporting.
Maybe we should consider replacing some of these because
we can rely on function call tracing instead.

But generally we like to see trace points that report
exceptional events rather than "I made it to this point".
The latter category of trace points are interesting
while code is under development but often loses its
value once the code is in the field.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 14/27] NFS: Improve heuristic for readdirplus
  2022-02-27 23:12                           ` [PATCH v9 14/27] NFS: Improve heuristic for readdirplus trondmy
  2022-02-27 23:12                             ` [PATCH v9 15/27] NFS: Don't ask for readdirplus unless it can help nfs_getattr() trondmy
@ 2022-03-09 17:39                             ` Benjamin Coddington
  2022-03-10 14:31                               ` [PATCH] NFS: Trigger "ls -l" readdir heuristic sooner Benjamin Coddington
  2022-03-10 20:15                               ` [PATCH v9 14/27] NFS: Improve heuristic for readdirplus Trond Myklebust
  1 sibling, 2 replies; 46+ messages in thread
From: Benjamin Coddington @ 2022-03-09 17:39 UTC (permalink / raw)
  To: trondmy; +Cc: linux-nfs

On 27 Feb 2022, at 18:12, trondmy@kernel.org wrote:

> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> The heuristic for readdirplus is designed to try to detect 'ls -l' and
> similar patterns. It does so by looking for cache hit/miss patterns in
> both the attribute cache and in the dcache of the files in a given
> directory, and then sets a flag for the readdirplus code to interpret.
>
> The problem with this approach is that a single attribute or dcache miss
> can cause the NFS code to force a refresh of the attributes for the
> entire set of files contained in the directory.
>
> To be able to make a more nuanced decision, let's sample the number of
> hits and misses in the set of open directory descriptors. That allows us
> to set thresholds at which we start preferring READDIRPLUS over regular
> READDIR, or at which we start to force a re-read of the remaining
> readdir cache using READDIRPLUS.

I like this patch very much.

The heuristic doesn't kick-in until "ls -l" makes its second call into
nfs_readdir(), and for my filenames with 8 chars, that means that there are
about 5800 GETATTRs generated before we clean the cache to do more
READDIRPLUS.  That's a large number to compound on connection latency.

We've already got some complaints that folk's 2nd "ls -l" takes "so much
longer" after 1a34c8c9a49e.

Can we possibly limit our first pass through nfs_readdir() so that the
heuristic takes effect sooner?

Ben


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index
  2022-02-27 23:12                                             ` [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index trondmy
  2022-02-27 23:12                                               ` [PATCH v9 24/27] NFS: Fix up forced readdirplus trondmy
@ 2022-03-09 20:01                                               ` Benjamin Coddington
  2022-03-09 21:03                                                 ` Benjamin Coddington
  2022-03-10 21:07                                                 ` Trond Myklebust
  1 sibling, 2 replies; 46+ messages in thread
From: Benjamin Coddington @ 2022-03-09 20:01 UTC (permalink / raw)
  To: trondmy; +Cc: linux-nfs

On 27 Feb 2022, at 18:12, trondmy@kernel.org wrote:

> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> Instead of using a linear index to address the pages, use the cookie of
> the first entry, since that is what we use to match the page anyway.
>
> This allows us to avoid re-reading the entire cache on a seekdir() type
> of operation. The latter is very common when re-exporting NFS, and is a
> major performance drain.
>
> The change does affect our duplicate cookie detection, since we can no
> longer rely on the page index as a linear offset for detecting whether
> we looped backwards. However since we no longer do a linear search
> through all the pages on each call to nfs_readdir(), this is less of a
> concern than it was previously.
> The other downside is that invalidate_mapping_pages() no longer can use
> the page index to avoid clearing pages that have been read. A subsequent
> patch will restore the functionality this provides to the 'ls -l'
> heuristic.

I didn't realize the approach was to also hash out the linearly-cached
entries.  I thought we'd do something like flag the context for hashed page
indexes after a seekdir event, and if there are collisions with the linear
entries, they'll get fixed up when found.

Doesn't that mean that with this approach seekdir() only hits the same pages
when the entry offset is page-aligned?  That's 1 in 127 odds.

It also means we're amplifying the pagecache's useage for slightly changing
directories - rather than re-using the same pages we're scattering our usage
across the index.  Eh, maybe not a big deal if we just expect the page
cache's LRU to do the work.

Ben


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index
  2022-03-09 20:01                                               ` [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index Benjamin Coddington
@ 2022-03-09 21:03                                                 ` Benjamin Coddington
  2022-03-10 21:07                                                 ` Trond Myklebust
  1 sibling, 0 replies; 46+ messages in thread
From: Benjamin Coddington @ 2022-03-09 21:03 UTC (permalink / raw)
  To: trondmy; +Cc: linux-nfs

On 9 Mar 2022, at 15:01, Benjamin Coddington wrote:

> On 27 Feb 2022, at 18:12, trondmy@kernel.org wrote:
>
>> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>>
>> Instead of using a linear index to address the pages, use the cookie 
>> of
>> the first entry, since that is what we use to match the page anyway.
>>
>> This allows us to avoid re-reading the entire cache on a seekdir() 
>> type
>> of operation. The latter is very common when re-exporting NFS, and is 
>> a
>> major performance drain.
>>
>> The change does affect our duplicate cookie detection, since we can 
>> no
>> longer rely on the page index as a linear offset for detecting 
>> whether
>> we looped backwards. However since we no longer do a linear search
>> through all the pages on each call to nfs_readdir(), this is less of 
>> a
>> concern than it was previously.
>> The other downside is that invalidate_mapping_pages() no longer can 
>> use
>> the page index to avoid clearing pages that have been read. A 
>> subsequent
>> patch will restore the functionality this provides to the 'ls -l'
>> heuristic.
>
> I didn't realize the approach was to also hash out the linearly-cached
> entries.  I thought we'd do something like flag the context for hashed 
> page
> indexes after a seekdir event, and if there are collisions with the 
> linear
> entries, they'll get fixed up when found.
>
> Doesn't that mean that with this approach seekdir() only hits the same 
> pages
> when the entry offset is page-aligned?  That's 1 in 127 odds.
>
> It also means we're amplifying the pagecache's useage for slightly 
> changing
> directories - rather than re-using the same pages we're scattering our 
> usage
> across the index.  Eh, maybe not a big deal if we just expect the page
> cache's LRU to do the work.

I don't have a better idea, though.. have you tested this performance?

..

maybe.. the hash divided the u64 cookie space into 262144 buckets, each 
being
a page the cookie could fall into.   So cookies 1 - 70368744177663 map 
into
page 1.. bah.  That wont work.

I was worried that I was wrong about this, but this program shows the
problem by requiring a full READDIR for each entry if we walk the 
entries
one-by-one with lseek().  I don't understand how the re-export seekdir()
case is helped by this unless you're hitting the exact same offsets all 
the
time.

I think that a hash of the page index for seekdir is no better than 
picking
an arbitrary offset, or just using the lowest pages in the cache.

Ben

#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sched.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/syscall.h>
#include <string.h>

#define NFSDIR "/mnt/fedora/127_dentries"

int main(int argc, char **argv)
{
     int i, dir_fd, bpos, total = 0;
     size_t nread;
     struct linux_dirent {
             long           d_ino;
             off_t          d_off;
             unsigned short d_reclen;
             char           d_name[];
     };
     struct linux_dirent *d;
     int buf_size = sizeof(struct linux_dirent) + sizeof("file_000");
     char buf[buf_size];

     /* create files: */
     for (i = 0; i < 127; i++) {
         sprintf(buf, NFSDIR "/file_%03d", i);
         close(open(buf, O_CREAT, 666));
     }

     dir_fd = open(NFSDIR, O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC);
     if (dir_fd < 0) {
             perror("cannot open dir");
             return 1;
     }

     /* no cache pls */
     posix_fadvise(dir_fd, 0, 0, POSIX_FADV_DONTNEED);

     while (1) {
         nread = syscall(SYS_getdents, dir_fd, buf, buf_size);
         if (nread == 0 || nread == -1)
             break;
         for (bpos = 0; bpos < nread;) {
             d = (struct linux_dirent *) (buf + bpos);
             printf("%s offset %lu\n", d->d_name, d->d_off);

             lseek(dir_fd, 0, SEEK_SET);
             lseek(dir_fd, d->d_off, SEEK_SET);
             total++;
             bpos += d->d_reclen;
         }
     }
     printf("Listing 1: %d total dirents\n", total);

     close(dir_fd);
     return 0;
}


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 00/27] Readdir improvements
  2022-02-27 23:12 [PATCH v9 00/27] Readdir improvements trondmy
  2022-02-27 23:12 ` [PATCH v9 01/27] NFS: Return valid errors from nfs2/3_decode_dirent() trondmy
@ 2022-03-09 21:32 ` Benjamin Coddington
  1 sibling, 0 replies; 46+ messages in thread
From: Benjamin Coddington @ 2022-03-09 21:32 UTC (permalink / raw)
  To: trondmy; +Cc: linux-nfs

On 27 Feb 2022, at 18:12, trondmy@kernel.org wrote:

> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> The current NFS readdir code will always try to maximise the amount of
> readahead it performs on the assumption that we can cache anything that
> isn't immediately read by the process.
> There are several cases where this assumption breaks down, including
> when the 'ls -l' heuristic kicks in to try to force use of readdirplus
> as a batch replacement for lookup/getattr.
>
> This series also implement Ben's page cache filter to ensure that we can
> improve the ability to share cached data between processes that are
> reading the same directory at the same time, and to avoid live-locks
> when the directory is simultaneously changing.
>
> --
> v2: Remove reset of dtsize when NFS_INO_FORCE_READDIR is set
> v3: Avoid excessive window shrinking in uncached_readdir case
> v4: Track 'ls -l' cache hit/miss statistics
>     Improved algorithm for falling back to uncached readdir
>     Skip readdirplus when files are being written to
> v5: bugfixes
>     Skip readdirplus when the acdirmax/acregmax values are low
>     Request a full XDR buffer when doing READDIRPLUS
> v6: Add tracing
>     Don't have lookup request readdirplus when it won't help
> v7: Implement Ben's page cache filter
>     Reduce the use of uncached readdir
>     Change indexing of the page cache to improve seekdir() performance.
> v8: Reduce the page cache overhead by shrinking the cookie hashvalue size
>     Incorporate other feedback from Anna, Ben and Neil
>     Fix nfs2/3_decode_dirent return values
>     Fix the change attribute value set in nfs_readdir_page_get_next()
> v9: Address bugs that were hit when testing with large directories
>     Misc cleanups

Hi Trond, thanks for all this work!  I went through these from your testing
branch (612896ec5a4e) rather than the posting.

If it pleases, these can all get marked with my

Reviewed-by: Benjamin Coddington <bcodding@redhat.com>
and/or
Tested-by: Benjamin Coddington <bcodding@redhat.com>

.. except for 25/27, which is missing from the testing branch.

As I replied to 23/27, I don't understand how the page index hashing is
going to help out the re-export seekdir case, I think it might make it
worse, and I think its unnecessary extra complication.

I did test extensively total directory listing correctness, and it appears
to me that you are correct, we are not regressing.  We're in a similar place
as before.  I think we can be even more correct with directory verifiers or
post-op updates with GETATTR in the READDIR compound for very little cost,
but I've already made those arguments a few weeks ago.

Thanks again,
Ben


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 03/27] NFS: Trace lookup revalidation failure
  2022-03-09 15:28         ` Chuck Lever III
@ 2022-03-09 21:35           ` Benjamin Coddington
  0 siblings, 0 replies; 46+ messages in thread
From: Benjamin Coddington @ 2022-03-09 21:35 UTC (permalink / raw)
  To: Chuck Lever III; +Cc: trondmy, Linux NFS Mailing List

On 9 Mar 2022, at 10:28, Chuck Lever III wrote:

>> On Mar 9, 2022, at 8:42 AM, Benjamin Coddington <bcodding@redhat.com> 
>> wrote:
>>
>> On 24 Feb 2022, at 21:09, Trond Myklebust wrote:
>>> On Thu, 2022-02-24 at 09:14 -0500, Benjamin Coddington wrote:
>>>> There's a path through nfs4_lookup_revalidate that will now only 
>>>> produce
>>>> this exit tracepoint.  Does it need the _enter tracepoint added?
>>>
>>> You're thinking about the nfs_lookup_revalidate_delegated() path? 
>>> The
>>> _enter() tracepoint doesn't provide any useful information that 
>>> isn't
>>> already provided by the _exit(), AFAICS.
>>
>> No, the path through nfs4_do_lookup_revalidate(), reval_dentry: jump. 
>>  But I
>> agree there's not much value in the _enter() tracepoint.  Maybe we 
>> can
>> remove it, and _exit more like _done.
>>
>> I am thinking about hearing back from folks about mis-matched 
>> _enter() and
>> _exit() results, but also realize this is nit-picking.
>
> I think the _enter / _exit trace points simply replaced
> dprintk call sites which did much the same reporting.
> Maybe we should consider replacing some of these because
> we can rely on function call tracing instead.
>
> But generally we like to see trace points that report
> exceptional events rather than "I made it to this point".
> The latter category of trace points are interesting
> while code is under development but often loses its
> value once the code is in the field.

Instead of "hearing back from folks" I should have said "I worry our QE 
team
is going to discover and possibly report as a bug". :P  Thanks for 
filling
in Chuck!

Ben


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] NFS: Trigger "ls -l" readdir heuristic sooner
  2022-03-09 17:39                             ` [PATCH v9 14/27] NFS: Improve heuristic for readdirplus Benjamin Coddington
@ 2022-03-10 14:31                               ` Benjamin Coddington
  2022-03-16 22:25                                 ` Olga Kornievskaia
  2022-03-10 20:15                               ` [PATCH v9 14/27] NFS: Improve heuristic for readdirplus Trond Myklebust
  1 sibling, 1 reply; 46+ messages in thread
From: Benjamin Coddington @ 2022-03-10 14:31 UTC (permalink / raw)
  To: trondmy; +Cc: linux-nfs

.. Something like this does the trick in my testing, but yes will have an
impact on regular workloads:

8<------------------------------------------------------------------------

Since commit 1a34c8c9a49e ("NFS: Support larger readdir buffers") has
updated dtsize and recent improvements to the READDIRPLUS helper heuristic,
the heuristic may not trigger until many dentries are emitted to userspace,
which may cause many thousands of GETATTR calls for "ls -l" when the
directory's pagecache has already been populated.  This typically manifests
as a much slower total runtime for a _second_ invocation of "ls -l" within
the directory attribute cache timeouts.

Fix this by emitting only 17 entries for any first pass through the NFS
directory's ->iterate_shared(), which will allow userpace to prime the
counters for the heuristic.

Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
---
 fs/nfs/dir.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 7e12102b29e7..dc5fc9ba2c49 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1060,6 +1060,8 @@ static int readdir_search_pagecache(struct nfs_readdir_descriptor *desc)
 	return res;
 }
 
+#define NFS_READDIR_CACHE_MISS_THRESHOLD (16UL)
+
 /*
  * Once we've found the start of the dirent within a page: fill 'er up...
  */
@@ -1069,6 +1071,7 @@ static void nfs_do_filldir(struct nfs_readdir_descriptor *desc,
 	struct file	*file = desc->file;
 	struct nfs_cache_array *array;
 	unsigned int i;
+	bool first_emit = !desc->dir_cookie;
 
 	array = kmap(desc->page);
 	for (i = desc->cache_entry_index; i < array->size; i++) {
@@ -1092,6 +1095,10 @@ static void nfs_do_filldir(struct nfs_readdir_descriptor *desc,
 			desc->ctx->pos = desc->dir_cookie;
 		else
 			desc->ctx->pos++;
+		if (first_emit && i > NFS_READDIR_CACHE_MISS_THRESHOLD + 1) {
+			desc->eob = true;
+			break;
+		}
 	}
 	if (array->page_is_eof)
 		desc->eof = !desc->eob;
@@ -1173,8 +1180,6 @@ static int uncached_readdir(struct nfs_readdir_descriptor *desc)
 	return status;
 }
 
-#define NFS_READDIR_CACHE_MISS_THRESHOLD (16UL)
-
 static bool nfs_readdir_handle_cache_misses(struct inode *inode,
 					    struct nfs_readdir_descriptor *desc,
 					    unsigned int cache_misses,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 14/27] NFS: Improve heuristic for readdirplus
  2022-03-09 17:39                             ` [PATCH v9 14/27] NFS: Improve heuristic for readdirplus Benjamin Coddington
  2022-03-10 14:31                               ` [PATCH] NFS: Trigger "ls -l" readdir heuristic sooner Benjamin Coddington
@ 2022-03-10 20:15                               ` Trond Myklebust
  2022-03-11 11:28                                 ` Benjamin Coddington
  1 sibling, 1 reply; 46+ messages in thread
From: Trond Myklebust @ 2022-03-10 20:15 UTC (permalink / raw)
  To: bcodding; +Cc: linux-nfs

On Wed, 2022-03-09 at 12:39 -0500, Benjamin Coddington wrote:
> On 27 Feb 2022, at 18:12, trondmy@kernel.org wrote:
> 
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > 
> > The heuristic for readdirplus is designed to try to detect 'ls -l'
> > and
> > similar patterns. It does so by looking for cache hit/miss patterns
> > in
> > both the attribute cache and in the dcache of the files in a given
> > directory, and then sets a flag for the readdirplus code to
> > interpret.
> > 
> > The problem with this approach is that a single attribute or dcache
> > miss
> > can cause the NFS code to force a refresh of the attributes for the
> > entire set of files contained in the directory.
> > 
> > To be able to make a more nuanced decision, let's sample the number
> > of
> > hits and misses in the set of open directory descriptors. That
> > allows us
> > to set thresholds at which we start preferring READDIRPLUS over
> > regular
> > READDIR, or at which we start to force a re-read of the remaining
> > readdir cache using READDIRPLUS.
> 
> I like this patch very much.
> 
> The heuristic doesn't kick-in until "ls -l" makes its second call
> into
> nfs_readdir(), and for my filenames with 8 chars, that means that
> there are
> about 5800 GETATTRs generated before we clean the cache to do more
> READDIRPLUS.  That's a large number to compound on connection
> latency.
> 
> We've already got some complaints that folk's 2nd "ls -l" takes "so
> much
> longer" after 1a34c8c9a49e.
> 
> Can we possibly limit our first pass through nfs_readdir() so that
> the
> heuristic takes effect sooner?
> 

The problem is really that 'ls' (or possibly glibc) is passing in a
pretty huge buffer to the getdents() system call.

On my setup, that buffer appears to be 80K in size. So what happens is
that we get that first getdents() call, and so we fill the 80K buffer
with as many files as will fit. That can quickly run into several
thousand entries, if the filenames are relatively short.

Then 'ls' goes through the contents and does a stat() (or a statx()) on
each entry, and so we record the statistics. However that means those
first several thousand entries are indeed going to use cached data, or
force GETATTR to go on the wire. We only start using forced readdirplus
on the second pass.

Yes, I suppose we could limit getdents() to ignore the buffer size, and
just return fewer entries, however what's the "right" size in that
case?
More to the point, how much pain are we going to accept before we give
up trying these assorted heuristics, and just define a readdirplus()
system call modelled on statx()?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index
  2022-03-09 20:01                                               ` [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index Benjamin Coddington
  2022-03-09 21:03                                                 ` Benjamin Coddington
@ 2022-03-10 21:07                                                 ` Trond Myklebust
  2022-03-11 11:58                                                   ` Benjamin Coddington
  1 sibling, 1 reply; 46+ messages in thread
From: Trond Myklebust @ 2022-03-10 21:07 UTC (permalink / raw)
  To: bcodding; +Cc: linux-nfs

On Wed, 2022-03-09 at 15:01 -0500, Benjamin Coddington wrote:
> On 27 Feb 2022, at 18:12, trondmy@kernel.org wrote:
> 
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > 
> > Instead of using a linear index to address the pages, use the
> > cookie of
> > the first entry, since that is what we use to match the page
> > anyway.
> > 
> > This allows us to avoid re-reading the entire cache on a seekdir()
> > type
> > of operation. The latter is very common when re-exporting NFS, and
> > is a
> > major performance drain.
> > 
> > The change does affect our duplicate cookie detection, since we can
> > no
> > longer rely on the page index as a linear offset for detecting
> > whether
> > we looped backwards. However since we no longer do a linear search
> > through all the pages on each call to nfs_readdir(), this is less
> > of a
> > concern than it was previously.
> > The other downside is that invalidate_mapping_pages() no longer can
> > use
> > the page index to avoid clearing pages that have been read. A
> > subsequent
> > patch will restore the functionality this provides to the 'ls -l'
> > heuristic.
> 
> I didn't realize the approach was to also hash out the linearly-
> cached
> entries.  I thought we'd do something like flag the context for
> hashed page
> indexes after a seekdir event, and if there are collisions with the
> linear
> entries, they'll get fixed up when found.

Why? What's the point of using 2 models where 1 will do?

> 
> Doesn't that mean that with this approach seekdir() only hits the
> same pages
> when the entry offset is page-aligned?  That's 1 in 127 odds.

The point is not to stomp all over the pages that contain aligned data
when the application does call seekdir().

IOW: we always optimise for the case where we do a linear read of the
directory, but we support random seekdir() + read too.

> 
> It also means we're amplifying the pagecache's useage for slightly
> changing
> directories - rather than re-using the same pages we're scattering
> our usage
> across the index.  Eh, maybe not a big deal if we just expect the
> page
> cache's LRU to do the work.
> 

I don't understand your point about 'not reusing'. If the user seeks to
the same cookie, we reuse the page. However I don't know how you would
go about setting up a schema that allows you to seek to an arbitrary
cookie without doing a linear search.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 14/27] NFS: Improve heuristic for readdirplus
  2022-03-10 20:15                               ` [PATCH v9 14/27] NFS: Improve heuristic for readdirplus Trond Myklebust
@ 2022-03-11 11:28                                 ` Benjamin Coddington
  0 siblings, 0 replies; 46+ messages in thread
From: Benjamin Coddington @ 2022-03-11 11:28 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On 10 Mar 2022, at 15:15, Trond Myklebust wrote:
>
> The problem is really that 'ls' (or possibly glibc) is passing in a
> pretty huge buffer to the getdents() system call.
>
> On my setup, that buffer appears to be 80K in size. So what happens is
> that we get that first getdents() call, and so we fill the 80K buffer
> with as many files as will fit. That can quickly run into several
> thousand entries, if the filenames are relatively short.
>
> Then 'ls' goes through the contents and does a stat() (or a statx()) on
> each entry, and so we record the statistics. However that means those
> first several thousand entries are indeed going to use cached data, or
> force GETATTR to go on the wire. We only start using forced readdirplus
> on the second pass.
>
> Yes, I suppose we could limit getdents() to ignore the buffer size, and
> just return fewer entries, however what's the "right" size in that
> case?

We can return fewer entries on the first call, so for the first pass the
right size is NFS_READDIR_CACHE_MISS_THRESHOLD + 1.  I sent a patch.

> More to the point, how much pain are we going to accept before we give
> up trying these assorted heuristics, and just define a readdirplus()
> system call modelled on statx()?

We cursed ourselves by creating the heuristic, and now we've had to maintain
it and try to make everyone happy.  The pain for us is when the behavior
keeps changing after sites have come to rely on previous performance.

I hope you can take a look at the patch.

Ben


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index
  2022-03-10 21:07                                                 ` Trond Myklebust
@ 2022-03-11 11:58                                                   ` Benjamin Coddington
  2022-03-11 14:02                                                     ` Trond Myklebust
  0 siblings, 1 reply; 46+ messages in thread
From: Benjamin Coddington @ 2022-03-11 11:58 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On 10 Mar 2022, at 16:07, Trond Myklebust wrote:

> On Wed, 2022-03-09 at 15:01 -0500, Benjamin Coddington wrote:
>> On 27 Feb 2022, at 18:12, trondmy@kernel.org wrote:
>>
>>> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>>>
>>> Instead of using a linear index to address the pages, use the
>>> cookie of
>>> the first entry, since that is what we use to match the page
>>> anyway.
>>>
>>> This allows us to avoid re-reading the entire cache on a seekdir()
>>> type
>>> of operation. The latter is very common when re-exporting NFS, and
>>> is a
>>> major performance drain.
>>>
>>> The change does affect our duplicate cookie detection, since we can
>>> no
>>> longer rely on the page index as a linear offset for detecting
>>> whether
>>> we looped backwards. However since we no longer do a linear search
>>> through all the pages on each call to nfs_readdir(), this is less
>>> of a
>>> concern than it was previously.
>>> The other downside is that invalidate_mapping_pages() no longer can
>>> use
>>> the page index to avoid clearing pages that have been read. A
>>> subsequent
>>> patch will restore the functionality this provides to the 'ls -l'
>>> heuristic.
>>
>> I didn't realize the approach was to also hash out the linearly-
>> cached
>> entries.  I thought we'd do something like flag the context for
>> hashed page
>> indexes after a seekdir event, and if there are collisions with the
>> linear
>> entries, they'll get fixed up when found.
>
> Why? What's the point of using 2 models where 1 will do?

I don't think the hashed model is quite as simple and efficient overall, and
may produce impacts to a system beyond NFS.

>>
>> Doesn't that mean that with this approach seekdir() only hits the
>> same pages
>> when the entry offset is page-aligned?  That's 1 in 127 odds.
>
> The point is not to stomp all over the pages that contain aligned data
> when the application does call seekdir().
>
> IOW: we always optimise for the case where we do a linear read of the
> directory, but we support random seekdir() + read too.

And that could be done just by bumping the seekdir users to some constant
offset (index 262144 ?), or something else equally dead-nuts simple.  That
keeps tightly clustered page indexes, so walking the cache is faster.  That
reduces the "buckshot" effect the hashing has of eating up pagecache pages
they'll never use again.  That doesn't cap our caching ability at 33 million
entries.

Its weird to me that we're doing exactly what XArray says not to do, hash
the index, when we don't have to.

>> It also means we're amplifying the pagecache's useage for slightly
>> changing
>> directories - rather than re-using the same pages we're scattering
>> our usage
>> across the index.  Eh, maybe not a big deal if we just expect the
>> page
>> cache's LRU to do the work.
>>
>
> I don't understand your point about 'not reusing'. If the user seeks to
> the same cookie, we reuse the page. However I don't know how you would
> go about setting up a schema that allows you to seek to an arbitrary
> cookie without doing a linear search.

So when I was taking about 'reusing' a page, that's about re-filling the
same pages rather than constantly conjuring new ones, which requires less of
the pagecache's resources in total.  Maybe the pagecache can handle that
without it negatively impacting other users of the cache that /will/ re-use
their cached pages, but I worry it might be irresponsible of us to fill the
pagecache with pages we know we're never going to find again.

Ben


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index
  2022-03-11 11:58                                                   ` Benjamin Coddington
@ 2022-03-11 14:02                                                     ` Trond Myklebust
  2022-03-11 16:14                                                       ` Benjamin Coddington
  0 siblings, 1 reply; 46+ messages in thread
From: Trond Myklebust @ 2022-03-11 14:02 UTC (permalink / raw)
  To: bcodding; +Cc: linux-nfs

On Fri, 2022-03-11 at 06:58 -0500, Benjamin Coddington wrote:
> On 10 Mar 2022, at 16:07, Trond Myklebust wrote:
> 
> > On Wed, 2022-03-09 at 15:01 -0500, Benjamin Coddington wrote:
> > > On 27 Feb 2022, at 18:12, trondmy@kernel.org wrote:
> > > 
> > > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > > 
> > > > Instead of using a linear index to address the pages, use the
> > > > cookie of
> > > > the first entry, since that is what we use to match the page
> > > > anyway.
> > > > 
> > > > This allows us to avoid re-reading the entire cache on a
> > > > seekdir()
> > > > type
> > > > of operation. The latter is very common when re-exporting NFS,
> > > > and
> > > > is a
> > > > major performance drain.
> > > > 
> > > > The change does affect our duplicate cookie detection, since we
> > > > can
> > > > no
> > > > longer rely on the page index as a linear offset for detecting
> > > > whether
> > > > we looped backwards. However since we no longer do a linear
> > > > search
> > > > through all the pages on each call to nfs_readdir(), this is
> > > > less
> > > > of a
> > > > concern than it was previously.
> > > > The other downside is that invalidate_mapping_pages() no longer
> > > > can
> > > > use
> > > > the page index to avoid clearing pages that have been read. A
> > > > subsequent
> > > > patch will restore the functionality this provides to the 'ls -
> > > > l'
> > > > heuristic.
> > > 
> > > I didn't realize the approach was to also hash out the linearly-
> > > cached
> > > entries.  I thought we'd do something like flag the context for
> > > hashed page
> > > indexes after a seekdir event, and if there are collisions with
> > > the
> > > linear
> > > entries, they'll get fixed up when found.
> > 
> > Why? What's the point of using 2 models where 1 will do?
> 
> I don't think the hashed model is quite as simple and efficient
> overall, and
> may produce impacts to a system beyond NFS.
> 
> > > 
> > > Doesn't that mean that with this approach seekdir() only hits the
> > > same pages
> > > when the entry offset is page-aligned?  That's 1 in 127 odds.
> > 
> > The point is not to stomp all over the pages that contain aligned
> > data
> > when the application does call seekdir().
> > 
> > IOW: we always optimise for the case where we do a linear read of
> > the
> > directory, but we support random seekdir() + read too.
> 
> And that could be done just by bumping the seekdir users to some
> constant
> offset (index 262144 ?), or something else equally dead-nuts simple. 
> That
> keeps tightly clustered page indexes, so walking the cache is
> faster.  That
> reduces the "buckshot" effect the hashing has of eating up pagecache
> pages
> they'll never use again.  That doesn't cap our caching ability at 33
> million
> entries.
> 

What you say would make sense if readdir cookies truly were offsets,
but in general they're not. Cookies are unstructured data, and should
be treated as unstructured data.

Let's say I do cache more than 33 million entries and I have to find a
cookie. I have to linearly read through at least 1GB of cached data
before I can give up and start a new readdir. Either that, or I need to
have a heuristic that tells me when to stop searching, and then another
heuristic that tells me where to store the data in a way that doesn't
trash the page cache.

With the hashing, I seek to the page matching the hash, and I either
immediately find what I need, or I immediately know to start a readdir.
There is no need for any additional heuristic.

> Its weird to me that we're doing exactly what XArray says not to do,
> hash
> the index, when we don't have to.
> 
> > > It also means we're amplifying the pagecache's useage for
> > > slightly
> > > changing
> > > directories - rather than re-using the same pages we're
> > > scattering
> > > our usage
> > > across the index.  Eh, maybe not a big deal if we just expect the
> > > page
> > > cache's LRU to do the work.
> > > 
> > 
> > I don't understand your point about 'not reusing'. If the user
> > seeks to
> > the same cookie, we reuse the page. However I don't know how you
> > would
> > go about setting up a schema that allows you to seek to an
> > arbitrary
> > cookie without doing a linear search.
> 
> So when I was taking about 'reusing' a page, that's about re-filling
> the
> same pages rather than constantly conjuring new ones, which requires
> less of
> the pagecache's resources in total.  Maybe the pagecache can handle
> that
> without it negatively impacting other users of the cache that /will/
> re-use
> their cached pages, but I worry it might be irresponsible of us to
> fill the
> pagecache with pages we know we're never going to find again.
> 

In the case where the processes are reading linearly through a
directory that is not changing (or at least where the beginning of the
directory is not changing), we will reuse the cached data, because just
like in the linearly indexed case, each process ends up reading the
exact same sequence of cookies, and looking up the exact same sequence
of hashes.

The sequences start to diverge only if they hit a part of the directory
that is being modified. At that point, we're going to be invalidating
page cache entries anyway with the last reader being more likely to be
following the new sequence of cookies.

The hashed indexing does come with a cost, thanks to XArray but that
cost is limited to a max of 8MB with the current scheme.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index
  2022-03-11 14:02                                                     ` Trond Myklebust
@ 2022-03-11 16:14                                                       ` Benjamin Coddington
  2022-03-11 16:51                                                         ` Trond Myklebust
  0 siblings, 1 reply; 46+ messages in thread
From: Benjamin Coddington @ 2022-03-11 16:14 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

On 11 Mar 2022, at 9:02, Trond Myklebust wrote:

> On Fri, 2022-03-11 at 06:58 -0500, Benjamin Coddington wrote:
>> On 10 Mar 2022, at 16:07, Trond Myklebust wrote:
>>
>>> On Wed, 2022-03-09 at 15:01 -0500, Benjamin Coddington wrote:
>>>> On 27 Feb 2022, at 18:12, trondmy@kernel.org wrote:
>>>>
>>>>> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>>>>>
>>>>> Instead of using a linear index to address the pages, use the
>>>>> cookie of
>>>>> the first entry, since that is what we use to match the page
>>>>> anyway.
>>>>>
>>>>> This allows us to avoid re-reading the entire cache on a
>>>>> seekdir()
>>>>> type
>>>>> of operation. The latter is very common when re-exporting NFS,
>>>>> and
>>>>> is a
>>>>> major performance drain.
>>>>>
>>>>> The change does affect our duplicate cookie detection, since we
>>>>> can
>>>>> no
>>>>> longer rely on the page index as a linear offset for detecting
>>>>> whether
>>>>> we looped backwards. However since we no longer do a linear
>>>>> search
>>>>> through all the pages on each call to nfs_readdir(), this is
>>>>> less
>>>>> of a
>>>>> concern than it was previously.
>>>>> The other downside is that invalidate_mapping_pages() no longer
>>>>> can
>>>>> use
>>>>> the page index to avoid clearing pages that have been read. A
>>>>> subsequent
>>>>> patch will restore the functionality this provides to the 'ls -
>>>>> l'
>>>>> heuristic.
>>>>
>>>> I didn't realize the approach was to also hash out the linearly-
>>>> cached
>>>> entries.  I thought we'd do something like flag the context for
>>>> hashed page
>>>> indexes after a seekdir event, and if there are collisions with
>>>> the
>>>> linear
>>>> entries, they'll get fixed up when found.
>>>
>>> Why? What's the point of using 2 models where 1 will do?
>>
>> I don't think the hashed model is quite as simple and efficient
>> overall, and
>> may produce impacts to a system beyond NFS.
>>
>>>>
>>>> Doesn't that mean that with this approach seekdir() only hits the
>>>> same pages
>>>> when the entry offset is page-aligned?  That's 1 in 127 odds.
>>>
>>> The point is not to stomp all over the pages that contain aligned
>>> data
>>> when the application does call seekdir().
>>>
>>> IOW: we always optimise for the case where we do a linear read of
>>> the
>>> directory, but we support random seekdir() + read too.
>>
>> And that could be done just by bumping the seekdir users to some
>> constant
>> offset (index 262144 ?), or something else equally dead-nuts simple. 
>> That
>> keeps tightly clustered page indexes, so walking the cache is
>> faster.  That
>> reduces the "buckshot" effect the hashing has of eating up pagecache
>> pages
>> they'll never use again.  That doesn't cap our caching ability at 33
>> million
>> entries.
>>
>
> What you say would make sense if readdir cookies truly were offsets,
> but in general they're not. Cookies are unstructured data, and should
> be treated as unstructured data.
>
> Let's say I do cache more than 33 million entries and I have to find a
> cookie. I have to linearly read through at least 1GB of cached data
> before I can give up and start a new readdir. Either that, or I need to
> have a heuristic that tells me when to stop searching, and then another
> heuristic that tells me where to store the data in a way that doesn't
> trash the page cache.
>
> With the hashing, I seek to the page matching the hash, and I either
> immediately find what I need, or I immediately know to start a readdir.
> There is no need for any additional heuristic.

The scenario where we want to find a cookie while not doing a linear pass
through the directory will be the seekdir() case.  In a linear walk, we have
the cached page index to help.  So in the seekdir case, the chances of
having someone already fill a page and also having the cookie be the 1 in
127 that are page-aligned (and so match an already cached page) are small, I
think.  Unless your use-case will often hit the exact same offsets over and
over.

So with the hashing and seekdir case, I think that the cache will be pretty
heavily filled with the same duplicated data at various offsets and rarely
useful.  That's why I wondered if you'd tested your use-case for it and found
it to be advantageous.  I think what we've got is going to work fine, but I
wonder if you've seen it to work well.

The major pain point most of our users complain about is not being able to
perform a complete walk in linear time with respect to size with
invalidations at play.  This series fixes that, and is a huge bonus.  Other
smaller performance improvements are pale in comparison for us, and might
just get us forever chasing one or two minor optimizations that have
trade-offs.

There's a lot of variables at play.  For some client/server setups (like
some low-latency RDMA), and very large directories and cache sizes, it might
be more performant to just do the READDIR every time, walking local caches
be damned.

>> Its weird to me that we're doing exactly what XArray says not to do,
>> hash
>> the index, when we don't have to.
>>
>>>> It also means we're amplifying the pagecache's useage for
>>>> slightly
>>>> changing
>>>> directories - rather than re-using the same pages we're
>>>> scattering
>>>> our usage
>>>> across the index.  Eh, maybe not a big deal if we just expect the
>>>> page
>>>> cache's LRU to do the work.
>>>>
>>>
>>> I don't understand your point about 'not reusing'. If the user
>>> seeks to
>>> the same cookie, we reuse the page. However I don't know how you
>>> would
>>> go about setting up a schema that allows you to seek to an
>>> arbitrary
>>> cookie without doing a linear search.
>>
>> So when I was taking about 'reusing' a page, that's about re-filling
>> the
>> same pages rather than constantly conjuring new ones, which requires
>> less of
>> the pagecache's resources in total.  Maybe the pagecache can handle
>> that
>> without it negatively impacting other users of the cache that /will/
>> re-use
>> their cached pages, but I worry it might be irresponsible of us to
>> fill the
>> pagecache with pages we know we're never going to find again.
>>
>
> In the case where the processes are reading linearly through a
> directory that is not changing (or at least where the beginning of the
> directory is not changing), we will reuse the cached data, because just
> like in the linearly indexed case, each process ends up reading the
> exact same sequence of cookies, and looking up the exact same sequence
> of hashes.
>
> The sequences start to diverge only if they hit a part of the directory
> that is being modified. At that point, we're going to be invalidating
> page cache entries anyway with the last reader being more likely to be
> following the new sequence of cookies.

I don't think we clean up behind ourselves anymore.  Now that we are going
to validate each page before using it, we don't invalidate the whole cache
at any point.  That means that a divergence duplicates the pagecache usage
beyond the divergence.

Ben


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index
  2022-03-11 16:14                                                       ` Benjamin Coddington
@ 2022-03-11 16:51                                                         ` Trond Myklebust
  0 siblings, 0 replies; 46+ messages in thread
From: Trond Myklebust @ 2022-03-11 16:51 UTC (permalink / raw)
  To: bcodding; +Cc: linux-nfs

On Fri, 2022-03-11 at 11:14 -0500, Benjamin Coddington wrote:
> On 11 Mar 2022, at 9:02, Trond Myklebust wrote:
> 
> > On Fri, 2022-03-11 at 06:58 -0500, Benjamin Coddington wrote:
> > > On 10 Mar 2022, at 16:07, Trond Myklebust wrote:
> > > 
> > > > On Wed, 2022-03-09 at 15:01 -0500, Benjamin Coddington wrote:
> > > > > On 27 Feb 2022, at 18:12, trondmy@kernel.org wrote:
> > > > > 
> > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > > > > 
> > > > > > Instead of using a linear index to address the pages, use
> > > > > > the
> > > > > > cookie of
> > > > > > the first entry, since that is what we use to match the
> > > > > > page
> > > > > > anyway.
> > > > > > 
> > > > > > This allows us to avoid re-reading the entire cache on a
> > > > > > seekdir()
> > > > > > type
> > > > > > of operation. The latter is very common when re-exporting
> > > > > > NFS,
> > > > > > and
> > > > > > is a
> > > > > > major performance drain.
> > > > > > 
> > > > > > The change does affect our duplicate cookie detection,
> > > > > > since we
> > > > > > can
> > > > > > no
> > > > > > longer rely on the page index as a linear offset for
> > > > > > detecting
> > > > > > whether
> > > > > > we looped backwards. However since we no longer do a linear
> > > > > > search
> > > > > > through all the pages on each call to nfs_readdir(), this
> > > > > > is
> > > > > > less
> > > > > > of a
> > > > > > concern than it was previously.
> > > > > > The other downside is that invalidate_mapping_pages() no
> > > > > > longer
> > > > > > can
> > > > > > use
> > > > > > the page index to avoid clearing pages that have been read.
> > > > > > A
> > > > > > subsequent
> > > > > > patch will restore the functionality this provides to the
> > > > > > 'ls -
> > > > > > l'
> > > > > > heuristic.
> > > > > 
> > > > > I didn't realize the approach was to also hash out the
> > > > > linearly-
> > > > > cached
> > > > > entries.  I thought we'd do something like flag the context
> > > > > for
> > > > > hashed page
> > > > > indexes after a seekdir event, and if there are collisions
> > > > > with
> > > > > the
> > > > > linear
> > > > > entries, they'll get fixed up when found.
> > > > 
> > > > Why? What's the point of using 2 models where 1 will do?
> > > 
> > > I don't think the hashed model is quite as simple and efficient
> > > overall, and
> > > may produce impacts to a system beyond NFS.
> > > 
> > > > > 
> > > > > Doesn't that mean that with this approach seekdir() only hits
> > > > > the
> > > > > same pages
> > > > > when the entry offset is page-aligned?  That's 1 in 127 odds.
> > > > 
> > > > The point is not to stomp all over the pages that contain
> > > > aligned
> > > > data
> > > > when the application does call seekdir().
> > > > 
> > > > IOW: we always optimise for the case where we do a linear read
> > > > of
> > > > the
> > > > directory, but we support random seekdir() + read too.
> > > 
> > > And that could be done just by bumping the seekdir users to some
> > > constant
> > > offset (index 262144 ?), or something else equally dead-nuts
> > > simple. 
> > > That
> > > keeps tightly clustered page indexes, so walking the cache is
> > > faster.  That
> > > reduces the "buckshot" effect the hashing has of eating up
> > > pagecache
> > > pages
> > > they'll never use again.  That doesn't cap our caching ability at
> > > 33
> > > million
> > > entries.
> > > 
> > 
> > What you say would make sense if readdir cookies truly were
> > offsets,
> > but in general they're not. Cookies are unstructured data, and
> > should
> > be treated as unstructured data.
> > 
> > Let's say I do cache more than 33 million entries and I have to
> > find a
> > cookie. I have to linearly read through at least 1GB of cached data
> > before I can give up and start a new readdir. Either that, or I
> > need to
> > have a heuristic that tells me when to stop searching, and then
> > another
> > heuristic that tells me where to store the data in a way that
> > doesn't
> > trash the page cache.
> > 
> > With the hashing, I seek to the page matching the hash, and I
> > either
> > immediately find what I need, or I immediately know to start a
> > readdir.
> > There is no need for any additional heuristic.
> 
> The scenario where we want to find a cookie while not doing a linear
> pass
> through the directory will be the seekdir() case.  In a linear walk,
> we have
> the cached page index to help.  So in the seekdir case, the chances
> of
> having someone already fill a page and also having the cookie be the
> 1 in
> 127 that are page-aligned (and so match an already cached page) are
> small, I
> think.  Unless your use-case will often hit the exact same offsets
> over and
> over.

For the use case where we are reexporting NFS, it can definitely
happen.
Firstly, the clients usually are reading the reexported directory
linearly, so they will tend to follow the same cookie request patterns.
Secondly, we're not going to replay the readdir from the duplicate
reply cache if the client resends the request. So even if you only have
one client, there can be a benefit.

> 
> So with the hashing and seekdir case, I think that the cache will be
> pretty
> heavily filled with the same duplicated data at various offsets and
> rarely
> useful.  That's why I wondered if you'd tested your use-case for it
> and found
> it to be advantageous.  I think what we've got is going to work fine,
> but I
> wonder if you've seen it to work well.
> 
> The major pain point most of our users complain about is not being
> able to
> perform a complete walk in linear time with respect to size with
> invalidations at play.  This series fixes that, and is a huge bonus. 
> Other
> smaller performance improvements are pale in comparison for us, and
> might
> just get us forever chasing one or two minor optimizations that have
> trade-offs.
> 
> There's a lot of variables at play.  For some client/server setups
> (like
> some low-latency RDMA), and very large directories and cache sizes,
> it might
> be more performant to just do the READDIR every time, walking local
> caches
> be damned.
> 

Sure, so a dedicated readdirplus() system call could help by providing
the same kind of guidance that statx() does today.

> > > Its weird to me that we're doing exactly what XArray says not to
> > > do,
> > > hash
> > > the index, when we don't have to.
> > > 
> > > > > It also means we're amplifying the pagecache's useage for
> > > > > slightly
> > > > > changing
> > > > > directories - rather than re-using the same pages we're
> > > > > scattering
> > > > > our usage
> > > > > across the index.  Eh, maybe not a big deal if we just expect
> > > > > the
> > > > > page
> > > > > cache's LRU to do the work.
> > > > > 
> > > > 
> > > > I don't understand your point about 'not reusing'. If the user
> > > > seeks to
> > > > the same cookie, we reuse the page. However I don't know how
> > > > you
> > > > would
> > > > go about setting up a schema that allows you to seek to an
> > > > arbitrary
> > > > cookie without doing a linear search.
> > > 
> > > So when I was taking about 'reusing' a page, that's about re-
> > > filling
> > > the
> > > same pages rather than constantly conjuring new ones, which
> > > requires
> > > less of
> > > the pagecache's resources in total.  Maybe the pagecache can
> > > handle
> > > that
> > > without it negatively impacting other users of the cache that
> > > /will/
> > > re-use
> > > their cached pages, but I worry it might be irresponsible of us
> > > to
> > > fill the
> > > pagecache with pages we know we're never going to find again.
> > > 
> > 
> > In the case where the processes are reading linearly through a
> > directory that is not changing (or at least where the beginning of
> > the
> > directory is not changing), we will reuse the cached data, because
> > just
> > like in the linearly indexed case, each process ends up reading the
> > exact same sequence of cookies, and looking up the exact same
> > sequence
> > of hashes.
> > 
> > The sequences start to diverge only if they hit a part of the
> > directory
> > that is being modified. At that point, we're going to be
> > invalidating
> > page cache entries anyway with the last reader being more likely to
> > be
> > following the new sequence of cookies.
> 
> I don't think we clean up behind ourselves anymore.  Now that we are
> going
> to validate each page before using it, we don't invalidate the whole
> cache
> at any point.  That means that a divergence duplicates the pagecache
> usage
> beyond the divergence.
> 

No. I reinstated the call to nfs_revalidate_mapping() in the linux-next
branch after Olga demonstrated that NFSv3 is still troubled with crappy
mtime/ctime resolutions on the server causing directory changes to not
be reflected in the readdir cache.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] NFS: Trigger "ls -l" readdir heuristic sooner
  2022-03-10 14:31                               ` [PATCH] NFS: Trigger "ls -l" readdir heuristic sooner Benjamin Coddington
@ 2022-03-16 22:25                                 ` Olga Kornievskaia
  0 siblings, 0 replies; 46+ messages in thread
From: Olga Kornievskaia @ 2022-03-16 22:25 UTC (permalink / raw)
  To: Benjamin Coddington; +Cc: trondmy, linux-nfs

On Fri, Mar 11, 2022 at 9:40 AM Benjamin Coddington <bcodding@redhat.com> wrote:
>
> .. Something like this does the trick in my testing, but yes will have an
> impact on regular workloads:
>
> 8<------------------------------------------------------------------------
>
> Since commit 1a34c8c9a49e ("NFS: Support larger readdir buffers") has
> updated dtsize and recent improvements to the READDIRPLUS helper heuristic,
> the heuristic may not trigger until many dentries are emitted to userspace,
> which may cause many thousands of GETATTR calls for "ls -l" when the
> directory's pagecache has already been populated.  This typically manifests
> as a much slower total runtime for a _second_ invocation of "ls -l" within
> the directory attribute cache timeouts.
>
> Fix this by emitting only 17 entries for any first pass through the NFS
> directory's ->iterate_shared(), which will allow userpace to prime the
> counters for the heuristic.

Here's for what it's worth. An experiment between linux to linux where
the linux server had a "small" directory structure of 57274
directories, 5727390 files in total where each directory had ~100
files each.
With this patch:

date; time tree vol1 > tree.out && date; time tree vol1 > tree.out
Wed Mar 16 12:21:30 EDT 2022

real    11m7.923s
user    0m20.507s
sys     0m39.683s
Wed Mar 16 12:32:38 EDT 2022

real    40m1.751s
user    0m23.477s
sys     0m45.663s

Without the patch:
date; time tree vol1 > tree.out && date; time tree vol1 > tree.out
Wed Mar 16 13:49:12 EDT 2022

real    10m52.909s
user    0m21.342s
sys     0m39.198s
Wed Mar 16 14:00:05 EDT 2022

real    222m56.990s
user    0m30.392s
sys     2m25.202s


>
> Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
> ---
>  fs/nfs/dir.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index 7e12102b29e7..dc5fc9ba2c49 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -1060,6 +1060,8 @@ static int readdir_search_pagecache(struct nfs_readdir_descriptor *desc)
>         return res;
>  }
>
> +#define NFS_READDIR_CACHE_MISS_THRESHOLD (16UL)
> +
>  /*
>   * Once we've found the start of the dirent within a page: fill 'er up...
>   */
> @@ -1069,6 +1071,7 @@ static void nfs_do_filldir(struct nfs_readdir_descriptor *desc,
>         struct file     *file = desc->file;
>         struct nfs_cache_array *array;
>         unsigned int i;
> +       bool first_emit = !desc->dir_cookie;
>
>         array = kmap(desc->page);
>         for (i = desc->cache_entry_index; i < array->size; i++) {
> @@ -1092,6 +1095,10 @@ static void nfs_do_filldir(struct nfs_readdir_descriptor *desc,
>                         desc->ctx->pos = desc->dir_cookie;
>                 else
>                         desc->ctx->pos++;
> +               if (first_emit && i > NFS_READDIR_CACHE_MISS_THRESHOLD + 1) {
> +                       desc->eob = true;
> +                       break;
> +               }
>         }
>         if (array->page_is_eof)
>                 desc->eof = !desc->eob;
> @@ -1173,8 +1180,6 @@ static int uncached_readdir(struct nfs_readdir_descriptor *desc)
>         return status;
>  }
>
> -#define NFS_READDIR_CACHE_MISS_THRESHOLD (16UL)
> -
>  static bool nfs_readdir_handle_cache_misses(struct inode *inode,
>                                             struct nfs_readdir_descriptor *desc,
>                                             unsigned int cache_misses,
> --
> 2.31.1
>

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2022-03-16 22:25 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-27 23:12 [PATCH v9 00/27] Readdir improvements trondmy
2022-02-27 23:12 ` [PATCH v9 01/27] NFS: Return valid errors from nfs2/3_decode_dirent() trondmy
2022-02-27 23:12   ` [PATCH v9 02/27] NFS: constify nfs_server_capable() and nfs_have_writebacks() trondmy
2022-02-27 23:12     ` [PATCH v9 03/27] NFS: Trace lookup revalidation failure trondmy
2022-02-27 23:12       ` [PATCH v9 04/27] NFS: Initialise the readdir verifier as best we can in nfs_opendir() trondmy
2022-02-27 23:12         ` [PATCH v9 05/27] NFS: Use kzalloc() to avoid initialising the nfs_open_dir_context trondmy
2022-02-27 23:12           ` [PATCH v9 06/27] NFS: Calculate page offsets algorithmically trondmy
2022-02-27 23:12             ` [PATCH v9 07/27] NFS: Store the change attribute in the directory page cache trondmy
2022-02-27 23:12               ` [PATCH v9 08/27] NFS: Don't re-read the entire page cache to find the next cookie trondmy
2022-02-27 23:12                 ` [PATCH v9 09/27] NFS: Don't advance the page pointer unless the page is full trondmy
2022-02-27 23:12                   ` [PATCH v9 10/27] NFS: Adjust the amount of readahead performed by NFS readdir trondmy
2022-02-27 23:12                     ` [PATCH v9 11/27] NFS: If the cookie verifier changes, we must invalidate the page cache trondmy
2022-02-27 23:12                       ` [PATCH v9 12/27] NFS: Simplify nfs_readdir_xdr_to_array() trondmy
2022-02-27 23:12                         ` [PATCH v9 13/27] NFS: Reduce use of uncached readdir trondmy
2022-02-27 23:12                           ` [PATCH v9 14/27] NFS: Improve heuristic for readdirplus trondmy
2022-02-27 23:12                             ` [PATCH v9 15/27] NFS: Don't ask for readdirplus unless it can help nfs_getattr() trondmy
2022-02-27 23:12                               ` [PATCH v9 16/27] NFSv4: Ask for a full XDR buffer of readdir goodness trondmy
2022-02-27 23:12                                 ` [PATCH v9 17/27] NFS: Readdirplus can't help lookup for case insensitive filesystems trondmy
2022-02-27 23:12                                   ` [PATCH v9 18/27] NFS: Don't request readdirplus when revalidation was forced trondmy
2022-02-27 23:12                                     ` [PATCH v9 19/27] NFS: Add basic readdir tracing trondmy
2022-02-27 23:12                                       ` [PATCH v9 20/27] NFS: Trace effects of readdirplus on the dcache trondmy
2022-02-27 23:12                                         ` [PATCH v9 21/27] NFS: Trace effects of the readdirplus heuristic trondmy
2022-02-27 23:12                                           ` [PATCH v9 22/27] NFS: Clean up page array initialisation/free trondmy
2022-02-27 23:12                                             ` [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index trondmy
2022-02-27 23:12                                               ` [PATCH v9 24/27] NFS: Fix up forced readdirplus trondmy
2022-02-27 23:12                                                 ` [PATCH v9 25/27] NFS: Remove unnecessary cache invalidations for directories trondmy
2022-02-27 23:12                                                   ` [PATCH v9 26/27] NFS: Optimise away the previous cookie field trondmy
2022-02-27 23:12                                                     ` [PATCH v9 27/27] NFS: Cache all entries in the readdirplus reply trondmy
2022-03-09 20:01                                               ` [PATCH v9 23/27] NFS: Convert readdir page cache to use a cookie based index Benjamin Coddington
2022-03-09 21:03                                                 ` Benjamin Coddington
2022-03-10 21:07                                                 ` Trond Myklebust
2022-03-11 11:58                                                   ` Benjamin Coddington
2022-03-11 14:02                                                     ` Trond Myklebust
2022-03-11 16:14                                                       ` Benjamin Coddington
2022-03-11 16:51                                                         ` Trond Myklebust
2022-03-09 17:39                             ` [PATCH v9 14/27] NFS: Improve heuristic for readdirplus Benjamin Coddington
2022-03-10 14:31                               ` [PATCH] NFS: Trigger "ls -l" readdir heuristic sooner Benjamin Coddington
2022-03-16 22:25                                 ` Olga Kornievskaia
2022-03-10 20:15                               ` [PATCH v9 14/27] NFS: Improve heuristic for readdirplus Trond Myklebust
2022-03-11 11:28                                 ` Benjamin Coddington
2022-03-01 19:09               ` [PATCH v9 07/27] NFS: Store the change attribute in the directory page cache Anna Schumaker
2022-03-01 23:11                 ` Trond Myklebust
2022-03-09 13:42       ` [PATCH v9 03/27] NFS: Trace lookup revalidation failure Benjamin Coddington
2022-03-09 15:28         ` Chuck Lever III
2022-03-09 21:35           ` Benjamin Coddington
2022-03-09 21:32 ` [PATCH v9 00/27] Readdir improvements Benjamin Coddington

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.