linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories
@ 2020-11-02 13:50 Dave Wysochanski
  2020-11-02 13:50 ` [PATCH 01/11] NFSv4: Improve nfs4_readdir tracepoint by adding additional fields Dave Wysochanski
                   ` (12 more replies)
  0 siblings, 13 replies; 29+ messages in thread
From: Dave Wysochanski @ 2020-11-02 13:50 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs

Note these patches are also on github at:
https://github.com/DaveWysochanskiRH/kernel/tree/5.9-nfs-readdir

This patchset is mostly tracepoints but patch 9 fixes a performance
problem with NFSv4.x when listing a directory that is being modified.
After the NFSv4.x patch, for consistency, patch 11 is included which
makes NFSv3 behave the same as NFSv4, and given the inclusion of
patch 9 does not re-introduce the regression fixed in previous commit
79f687a3de9e.  As described in patch 11, NFSv3 does not currently have
the NFSv4 problem because it does not drop the pagecache when a
large directory read is in progress.

As background, currently any process that is listing a directory
must search starting at cookie 0 at each entry to nfs_readdir(),
regardless of the previous cookie returned by the server, regardless of
NFS version, and regardless of whether the directory is being modified
or cache expired.  This means, for example, if a large directory listing
requires many getdents64/nfs_readdir calls, the kernel will search
through the pagecache with an increasing penalty as the size of the
directory increases and the process attempts to obtain the latter
entries of the directory.  When the directory is being modified, and
when acdirmax is exceeded, and nfs_attribute_cache_expired() returns
true, with NFSv4.x we drop the pagecache for the directory entries,
so the process cannot make forward progress.

I investigated using a similar approach as was done with NFSv3 in
commit 79f687a3de9e, but for NFSv4, READDIR does not return the directory
attributes and thus _nfs4_proc_readdir does not call nfs_refresh_inode()
unlike nfs3_proc_readdir(). With NFSv3, the calling of nfs_refresh_inode()
is what updates nfsi->read_cache_jiffies causing nfs_attribute_cache_expired()
to always return false, leaving the pagecache in tact despite
NFS_INO_INVALID_DATA being set.  Thus it's not clear whether the NFSv3
approach could be done for NFSv4 to achieve the same behavior as with
NFSv3.  In addition, the current NFSv3 approach leaves in place the
aforementioned penalty, which for large directories can be substantial.
So rather than this approach, the approach taken with NFSv4 leaves
in place the dropping of the pagecache when acdirmax expires, and
relies on the fact that a process can trust the last cookie returned
by the server and continue at that point in the pagecache without
forcing the current process to start over from cookie 0 upon the
next entry to nfs_readdir().  As far as I could tell there is no
POSIX requirement to start from 0 again when a directory is being
modified during an in-progress directory read, and so the current
behavior is only an implementation limitation.

The NFSv4 performance problem can be seen by exporting a directory
with a larger number of files such that the uncached time to list the
directory exceeds acdirmax.  I have an explicit reproducer script
I can provide, but a simple reproducer outline is as follows:

1. First export a directory that contains enough files that the 
listing exceeds acdirmax.  In my testing, the NFS server and client
were the same system, there were 500,000 files in the directory,
and I set acdirmin=10,acdirmax=20.

2. Mount client1 (writer) and client2 (reader).  Note below I use
the loopback IP address, and a 'wsize' parameter on the writer mount
so we get different superblocks:
mount -o vers=4.1 127.0.0.1:/export/bz1889478 $MNT1
mount -o vers=4.1,wsize=65536 127.0.0.1:/export/bz1889478 $MNT2

3. Start client1 (writer):
echo 3 > /proc/sys/vm/drop_caches
i=500000; while true; do touch $MNT2/file$i.bin; i=$((i + 1)); sleep 1; done > /dev/null 2>&1 &

4. Start client2 (reader):
while true; do time ls -1f $MNT1 | wc -l; done &

The output from my reproducer script is:
./t0_bz1889478.sh 4.1 127.0.0.1 500000
Mon 02 Nov 2020 07:54:22 AM EST: Running with NFS vers=4.1 server 127.0.0.1 and files 500000 directory is idle
Mon 02 Nov 2020 07:54:22 AM EST: mount -o vers=4.1 127.0.0.1:/export/bz1889478 /mnt/nfs/bz1889478-mnt1
Mon 02 Nov 2020 07:54:22 AM EST: dropping caches with: echo 3 > /proc/sys/vm/drop_caches
Mon 02 Nov 2020 07:54:24 AM EST: idle directory: skipping client2 adding files
Mon 02 Nov 2020 07:54:24 AM EST: client1 running command: ls -lf /mnt/nfs/bz1889478-mnt1
Mon 02 Nov 2020 07:54:24 AM EST: waiting for listing to complete
Mon 02 Nov 2020 07:54:46 AM EST: client1 pid took 22 seconds to list the directory of 500000 files
Mon 02 Nov 2020 07:54:46 AM EST: client1 READDIR ops total: 4902 (before = 0 after = 4902)
Mon 02 Nov 2020 07:54:47 AM EST: umounting /mnt/nfs/bz1889478-mnt1 /mnt/nfs/bz1889478-mnt2
Mon 02 Nov 2020 07:54:47 AM EST: Running with NFS vers=4.1 server 127.0.0.1 and files 500000 directory being modified
Mon 02 Nov 2020 07:54:47 AM EST: mount -o vers=4.1 127.0.0.1:/export/bz1889478 /mnt/nfs/bz1889478-mnt1
Mon 02 Nov 2020 07:54:47 AM EST: mount -o vers=4.1,wsize=65536 127.0.0.1:/export/bz1889478 /mnt/nfs/bz1889478-mnt2
Mon 02 Nov 2020 07:54:47 AM EST: dropping caches with: echo 3 > /proc/sys/vm/drop_caches
Mon 02 Nov 2020 07:54:50 AM EST: changing directory: client2 start adding 1 file/sec at /mnt/nfs/bz1889478-mnt2
Mon 02 Nov 2020 07:54:51 AM EST: client1 running command: ls -lf /mnt/nfs/bz1889478-mnt1
Mon 02 Nov 2020 07:54:51 AM EST: waiting for listing to complete
Mon 02 Nov 2020 07:55:12 AM EST: client1 pid took 21 seconds to list the directory of 500000 files
Mon 02 Nov 2020 07:55:12 AM EST: client1 READDIR ops total: 4902 (before = 0 after = 4902)
./t0_bz1889478.sh: line 124:  5973 Killed                  while true; do
    echo $i; touch $NFS_CLIENT_MOUNTPOINT2/file$i.bin; i=$((i + 1)); sleep 1;
done > /dev/null 2>&1
Mon 02 Nov 2020 07:55:20 AM EST: umounting /mnt/nfs/bz1889478-mnt1 /mnt/nfs/bz1889478-mnt2
PASSED TEST ./t0_bz1889478.sh on kernel 5.9.0-nfs-readdir+ with NFS vers=4.1

For diagnostics and verification, of course a tcpdump can be
used, or even READDIR ops and time can be compared as in the
reproducer, but also the included tracepoints can be used.  For
the tracepoints, before step #2 above use the below trace-cmd
to trace the listing and see whether the problem occurs or not,
evidenced by either the presence of nfs_invalidate_mapping*
trace events or multiple nfs_readdir_enter calls with
"cookie=0x00000000":
trace-cmd start -e nfs:nfs_readdir_enter -e nfs4:nfs4_readdir -e nfs:nfs_readdir_exit -e nfs:nfs_invalidate_mapping_*


Dave Wysochanski (11):
  NFSv4: Improve nfs4_readdir tracepoint by adding additional fields
  NFS: Replace dfprintk statements with trace events in nfs_readdir
  NFS: Move nfs_readdir_descriptor_t into internal header
  NFS: Add tracepoints for functions involving nfs_readdir_descriptor_t
  NFS: Add tracepoints for opendir, closedir, fsync_dir and llseek_dir
  NFS: Add tracepoints for nfs_readdir_xdr_filler enter and exit
  NFS: Add tracepoint to entry and exit of nfs_do_filldir
  NFS: Replace LOOKUPCACHE dfprintk statements with tracepoints
  NFS: Improve performance of listing directories being modified
  NFS: Add page_index to nfs_readdir enter and exit tracepoints
  NFS: Bring back nfs_dir_mapping_need_revalidate() in nfs_readdir()

 fs/nfs/dir.c           | 101 +++++++++---------
 fs/nfs/internal.h      |  18 ++++
 fs/nfs/nfs3xdr.c       |   2 +-
 fs/nfs/nfs4proc.c      |   2 +-
 fs/nfs/nfs4trace.h     |  44 +++++++-
 fs/nfs/nfstrace.h      | 277 +++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/nfs_fs.h |   1 +
 7 files changed, 394 insertions(+), 51 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 01/11] NFSv4: Improve nfs4_readdir tracepoint by adding additional fields
  2020-11-02 13:50 [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories Dave Wysochanski
@ 2020-11-02 13:50 ` Dave Wysochanski
  2020-11-02 13:50 ` [PATCH 02/11] NFS: Replace dfprintk statements with trace events in nfs_readdir Dave Wysochanski
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Dave Wysochanski @ 2020-11-02 13:50 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs

When tracing NFSv4 readdir there are fields beyond the inode based
information that are useful, such as the cookie, count, and whether
readdirplus is enabled.  Add these fields to the nfs4_readdir
tracepoint which executes after a NFS4 readdir completes.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
---
 fs/nfs/nfs4proc.c  |  2 +-
 fs/nfs/nfs4trace.h | 44 +++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 6e95c85fe395..4f324ad5c5b0 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -4983,7 +4983,7 @@ static int nfs4_proc_readdir(struct dentry *dentry, const struct cred *cred,
 	do {
 		err = _nfs4_proc_readdir(dentry, cred, cookie,
 				pages, count, plus);
-		trace_nfs4_readdir(d_inode(dentry), err);
+		trace_nfs4_readdir(d_inode(dentry), cookie, count, plus, err);
 		err = nfs4_handle_exception(NFS_SERVER(d_inode(dentry)), err,
 				&exception);
 	} while (exception.retry);
diff --git a/fs/nfs/nfs4trace.h b/fs/nfs/nfs4trace.h
index b4f852d4d099..c62e46451b16 100644
--- a/fs/nfs/nfs4trace.h
+++ b/fs/nfs/nfs4trace.h
@@ -1449,9 +1449,51 @@
 
 DEFINE_NFS4_INODE_EVENT(nfs4_access);
 DEFINE_NFS4_INODE_EVENT(nfs4_readlink);
-DEFINE_NFS4_INODE_EVENT(nfs4_readdir);
 DEFINE_NFS4_INODE_EVENT(nfs4_get_acl);
 DEFINE_NFS4_INODE_EVENT(nfs4_set_acl);
+
+TRACE_EVENT(nfs4_readdir,
+		TP_PROTO(
+			const struct inode *inode,
+			u64 cookie,
+			unsigned int count,
+			bool plus,
+			int error
+		),
+
+		TP_ARGS(inode, cookie, count, plus, error),
+
+		TP_STRUCT__entry(
+			__field(dev_t, dev)
+			__field(u32, fhandle)
+			__field(u64, fileid)
+			__field(unsigned long, error)
+			__field(u64, cookie)
+			__field(u64, count)
+			__field(bool, plus)
+		),
+
+		TP_fast_assign(
+			__entry->dev = inode->i_sb->s_dev;
+			__entry->fileid = NFS_FILEID(inode);
+			__entry->fhandle = nfs_fhandle_hash(NFS_FH(inode));
+			__entry->error = error < 0 ? -error : 0;
+			__entry->cookie = cookie;
+			__entry->count = count;
+			__entry->plus = plus;
+		),
+
+		TP_printk(
+			"error=%ld (%s) fileid=%02x:%02x:%llu fhandle=0x%08x cookie=0x%08llx count=%llu plus=%s",
+			-__entry->error,
+			show_nfsv4_errors(__entry->error),
+			MAJOR(__entry->dev), MINOR(__entry->dev),
+			(unsigned long long)__entry->fileid,
+			__entry->fhandle, __entry->cookie,
+			__entry->count, __entry->plus ? "true" : "false"
+		)
+);
+
 #ifdef CONFIG_NFS_V4_SECURITY_LABEL
 DEFINE_NFS4_INODE_EVENT(nfs4_get_security_label);
 DEFINE_NFS4_INODE_EVENT(nfs4_set_security_label);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 02/11] NFS: Replace dfprintk statements with trace events in nfs_readdir
  2020-11-02 13:50 [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories Dave Wysochanski
  2020-11-02 13:50 ` [PATCH 01/11] NFSv4: Improve nfs4_readdir tracepoint by adding additional fields Dave Wysochanski
@ 2020-11-02 13:50 ` Dave Wysochanski
  2020-11-02 13:50 ` [PATCH 03/11] NFS: Move nfs_readdir_descriptor_t into internal header Dave Wysochanski
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Dave Wysochanski @ 2020-11-02 13:50 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs

Remove a couple dfprintks in nfs_readdir and replace them with
trace events on entry/exit that contain more information.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
---
 fs/nfs/dir.c      |  7 +++--
 fs/nfs/nfstrace.h | 84 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 88 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index cb52db9a0cfb..116493e66922 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -900,9 +900,9 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 			*desc = &my_desc;
 	int res = 0;
 
-	dfprintk(FILE, "NFS: readdir(%pD2) starting at cookie %llu\n",
-			file, (long long)ctx->pos);
 	nfs_inc_stats(inode, NFSIOS_VFSGETDENTS);
+	trace_nfs_readdir_enter(inode, ctx->pos, dir_ctx->dir_cookie,
+				NFS_SERVER(inode)->dtsize, my_desc.plus);
 
 	/*
 	 * ctx->pos points to the dirent entry number.
@@ -949,7 +949,8 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 out:
 	if (res > 0)
 		res = 0;
-	dfprintk(FILE, "NFS: readdir(%pD2) returns %d\n", file, res);
+	trace_nfs_readdir_exit(inode, ctx->pos, dir_ctx->dir_cookie,
+			       NFS_SERVER(inode)->dtsize, my_desc.plus, res);
 	return res;
 }
 
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index 5a59dcdce0b2..6bbe0aa221f2 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -668,6 +668,90 @@
 DEFINE_NFS_DIRECTORY_EVENT(nfs_symlink_enter);
 DEFINE_NFS_DIRECTORY_EVENT_DONE(nfs_symlink_exit);
 
+TRACE_EVENT(nfs_readdir_enter,
+		TP_PROTO(
+			const struct inode *inode,
+			u64 cookie,
+			u64 dir_cookie,
+			unsigned int count,
+			bool plus
+		),
+
+		TP_ARGS(inode, cookie, dir_cookie, count, plus),
+
+		TP_STRUCT__entry(
+			__field(dev_t, dev)
+			__field(u32, fhandle)
+			__field(u64, fileid)
+			__field(u64, cookie)
+			__field(u64, dir_cookie)
+			__field(u64, count)
+			__field(bool, plus)
+		),
+
+		TP_fast_assign(
+			__entry->dev = inode->i_sb->s_dev;
+			__entry->fileid = NFS_FILEID(inode);
+			__entry->fhandle = nfs_fhandle_hash(NFS_FH(inode));
+			__entry->cookie = cookie;
+			__entry->dir_cookie = dir_cookie;
+			__entry->count = count;
+			__entry->plus = plus;
+		),
+
+		TP_printk(
+			"fileid=%02x:%02x:%llu fhandle=0x%08x cookie=0x%08llx dir_cookie=0x%08llx count=%llu plus=%s",
+			MAJOR(__entry->dev), MINOR(__entry->dev),
+			(unsigned long long)__entry->fileid,
+			__entry->fhandle, __entry->cookie, __entry->dir_cookie,
+			__entry->count, __entry->plus ? "true" : "false"
+		)
+);
+
+TRACE_EVENT(nfs_readdir_exit,
+		TP_PROTO(
+			const struct inode *inode,
+			u64 cookie,
+			u64 dir_cookie,
+			unsigned int count,
+			bool plus,
+			int error
+		),
+
+		TP_ARGS(inode, cookie, dir_cookie, count, plus, error),
+
+		TP_STRUCT__entry(
+			__field(dev_t, dev)
+			__field(u32, fhandle)
+			__field(u64, fileid)
+			__field(unsigned long, error)
+			__field(u64, cookie)
+			__field(u64, dir_cookie)
+			__field(u64, count)
+			__field(bool, plus)
+		),
+
+		TP_fast_assign(
+			__entry->dev = inode->i_sb->s_dev;
+			__entry->fileid = NFS_FILEID(inode);
+			__entry->fhandle = nfs_fhandle_hash(NFS_FH(inode));
+			__entry->error = error;
+			__entry->cookie = cookie;
+			__entry->dir_cookie = dir_cookie;
+			__entry->count = count;
+			__entry->plus = plus;
+		),
+
+		TP_printk(
+			"error=%ld fileid=%02x:%02x:%llu fhandle=0x%08x cookie=0x%08llx dir_cookie=0x%08llx count=%llu plus=%s",
+			__entry->error,
+			MAJOR(__entry->dev), MINOR(__entry->dev),
+			(unsigned long long)__entry->fileid,
+			__entry->fhandle, __entry->cookie, __entry->dir_cookie,
+			__entry->count, __entry->plus ? "true" : "false"
+		)
+);
+
 TRACE_EVENT(nfs_link_enter,
 		TP_PROTO(
 			const struct inode *inode,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 03/11] NFS: Move nfs_readdir_descriptor_t into internal header
  2020-11-02 13:50 [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories Dave Wysochanski
  2020-11-02 13:50 ` [PATCH 01/11] NFSv4: Improve nfs4_readdir tracepoint by adding additional fields Dave Wysochanski
  2020-11-02 13:50 ` [PATCH 02/11] NFS: Replace dfprintk statements with trace events in nfs_readdir Dave Wysochanski
@ 2020-11-02 13:50 ` Dave Wysochanski
  2020-11-02 13:50 ` [PATCH 04/11] NFS: Add tracepoints for functions involving nfs_readdir_descriptor_t Dave Wysochanski
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Dave Wysochanski @ 2020-11-02 13:50 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs

In preparation for readdir tracing involving nfs_readdir_descriptor_t
move the definition into the "internal.h" header file, and ensure
internal.h is included before nfstrace.h to avoid build problems.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
---
 fs/nfs/dir.c      | 18 ------------------
 fs/nfs/internal.h | 18 ++++++++++++++++++
 fs/nfs/nfs3xdr.c  |  2 +-
 3 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 116493e66922..145393188f6a 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -144,24 +144,6 @@ struct nfs_cache_array {
 	struct nfs_cache_array_entry array[];
 };
 
-typedef struct {
-	struct file	*file;
-	struct page	*page;
-	struct dir_context *ctx;
-	unsigned long	page_index;
-	u64		*dir_cookie;
-	u64		last_cookie;
-	loff_t		current_index;
-	loff_t		prev_index;
-
-	unsigned long	dir_verifier;
-	unsigned long	timestamp;
-	unsigned long	gencount;
-	unsigned int	cache_entry_index;
-	bool plus;
-	bool eof;
-} nfs_readdir_descriptor_t;
-
 static
 void nfs_readdir_init_array(struct page *page)
 {
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 6673a77884d9..68ade987370d 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -62,6 +62,24 @@ static inline bool nfs_lookup_is_soft_revalidate(const struct dentry *dentry)
  */
 #define NFS_MAX_READDIR_PAGES 8
 
+typedef struct {
+	struct file	*file;
+	struct page	*page;
+	struct dir_context *ctx;
+	unsigned long	page_index;
+	u64		*dir_cookie;
+	u64		last_cookie;
+	loff_t		current_index;
+	loff_t		prev_index;
+
+	unsigned long	dir_verifier;
+	unsigned long	timestamp;
+	unsigned long	gencount;
+	unsigned int	cache_entry_index;
+	bool plus;
+	bool eof;
+} nfs_readdir_descriptor_t;
+
 struct nfs_client_initdata {
 	unsigned long init_flags;
 	const char *hostname;			/* Hostname of the server */
diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index 69971f6c840d..972759325de0 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -21,8 +21,8 @@
 #include <linux/nfs3.h>
 #include <linux/nfs_fs.h>
 #include <linux/nfsacl.h>
-#include "nfstrace.h"
 #include "internal.h"
+#include "nfstrace.h"
 
 #define NFSDBG_FACILITY		NFSDBG_XDR
 
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 04/11] NFS: Add tracepoints for functions involving nfs_readdir_descriptor_t
  2020-11-02 13:50 [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories Dave Wysochanski
                   ` (2 preceding siblings ...)
  2020-11-02 13:50 ` [PATCH 03/11] NFS: Move nfs_readdir_descriptor_t into internal header Dave Wysochanski
@ 2020-11-02 13:50 ` Dave Wysochanski
  2020-11-02 17:32   ` kernel test robot
  2020-11-02 13:50 ` [PATCH 05/11] NFS: Add tracepoints for opendir, closedir, fsync_dir and llseek_dir Dave Wysochanski
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Dave Wysochanski @ 2020-11-02 13:50 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs

Add more tracepoints in the NFS readdir code to trace functions
which change members of nfs_readdir_descriptor_t.  In the process,
remove two more dfprintks inside uncached_readdir().

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
---
 fs/nfs/dir.c      |  10 +++---
 fs/nfs/nfstrace.h | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 108 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 145393188f6a..227cddc12983 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -329,6 +329,7 @@ int nfs_readdir_search_array(nfs_readdir_descriptor_t *desc)
 	struct nfs_cache_array *array;
 	int status;
 
+	trace_nfs_readdir_search_array_enter(desc);
 	array = kmap(desc->page);
 
 	if (*desc->dir_cookie == 0)
@@ -342,6 +343,7 @@ int nfs_readdir_search_array(nfs_readdir_descriptor_t *desc)
 		desc->page_index++;
 	}
 	kunmap(desc->page);
+	trace_nfs_readdir_search_array_exit(desc, status);
 	return status;
 }
 
@@ -762,6 +764,7 @@ int readdir_search_pagecache(nfs_readdir_descriptor_t *desc)
 {
 	int res;
 
+	trace_nfs_readdir_search_pagecache_enter(desc);
 	if (desc->page_index == 0) {
 		desc->current_index = 0;
 		desc->prev_index = 0;
@@ -770,6 +773,7 @@ int readdir_search_pagecache(nfs_readdir_descriptor_t *desc)
 	do {
 		res = find_and_lock_cache_page(desc);
 	} while (res == -EAGAIN);
+	trace_nfs_readdir_search_pagecache_exit(desc, res);
 	return res;
 }
 
@@ -835,8 +839,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc)
 	struct inode *inode = file_inode(desc->file);
 	struct nfs_open_dir_context *ctx = desc->file->private_data;
 
-	dfprintk(DIRCACHE, "NFS: uncached_readdir() searching for cookie %Lu\n",
-			(unsigned long long)*desc->dir_cookie);
+	trace_nfs_uncached_readdir_enter(desc);
 
 	page = alloc_page(GFP_HIGHUSER);
 	if (!page) {
@@ -859,8 +862,7 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc)
 	nfs_readdir_clear_array(desc->page);
 	cache_page_release(desc);
  out:
-	dfprintk(DIRCACHE, "NFS: %s: returns %d\n",
-			__func__, status);
+	trace_nfs_uncached_readdir_exit(desc, status);
 	return status;
 }
 
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index 6bbe0aa221f2..e6a946b83330 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -752,6 +752,108 @@
 		)
 );
 
+DECLARE_EVENT_CLASS(nfs_readdir_descriptor_event_enter,
+		TP_PROTO(
+				const nfs_readdir_descriptor_t *desc
+			),
+
+		TP_ARGS(desc),
+
+		TP_STRUCT__entry(
+			__field(dev_t, dev)
+			__field(u64, fileid)
+			__field(u32, fhandle)
+			__field(u64, dir_cookie)
+			__field(u64, page_index)
+			__field(u64, last_cookie)
+			__field(u64, current_index)
+			__field(u64, prev_index)
+		),
+
+		TP_fast_assign(
+			__entry->dev = file_inode(desc->file)->i_sb->s_dev;
+			__entry->fileid = NFS_FILEID(file_inode(desc->file));
+			__entry->fhandle = nfs_fhandle_hash(NFS_FH(file_inode(desc->file)));
+			__entry->dir_cookie = *desc->dir_cookie;
+			__entry->page_index = desc->page_index;
+			__entry->last_cookie = desc->last_cookie;
+			__entry->current_index = desc->current_index;
+			__entry->prev_index = desc->prev_index;
+		),
+
+		TP_printk(
+			"fileid=%02x:%02x:%llu fhandle=0x%08x dir_cookie=0x%08llx last_cookie=0x%08llx page_index=0x%08llx current_index=0x%08llu prev_index=0x%08llu",
+			MAJOR(__entry->dev), MINOR(__entry->dev),
+			(unsigned long long)__entry->fileid, __entry->fhandle,
+			__entry->dir_cookie, __entry->last_cookie, __entry->page_index,
+			__entry->current_index, __entry->prev_index
+		)
+);
+
+#define DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(name) \
+	DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
+			TP_PROTO( \
+				const nfs_readdir_descriptor_t *desc \
+			), \
+			TP_ARGS(desc))
+
+DECLARE_EVENT_CLASS(nfs_readdir_descriptor_event_exit,
+		TP_PROTO(
+				const nfs_readdir_descriptor_t *desc,
+				int error
+			),
+
+		TP_ARGS(desc, error),
+
+		TP_STRUCT__entry(
+			__field(dev_t, dev)
+			__field(u64, fileid)
+			__field(u32, fhandle)
+			__field(u64, dir_cookie)
+			__field(unsigned long, error)
+			__field(u64, page_index)
+			__field(u64, last_cookie)
+			__field(u64, current_index)
+			__field(u64, prev_index)
+		),
+
+		TP_fast_assign(
+			__entry->dev = file_inode(desc->file)->i_sb->s_dev;
+			__entry->fileid = NFS_FILEID(file_inode(desc->file));
+			__entry->fhandle = nfs_fhandle_hash(NFS_FH(file_inode(desc->file)));
+			__entry->dir_cookie = *desc->dir_cookie;
+			__entry->error = error;
+			__entry->page_index = desc->page_index;
+			__entry->last_cookie = desc->last_cookie;
+			__entry->current_index = desc->current_index;
+			__entry->prev_index = desc->prev_index;
+		),
+
+		TP_printk(
+			"error=%ld fileid=%02x:%02x:%llu fhandle=0x%08x dir_cookie=0x%08llx last_cookie=0x%08llx page_index=0x%08llx current_index=0x%08llu prev_index=0x%08llu",
+			__entry->error,
+			MAJOR(__entry->dev), MINOR(__entry->dev),
+			(unsigned long long)__entry->fileid, __entry->fhandle,
+			__entry->dir_cookie, __entry->last_cookie, __entry->page_index,
+			__entry->current_index, __entry->prev_index
+		)
+);
+
+#define DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(name) \
+	DEFINE_EVENT(nfs_readdir_descriptor_event_exit, name, \
+			TP_PROTO( \
+				const nfs_readdir_descriptor_t *desc, \
+				int error \
+			), \
+			TP_ARGS(desc, error))
+
+DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_uncached_readdir_enter);
+DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_uncached_readdir_exit);
+DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_readdir_search_pagecache_enter);
+DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_readdir_search_pagecache_exit);
+DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_readdir_search_array_enter);
+DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_readdir_search_array_exit);
+
 TRACE_EVENT(nfs_link_enter,
 		TP_PROTO(
 			const struct inode *inode,
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 05/11] NFS: Add tracepoints for opendir, closedir, fsync_dir and llseek_dir
  2020-11-02 13:50 [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories Dave Wysochanski
                   ` (3 preceding siblings ...)
  2020-11-02 13:50 ` [PATCH 04/11] NFS: Add tracepoints for functions involving nfs_readdir_descriptor_t Dave Wysochanski
@ 2020-11-02 13:50 ` Dave Wysochanski
  2020-11-02 13:50 ` [PATCH 06/11] NFS: Add tracepoints for nfs_readdir_xdr_filler enter and exit Dave Wysochanski
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Dave Wysochanski @ 2020-11-02 13:50 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs

Add a few more directory operation tracepoints and remove a few
more dfprintks.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
---
 fs/nfs/dir.c      | 10 +++++---
 fs/nfs/nfstrace.h | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 79 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 227cddc12983..e4cd9789ebb5 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -109,7 +109,7 @@ static void put_nfs_open_dir_context(struct inode *dir, struct nfs_open_dir_cont
 	int res = 0;
 	struct nfs_open_dir_context *ctx;
 
-	dfprintk(FILE, "NFS: open dir(%pD2)\n", filp);
+	trace_nfs_opendir_enter(file_inode(filp), file_dentry(filp));
 
 	nfs_inc_stats(inode, NFSIOS_VFSOPEN);
 
@@ -120,13 +120,16 @@ static void put_nfs_open_dir_context(struct inode *dir, struct nfs_open_dir_cont
 	}
 	filp->private_data = ctx;
 out:
+	trace_nfs_opendir_exit(file_inode(filp), file_dentry(filp), res);
 	return res;
 }
 
 static int
 nfs_closedir(struct inode *inode, struct file *filp)
 {
+	trace_nfs_closedir_enter(file_inode(filp), file_dentry(filp));
 	put_nfs_open_dir_context(file_inode(filp), filp->private_data);
+	trace_nfs_closedir_exit(file_inode(filp), file_dentry(filp), 0);
 	return 0;
 }
 
@@ -943,8 +946,7 @@ static loff_t nfs_llseek_dir(struct file *filp, loff_t offset, int whence)
 	struct inode *inode = file_inode(filp);
 	struct nfs_open_dir_context *dir_ctx = filp->private_data;
 
-	dfprintk(FILE, "NFS: llseek dir(%pD2, %lld, %d)\n",
-			filp, offset, whence);
+	trace_nfs_llseek_dir_enter(inode, offset, whence);
 
 	switch (whence) {
 	default:
@@ -985,7 +987,7 @@ static int nfs_fsync_dir(struct file *filp, loff_t start, loff_t end,
 {
 	struct inode *inode = file_inode(filp);
 
-	dfprintk(FILE, "NFS: fsync dir(%pD2) datasync %d\n", filp, datasync);
+	trace_nfs_fsync_dir_enter(inode, start, end, datasync);
 
 	inode_lock(inode);
 	nfs_inc_stats(inode, NFSIOS_VFSFSYNC);
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index e6a946b83330..0ed330f323bb 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -667,6 +667,79 @@
 DEFINE_NFS_DIRECTORY_EVENT_DONE(nfs_unlink_exit);
 DEFINE_NFS_DIRECTORY_EVENT(nfs_symlink_enter);
 DEFINE_NFS_DIRECTORY_EVENT_DONE(nfs_symlink_exit);
+DEFINE_NFS_DIRECTORY_EVENT(nfs_opendir_enter);
+DEFINE_NFS_DIRECTORY_EVENT_DONE(nfs_opendir_exit);
+DEFINE_NFS_DIRECTORY_EVENT(nfs_closedir_enter);
+DEFINE_NFS_DIRECTORY_EVENT_DONE(nfs_closedir_exit);
+
+TRACE_EVENT(nfs_llseek_dir_enter,
+		TP_PROTO(
+			const struct inode *inode,
+			u64 offset,
+			int whence
+		),
+
+		TP_ARGS(inode, offset, whence),
+
+		TP_STRUCT__entry(
+			__field(dev_t, dev)
+			__field(u32, fhandle)
+			__field(u64, fileid)
+			__field(u64, offset)
+			__field(unsigned long, whence)
+		),
+
+		TP_fast_assign(
+			__entry->dev = inode->i_sb->s_dev;
+			__entry->fileid = NFS_FILEID(inode);
+			__entry->fhandle = nfs_fhandle_hash(NFS_FH(inode));
+			__entry->offset = offset;
+			__entry->whence = whence;
+		),
+
+		TP_printk(
+			"fileid=%02x:%02x:%llu handle=0x%08x offset=0x%08llx whence=%lu",
+			MAJOR(__entry->dev), MINOR(__entry->dev),
+			(unsigned long long)__entry->fileid,
+			__entry->fhandle, __entry->offset, __entry->whence
+		)
+);
+
+TRACE_EVENT(nfs_fsync_dir_enter,
+		TP_PROTO(
+			const struct inode *inode,
+			u64 start,
+			u64 end,
+			int datasync
+		),
+
+		TP_ARGS(inode, start, end, datasync),
+
+		TP_STRUCT__entry(
+			__field(dev_t, dev)
+			__field(u32, fhandle)
+			__field(u64, fileid)
+			__field(u64, start)
+			__field(u64, end)
+			__field(unsigned long, datasync)
+		),
+
+		TP_fast_assign(
+			__entry->dev = inode->i_sb->s_dev;
+			__entry->fileid = NFS_FILEID(inode);
+			__entry->fhandle = nfs_fhandle_hash(NFS_FH(inode));
+			__entry->start = start;
+			__entry->end = end;
+			__entry->datasync = datasync;
+		),
+
+		TP_printk(
+			"fileid=%02x:%02x:%llu handle=0x%08x start=0x%08llx end=0x%08llx datasync=%lu",
+			MAJOR(__entry->dev), MINOR(__entry->dev),
+			(unsigned long long)__entry->fileid,
+			__entry->fhandle, __entry->start, __entry->end, __entry->datasync
+		)
+);
 
 TRACE_EVENT(nfs_readdir_enter,
 		TP_PROTO(
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 06/11] NFS: Add tracepoints for nfs_readdir_xdr_filler enter and exit
  2020-11-02 13:50 [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories Dave Wysochanski
                   ` (4 preceding siblings ...)
  2020-11-02 13:50 ` [PATCH 05/11] NFS: Add tracepoints for opendir, closedir, fsync_dir and llseek_dir Dave Wysochanski
@ 2020-11-02 13:50 ` Dave Wysochanski
  2020-11-02 13:50 ` [PATCH 07/11] NFS: Add tracepoint to entry and exit of nfs_do_filldir Dave Wysochanski
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Dave Wysochanski @ 2020-11-02 13:50 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs

Add tracepoints for entry and exit to nfs_readdir_xdr_filler.
Note the exit trace event captures the results of over the wire
READDIR operations, regardless of NFS version.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
---
 fs/nfs/dir.c      | 2 ++
 fs/nfs/nfstrace.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index e4cd9789ebb5..d1e9ba28c4a0 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -361,6 +361,7 @@ int nfs_readdir_xdr_filler(struct page **pages, nfs_readdir_descriptor_t *desc,
 	int		error;
 
  again:
+	trace_nfs_readdir_xdr_filler_enter(desc);
 	timestamp = jiffies;
 	gencount = nfs_inc_attr_generation_counter();
 	desc->dir_verifier = nfs_save_change_attribute(inode);
@@ -379,6 +380,7 @@ int nfs_readdir_xdr_filler(struct page **pages, nfs_readdir_descriptor_t *desc,
 	desc->timestamp = timestamp;
 	desc->gencount = gencount;
 error:
+	trace_nfs_readdir_xdr_filler_exit(desc, error);
 	return error;
 }
 
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index 0ed330f323bb..5dbadd2718e3 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -926,6 +926,8 @@
 DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_readdir_search_pagecache_exit);
 DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_readdir_search_array_enter);
 DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_readdir_search_array_exit);
+DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_readdir_xdr_filler_enter);
+DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_readdir_xdr_filler_exit);
 
 TRACE_EVENT(nfs_link_enter,
 		TP_PROTO(
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 07/11] NFS: Add tracepoint to entry and exit of nfs_do_filldir
  2020-11-02 13:50 [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories Dave Wysochanski
                   ` (5 preceding siblings ...)
  2020-11-02 13:50 ` [PATCH 06/11] NFS: Add tracepoints for nfs_readdir_xdr_filler enter and exit Dave Wysochanski
@ 2020-11-02 13:50 ` Dave Wysochanski
  2020-11-02 13:50 ` [PATCH 08/11] NFS: Replace LOOKUPCACHE dfprintk statements with tracepoints Dave Wysochanski
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Dave Wysochanski @ 2020-11-02 13:50 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs

Add tracepoints to nfs_do_filldir and remove one more dfprintk.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
---
 fs/nfs/dir.c      | 4 ++--
 fs/nfs/nfstrace.h | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index d1e9ba28c4a0..d8c3c3fdea75 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -794,6 +794,7 @@ int nfs_do_filldir(nfs_readdir_descriptor_t *desc)
 	struct nfs_cache_array *array = NULL;
 	struct nfs_open_dir_context *ctx = file->private_data;
 
+	trace_nfs_do_filldir_enter(desc);
 	array = kmap(desc->page);
 	for (i = desc->cache_entry_index; i < array->size; i++) {
 		struct nfs_cache_array_entry *ent;
@@ -819,8 +820,7 @@ int nfs_do_filldir(nfs_readdir_descriptor_t *desc)
 		desc->eof = true;
 
 	kunmap(desc->page);
-	dfprintk(DIRCACHE, "NFS: nfs_do_filldir() filling ended @ cookie %Lu; returning = %d\n",
-			(unsigned long long)*desc->dir_cookie, res);
+	trace_nfs_do_filldir_exit(desc, res);
 	return res;
 }
 
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index 5dbadd2718e3..e28551f70eab 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -928,6 +928,8 @@
 DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_readdir_search_array_exit);
 DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_readdir_xdr_filler_enter);
 DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_readdir_xdr_filler_exit);
+DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_do_filldir_enter);
+DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_do_filldir_exit);
 
 TRACE_EVENT(nfs_link_enter,
 		TP_PROTO(
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 08/11] NFS: Replace LOOKUPCACHE dfprintk statements with tracepoints
  2020-11-02 13:50 [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories Dave Wysochanski
                   ` (6 preceding siblings ...)
  2020-11-02 13:50 ` [PATCH 07/11] NFS: Add tracepoint to entry and exit of nfs_do_filldir Dave Wysochanski
@ 2020-11-02 13:50 ` Dave Wysochanski
  2020-11-02 13:50 ` [PATCH 09/11] NFS: Improve performance of listing directories being modified Dave Wysochanski
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Dave Wysochanski @ 2020-11-02 13:50 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs

Remove all the LOOKUPCACHE dfprintk statements in favor of
tracepoints based on existing tracing templates
DEFINE_NFS_LOOKUP_EVENT and DEFINE_NFS_LOOKUP_EVENT_DONE.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
---
 fs/nfs/dir.c      | 27 ++++++++++++---------------
 fs/nfs/nfstrace.h |  8 ++++++++
 2 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index d8c3c3fdea75..52e06c8fc7cd 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1226,8 +1226,7 @@ int nfs_neg_need_reval(struct inode *dir, struct dentry *dentry,
 {
 	switch (error) {
 	case 1:
-		dfprintk(LOOKUPCACHE, "NFS: %s(%pd2) is valid\n",
-			__func__, dentry);
+		trace_nfs_lookup_revalidate_done_valid(dir, dentry, 0);
 		return 1;
 	case 0:
 		nfs_mark_for_revalidate(dir);
@@ -1243,12 +1242,10 @@ int nfs_neg_need_reval(struct inode *dir, struct dentry *dentry,
 			if (IS_ROOT(dentry))
 				return 1;
 		}
-		dfprintk(LOOKUPCACHE, "NFS: %s(%pd2) is invalid\n",
-				__func__, dentry);
+		trace_nfs_lookup_revalidate_done_invalid(dir, dentry, 0);
 		return 0;
 	}
-	dfprintk(LOOKUPCACHE, "NFS: %s(%pd2) lookup returned error %d\n",
-				__func__, dentry, error);
+	trace_nfs_lookup_revalidate_done_exit(dir, dentry, 0, error);
 	return error;
 }
 
@@ -1344,12 +1341,14 @@ int nfs_neg_need_reval(struct inode *dir, struct dentry *dentry,
 	nfs_inc_stats(dir, NFSIOS_DENTRYREVALIDATE);
 	inode = d_inode(dentry);
 
-	if (!inode)
+	if (!inode) {
+		trace_nfs_lookup_revalidate_negative_inode(dir, dentry,
+							   flags);
 		return nfs_lookup_revalidate_negative(dir, dentry, flags);
+	}
 
 	if (is_bad_inode(inode)) {
-		dfprintk(LOOKUPCACHE, "%s: %pd2 has dud inode\n",
-				__func__, dentry);
+		trace_nfs_lookup_revalidate_bad_inode(dir, dentry, flags);
 		goto out_bad;
 	}
 
@@ -1436,20 +1435,18 @@ static int nfs_weak_revalidate(struct dentry *dentry, unsigned int flags)
 	 * eventually need to do something more here.
 	 */
 	if (!inode) {
-		dfprintk(LOOKUPCACHE, "%s: %pd2 has negative inode\n",
-				__func__, dentry);
+		trace_nfs_weak_revalidate_negative_inode(inode, dentry,
+							 flags);
 		return 1;
 	}
 
 	if (is_bad_inode(inode)) {
-		dfprintk(LOOKUPCACHE, "%s: %pd2 has dud inode\n",
-				__func__, dentry);
+		trace_nfs_weak_revalidate_bad_inode(inode, dentry, flags);
 		return 0;
 	}
 
 	error = nfs_lookup_verify_inode(inode, flags);
-	dfprintk(LOOKUPCACHE, "NFS: %s: inode %lu is %s\n",
-			__func__, inode->i_ino, error ? "invalid" : "valid");
+	trace_nfs_weak_revalidate_exit(inode, dentry, flags, error);
 	return !error;
 }
 
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index e28551f70eab..06b301da85a2 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -386,6 +386,14 @@
 DEFINE_NFS_LOOKUP_EVENT_DONE(nfs_lookup_exit);
 DEFINE_NFS_LOOKUP_EVENT(nfs_lookup_revalidate_enter);
 DEFINE_NFS_LOOKUP_EVENT_DONE(nfs_lookup_revalidate_exit);
+DEFINE_NFS_LOOKUP_EVENT(nfs_lookup_revalidate_negative_inode);
+DEFINE_NFS_LOOKUP_EVENT(nfs_lookup_revalidate_bad_inode);
+DEFINE_NFS_LOOKUP_EVENT(nfs_lookup_revalidate_done_valid);
+DEFINE_NFS_LOOKUP_EVENT(nfs_lookup_revalidate_done_invalid);
+DEFINE_NFS_LOOKUP_EVENT_DONE(nfs_lookup_revalidate_done_exit);
+DEFINE_NFS_LOOKUP_EVENT(nfs_weak_revalidate_negative_inode);
+DEFINE_NFS_LOOKUP_EVENT(nfs_weak_revalidate_bad_inode);
+DEFINE_NFS_LOOKUP_EVENT_DONE(nfs_weak_revalidate_exit);
 
 TRACE_DEFINE_ENUM(O_WRONLY);
 TRACE_DEFINE_ENUM(O_RDWR);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 09/11] NFS: Improve performance of listing directories being modified
  2020-11-02 13:50 [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories Dave Wysochanski
                   ` (7 preceding siblings ...)
  2020-11-02 13:50 ` [PATCH 08/11] NFS: Replace LOOKUPCACHE dfprintk statements with tracepoints Dave Wysochanski
@ 2020-11-02 13:50 ` Dave Wysochanski
  2020-11-02 16:21   ` Trond Myklebust
  2020-11-02 13:50 ` [PATCH 10/11] NFS: Add page_index to nfs_readdir enter and exit tracepoints Dave Wysochanski
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Dave Wysochanski @ 2020-11-02 13:50 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs

A process can hang forever to 'ls -l' a directory while the directory
is being modified such as another NFS client adding files to the
directory.  The problem is seen specifically with larger directories
(I tested with 1 million) and/or slower NFS server responses to
READDIR.  If a combination of the NFS directory size, the NFS server
responses to READDIR is such that the 'ls' process gets partially
through the listing before the attribute cache expires (time
exceeds acdirmax), we drop the pagecache and have to re-fill it,
and as a result, the process may never complete.  One could argue
for larger directories the acdirmin/acdirmax should be increased,
but it's not always possible to tune this effectively.

The root cause of this problem is due to how the NFS readdir cache
currently works.  The main search function, readdir_search_pagecache(),
always starts searching at page_index and cookie == 0, and for any
page not in the cache, fills in the page with entries obtained in
a READDIR NFS call.  If a page already exists, we proceed to
nfs_readdir_search_for_cookie(), which searches for the cookie
(pos) of the readdir call.  The search is O(n), where n is the
directory size before the cookie in question is found, and every
entry to nfs_readdir() pays this penalty, irrespective of the
current directory position (dir_context.pos).  The search is
expensive due to the opaque nature of readdir cookies, and the fact
that no mapping (hash) exists from cookies to pages.  In the case
of a directory being modified, the above behavior can become an
excessive penalty, since the same process is forced to fill pages it
may be no longer interested in (the entries were passed in a previous
nfs_readdir call), and this can essentially lead no forward progress.

To fix this problem, at the end of nfs_readdir(), save the page_index
corresponding to the directory position (cookie) inside the process's
nfs_open_dir_context.  Then at the next entry of nfs_readdir(), use
the saved page_index as the starting search point rather than starting
at page_index == 0.  Not only does this fix the problem of listing
a directory being modified, it also significantly improves performance
in the unmodified case since no extra search penalty is paid at each
entry to nfs_readdir().

In the case of lseek, since there is no hash or other mapping from a
cookie value to the page->index, just reset nfs_open_dir_context.page_index
to 0, which will reset the search to the old behavior.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
---
 fs/nfs/dir.c           | 8 +++++++-
 include/linux/nfs_fs.h | 1 +
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 52e06c8fc7cd..b266f505b521 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -78,6 +78,7 @@ static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct inode *dir
 		ctx->attr_gencount = nfsi->attr_gencount;
 		ctx->dir_cookie = 0;
 		ctx->dup_cookie = 0;
+		ctx->page_index = 0;
 		ctx->cred = get_cred(cred);
 		spin_lock(&dir->i_lock);
 		if (list_empty(&nfsi->open_files) &&
@@ -763,7 +764,7 @@ int find_and_lock_cache_page(nfs_readdir_descriptor_t *desc)
 	return res;
 }
 
-/* Search for desc->dir_cookie from the beginning of the page cache */
+/* Search for desc->dir_cookie starting at desc->page_index */
 static inline
 int readdir_search_pagecache(nfs_readdir_descriptor_t *desc)
 {
@@ -885,6 +886,8 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 		.ctx = ctx,
 		.dir_cookie = &dir_ctx->dir_cookie,
 		.plus = nfs_use_readdirplus(inode, ctx),
+		.page_index = dir_ctx->page_index,
+		.last_cookie = nfs_readdir_use_cookie(file) ? ctx->pos : 0,
 	},
 			*desc = &my_desc;
 	int res = 0;
@@ -938,6 +941,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 out:
 	if (res > 0)
 		res = 0;
+	dir_ctx->page_index = desc->page_index;
 	trace_nfs_readdir_exit(inode, ctx->pos, dir_ctx->dir_cookie,
 			       NFS_SERVER(inode)->dtsize, my_desc.plus, res);
 	return res;
@@ -975,6 +979,8 @@ static loff_t nfs_llseek_dir(struct file *filp, loff_t offset, int whence)
 		else
 			dir_ctx->dir_cookie = 0;
 		dir_ctx->duped = 0;
+		/* Force readdir_search_pagecache to start over */
+		dir_ctx->page_index = 0;
 	}
 	inode_unlock(inode);
 	return offset;
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index a2c6455ea3fa..0e55c0154ccd 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -93,6 +93,7 @@ struct nfs_open_dir_context {
 	__u64 dir_cookie;
 	__u64 dup_cookie;
 	signed char duped;
+	unsigned long	page_index;
 };
 
 /*
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 10/11] NFS: Add page_index to nfs_readdir enter and exit tracepoints
  2020-11-02 13:50 [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories Dave Wysochanski
                   ` (8 preceding siblings ...)
  2020-11-02 13:50 ` [PATCH 09/11] NFS: Improve performance of listing directories being modified Dave Wysochanski
@ 2020-11-02 13:50 ` Dave Wysochanski
  2020-11-02 13:50 ` [PATCH 11/11] NFS: Bring back nfs_dir_mapping_need_revalidate() in nfs_readdir() Dave Wysochanski
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Dave Wysochanski @ 2020-11-02 13:50 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs

Add nfs_open_dir_context.page_index to enter and exit tracepoints
since this affects searching the cache.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
---
 fs/nfs/dir.c      |  2 ++
 fs/nfs/nfstrace.h | 22 ++++++++++++++--------
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index b266f505b521..cbd74cbdbb9f 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -894,6 +894,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 
 	nfs_inc_stats(inode, NFSIOS_VFSGETDENTS);
 	trace_nfs_readdir_enter(inode, ctx->pos, dir_ctx->dir_cookie,
+				dir_ctx->page_index,
 				NFS_SERVER(inode)->dtsize, my_desc.plus);
 
 	/*
@@ -943,6 +944,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 		res = 0;
 	dir_ctx->page_index = desc->page_index;
 	trace_nfs_readdir_exit(inode, ctx->pos, dir_ctx->dir_cookie,
+			       dir_ctx->page_index,
 			       NFS_SERVER(inode)->dtsize, my_desc.plus, res);
 	return res;
 }
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index 06b301da85a2..12869b0c3f70 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -754,11 +754,12 @@
 			const struct inode *inode,
 			u64 cookie,
 			u64 dir_cookie,
+			unsigned long page_index,
 			unsigned int count,
 			bool plus
 		),
 
-		TP_ARGS(inode, cookie, dir_cookie, count, plus),
+		TP_ARGS(inode, cookie, dir_cookie, page_index, count, plus),
 
 		TP_STRUCT__entry(
 			__field(dev_t, dev)
@@ -766,6 +767,7 @@
 			__field(u64, fileid)
 			__field(u64, cookie)
 			__field(u64, dir_cookie)
+			__field(unsigned long, page_index)
 			__field(u64, count)
 			__field(bool, plus)
 		),
@@ -776,15 +778,16 @@
 			__entry->fhandle = nfs_fhandle_hash(NFS_FH(inode));
 			__entry->cookie = cookie;
 			__entry->dir_cookie = dir_cookie;
+			__entry->page_index = page_index;
 			__entry->count = count;
 			__entry->plus = plus;
 		),
 
 		TP_printk(
-			"fileid=%02x:%02x:%llu fhandle=0x%08x cookie=0x%08llx dir_cookie=0x%08llx count=%llu plus=%s",
+			"fileid=%02x:%02x:%llu fhandle=0x%08x cookie=0x%08llx dir_cookie=0x%08llx page_index=0x%08lx count=%llu plus=%s",
 			MAJOR(__entry->dev), MINOR(__entry->dev),
-			(unsigned long long)__entry->fileid,
-			__entry->fhandle, __entry->cookie, __entry->dir_cookie,
+			(unsigned long long)__entry->fileid, __entry->fhandle,
+			__entry->cookie, __entry->dir_cookie, __entry->page_index,
 			__entry->count, __entry->plus ? "true" : "false"
 		)
 );
@@ -794,12 +797,13 @@
 			const struct inode *inode,
 			u64 cookie,
 			u64 dir_cookie,
+			unsigned long page_index,
 			unsigned int count,
 			bool plus,
 			int error
 		),
 
-		TP_ARGS(inode, cookie, dir_cookie, count, plus, error),
+		TP_ARGS(inode, cookie, dir_cookie, page_index, count, plus, error),
 
 		TP_STRUCT__entry(
 			__field(dev_t, dev)
@@ -808,6 +812,7 @@
 			__field(unsigned long, error)
 			__field(u64, cookie)
 			__field(u64, dir_cookie)
+			__field(unsigned long, page_index)
 			__field(u64, count)
 			__field(bool, plus)
 		),
@@ -819,16 +824,17 @@
 			__entry->error = error;
 			__entry->cookie = cookie;
 			__entry->dir_cookie = dir_cookie;
+			__entry->page_index = page_index;
 			__entry->count = count;
 			__entry->plus = plus;
 		),
 
 		TP_printk(
-			"error=%ld fileid=%02x:%02x:%llu fhandle=0x%08x cookie=0x%08llx dir_cookie=0x%08llx count=%llu plus=%s",
+			"error=%ld fileid=%02x:%02x:%llu fhandle=0x%08x cookie=0x%08llx dir_cookie=0x%08llx page_index=0x%08lx count=%llu plus=%s",
 			__entry->error,
 			MAJOR(__entry->dev), MINOR(__entry->dev),
-			(unsigned long long)__entry->fileid,
-			__entry->fhandle, __entry->cookie, __entry->dir_cookie,
+			(unsigned long long)__entry->fileid, __entry->fhandle,
+			__entry->cookie, __entry->dir_cookie, __entry->page_index,
 			__entry->count, __entry->plus ? "true" : "false"
 		)
 );
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 11/11] NFS: Bring back nfs_dir_mapping_need_revalidate() in nfs_readdir()
  2020-11-02 13:50 [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories Dave Wysochanski
                   ` (9 preceding siblings ...)
  2020-11-02 13:50 ` [PATCH 10/11] NFS: Add page_index to nfs_readdir enter and exit tracepoints Dave Wysochanski
@ 2020-11-02 13:50 ` Dave Wysochanski
  2020-11-02 15:38   ` Mkrtchyan, Tigran
  2020-11-02 14:27 ` [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories Chuck Lever
  2020-11-02 15:58 ` Mkrtchyan, Tigran
  12 siblings, 1 reply; 29+ messages in thread
From: Dave Wysochanski @ 2020-11-02 13:50 UTC (permalink / raw)
  To: Trond Myklebust, Anna Schumaker; +Cc: linux-nfs

commit 79f687a3de9e ("NFS: Fix a performance regression in readdir")
removed nfs_dir_mapping_need_revalidate() in an attempt to fix a
performance regression, but had the effect that with NFSv3 the
pagecache would never expire while a process was reading a directory.
An earlier patch addressed this problem by allowing the pagecache
to expire but the current process to continue using the last cookie
returned by the server when searching the pagecache, rather than
always starting at 0.  Thus, bring back nfs_dir_mapping_need_revalidate()
so that nfsi->cache_validity & NFS_INO_INVALID_DATA will be seen
and nfs_revalidate_mapping() will be called with NFSv3 as well,
making it behave the same as NFSv4.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
---
 fs/nfs/dir.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index cbd74cbdbb9f..aeb086fb3d0a 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -872,6 +872,17 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc)
 	return status;
 }
 
+static bool nfs_dir_mapping_need_revalidate(struct inode *dir)
+{
+	struct nfs_inode *nfsi = NFS_I(dir);
+
+	if (nfs_attribute_cache_expired(dir))
+		return true;
+	if (nfsi->cache_validity & NFS_INO_INVALID_DATA)
+		return true;
+	return false;
+}
+
 /* The file offset position represents the dirent entry number.  A
    last cookie cache takes care of the common case of reading the
    whole directory.
@@ -903,7 +914,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 	 * to either find the entry with the appropriate number or
 	 * revalidate the cookie.
 	 */
-	if (ctx->pos == 0 || nfs_attribute_cache_expired(inode))
+	if (ctx->pos == 0 || nfs_dir_mapping_need_revalidate(inode))
 		res = nfs_revalidate_mapping(inode, file->f_mapping);
 	if (res < 0)
 		goto out;
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories
  2020-11-02 13:50 [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories Dave Wysochanski
                   ` (10 preceding siblings ...)
  2020-11-02 13:50 ` [PATCH 11/11] NFS: Bring back nfs_dir_mapping_need_revalidate() in nfs_readdir() Dave Wysochanski
@ 2020-11-02 14:27 ` Chuck Lever
  2020-11-02 15:07   ` David Wysochanski
  2020-11-02 15:58 ` Mkrtchyan, Tigran
  12 siblings, 1 reply; 29+ messages in thread
From: Chuck Lever @ 2020-11-02 14:27 UTC (permalink / raw)
  To: Dave Wysochanski; +Cc: Trond Myklebust, Anna Schumaker, Linux NFS Mailing List



> On Nov 2, 2020, at 8:50 AM, Dave Wysochanski <dwysocha@redhat.com> wrote:
> 
> Note these patches are also on github at:
> https://github.com/DaveWysochanskiRH/kernel/tree/5.9-nfs-readdir
> 
> This patchset is mostly tracepoints but patch 9 fixes a performance
> problem with NFSv4.x when listing a directory that is being modified.
> After the NFSv4.x patch, for consistency, patch 11 is included which
> makes NFSv3 behave the same as NFSv4, and given the inclusion of
> patch 9 does not re-introduce the regression fixed in previous commit
> 79f687a3de9e.  As described in patch 11, NFSv3 does not currently have
> the NFSv4 problem because it does not drop the pagecache when a
> large directory read is in progress.

Hi Dave-

Replacing our community's deep dependence on wire capture is a good
thing.

My initial concern with the tracepoint patches is that there are
so many new tracepoints. In cases where we might want to enable all
of them on an intensive workload, that generates an enormous amount
of tracepoint traffic, which might overwhelm local file storage or
analysis tools.

Also, much of this infrastructure is useful for developers, but
how much will be used in the field? The point of adding permanent 
tracepoints should be to capture just enough to enable sustaining
engineers to diagnose problems remotely. Developers can add very
specific tracepoints as they need to, but also they can use tools
like systemtap or eBPF.

Boiled down, I'm wondering if there's a way to prune some of this.
Are the dprintk call sites you're replacing all that useful, for
instance?

And: Will it be easy to backport the fixes at the end? Maybe those
should go earlier in the patch set.


> As background, currently any process that is listing a directory
> must search starting at cookie 0 at each entry to nfs_readdir(),
> regardless of the previous cookie returned by the server, regardless of
> NFS version, and regardless of whether the directory is being modified
> or cache expired.  This means, for example, if a large directory listing
> requires many getdents64/nfs_readdir calls, the kernel will search
> through the pagecache with an increasing penalty as the size of the
> directory increases and the process attempts to obtain the latter
> entries of the directory.  When the directory is being modified, and
> when acdirmax is exceeded, and nfs_attribute_cache_expired() returns
> true, with NFSv4.x we drop the pagecache for the directory entries,
> so the process cannot make forward progress.
> 
> I investigated using a similar approach as was done with NFSv3 in
> commit 79f687a3de9e, but for NFSv4, READDIR does not return the directory
> attributes and thus _nfs4_proc_readdir does not call nfs_refresh_inode()
> unlike nfs3_proc_readdir(). With NFSv3, the calling of nfs_refresh_inode()
> is what updates nfsi->read_cache_jiffies causing nfs_attribute_cache_expired()
> to always return false, leaving the pagecache in tact despite
> NFS_INO_INVALID_DATA being set.  Thus it's not clear whether the NFSv3
> approach could be done for NFSv4 to achieve the same behavior as with
> NFSv3.  In addition, the current NFSv3 approach leaves in place the
> aforementioned penalty, which for large directories can be substantial.
> So rather than this approach, the approach taken with NFSv4 leaves
> in place the dropping of the pagecache when acdirmax expires, and
> relies on the fact that a process can trust the last cookie returned
> by the server and continue at that point in the pagecache without
> forcing the current process to start over from cookie 0 upon the
> next entry to nfs_readdir().  As far as I could tell there is no
> POSIX requirement to start from 0 again when a directory is being
> modified during an in-progress directory read, and so the current
> behavior is only an implementation limitation.
> 
> The NFSv4 performance problem can be seen by exporting a directory
> with a larger number of files such that the uncached time to list the
> directory exceeds acdirmax.  I have an explicit reproducer script
> I can provide, but a simple reproducer outline is as follows:
> 
> 1. First export a directory that contains enough files that the 
> listing exceeds acdirmax.  In my testing, the NFS server and client
> were the same system, there were 500,000 files in the directory,
> and I set acdirmin=10,acdirmax=20.
> 
> 2. Mount client1 (writer) and client2 (reader).  Note below I use
> the loopback IP address, and a 'wsize' parameter on the writer mount
> so we get different superblocks:
> mount -o vers=4.1 127.0.0.1:/export/bz1889478 $MNT1
> mount -o vers=4.1,wsize=65536 127.0.0.1:/export/bz1889478 $MNT2
> 
> 3. Start client1 (writer):
> echo 3 > /proc/sys/vm/drop_caches
> i=500000; while true; do touch $MNT2/file$i.bin; i=$((i + 1)); sleep 1; done > /dev/null 2>&1 &
> 
> 4. Start client2 (reader):
> while true; do time ls -1f $MNT1 | wc -l; done &
> 
> The output from my reproducer script is:
> ./t0_bz1889478.sh 4.1 127.0.0.1 500000
> Mon 02 Nov 2020 07:54:22 AM EST: Running with NFS vers=4.1 server 127.0.0.1 and files 500000 directory is idle
> Mon 02 Nov 2020 07:54:22 AM EST: mount -o vers=4.1 127.0.0.1:/export/bz1889478 /mnt/nfs/bz1889478-mnt1
> Mon 02 Nov 2020 07:54:22 AM EST: dropping caches with: echo 3 > /proc/sys/vm/drop_caches
> Mon 02 Nov 2020 07:54:24 AM EST: idle directory: skipping client2 adding files
> Mon 02 Nov 2020 07:54:24 AM EST: client1 running command: ls -lf /mnt/nfs/bz1889478-mnt1
> Mon 02 Nov 2020 07:54:24 AM EST: waiting for listing to complete
> Mon 02 Nov 2020 07:54:46 AM EST: client1 pid took 22 seconds to list the directory of 500000 files
> Mon 02 Nov 2020 07:54:46 AM EST: client1 READDIR ops total: 4902 (before = 0 after = 4902)
> Mon 02 Nov 2020 07:54:47 AM EST: umounting /mnt/nfs/bz1889478-mnt1 /mnt/nfs/bz1889478-mnt2
> Mon 02 Nov 2020 07:54:47 AM EST: Running with NFS vers=4.1 server 127.0.0.1 and files 500000 directory being modified
> Mon 02 Nov 2020 07:54:47 AM EST: mount -o vers=4.1 127.0.0.1:/export/bz1889478 /mnt/nfs/bz1889478-mnt1
> Mon 02 Nov 2020 07:54:47 AM EST: mount -o vers=4.1,wsize=65536 127.0.0.1:/export/bz1889478 /mnt/nfs/bz1889478-mnt2
> Mon 02 Nov 2020 07:54:47 AM EST: dropping caches with: echo 3 > /proc/sys/vm/drop_caches
> Mon 02 Nov 2020 07:54:50 AM EST: changing directory: client2 start adding 1 file/sec at /mnt/nfs/bz1889478-mnt2
> Mon 02 Nov 2020 07:54:51 AM EST: client1 running command: ls -lf /mnt/nfs/bz1889478-mnt1
> Mon 02 Nov 2020 07:54:51 AM EST: waiting for listing to complete
> Mon 02 Nov 2020 07:55:12 AM EST: client1 pid took 21 seconds to list the directory of 500000 files
> Mon 02 Nov 2020 07:55:12 AM EST: client1 READDIR ops total: 4902 (before = 0 after = 4902)
> ./t0_bz1889478.sh: line 124:  5973 Killed                  while true; do
>    echo $i; touch $NFS_CLIENT_MOUNTPOINT2/file$i.bin; i=$((i + 1)); sleep 1;
> done > /dev/null 2>&1
> Mon 02 Nov 2020 07:55:20 AM EST: umounting /mnt/nfs/bz1889478-mnt1 /mnt/nfs/bz1889478-mnt2
> PASSED TEST ./t0_bz1889478.sh on kernel 5.9.0-nfs-readdir+ with NFS vers=4.1
> 
> For diagnostics and verification, of course a tcpdump can be
> used, or even READDIR ops and time can be compared as in the
> reproducer, but also the included tracepoints can be used.  For
> the tracepoints, before step #2 above use the below trace-cmd
> to trace the listing and see whether the problem occurs or not,
> evidenced by either the presence of nfs_invalidate_mapping*
> trace events or multiple nfs_readdir_enter calls with
> "cookie=0x00000000":
> trace-cmd start -e nfs:nfs_readdir_enter -e nfs4:nfs4_readdir -e nfs:nfs_readdir_exit -e nfs:nfs_invalidate_mapping_*
> 
> 
> Dave Wysochanski (11):
>  NFSv4: Improve nfs4_readdir tracepoint by adding additional fields
>  NFS: Replace dfprintk statements with trace events in nfs_readdir
>  NFS: Move nfs_readdir_descriptor_t into internal header
>  NFS: Add tracepoints for functions involving nfs_readdir_descriptor_t
>  NFS: Add tracepoints for opendir, closedir, fsync_dir and llseek_dir
>  NFS: Add tracepoints for nfs_readdir_xdr_filler enter and exit
>  NFS: Add tracepoint to entry and exit of nfs_do_filldir
>  NFS: Replace LOOKUPCACHE dfprintk statements with tracepoints
>  NFS: Improve performance of listing directories being modified
>  NFS: Add page_index to nfs_readdir enter and exit tracepoints
>  NFS: Bring back nfs_dir_mapping_need_revalidate() in nfs_readdir()
> 
> fs/nfs/dir.c           | 101 +++++++++---------
> fs/nfs/internal.h      |  18 ++++
> fs/nfs/nfs3xdr.c       |   2 +-
> fs/nfs/nfs4proc.c      |   2 +-
> fs/nfs/nfs4trace.h     |  44 +++++++-
> fs/nfs/nfstrace.h      | 277 +++++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/nfs_fs.h |   1 +
> 7 files changed, 394 insertions(+), 51 deletions(-)
> 
> -- 
> 1.8.3.1
> 

--
Chuck Lever




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories
  2020-11-02 14:27 ` [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories Chuck Lever
@ 2020-11-02 15:07   ` David Wysochanski
  2020-11-02 15:13     ` Chuck Lever
  0 siblings, 1 reply; 29+ messages in thread
From: David Wysochanski @ 2020-11-02 15:07 UTC (permalink / raw)
  To: Chuck Lever; +Cc: Trond Myklebust, Anna Schumaker, Linux NFS Mailing List

On Mon, Nov 2, 2020 at 9:29 AM Chuck Lever <chuck.lever@oracle.com> wrote:
>
>
>
> > On Nov 2, 2020, at 8:50 AM, Dave Wysochanski <dwysocha@redhat.com> wrote:
> >
> > Note these patches are also on github at:
> > https://github.com/DaveWysochanskiRH/kernel/tree/5.9-nfs-readdir
> >
> > This patchset is mostly tracepoints but patch 9 fixes a performance
> > problem with NFSv4.x when listing a directory that is being modified.
> > After the NFSv4.x patch, for consistency, patch 11 is included which
> > makes NFSv3 behave the same as NFSv4, and given the inclusion of
> > patch 9 does not re-introduce the regression fixed in previous commit
> > 79f687a3de9e.  As described in patch 11, NFSv3 does not currently have
> > the NFSv4 problem because it does not drop the pagecache when a
> > large directory read is in progress.
>
> Hi Dave-
>
> Replacing our community's deep dependence on wire capture is a good
> thing.
>
Chuck, I agree and thanks for your early response here.

> My initial concern with the tracepoint patches is that there are
> so many new tracepoints. In cases where we might want to enable all
> of them on an intensive workload, that generates an enormous amount
> of tracepoint traffic, which might overwhelm local file storage or
> analysis tools.
>
> Also, much of this infrastructure is useful for developers, but
> how much will be used in the field? The point of adding permanent
> tracepoints should be to capture just enough to enable sustaining
> engineers to diagnose problems remotely. Developers can add very
> specific tracepoints as they need to, but also they can use tools
> like systemtap or eBPF.
>
> Boiled down, I'm wondering if there's a way to prune some of this.
> Are the dprintk call sites you're replacing all that useful, for
> instance?
>
Yes this is a good point.  I struggled with this too, I actually had even
more tracepoints than was included here, and even looked
at the size of nfs.ko before and after (it is about 22k difference).
# size /lib/modules/$(uname -r)/kernel/fs/nfs/nfs.ko
   text    data     bss     dec     hex filename
 250820   67272     408  318500   4dc24
/lib/modules/5.9.0-nfs-readdir+/kernel/fs/nfs/nfs.ko
# size /lib/modules/5.9.0/kernel/fs/nfs/nfs.ko
   text    data     bss     dec     hex filename
 240232   55496     408  296136   484c8 /lib/modules/5.9.0/kernel/fs/nfs/nfs.ko

I can surely take another look at this and see what are the most
important, but off the top of my head, I'm thinking nfs_readdir() entry
and exit, as well as tracing the actual READDIR ops, probably something
searching the NFS readdir cache.  Are there tracepoints that look
unnecessary to you, or are you mainly asking me for more justification
for each one?

> And: Will it be easy to backport the fixes at the end? Maybe those
> should go earlier in the patch set.
>
Yes it could be done that way.  I also thought of separating the tracing
patches into one set, then the fixes another, but decided on one post
maybe due to the intro patch which ties everything together.

FWIW, there is only one tiny conflicts on 5.9 with the two fixes and
it's with patch 9:
863c92f4328a... NFS: Improve performance of listing directories being modified
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@@ -949,7 -941,9 +952,13 @@@ static int nfs_readdir(struct file *fil
  out:
        if (res > 0)
                res = 0;
++<<<<<<< HEAD
 +      dfprintk(FILE, "NFS: readdir(%pD2) returns %d\n", file, res);
++=======
+       dir_ctx->page_index = desc->page_index;
+       trace_nfs_readdir_exit(inode, ctx->pos, dir_ctx->dir_cookie,
+                              NFS_SERVER(inode)->dtsize, my_desc.plus, res);
++>>>>>>> 863c92f4328a... NFS: Improve performance of listing
directories being modified
        return res;
  }


>
> > As background, currently any process that is listing a directory
> > must search starting at cookie 0 at each entry to nfs_readdir(),
> > regardless of the previous cookie returned by the server, regardless of
> > NFS version, and regardless of whether the directory is being modified
> > or cache expired.  This means, for example, if a large directory listing
> > requires many getdents64/nfs_readdir calls, the kernel will search
> > through the pagecache with an increasing penalty as the size of the
> > directory increases and the process attempts to obtain the latter
> > entries of the directory.  When the directory is being modified, and
> > when acdirmax is exceeded, and nfs_attribute_cache_expired() returns
> > true, with NFSv4.x we drop the pagecache for the directory entries,
> > so the process cannot make forward progress.
> >
> > I investigated using a similar approach as was done with NFSv3 in
> > commit 79f687a3de9e, but for NFSv4, READDIR does not return the directory
> > attributes and thus _nfs4_proc_readdir does not call nfs_refresh_inode()
> > unlike nfs3_proc_readdir(). With NFSv3, the calling of nfs_refresh_inode()
> > is what updates nfsi->read_cache_jiffies causing nfs_attribute_cache_expired()
> > to always return false, leaving the pagecache in tact despite
> > NFS_INO_INVALID_DATA being set.  Thus it's not clear whether the NFSv3
> > approach could be done for NFSv4 to achieve the same behavior as with
> > NFSv3.  In addition, the current NFSv3 approach leaves in place the
> > aforementioned penalty, which for large directories can be substantial.
> > So rather than this approach, the approach taken with NFSv4 leaves
> > in place the dropping of the pagecache when acdirmax expires, and
> > relies on the fact that a process can trust the last cookie returned
> > by the server and continue at that point in the pagecache without
> > forcing the current process to start over from cookie 0 upon the
> > next entry to nfs_readdir().  As far as I could tell there is no
> > POSIX requirement to start from 0 again when a directory is being
> > modified during an in-progress directory read, and so the current
> > behavior is only an implementation limitation.
> >
> > The NFSv4 performance problem can be seen by exporting a directory
> > with a larger number of files such that the uncached time to list the
> > directory exceeds acdirmax.  I have an explicit reproducer script
> > I can provide, but a simple reproducer outline is as follows:
> >
> > 1. First export a directory that contains enough files that the
> > listing exceeds acdirmax.  In my testing, the NFS server and client
> > were the same system, there were 500,000 files in the directory,
> > and I set acdirmin=10,acdirmax=20.
> >
> > 2. Mount client1 (writer) and client2 (reader).  Note below I use
> > the loopback IP address, and a 'wsize' parameter on the writer mount
> > so we get different superblocks:
> > mount -o vers=4.1 127.0.0.1:/export/bz1889478 $MNT1
> > mount -o vers=4.1,wsize=65536 127.0.0.1:/export/bz1889478 $MNT2
> >
> > 3. Start client1 (writer):
> > echo 3 > /proc/sys/vm/drop_caches
> > i=500000; while true; do touch $MNT2/file$i.bin; i=$((i + 1)); sleep 1; done > /dev/null 2>&1 &
> >
> > 4. Start client2 (reader):
> > while true; do time ls -1f $MNT1 | wc -l; done &
> >
> > The output from my reproducer script is:
> > ./t0_bz1889478.sh 4.1 127.0.0.1 500000
> > Mon 02 Nov 2020 07:54:22 AM EST: Running with NFS vers=4.1 server 127.0.0.1 and files 500000 directory is idle
> > Mon 02 Nov 2020 07:54:22 AM EST: mount -o vers=4.1 127.0.0.1:/export/bz1889478 /mnt/nfs/bz1889478-mnt1
> > Mon 02 Nov 2020 07:54:22 AM EST: dropping caches with: echo 3 > /proc/sys/vm/drop_caches
> > Mon 02 Nov 2020 07:54:24 AM EST: idle directory: skipping client2 adding files
> > Mon 02 Nov 2020 07:54:24 AM EST: client1 running command: ls -lf /mnt/nfs/bz1889478-mnt1
> > Mon 02 Nov 2020 07:54:24 AM EST: waiting for listing to complete
> > Mon 02 Nov 2020 07:54:46 AM EST: client1 pid took 22 seconds to list the directory of 500000 files
> > Mon 02 Nov 2020 07:54:46 AM EST: client1 READDIR ops total: 4902 (before = 0 after = 4902)
> > Mon 02 Nov 2020 07:54:47 AM EST: umounting /mnt/nfs/bz1889478-mnt1 /mnt/nfs/bz1889478-mnt2
> > Mon 02 Nov 2020 07:54:47 AM EST: Running with NFS vers=4.1 server 127.0.0.1 and files 500000 directory being modified
> > Mon 02 Nov 2020 07:54:47 AM EST: mount -o vers=4.1 127.0.0.1:/export/bz1889478 /mnt/nfs/bz1889478-mnt1
> > Mon 02 Nov 2020 07:54:47 AM EST: mount -o vers=4.1,wsize=65536 127.0.0.1:/export/bz1889478 /mnt/nfs/bz1889478-mnt2
> > Mon 02 Nov 2020 07:54:47 AM EST: dropping caches with: echo 3 > /proc/sys/vm/drop_caches
> > Mon 02 Nov 2020 07:54:50 AM EST: changing directory: client2 start adding 1 file/sec at /mnt/nfs/bz1889478-mnt2
> > Mon 02 Nov 2020 07:54:51 AM EST: client1 running command: ls -lf /mnt/nfs/bz1889478-mnt1
> > Mon 02 Nov 2020 07:54:51 AM EST: waiting for listing to complete
> > Mon 02 Nov 2020 07:55:12 AM EST: client1 pid took 21 seconds to list the directory of 500000 files
> > Mon 02 Nov 2020 07:55:12 AM EST: client1 READDIR ops total: 4902 (before = 0 after = 4902)
> > ./t0_bz1889478.sh: line 124:  5973 Killed                  while true; do
> >    echo $i; touch $NFS_CLIENT_MOUNTPOINT2/file$i.bin; i=$((i + 1)); sleep 1;
> > done > /dev/null 2>&1
> > Mon 02 Nov 2020 07:55:20 AM EST: umounting /mnt/nfs/bz1889478-mnt1 /mnt/nfs/bz1889478-mnt2
> > PASSED TEST ./t0_bz1889478.sh on kernel 5.9.0-nfs-readdir+ with NFS vers=4.1
> >
> > For diagnostics and verification, of course a tcpdump can be
> > used, or even READDIR ops and time can be compared as in the
> > reproducer, but also the included tracepoints can be used.  For
> > the tracepoints, before step #2 above use the below trace-cmd
> > to trace the listing and see whether the problem occurs or not,
> > evidenced by either the presence of nfs_invalidate_mapping*
> > trace events or multiple nfs_readdir_enter calls with
> > "cookie=0x00000000":
> > trace-cmd start -e nfs:nfs_readdir_enter -e nfs4:nfs4_readdir -e nfs:nfs_readdir_exit -e nfs:nfs_invalidate_mapping_*
> >
> >
> > Dave Wysochanski (11):
> >  NFSv4: Improve nfs4_readdir tracepoint by adding additional fields
> >  NFS: Replace dfprintk statements with trace events in nfs_readdir
> >  NFS: Move nfs_readdir_descriptor_t into internal header
> >  NFS: Add tracepoints for functions involving nfs_readdir_descriptor_t
> >  NFS: Add tracepoints for opendir, closedir, fsync_dir and llseek_dir
> >  NFS: Add tracepoints for nfs_readdir_xdr_filler enter and exit
> >  NFS: Add tracepoint to entry and exit of nfs_do_filldir
> >  NFS: Replace LOOKUPCACHE dfprintk statements with tracepoints
> >  NFS: Improve performance of listing directories being modified
> >  NFS: Add page_index to nfs_readdir enter and exit tracepoints
> >  NFS: Bring back nfs_dir_mapping_need_revalidate() in nfs_readdir()
> >
> > fs/nfs/dir.c           | 101 +++++++++---------
> > fs/nfs/internal.h      |  18 ++++
> > fs/nfs/nfs3xdr.c       |   2 +-
> > fs/nfs/nfs4proc.c      |   2 +-
> > fs/nfs/nfs4trace.h     |  44 +++++++-
> > fs/nfs/nfstrace.h      | 277 +++++++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/nfs_fs.h |   1 +
> > 7 files changed, 394 insertions(+), 51 deletions(-)
> >
> > --
> > 1.8.3.1
> >
>
> --
> Chuck Lever
>
>
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories
  2020-11-02 15:07   ` David Wysochanski
@ 2020-11-02 15:13     ` Chuck Lever
  0 siblings, 0 replies; 29+ messages in thread
From: Chuck Lever @ 2020-11-02 15:13 UTC (permalink / raw)
  To: David Wysochanski; +Cc: Trond Myklebust, Anna Schumaker, Linux NFS Mailing List



> On Nov 2, 2020, at 10:07 AM, David Wysochanski <dwysocha@redhat.com> wrote:
> 
> On Mon, Nov 2, 2020 at 9:29 AM Chuck Lever <chuck.lever@oracle.com> wrote:
>> 
>> 
>> 
>>> On Nov 2, 2020, at 8:50 AM, Dave Wysochanski <dwysocha@redhat.com> wrote:
>>> 
>>> Note these patches are also on github at:
>>> https://github.com/DaveWysochanskiRH/kernel/tree/5.9-nfs-readdir
>>> 
>>> This patchset is mostly tracepoints but patch 9 fixes a performance
>>> problem with NFSv4.x when listing a directory that is being modified.
>>> After the NFSv4.x patch, for consistency, patch 11 is included which
>>> makes NFSv3 behave the same as NFSv4, and given the inclusion of
>>> patch 9 does not re-introduce the regression fixed in previous commit
>>> 79f687a3de9e.  As described in patch 11, NFSv3 does not currently have
>>> the NFSv4 problem because it does not drop the pagecache when a
>>> large directory read is in progress.
>> 
>> Hi Dave-
>> 
>> Replacing our community's deep dependence on wire capture is a good
>> thing.
>> 
> Chuck, I agree and thanks for your early response here.
> 
>> My initial concern with the tracepoint patches is that there are
>> so many new tracepoints. In cases where we might want to enable all
>> of them on an intensive workload, that generates an enormous amount
>> of tracepoint traffic, which might overwhelm local file storage or
>> analysis tools.
>> 
>> Also, much of this infrastructure is useful for developers, but
>> how much will be used in the field? The point of adding permanent
>> tracepoints should be to capture just enough to enable sustaining
>> engineers to diagnose problems remotely. Developers can add very
>> specific tracepoints as they need to, but also they can use tools
>> like systemtap or eBPF.
>> 
>> Boiled down, I'm wondering if there's a way to prune some of this.
>> Are the dprintk call sites you're replacing all that useful, for
>> instance?
>> 
> Yes this is a good point.  I struggled with this too, I actually had even
> more tracepoints than was included here, and even looked
> at the size of nfs.ko before and after (it is about 22k difference).
> # size /lib/modules/$(uname -r)/kernel/fs/nfs/nfs.ko
>   text    data     bss     dec     hex filename
> 250820   67272     408  318500   4dc24
> /lib/modules/5.9.0-nfs-readdir+/kernel/fs/nfs/nfs.ko
> # size /lib/modules/5.9.0/kernel/fs/nfs/nfs.ko
>   text    data     bss     dec     hex filename
> 240232   55496     408  296136   484c8 /lib/modules/5.9.0/kernel/fs/nfs/nfs.ko
> 
> I can surely take another look at this and see what are the most
> important, but off the top of my head, I'm thinking nfs_readdir() entry
> and exit, as well as tracing the actual READDIR ops, probably something
> searching the NFS readdir cache.  Are there tracepoints that look
> unnecessary to you, or are you mainly asking me for more justification
> for each one?

Entry and exit tracepoints are usually easily replaced by
"function_graph" or "function" tracing or with eBPF scripts.
So those especially would need some justification for keeping
them in the form of permanent tracepoints.

IMO any place where the tracepoint is simply "I got here" is
a place where I would raise a question of whether it is needed
in common case troubleshooting.


>> And: Will it be easy to backport the fixes at the end? Maybe those
>> should go earlier in the patch set.
>> 
> Yes it could be done that way.  I also thought of separating the tracing
> patches into one set, then the fixes another, but decided on one post
> maybe due to the intro patch which ties everything together.
> 
> FWIW, there is only one tiny conflicts on 5.9 with the two fixes and
> it's with patch 9:
> 863c92f4328a... NFS: Improve performance of listing directories being modified
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@@ -949,7 -941,9 +952,13 @@@ static int nfs_readdir(struct file *fil
>  out:
>        if (res > 0)
>                res = 0;
> ++<<<<<<< HEAD
> +      dfprintk(FILE, "NFS: readdir(%pD2) returns %d\n", file, res);
> ++=======
> +       dir_ctx->page_index = desc->page_index;
> +       trace_nfs_readdir_exit(inode, ctx->pos, dir_ctx->dir_cookie,
> +                              NFS_SERVER(inode)->dtsize, my_desc.plus, res);
> ++>>>>>>> 863c92f4328a... NFS: Improve performance of listing
> directories being modified
>        return res;
>  }
> 
> 
>> 
>>> As background, currently any process that is listing a directory
>>> must search starting at cookie 0 at each entry to nfs_readdir(),
>>> regardless of the previous cookie returned by the server, regardless of
>>> NFS version, and regardless of whether the directory is being modified
>>> or cache expired.  This means, for example, if a large directory listing
>>> requires many getdents64/nfs_readdir calls, the kernel will search
>>> through the pagecache with an increasing penalty as the size of the
>>> directory increases and the process attempts to obtain the latter
>>> entries of the directory.  When the directory is being modified, and
>>> when acdirmax is exceeded, and nfs_attribute_cache_expired() returns
>>> true, with NFSv4.x we drop the pagecache for the directory entries,
>>> so the process cannot make forward progress.
>>> 
>>> I investigated using a similar approach as was done with NFSv3 in
>>> commit 79f687a3de9e, but for NFSv4, READDIR does not return the directory
>>> attributes and thus _nfs4_proc_readdir does not call nfs_refresh_inode()
>>> unlike nfs3_proc_readdir(). With NFSv3, the calling of nfs_refresh_inode()
>>> is what updates nfsi->read_cache_jiffies causing nfs_attribute_cache_expired()
>>> to always return false, leaving the pagecache in tact despite
>>> NFS_INO_INVALID_DATA being set.  Thus it's not clear whether the NFSv3
>>> approach could be done for NFSv4 to achieve the same behavior as with
>>> NFSv3.  In addition, the current NFSv3 approach leaves in place the
>>> aforementioned penalty, which for large directories can be substantial.
>>> So rather than this approach, the approach taken with NFSv4 leaves
>>> in place the dropping of the pagecache when acdirmax expires, and
>>> relies on the fact that a process can trust the last cookie returned
>>> by the server and continue at that point in the pagecache without
>>> forcing the current process to start over from cookie 0 upon the
>>> next entry to nfs_readdir().  As far as I could tell there is no
>>> POSIX requirement to start from 0 again when a directory is being
>>> modified during an in-progress directory read, and so the current
>>> behavior is only an implementation limitation.
>>> 
>>> The NFSv4 performance problem can be seen by exporting a directory
>>> with a larger number of files such that the uncached time to list the
>>> directory exceeds acdirmax.  I have an explicit reproducer script
>>> I can provide, but a simple reproducer outline is as follows:
>>> 
>>> 1. First export a directory that contains enough files that the
>>> listing exceeds acdirmax.  In my testing, the NFS server and client
>>> were the same system, there were 500,000 files in the directory,
>>> and I set acdirmin=10,acdirmax=20.
>>> 
>>> 2. Mount client1 (writer) and client2 (reader).  Note below I use
>>> the loopback IP address, and a 'wsize' parameter on the writer mount
>>> so we get different superblocks:
>>> mount -o vers=4.1 127.0.0.1:/export/bz1889478 $MNT1
>>> mount -o vers=4.1,wsize=65536 127.0.0.1:/export/bz1889478 $MNT2
>>> 
>>> 3. Start client1 (writer):
>>> echo 3 > /proc/sys/vm/drop_caches
>>> i=500000; while true; do touch $MNT2/file$i.bin; i=$((i + 1)); sleep 1; done > /dev/null 2>&1 &
>>> 
>>> 4. Start client2 (reader):
>>> while true; do time ls -1f $MNT1 | wc -l; done &
>>> 
>>> The output from my reproducer script is:
>>> ./t0_bz1889478.sh 4.1 127.0.0.1 500000
>>> Mon 02 Nov 2020 07:54:22 AM EST: Running with NFS vers=4.1 server 127.0.0.1 and files 500000 directory is idle
>>> Mon 02 Nov 2020 07:54:22 AM EST: mount -o vers=4.1 127.0.0.1:/export/bz1889478 /mnt/nfs/bz1889478-mnt1
>>> Mon 02 Nov 2020 07:54:22 AM EST: dropping caches with: echo 3 > /proc/sys/vm/drop_caches
>>> Mon 02 Nov 2020 07:54:24 AM EST: idle directory: skipping client2 adding files
>>> Mon 02 Nov 2020 07:54:24 AM EST: client1 running command: ls -lf /mnt/nfs/bz1889478-mnt1
>>> Mon 02 Nov 2020 07:54:24 AM EST: waiting for listing to complete
>>> Mon 02 Nov 2020 07:54:46 AM EST: client1 pid took 22 seconds to list the directory of 500000 files
>>> Mon 02 Nov 2020 07:54:46 AM EST: client1 READDIR ops total: 4902 (before = 0 after = 4902)
>>> Mon 02 Nov 2020 07:54:47 AM EST: umounting /mnt/nfs/bz1889478-mnt1 /mnt/nfs/bz1889478-mnt2
>>> Mon 02 Nov 2020 07:54:47 AM EST: Running with NFS vers=4.1 server 127.0.0.1 and files 500000 directory being modified
>>> Mon 02 Nov 2020 07:54:47 AM EST: mount -o vers=4.1 127.0.0.1:/export/bz1889478 /mnt/nfs/bz1889478-mnt1
>>> Mon 02 Nov 2020 07:54:47 AM EST: mount -o vers=4.1,wsize=65536 127.0.0.1:/export/bz1889478 /mnt/nfs/bz1889478-mnt2
>>> Mon 02 Nov 2020 07:54:47 AM EST: dropping caches with: echo 3 > /proc/sys/vm/drop_caches
>>> Mon 02 Nov 2020 07:54:50 AM EST: changing directory: client2 start adding 1 file/sec at /mnt/nfs/bz1889478-mnt2
>>> Mon 02 Nov 2020 07:54:51 AM EST: client1 running command: ls -lf /mnt/nfs/bz1889478-mnt1
>>> Mon 02 Nov 2020 07:54:51 AM EST: waiting for listing to complete
>>> Mon 02 Nov 2020 07:55:12 AM EST: client1 pid took 21 seconds to list the directory of 500000 files
>>> Mon 02 Nov 2020 07:55:12 AM EST: client1 READDIR ops total: 4902 (before = 0 after = 4902)
>>> ./t0_bz1889478.sh: line 124:  5973 Killed                  while true; do
>>>   echo $i; touch $NFS_CLIENT_MOUNTPOINT2/file$i.bin; i=$((i + 1)); sleep 1;
>>> done > /dev/null 2>&1
>>> Mon 02 Nov 2020 07:55:20 AM EST: umounting /mnt/nfs/bz1889478-mnt1 /mnt/nfs/bz1889478-mnt2
>>> PASSED TEST ./t0_bz1889478.sh on kernel 5.9.0-nfs-readdir+ with NFS vers=4.1
>>> 
>>> For diagnostics and verification, of course a tcpdump can be
>>> used, or even READDIR ops and time can be compared as in the
>>> reproducer, but also the included tracepoints can be used.  For
>>> the tracepoints, before step #2 above use the below trace-cmd
>>> to trace the listing and see whether the problem occurs or not,
>>> evidenced by either the presence of nfs_invalidate_mapping*
>>> trace events or multiple nfs_readdir_enter calls with
>>> "cookie=0x00000000":
>>> trace-cmd start -e nfs:nfs_readdir_enter -e nfs4:nfs4_readdir -e nfs:nfs_readdir_exit -e nfs:nfs_invalidate_mapping_*
>>> 
>>> 
>>> Dave Wysochanski (11):
>>> NFSv4: Improve nfs4_readdir tracepoint by adding additional fields
>>> NFS: Replace dfprintk statements with trace events in nfs_readdir
>>> NFS: Move nfs_readdir_descriptor_t into internal header
>>> NFS: Add tracepoints for functions involving nfs_readdir_descriptor_t
>>> NFS: Add tracepoints for opendir, closedir, fsync_dir and llseek_dir
>>> NFS: Add tracepoints for nfs_readdir_xdr_filler enter and exit
>>> NFS: Add tracepoint to entry and exit of nfs_do_filldir
>>> NFS: Replace LOOKUPCACHE dfprintk statements with tracepoints
>>> NFS: Improve performance of listing directories being modified
>>> NFS: Add page_index to nfs_readdir enter and exit tracepoints
>>> NFS: Bring back nfs_dir_mapping_need_revalidate() in nfs_readdir()
>>> 
>>> fs/nfs/dir.c           | 101 +++++++++---------
>>> fs/nfs/internal.h      |  18 ++++
>>> fs/nfs/nfs3xdr.c       |   2 +-
>>> fs/nfs/nfs4proc.c      |   2 +-
>>> fs/nfs/nfs4trace.h     |  44 +++++++-
>>> fs/nfs/nfstrace.h      | 277 +++++++++++++++++++++++++++++++++++++++++++++++++
>>> include/linux/nfs_fs.h |   1 +
>>> 7 files changed, 394 insertions(+), 51 deletions(-)
>>> 
>>> --
>>> 1.8.3.1
>>> 
>> 
>> --
>> Chuck Lever

--
Chuck Lever




^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/11] NFS: Bring back nfs_dir_mapping_need_revalidate() in nfs_readdir()
  2020-11-02 13:50 ` [PATCH 11/11] NFS: Bring back nfs_dir_mapping_need_revalidate() in nfs_readdir() Dave Wysochanski
@ 2020-11-02 15:38   ` Mkrtchyan, Tigran
  2020-11-02 16:16     ` David Wysochanski
  0 siblings, 1 reply; 29+ messages in thread
From: Mkrtchyan, Tigran @ 2020-11-02 15:38 UTC (permalink / raw)
  To: David Wysochanski; +Cc: trondmy, Anna Schumaker, linux-nfs

Hi Dave,

----- Original Message -----
> From: "David Wysochanski" <dwysocha@redhat.com>
> To: "trondmy" <trondmy@hammerspace.com>, "Anna Schumaker" <anna.schumaker@netapp.com>
> Cc: "linux-nfs" <linux-nfs@vger.kernel.org>
> Sent: Monday, 2 November, 2020 14:50:11
> Subject: [PATCH 11/11] NFS: Bring back nfs_dir_mapping_need_revalidate() in nfs_readdir()

> commit 79f687a3de9e ("NFS: Fix a performance regression in readdir")
> removed nfs_dir_mapping_need_revalidate() in an attempt to fix a
> performance regression, but had the effect that with NFSv3 the
> pagecache would never expire while a process was reading a directory.
> An earlier patch addressed this problem by allowing the pagecache
> to expire but the current process to continue using the last cookie
> returned by the server when searching the pagecache, rather than
> always starting at 0.  Thus, bring back nfs_dir_mapping_need_revalidate()
> so that nfsi->cache_validity & NFS_INO_INVALID_DATA will be seen
> and nfs_revalidate_mapping() will be called with NFSv3 as well,
> making it behave the same as NFSv4.
> 
> Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> ---
> fs/nfs/dir.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index cbd74cbdbb9f..aeb086fb3d0a 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -872,6 +872,17 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc)
> 	return status;
> }
> 
> +static bool nfs_dir_mapping_need_revalidate(struct inode *dir)
> +{
> +	struct nfs_inode *nfsi = NFS_I(dir);
> +
> +	if (nfs_attribute_cache_expired(dir))
> +		return true;
> +	if (nfsi->cache_validity & NFS_INO_INVALID_DATA)
> +		return true;
> +	return false;
> +}

Is this the same as:

static bool nfs_dir_mapping_need_revalidate(struct inode *dir)
{
        struct nfs_inode *nfsi = NFS_I(dir);

        return nfs_attribute_cache_expired(dir) || nfsi->cache_validity & NFS_INO_INVALID_DATA);
}

Tigran.

> +
> /* The file offset position represents the dirent entry number.  A
>    last cookie cache takes care of the common case of reading the
>    whole directory.
> @@ -903,7 +914,7 @@ static int nfs_readdir(struct file *file, struct dir_context
> *ctx)
> 	 * to either find the entry with the appropriate number or
> 	 * revalidate the cookie.
> 	 */
> -	if (ctx->pos == 0 || nfs_attribute_cache_expired(inode))
> +	if (ctx->pos == 0 || nfs_dir_mapping_need_revalidate(inode))
> 		res = nfs_revalidate_mapping(inode, file->f_mapping);
> 	if (res < 0)
> 		goto out;
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories
  2020-11-02 13:50 [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories Dave Wysochanski
                   ` (11 preceding siblings ...)
  2020-11-02 14:27 ` [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories Chuck Lever
@ 2020-11-02 15:58 ` Mkrtchyan, Tigran
  12 siblings, 0 replies; 29+ messages in thread
From: Mkrtchyan, Tigran @ 2020-11-02 15:58 UTC (permalink / raw)
  To: David Wysochanski; +Cc: trondmy, Anna Schumaker, linux-nfs


Hi Dave,

thanks for addressing this issue!

The readdir on large directory that's permanently changes is a pain in a b...
We have solved it by creating a snapshot of a directory on the server and
identifying this snapshot by cookie verifier. This is probably it not posix
compliant, but works for us.

I am happy to run some tests.

Tigran.


----- Original Message -----
> From: "David Wysochanski" <dwysocha@redhat.com>
> To: "trondmy" <trondmy@hammerspace.com>, "Anna Schumaker" <anna.schumaker@netapp.com>
> Cc: "linux-nfs" <linux-nfs@vger.kernel.org>
> Sent: Monday, 2 November, 2020 14:50:00
> Subject: [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories

> Note these patches are also on github at:
> https://github.com/DaveWysochanskiRH/kernel/tree/5.9-nfs-readdir
> 
> This patchset is mostly tracepoints but patch 9 fixes a performance
> problem with NFSv4.x when listing a directory that is being modified.
> After the NFSv4.x patch, for consistency, patch 11 is included which
> makes NFSv3 behave the same as NFSv4, and given the inclusion of
> patch 9 does not re-introduce the regression fixed in previous commit
> 79f687a3de9e.  As described in patch 11, NFSv3 does not currently have
> the NFSv4 problem because it does not drop the pagecache when a
> large directory read is in progress.
> 
> As background, currently any process that is listing a directory
> must search starting at cookie 0 at each entry to nfs_readdir(),
> regardless of the previous cookie returned by the server, regardless of
> NFS version, and regardless of whether the directory is being modified
> or cache expired.  This means, for example, if a large directory listing
> requires many getdents64/nfs_readdir calls, the kernel will search
> through the pagecache with an increasing penalty as the size of the
> directory increases and the process attempts to obtain the latter
> entries of the directory.  When the directory is being modified, and
> when acdirmax is exceeded, and nfs_attribute_cache_expired() returns
> true, with NFSv4.x we drop the pagecache for the directory entries,
> so the process cannot make forward progress.
> 
> I investigated using a similar approach as was done with NFSv3 in
> commit 79f687a3de9e, but for NFSv4, READDIR does not return the directory
> attributes and thus _nfs4_proc_readdir does not call nfs_refresh_inode()
> unlike nfs3_proc_readdir(). With NFSv3, the calling of nfs_refresh_inode()
> is what updates nfsi->read_cache_jiffies causing nfs_attribute_cache_expired()
> to always return false, leaving the pagecache in tact despite
> NFS_INO_INVALID_DATA being set.  Thus it's not clear whether the NFSv3
> approach could be done for NFSv4 to achieve the same behavior as with
> NFSv3.  In addition, the current NFSv3 approach leaves in place the
> aforementioned penalty, which for large directories can be substantial.
> So rather than this approach, the approach taken with NFSv4 leaves
> in place the dropping of the pagecache when acdirmax expires, and
> relies on the fact that a process can trust the last cookie returned
> by the server and continue at that point in the pagecache without
> forcing the current process to start over from cookie 0 upon the
> next entry to nfs_readdir().  As far as I could tell there is no
> POSIX requirement to start from 0 again when a directory is being
> modified during an in-progress directory read, and so the current
> behavior is only an implementation limitation.
> 
> The NFSv4 performance problem can be seen by exporting a directory
> with a larger number of files such that the uncached time to list the
> directory exceeds acdirmax.  I have an explicit reproducer script
> I can provide, but a simple reproducer outline is as follows:
> 
> 1. First export a directory that contains enough files that the
> listing exceeds acdirmax.  In my testing, the NFS server and client
> were the same system, there were 500,000 files in the directory,
> and I set acdirmin=10,acdirmax=20.
> 
> 2. Mount client1 (writer) and client2 (reader).  Note below I use
> the loopback IP address, and a 'wsize' parameter on the writer mount
> so we get different superblocks:
> mount -o vers=4.1 127.0.0.1:/export/bz1889478 $MNT1
> mount -o vers=4.1,wsize=65536 127.0.0.1:/export/bz1889478 $MNT2
> 
> 3. Start client1 (writer):
> echo 3 > /proc/sys/vm/drop_caches
> i=500000; while true; do touch $MNT2/file$i.bin; i=$((i + 1)); sleep 1; done >
> /dev/null 2>&1 &
> 
> 4. Start client2 (reader):
> while true; do time ls -1f $MNT1 | wc -l; done &
> 
> The output from my reproducer script is:
> ./t0_bz1889478.sh 4.1 127.0.0.1 500000
> Mon 02 Nov 2020 07:54:22 AM EST: Running with NFS vers=4.1 server 127.0.0.1 and
> files 500000 directory is idle
> Mon 02 Nov 2020 07:54:22 AM EST: mount -o vers=4.1 127.0.0.1:/export/bz1889478
> /mnt/nfs/bz1889478-mnt1
> Mon 02 Nov 2020 07:54:22 AM EST: dropping caches with: echo 3 >
> /proc/sys/vm/drop_caches
> Mon 02 Nov 2020 07:54:24 AM EST: idle directory: skipping client2 adding files
> Mon 02 Nov 2020 07:54:24 AM EST: client1 running command: ls -lf
> /mnt/nfs/bz1889478-mnt1
> Mon 02 Nov 2020 07:54:24 AM EST: waiting for listing to complete
> Mon 02 Nov 2020 07:54:46 AM EST: client1 pid took 22 seconds to list the
> directory of 500000 files
> Mon 02 Nov 2020 07:54:46 AM EST: client1 READDIR ops total: 4902 (before = 0
> after = 4902)
> Mon 02 Nov 2020 07:54:47 AM EST: umounting /mnt/nfs/bz1889478-mnt1
> /mnt/nfs/bz1889478-mnt2
> Mon 02 Nov 2020 07:54:47 AM EST: Running with NFS vers=4.1 server 127.0.0.1 and
> files 500000 directory being modified
> Mon 02 Nov 2020 07:54:47 AM EST: mount -o vers=4.1 127.0.0.1:/export/bz1889478
> /mnt/nfs/bz1889478-mnt1
> Mon 02 Nov 2020 07:54:47 AM EST: mount -o vers=4.1,wsize=65536
> 127.0.0.1:/export/bz1889478 /mnt/nfs/bz1889478-mnt2
> Mon 02 Nov 2020 07:54:47 AM EST: dropping caches with: echo 3 >
> /proc/sys/vm/drop_caches
> Mon 02 Nov 2020 07:54:50 AM EST: changing directory: client2 start adding 1
> file/sec at /mnt/nfs/bz1889478-mnt2
> Mon 02 Nov 2020 07:54:51 AM EST: client1 running command: ls -lf
> /mnt/nfs/bz1889478-mnt1
> Mon 02 Nov 2020 07:54:51 AM EST: waiting for listing to complete
> Mon 02 Nov 2020 07:55:12 AM EST: client1 pid took 21 seconds to list the
> directory of 500000 files
> Mon 02 Nov 2020 07:55:12 AM EST: client1 READDIR ops total: 4902 (before = 0
> after = 4902)
> ./t0_bz1889478.sh: line 124:  5973 Killed                  while true; do
>    echo $i; touch $NFS_CLIENT_MOUNTPOINT2/file$i.bin; i=$((i + 1)); sleep 1;
> done > /dev/null 2>&1
> Mon 02 Nov 2020 07:55:20 AM EST: umounting /mnt/nfs/bz1889478-mnt1
> /mnt/nfs/bz1889478-mnt2
> PASSED TEST ./t0_bz1889478.sh on kernel 5.9.0-nfs-readdir+ with NFS vers=4.1
> 
> For diagnostics and verification, of course a tcpdump can be
> used, or even READDIR ops and time can be compared as in the
> reproducer, but also the included tracepoints can be used.  For
> the tracepoints, before step #2 above use the below trace-cmd
> to trace the listing and see whether the problem occurs or not,
> evidenced by either the presence of nfs_invalidate_mapping*
> trace events or multiple nfs_readdir_enter calls with
> "cookie=0x00000000":
> trace-cmd start -e nfs:nfs_readdir_enter -e nfs4:nfs4_readdir -e
> nfs:nfs_readdir_exit -e nfs:nfs_invalidate_mapping_*
> 
> 
> Dave Wysochanski (11):
>  NFSv4: Improve nfs4_readdir tracepoint by adding additional fields
>  NFS: Replace dfprintk statements with trace events in nfs_readdir
>  NFS: Move nfs_readdir_descriptor_t into internal header
>  NFS: Add tracepoints for functions involving nfs_readdir_descriptor_t
>  NFS: Add tracepoints for opendir, closedir, fsync_dir and llseek_dir
>  NFS: Add tracepoints for nfs_readdir_xdr_filler enter and exit
>  NFS: Add tracepoint to entry and exit of nfs_do_filldir
>  NFS: Replace LOOKUPCACHE dfprintk statements with tracepoints
>  NFS: Improve performance of listing directories being modified
>  NFS: Add page_index to nfs_readdir enter and exit tracepoints
>  NFS: Bring back nfs_dir_mapping_need_revalidate() in nfs_readdir()
> 
> fs/nfs/dir.c           | 101 +++++++++---------
> fs/nfs/internal.h      |  18 ++++
> fs/nfs/nfs3xdr.c       |   2 +-
> fs/nfs/nfs4proc.c      |   2 +-
> fs/nfs/nfs4trace.h     |  44 +++++++-
> fs/nfs/nfstrace.h      | 277 +++++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/nfs_fs.h |   1 +
> 7 files changed, 394 insertions(+), 51 deletions(-)
> 
> --
> 1.8.3.1

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/11] NFS: Bring back nfs_dir_mapping_need_revalidate() in nfs_readdir()
  2020-11-02 15:38   ` Mkrtchyan, Tigran
@ 2020-11-02 16:16     ` David Wysochanski
  0 siblings, 0 replies; 29+ messages in thread
From: David Wysochanski @ 2020-11-02 16:16 UTC (permalink / raw)
  To: Mkrtchyan, Tigran; +Cc: trondmy, Anna Schumaker, linux-nfs

On Mon, Nov 2, 2020 at 10:48 AM Mkrtchyan, Tigran
<tigran.mkrtchyan@desy.de> wrote:
>
> Hi Dave,
>
> ----- Original Message -----
> > From: "David Wysochanski" <dwysocha@redhat.com>
> > To: "trondmy" <trondmy@hammerspace.com>, "Anna Schumaker" <anna.schumaker@netapp.com>
> > Cc: "linux-nfs" <linux-nfs@vger.kernel.org>
> > Sent: Monday, 2 November, 2020 14:50:11
> > Subject: [PATCH 11/11] NFS: Bring back nfs_dir_mapping_need_revalidate() in nfs_readdir()
>
> > commit 79f687a3de9e ("NFS: Fix a performance regression in readdir")
> > removed nfs_dir_mapping_need_revalidate() in an attempt to fix a
> > performance regression, but had the effect that with NFSv3 the
> > pagecache would never expire while a process was reading a directory.
> > An earlier patch addressed this problem by allowing the pagecache
> > to expire but the current process to continue using the last cookie
> > returned by the server when searching the pagecache, rather than
> > always starting at 0.  Thus, bring back nfs_dir_mapping_need_revalidate()
> > so that nfsi->cache_validity & NFS_INO_INVALID_DATA will be seen
> > and nfs_revalidate_mapping() will be called with NFSv3 as well,
> > making it behave the same as NFSv4.
> >
> > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> > ---
> > fs/nfs/dir.c | 13 ++++++++++++-
> > 1 file changed, 12 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > index cbd74cbdbb9f..aeb086fb3d0a 100644
> > --- a/fs/nfs/dir.c
> > +++ b/fs/nfs/dir.c
> > @@ -872,6 +872,17 @@ int uncached_readdir(nfs_readdir_descriptor_t *desc)
> >       return status;
> > }
> >
> > +static bool nfs_dir_mapping_need_revalidate(struct inode *dir)
> > +{
> > +     struct nfs_inode *nfsi = NFS_I(dir);
> > +
> > +     if (nfs_attribute_cache_expired(dir))
> > +             return true;
> > +     if (nfsi->cache_validity & NFS_INO_INVALID_DATA)
> > +             return true;
> > +     return false;
> > +}
>
> Is this the same as:
>
> static bool nfs_dir_mapping_need_revalidate(struct inode *dir)
> {
>         struct nfs_inode *nfsi = NFS_I(dir);
>
>         return nfs_attribute_cache_expired(dir) || nfsi->cache_validity & NFS_INO_INVALID_DATA);
> }
>
Yes that's a nice simplification!

> Tigran.
>
> > +
> > /* The file offset position represents the dirent entry number.  A
> >    last cookie cache takes care of the common case of reading the
> >    whole directory.
> > @@ -903,7 +914,7 @@ static int nfs_readdir(struct file *file, struct dir_context
> > *ctx)
> >        * to either find the entry with the appropriate number or
> >        * revalidate the cookie.
> >        */
> > -     if (ctx->pos == 0 || nfs_attribute_cache_expired(inode))
> > +     if (ctx->pos == 0 || nfs_dir_mapping_need_revalidate(inode))
> >               res = nfs_revalidate_mapping(inode, file->f_mapping);
> >       if (res < 0)
> >               goto out;
> > --
> > 1.8.3.1
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 09/11] NFS: Improve performance of listing directories being modified
  2020-11-02 13:50 ` [PATCH 09/11] NFS: Improve performance of listing directories being modified Dave Wysochanski
@ 2020-11-02 16:21   ` Trond Myklebust
  2020-11-02 16:26     ` David Wysochanski
  0 siblings, 1 reply; 29+ messages in thread
From: Trond Myklebust @ 2020-11-02 16:21 UTC (permalink / raw)
  To: anna.schumaker, dwysocha; +Cc: linux-nfs

On Mon, 2020-11-02 at 08:50 -0500, Dave Wysochanski wrote:
> A process can hang forever to 'ls -l' a directory while the directory
> is being modified such as another NFS client adding files to the
> directory.  The problem is seen specifically with larger directories
> (I tested with 1 million) and/or slower NFS server responses to
> READDIR.  If a combination of the NFS directory size, the NFS server
> responses to READDIR is such that the 'ls' process gets partially
> through the listing before the attribute cache expires (time
> exceeds acdirmax), we drop the pagecache and have to re-fill it,
> and as a result, the process may never complete.  One could argue
> for larger directories the acdirmin/acdirmax should be increased,
> but it's not always possible to tune this effectively.
> 
> The root cause of this problem is due to how the NFS readdir cache
> currently works.  The main search function,
> readdir_search_pagecache(),
> always starts searching at page_index and cookie == 0, and for any
> page not in the cache, fills in the page with entries obtained in
> a READDIR NFS call.  If a page already exists, we proceed to
> nfs_readdir_search_for_cookie(), which searches for the cookie
> (pos) of the readdir call.  The search is O(n), where n is the
> directory size before the cookie in question is found, and every
> entry to nfs_readdir() pays this penalty, irrespective of the
> current directory position (dir_context.pos).  The search is
> expensive due to the opaque nature of readdir cookies, and the fact
> that no mapping (hash) exists from cookies to pages.  In the case
> of a directory being modified, the above behavior can become an
> excessive penalty, since the same process is forced to fill pages it
> may be no longer interested in (the entries were passed in a previous
> nfs_readdir call), and this can essentially lead no forward progress.
> 
> To fix this problem, at the end of nfs_readdir(), save the page_index
> corresponding to the directory position (cookie) inside the process's
> nfs_open_dir_context.  Then at the next entry of nfs_readdir(), use
> the saved page_index as the starting search point rather than
> starting
> at page_index == 0.  Not only does this fix the problem of listing
> a directory being modified, it also significantly improves
> performance
> in the unmodified case since no extra search penalty is paid at each
> entry to nfs_readdir().
> 
> In the case of lseek, since there is no hash or other mapping from a
> cookie value to the page->index, just reset
> nfs_open_dir_context.page_index
> to 0, which will reset the search to the old behavior.
> 
> Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> ---
>  fs/nfs/dir.c           | 8 +++++++-
>  include/linux/nfs_fs.h | 1 +
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index 52e06c8fc7cd..b266f505b521 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -78,6 +78,7 @@ static struct nfs_open_dir_context
> *alloc_nfs_open_dir_context(struct inode *dir
>                 ctx->attr_gencount = nfsi->attr_gencount;
>                 ctx->dir_cookie = 0;
>                 ctx->dup_cookie = 0;
> +               ctx->page_index = 0;
>                 ctx->cred = get_cred(cred);
>                 spin_lock(&dir->i_lock);
>                 if (list_empty(&nfsi->open_files) &&
> @@ -763,7 +764,7 @@ int
> find_and_lock_cache_page(nfs_readdir_descriptor_t *desc)
>         return res;
>  }
>  
> -/* Search for desc->dir_cookie from the beginning of the page cache
> */
> +/* Search for desc->dir_cookie starting at desc->page_index */
>  static inline
>  int readdir_search_pagecache(nfs_readdir_descriptor_t *desc)
>  {
> @@ -885,6 +886,8 @@ static int nfs_readdir(struct file *file, struct
> dir_context *ctx)
>                 .ctx = ctx,
>                 .dir_cookie = &dir_ctx->dir_cookie,
>                 .plus = nfs_use_readdirplus(inode, ctx),
> +               .page_index = dir_ctx->page_index,
> +               .last_cookie = nfs_readdir_use_cookie(file) ? ctx-
> >pos : 0,
>         },
>                         *desc = &my_desc;
>         int res = 0;
> @@ -938,6 +941,7 @@ static int nfs_readdir(struct file *file, struct
> dir_context *ctx)
>  out:
>         if (res > 0)
>                 res = 0;
> +       dir_ctx->page_index = desc->page_index;
>         trace_nfs_readdir_exit(inode, ctx->pos, dir_ctx->dir_cookie,
>                                NFS_SERVER(inode)->dtsize,
> my_desc.plus, res);
>         return res;
> @@ -975,6 +979,8 @@ static loff_t nfs_llseek_dir(struct file *filp,
> loff_t offset, int whence)
>                 else
>                         dir_ctx->dir_cookie = 0;
>                 dir_ctx->duped = 0;
> +               /* Force readdir_search_pagecache to start over */
> +               dir_ctx->page_index = 0;
>         }
>         inode_unlock(inode);
>         return offset;
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index a2c6455ea3fa..0e55c0154ccd 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -93,6 +93,7 @@ struct nfs_open_dir_context {
>         __u64 dir_cookie;
>         __u64 dup_cookie;
>         signed char duped;
> +       unsigned long   page_index;
>  };
>  
>  /*

NACK. It makes no sense to store the page index as a cursor.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 09/11] NFS: Improve performance of listing directories being modified
  2020-11-02 16:21   ` Trond Myklebust
@ 2020-11-02 16:26     ` David Wysochanski
  2020-11-02 17:31       ` Trond Myklebust
  0 siblings, 1 reply; 29+ messages in thread
From: David Wysochanski @ 2020-11-02 16:26 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: anna.schumaker, linux-nfs

On Mon, Nov 2, 2020 at 11:22 AM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> On Mon, 2020-11-02 at 08:50 -0500, Dave Wysochanski wrote:
> > A process can hang forever to 'ls -l' a directory while the directory
> > is being modified such as another NFS client adding files to the
> > directory.  The problem is seen specifically with larger directories
> > (I tested with 1 million) and/or slower NFS server responses to
> > READDIR.  If a combination of the NFS directory size, the NFS server
> > responses to READDIR is such that the 'ls' process gets partially
> > through the listing before the attribute cache expires (time
> > exceeds acdirmax), we drop the pagecache and have to re-fill it,
> > and as a result, the process may never complete.  One could argue
> > for larger directories the acdirmin/acdirmax should be increased,
> > but it's not always possible to tune this effectively.
> >
> > The root cause of this problem is due to how the NFS readdir cache
> > currently works.  The main search function,
> > readdir_search_pagecache(),
> > always starts searching at page_index and cookie == 0, and for any
> > page not in the cache, fills in the page with entries obtained in
> > a READDIR NFS call.  If a page already exists, we proceed to
> > nfs_readdir_search_for_cookie(), which searches for the cookie
> > (pos) of the readdir call.  The search is O(n), where n is the
> > directory size before the cookie in question is found, and every
> > entry to nfs_readdir() pays this penalty, irrespective of the
> > current directory position (dir_context.pos).  The search is
> > expensive due to the opaque nature of readdir cookies, and the fact
> > that no mapping (hash) exists from cookies to pages.  In the case
> > of a directory being modified, the above behavior can become an
> > excessive penalty, since the same process is forced to fill pages it
> > may be no longer interested in (the entries were passed in a previous
> > nfs_readdir call), and this can essentially lead no forward progress.
> >
> > To fix this problem, at the end of nfs_readdir(), save the page_index
> > corresponding to the directory position (cookie) inside the process's
> > nfs_open_dir_context.  Then at the next entry of nfs_readdir(), use
> > the saved page_index as the starting search point rather than
> > starting
> > at page_index == 0.  Not only does this fix the problem of listing
> > a directory being modified, it also significantly improves
> > performance
> > in the unmodified case since no extra search penalty is paid at each
> > entry to nfs_readdir().
> >
> > In the case of lseek, since there is no hash or other mapping from a
> > cookie value to the page->index, just reset
> > nfs_open_dir_context.page_index
> > to 0, which will reset the search to the old behavior.
> >
> > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> > ---
> >  fs/nfs/dir.c           | 8 +++++++-
> >  include/linux/nfs_fs.h | 1 +
> >  2 files changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > index 52e06c8fc7cd..b266f505b521 100644
> > --- a/fs/nfs/dir.c
> > +++ b/fs/nfs/dir.c
> > @@ -78,6 +78,7 @@ static struct nfs_open_dir_context
> > *alloc_nfs_open_dir_context(struct inode *dir
> >                 ctx->attr_gencount = nfsi->attr_gencount;
> >                 ctx->dir_cookie = 0;
> >                 ctx->dup_cookie = 0;
> > +               ctx->page_index = 0;
> >                 ctx->cred = get_cred(cred);
> >                 spin_lock(&dir->i_lock);
> >                 if (list_empty(&nfsi->open_files) &&
> > @@ -763,7 +764,7 @@ int
> > find_and_lock_cache_page(nfs_readdir_descriptor_t *desc)
> >         return res;
> >  }
> >
> > -/* Search for desc->dir_cookie from the beginning of the page cache
> > */
> > +/* Search for desc->dir_cookie starting at desc->page_index */
> >  static inline
> >  int readdir_search_pagecache(nfs_readdir_descriptor_t *desc)
> >  {
> > @@ -885,6 +886,8 @@ static int nfs_readdir(struct file *file, struct
> > dir_context *ctx)
> >                 .ctx = ctx,
> >                 .dir_cookie = &dir_ctx->dir_cookie,
> >                 .plus = nfs_use_readdirplus(inode, ctx),
> > +               .page_index = dir_ctx->page_index,
> > +               .last_cookie = nfs_readdir_use_cookie(file) ? ctx-
> > >pos : 0,
> >         },
> >                         *desc = &my_desc;
> >         int res = 0;
> > @@ -938,6 +941,7 @@ static int nfs_readdir(struct file *file, struct
> > dir_context *ctx)
> >  out:
> >         if (res > 0)
> >                 res = 0;
> > +       dir_ctx->page_index = desc->page_index;
> >         trace_nfs_readdir_exit(inode, ctx->pos, dir_ctx->dir_cookie,
> >                                NFS_SERVER(inode)->dtsize,
> > my_desc.plus, res);
> >         return res;
> > @@ -975,6 +979,8 @@ static loff_t nfs_llseek_dir(struct file *filp,
> > loff_t offset, int whence)
> >                 else
> >                         dir_ctx->dir_cookie = 0;
> >                 dir_ctx->duped = 0;
> > +               /* Force readdir_search_pagecache to start over */
> > +               dir_ctx->page_index = 0;
> >         }
> >         inode_unlock(inode);
> >         return offset;
> > diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> > index a2c6455ea3fa..0e55c0154ccd 100644
> > --- a/include/linux/nfs_fs.h
> > +++ b/include/linux/nfs_fs.h
> > @@ -93,6 +93,7 @@ struct nfs_open_dir_context {
> >         __u64 dir_cookie;
> >         __u64 dup_cookie;
> >         signed char duped;
> > +       unsigned long   page_index;
> >  };
> >
> >  /*
>
> NACK. It makes no sense to store the page index as a cursor.
>

A similar thing was done recently with:
227823d2074d nfs: optimise readdir cache page invalidation


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 09/11] NFS: Improve performance of listing directories being modified
  2020-11-02 16:26     ` David Wysochanski
@ 2020-11-02 17:31       ` Trond Myklebust
  2020-11-02 19:45         ` David Wysochanski
  0 siblings, 1 reply; 29+ messages in thread
From: Trond Myklebust @ 2020-11-02 17:31 UTC (permalink / raw)
  To: dwysocha; +Cc: linux-nfs, anna.schumaker

On Mon, 2020-11-02 at 11:26 -0500, David Wysochanski wrote:
> On Mon, Nov 2, 2020 at 11:22 AM Trond Myklebust <
> trondmy@hammerspace.com> wrote:
> > 
> > On Mon, 2020-11-02 at 08:50 -0500, Dave Wysochanski wrote:
> > > A process can hang forever to 'ls -l' a directory while the
> > > directory
> > > is being modified such as another NFS client adding files to the
> > > directory.  The problem is seen specifically with larger
> > > directories
> > > (I tested with 1 million) and/or slower NFS server responses to
> > > READDIR.  If a combination of the NFS directory size, the NFS
> > > server
> > > responses to READDIR is such that the 'ls' process gets partially
> > > through the listing before the attribute cache expires (time
> > > exceeds acdirmax), we drop the pagecache and have to re-fill it,
> > > and as a result, the process may never complete.  One could argue
> > > for larger directories the acdirmin/acdirmax should be increased,
> > > but it's not always possible to tune this effectively.
> > > 
> > > The root cause of this problem is due to how the NFS readdir
> > > cache
> > > currently works.  The main search function,
> > > readdir_search_pagecache(),
> > > always starts searching at page_index and cookie == 0, and for
> > > any
> > > page not in the cache, fills in the page with entries obtained in
> > > a READDIR NFS call.  If a page already exists, we proceed to
> > > nfs_readdir_search_for_cookie(), which searches for the cookie
> > > (pos) of the readdir call.  The search is O(n), where n is the
> > > directory size before the cookie in question is found, and every
> > > entry to nfs_readdir() pays this penalty, irrespective of the
> > > current directory position (dir_context.pos).  The search is
> > > expensive due to the opaque nature of readdir cookies, and the
> > > fact
> > > that no mapping (hash) exists from cookies to pages.  In the case
> > > of a directory being modified, the above behavior can become an
> > > excessive penalty, since the same process is forced to fill pages
> > > it
> > > may be no longer interested in (the entries were passed in a
> > > previous
> > > nfs_readdir call), and this can essentially lead no forward
> > > progress.
> > > 
> > > To fix this problem, at the end of nfs_readdir(), save the
> > > page_index
> > > corresponding to the directory position (cookie) inside the
> > > process's
> > > nfs_open_dir_context.  Then at the next entry of nfs_readdir(),
> > > use
> > > the saved page_index as the starting search point rather than
> > > starting
> > > at page_index == 0.  Not only does this fix the problem of
> > > listing
> > > a directory being modified, it also significantly improves
> > > performance
> > > in the unmodified case since no extra search penalty is paid at
> > > each
> > > entry to nfs_readdir().
> > > 
> > > In the case of lseek, since there is no hash or other mapping
> > > from a
> > > cookie value to the page->index, just reset
> > > nfs_open_dir_context.page_index
> > > to 0, which will reset the search to the old behavior.
> > > 
> > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> > > ---
> > >  fs/nfs/dir.c           | 8 +++++++-
> > >  include/linux/nfs_fs.h | 1 +
> > >  2 files changed, 8 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > > index 52e06c8fc7cd..b266f505b521 100644
> > > --- a/fs/nfs/dir.c
> > > +++ b/fs/nfs/dir.c
> > > @@ -78,6 +78,7 @@ static struct nfs_open_dir_context
> > > *alloc_nfs_open_dir_context(struct inode *dir
> > >                 ctx->attr_gencount = nfsi->attr_gencount;
> > >                 ctx->dir_cookie = 0;
> > >                 ctx->dup_cookie = 0;
> > > +               ctx->page_index = 0;
> > >                 ctx->cred = get_cred(cred);
> > >                 spin_lock(&dir->i_lock);
> > >                 if (list_empty(&nfsi->open_files) &&
> > > @@ -763,7 +764,7 @@ int
> > > find_and_lock_cache_page(nfs_readdir_descriptor_t *desc)
> > >         return res;
> > >  }
> > > 
> > > -/* Search for desc->dir_cookie from the beginning of the page
> > > cache
> > > */
> > > +/* Search for desc->dir_cookie starting at desc->page_index */
> > >  static inline
> > >  int readdir_search_pagecache(nfs_readdir_descriptor_t *desc)
> > >  {
> > > @@ -885,6 +886,8 @@ static int nfs_readdir(struct file *file,
> > > struct
> > > dir_context *ctx)
> > >                 .ctx = ctx,
> > >                 .dir_cookie = &dir_ctx->dir_cookie,
> > >                 .plus = nfs_use_readdirplus(inode, ctx),
> > > +               .page_index = dir_ctx->page_index,
> > > +               .last_cookie = nfs_readdir_use_cookie(file) ?
> > > ctx-
> > > > pos : 0,
> > >         },
> > >                         *desc = &my_desc;
> > >         int res = 0;
> > > @@ -938,6 +941,7 @@ static int nfs_readdir(struct file *file,
> > > struct
> > > dir_context *ctx)
> > >  out:
> > >         if (res > 0)
> > >                 res = 0;
> > > +       dir_ctx->page_index = desc->page_index;
> > >         trace_nfs_readdir_exit(inode, ctx->pos, dir_ctx-
> > > >dir_cookie,
> > >                                NFS_SERVER(inode)->dtsize,
> > > my_desc.plus, res);
> > >         return res;
> > > @@ -975,6 +979,8 @@ static loff_t nfs_llseek_dir(struct file
> > > *filp,
> > > loff_t offset, int whence)
> > >                 else
> > >                         dir_ctx->dir_cookie = 0;
> > >                 dir_ctx->duped = 0;
> > > +               /* Force readdir_search_pagecache to start over
> > > */
> > > +               dir_ctx->page_index = 0;
> > >         }
> > >         inode_unlock(inode);
> > >         return offset;
> > > diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> > > index a2c6455ea3fa..0e55c0154ccd 100644
> > > --- a/include/linux/nfs_fs.h
> > > +++ b/include/linux/nfs_fs.h
> > > @@ -93,6 +93,7 @@ struct nfs_open_dir_context {
> > >         __u64 dir_cookie;
> > >         __u64 dup_cookie;
> > >         signed char duped;
> > > +       unsigned long   page_index;
> > >  };
> > > 
> > >  /*
> > 
> > NACK. It makes no sense to store the page index as a cursor.
> > 
> 
> A similar thing was done recently with:
> 227823d2074d nfs: optimise readdir cache page invalidation
> 

That's a very different thing. It is about discarding page data in
order to force a re-read of the contents into cache.

What you're doing is basically trying to guess where the data is
located. which might work in some cases where the directory is
completely static, but if it shrinks (e.g. due to a few unlink() or
rename() calls) so that you overshoot the cookie, then you can end up
reading all the way to the end of the directory before doing an
uncached readdir.

IOW: This will have a detrimental effect for some workloads, which
needs to be weighed up against the benefits. I saw that you've tested
with large directories, but what workloads were you testing on those
directories?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 04/11] NFS: Add tracepoints for functions involving nfs_readdir_descriptor_t
  2020-11-02 13:50 ` [PATCH 04/11] NFS: Add tracepoints for functions involving nfs_readdir_descriptor_t Dave Wysochanski
@ 2020-11-02 17:32   ` kernel test robot
  0 siblings, 0 replies; 29+ messages in thread
From: kernel test robot @ 2020-11-02 17:32 UTC (permalink / raw)
  To: Dave Wysochanski, Trond Myklebust, Anna Schumaker; +Cc: kbuild-all, linux-nfs

[-- Attachment #1: Type: text/plain, Size: 55930 bytes --]

Hi Dave,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on nfs/linux-next]
[also build test ERROR on v5.10-rc2 next-20201102]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Dave-Wysochanski/Add-NFS-readdir-tracepoints-and-improve-performance-of-reading-directories/20201102-215316
base:   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
config: arm-randconfig-r036-20201102 (attached as .config)
compiler: arm-linux-gnueabi-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/a9947d23a4cc07f24cdb02954b19e650d7bf7a85
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Dave-Wysochanski/Add-NFS-readdir-tracepoints-and-improve-performance-of-reading-directories/20201102-215316
        git checkout a9947d23a4cc07f24cdb02954b19e650d7bf7a85
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from fs/nfs/nfstrace.h:11,
                    from fs/nfs/nfs2xdr.c:25:
>> fs/nfs/nfstrace.h:796:11: error: unknown type name 'nfs_readdir_descriptor_t'
     796 |     const nfs_readdir_descriptor_t *desc \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:235:32: note: in definition of macro '__DECLARE_TRACE'
     235 |  extern int __traceiter_##name(data_proto);   \
         |                                ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:794:2: note: in expansion of macro 'DEFINE_EVENT'
     794 |  DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:795:4: note: in expansion of macro 'TP_PROTO'
     795 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:850:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT'
     850 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_uncached_readdir_enter);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> fs/nfs/nfstrace.h:796:11: error: unknown type name 'nfs_readdir_descriptor_t'
     796 |     const nfs_readdir_descriptor_t *desc \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:238:34: note: in definition of macro '__DECLARE_TRACE'
     238 |  static inline void trace_##name(proto)    \
         |                                  ^~~~~
   include/linux/tracepoint.h:411:24: note: in expansion of macro 'PARAMS'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |                        ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:794:2: note: in expansion of macro 'DEFINE_EVENT'
     794 |  DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:795:4: note: in expansion of macro 'TP_PROTO'
     795 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:850:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT'
     850 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_uncached_readdir_enter);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> fs/nfs/nfstrace.h:796:11: error: unknown type name 'nfs_readdir_descriptor_t'
     796 |     const nfs_readdir_descriptor_t *desc \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:210:44: note: in definition of macro '__DECLARE_TRACE_RCU'
     210 |  static inline void trace_##name##_rcuidle(proto)  \
         |                                            ^~~~~
   include/linux/tracepoint.h:251:28: note: in expansion of macro 'PARAMS'
     251 |  __DECLARE_TRACE_RCU(name, PARAMS(proto), PARAMS(args),  \
         |                            ^~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:411:24: note: in expansion of macro 'PARAMS'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |                        ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:794:2: note: in expansion of macro 'DEFINE_EVENT'
     794 |  DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:795:4: note: in expansion of macro 'TP_PROTO'
     795 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:850:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT'
     850 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_uncached_readdir_enter);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> fs/nfs/nfstrace.h:796:11: error: unknown type name 'nfs_readdir_descriptor_t'
     796 |     const nfs_readdir_descriptor_t *desc \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:254:38: note: in definition of macro '__DECLARE_TRACE'
     254 |  register_trace_##name(void (*probe)(data_proto), void *data) \
         |                                      ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:794:2: note: in expansion of macro 'DEFINE_EVENT'
     794 |  DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:795:4: note: in expansion of macro 'TP_PROTO'
     795 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:850:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT'
     850 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_uncached_readdir_enter);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> fs/nfs/nfstrace.h:796:11: error: unknown type name 'nfs_readdir_descriptor_t'
     796 |     const nfs_readdir_descriptor_t *desc \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:260:43: note: in definition of macro '__DECLARE_TRACE'
     260 |  register_trace_prio_##name(void (*probe)(data_proto), void *data,\
         |                                           ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:794:2: note: in expansion of macro 'DEFINE_EVENT'
     794 |  DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:795:4: note: in expansion of macro 'TP_PROTO'
     795 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:850:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT'
     850 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_uncached_readdir_enter);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> fs/nfs/nfstrace.h:796:11: error: unknown type name 'nfs_readdir_descriptor_t'
     796 |     const nfs_readdir_descriptor_t *desc \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:267:40: note: in definition of macro '__DECLARE_TRACE'
     267 |  unregister_trace_##name(void (*probe)(data_proto), void *data) \
         |                                        ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:794:2: note: in expansion of macro 'DEFINE_EVENT'
     794 |  DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:795:4: note: in expansion of macro 'TP_PROTO'
     795 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:850:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT'
     850 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_uncached_readdir_enter);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> fs/nfs/nfstrace.h:796:11: error: unknown type name 'nfs_readdir_descriptor_t'
     796 |     const nfs_readdir_descriptor_t *desc \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:273:46: note: in definition of macro '__DECLARE_TRACE'
     273 |  check_trace_callback_type_##name(void (*cb)(data_proto)) \
         |                                              ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:794:2: note: in expansion of macro 'DEFINE_EVENT'
     794 |  DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:795:4: note: in expansion of macro 'TP_PROTO'
     795 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:850:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT'
     850 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_uncached_readdir_enter);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/nfs/nfstrace.h:845:11: error: unknown type name 'nfs_readdir_descriptor_t'
     845 |     const nfs_readdir_descriptor_t *desc, \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:235:32: note: in definition of macro '__DECLARE_TRACE'
     235 |  extern int __traceiter_##name(data_proto);   \
         |                                ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:843:2: note: in expansion of macro 'DEFINE_EVENT'
     843 |  DEFINE_EVENT(nfs_readdir_descriptor_event_exit, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:844:4: note: in expansion of macro 'TP_PROTO'
     844 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:851:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT'
     851 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_uncached_readdir_exit);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/nfs/nfstrace.h:845:11: error: unknown type name 'nfs_readdir_descriptor_t'
     845 |     const nfs_readdir_descriptor_t *desc, \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:238:34: note: in definition of macro '__DECLARE_TRACE'
     238 |  static inline void trace_##name(proto)    \
         |                                  ^~~~~
   include/linux/tracepoint.h:411:24: note: in expansion of macro 'PARAMS'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |                        ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:843:2: note: in expansion of macro 'DEFINE_EVENT'
     843 |  DEFINE_EVENT(nfs_readdir_descriptor_event_exit, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:844:4: note: in expansion of macro 'TP_PROTO'
     844 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:851:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT'
     851 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_uncached_readdir_exit);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/nfs/nfstrace.h:845:11: error: unknown type name 'nfs_readdir_descriptor_t'
     845 |     const nfs_readdir_descriptor_t *desc, \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:210:44: note: in definition of macro '__DECLARE_TRACE_RCU'
     210 |  static inline void trace_##name##_rcuidle(proto)  \
         |                                            ^~~~~
   include/linux/tracepoint.h:251:28: note: in expansion of macro 'PARAMS'
     251 |  __DECLARE_TRACE_RCU(name, PARAMS(proto), PARAMS(args),  \
         |                            ^~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:411:24: note: in expansion of macro 'PARAMS'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |                        ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:843:2: note: in expansion of macro 'DEFINE_EVENT'
     843 |  DEFINE_EVENT(nfs_readdir_descriptor_event_exit, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:844:4: note: in expansion of macro 'TP_PROTO'
     844 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:851:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT'
     851 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_uncached_readdir_exit);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/nfs/nfstrace.h:845:11: error: unknown type name 'nfs_readdir_descriptor_t'
     845 |     const nfs_readdir_descriptor_t *desc, \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:254:38: note: in definition of macro '__DECLARE_TRACE'
     254 |  register_trace_##name(void (*probe)(data_proto), void *data) \
         |                                      ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:843:2: note: in expansion of macro 'DEFINE_EVENT'
     843 |  DEFINE_EVENT(nfs_readdir_descriptor_event_exit, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:844:4: note: in expansion of macro 'TP_PROTO'
     844 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:851:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT'
     851 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_uncached_readdir_exit);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/nfs/nfstrace.h:845:11: error: unknown type name 'nfs_readdir_descriptor_t'
     845 |     const nfs_readdir_descriptor_t *desc, \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:260:43: note: in definition of macro '__DECLARE_TRACE'
     260 |  register_trace_prio_##name(void (*probe)(data_proto), void *data,\
         |                                           ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:843:2: note: in expansion of macro 'DEFINE_EVENT'
     843 |  DEFINE_EVENT(nfs_readdir_descriptor_event_exit, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:844:4: note: in expansion of macro 'TP_PROTO'
     844 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:851:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT'
     851 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_uncached_readdir_exit);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/nfs/nfstrace.h:845:11: error: unknown type name 'nfs_readdir_descriptor_t'
     845 |     const nfs_readdir_descriptor_t *desc, \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:267:40: note: in definition of macro '__DECLARE_TRACE'
     267 |  unregister_trace_##name(void (*probe)(data_proto), void *data) \
         |                                        ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:843:2: note: in expansion of macro 'DEFINE_EVENT'
     843 |  DEFINE_EVENT(nfs_readdir_descriptor_event_exit, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:844:4: note: in expansion of macro 'TP_PROTO'
     844 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:851:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT'
     851 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_uncached_readdir_exit);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/nfs/nfstrace.h:845:11: error: unknown type name 'nfs_readdir_descriptor_t'
     845 |     const nfs_readdir_descriptor_t *desc, \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:273:46: note: in definition of macro '__DECLARE_TRACE'
     273 |  check_trace_callback_type_##name(void (*cb)(data_proto)) \
         |                                              ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:843:2: note: in expansion of macro 'DEFINE_EVENT'
     843 |  DEFINE_EVENT(nfs_readdir_descriptor_event_exit, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:844:4: note: in expansion of macro 'TP_PROTO'
     844 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:851:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT'
     851 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_uncached_readdir_exit);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> fs/nfs/nfstrace.h:796:11: error: unknown type name 'nfs_readdir_descriptor_t'
     796 |     const nfs_readdir_descriptor_t *desc \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:235:32: note: in definition of macro '__DECLARE_TRACE'
     235 |  extern int __traceiter_##name(data_proto);   \
         |                                ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:794:2: note: in expansion of macro 'DEFINE_EVENT'
     794 |  DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:795:4: note: in expansion of macro 'TP_PROTO'
     795 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:852:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT'
     852 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_readdir_search_pagecache_enter);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> fs/nfs/nfstrace.h:796:11: error: unknown type name 'nfs_readdir_descriptor_t'
     796 |     const nfs_readdir_descriptor_t *desc \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:238:34: note: in definition of macro '__DECLARE_TRACE'
     238 |  static inline void trace_##name(proto)    \
         |                                  ^~~~~
   include/linux/tracepoint.h:411:24: note: in expansion of macro 'PARAMS'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |                        ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:794:2: note: in expansion of macro 'DEFINE_EVENT'
     794 |  DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:795:4: note: in expansion of macro 'TP_PROTO'
     795 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:852:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT'
     852 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_readdir_search_pagecache_enter);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> fs/nfs/nfstrace.h:796:11: error: unknown type name 'nfs_readdir_descriptor_t'
     796 |     const nfs_readdir_descriptor_t *desc \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:210:44: note: in definition of macro '__DECLARE_TRACE_RCU'
     210 |  static inline void trace_##name##_rcuidle(proto)  \
         |                                            ^~~~~
   include/linux/tracepoint.h:251:28: note: in expansion of macro 'PARAMS'
     251 |  __DECLARE_TRACE_RCU(name, PARAMS(proto), PARAMS(args),  \
         |                            ^~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:411:24: note: in expansion of macro 'PARAMS'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |                        ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:794:2: note: in expansion of macro 'DEFINE_EVENT'
     794 |  DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:795:4: note: in expansion of macro 'TP_PROTO'
     795 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:852:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT'
     852 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_readdir_search_pagecache_enter);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> fs/nfs/nfstrace.h:796:11: error: unknown type name 'nfs_readdir_descriptor_t'
     796 |     const nfs_readdir_descriptor_t *desc \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:254:38: note: in definition of macro '__DECLARE_TRACE'
     254 |  register_trace_##name(void (*probe)(data_proto), void *data) \
         |                                      ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:794:2: note: in expansion of macro 'DEFINE_EVENT'
     794 |  DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:795:4: note: in expansion of macro 'TP_PROTO'
     795 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:852:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT'
     852 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_readdir_search_pagecache_enter);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> fs/nfs/nfstrace.h:796:11: error: unknown type name 'nfs_readdir_descriptor_t'
     796 |     const nfs_readdir_descriptor_t *desc \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:260:43: note: in definition of macro '__DECLARE_TRACE'
     260 |  register_trace_prio_##name(void (*probe)(data_proto), void *data,\
         |                                           ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:794:2: note: in expansion of macro 'DEFINE_EVENT'
     794 |  DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:795:4: note: in expansion of macro 'TP_PROTO'
     795 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:852:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT'
     852 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_readdir_search_pagecache_enter);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> fs/nfs/nfstrace.h:796:11: error: unknown type name 'nfs_readdir_descriptor_t'
     796 |     const nfs_readdir_descriptor_t *desc \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:267:40: note: in definition of macro '__DECLARE_TRACE'
     267 |  unregister_trace_##name(void (*probe)(data_proto), void *data) \
         |                                        ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:794:2: note: in expansion of macro 'DEFINE_EVENT'
     794 |  DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:795:4: note: in expansion of macro 'TP_PROTO'
     795 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:852:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT'
     852 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_readdir_search_pagecache_enter);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> fs/nfs/nfstrace.h:796:11: error: unknown type name 'nfs_readdir_descriptor_t'
     796 |     const nfs_readdir_descriptor_t *desc \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:273:46: note: in definition of macro '__DECLARE_TRACE'
     273 |  check_trace_callback_type_##name(void (*cb)(data_proto)) \
         |                                              ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:794:2: note: in expansion of macro 'DEFINE_EVENT'
     794 |  DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:795:4: note: in expansion of macro 'TP_PROTO'
     795 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:852:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT'
     852 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_readdir_search_pagecache_enter);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/nfs/nfstrace.h:845:11: error: unknown type name 'nfs_readdir_descriptor_t'
     845 |     const nfs_readdir_descriptor_t *desc, \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:235:32: note: in definition of macro '__DECLARE_TRACE'
     235 |  extern int __traceiter_##name(data_proto);   \
         |                                ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:843:2: note: in expansion of macro 'DEFINE_EVENT'
     843 |  DEFINE_EVENT(nfs_readdir_descriptor_event_exit, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:844:4: note: in expansion of macro 'TP_PROTO'
     844 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:853:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT'
     853 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_readdir_search_pagecache_exit);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/nfs/nfstrace.h:845:11: error: unknown type name 'nfs_readdir_descriptor_t'
     845 |     const nfs_readdir_descriptor_t *desc, \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:238:34: note: in definition of macro '__DECLARE_TRACE'
     238 |  static inline void trace_##name(proto)    \
         |                                  ^~~~~
   include/linux/tracepoint.h:411:24: note: in expansion of macro 'PARAMS'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |                        ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:843:2: note: in expansion of macro 'DEFINE_EVENT'
     843 |  DEFINE_EVENT(nfs_readdir_descriptor_event_exit, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:844:4: note: in expansion of macro 'TP_PROTO'
     844 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:853:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT'
     853 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_readdir_search_pagecache_exit);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/nfs/nfstrace.h:845:11: error: unknown type name 'nfs_readdir_descriptor_t'
     845 |     const nfs_readdir_descriptor_t *desc, \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:210:44: note: in definition of macro '__DECLARE_TRACE_RCU'
     210 |  static inline void trace_##name##_rcuidle(proto)  \
         |                                            ^~~~~
   include/linux/tracepoint.h:251:28: note: in expansion of macro 'PARAMS'
     251 |  __DECLARE_TRACE_RCU(name, PARAMS(proto), PARAMS(args),  \
         |                            ^~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:411:24: note: in expansion of macro 'PARAMS'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |                        ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:843:2: note: in expansion of macro 'DEFINE_EVENT'
     843 |  DEFINE_EVENT(nfs_readdir_descriptor_event_exit, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:844:4: note: in expansion of macro 'TP_PROTO'
     844 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:853:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT'
     853 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_readdir_search_pagecache_exit);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/nfs/nfstrace.h:845:11: error: unknown type name 'nfs_readdir_descriptor_t'
     845 |     const nfs_readdir_descriptor_t *desc, \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:254:38: note: in definition of macro '__DECLARE_TRACE'
     254 |  register_trace_##name(void (*probe)(data_proto), void *data) \
         |                                      ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:843:2: note: in expansion of macro 'DEFINE_EVENT'
     843 |  DEFINE_EVENT(nfs_readdir_descriptor_event_exit, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:844:4: note: in expansion of macro 'TP_PROTO'
     844 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:853:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT'
     853 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_readdir_search_pagecache_exit);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/nfs/nfstrace.h:845:11: error: unknown type name 'nfs_readdir_descriptor_t'
     845 |     const nfs_readdir_descriptor_t *desc, \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:260:43: note: in definition of macro '__DECLARE_TRACE'
     260 |  register_trace_prio_##name(void (*probe)(data_proto), void *data,\
         |                                           ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:843:2: note: in expansion of macro 'DEFINE_EVENT'
     843 |  DEFINE_EVENT(nfs_readdir_descriptor_event_exit, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:844:4: note: in expansion of macro 'TP_PROTO'
     844 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:853:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT'
     853 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_readdir_search_pagecache_exit);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/nfs/nfstrace.h:845:11: error: unknown type name 'nfs_readdir_descriptor_t'
     845 |     const nfs_readdir_descriptor_t *desc, \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:267:40: note: in definition of macro '__DECLARE_TRACE'
     267 |  unregister_trace_##name(void (*probe)(data_proto), void *data) \
         |                                        ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:843:2: note: in expansion of macro 'DEFINE_EVENT'
     843 |  DEFINE_EVENT(nfs_readdir_descriptor_event_exit, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:844:4: note: in expansion of macro 'TP_PROTO'
     844 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:853:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT'
     853 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_readdir_search_pagecache_exit);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/nfs/nfstrace.h:845:11: error: unknown type name 'nfs_readdir_descriptor_t'
     845 |     const nfs_readdir_descriptor_t *desc, \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:273:46: note: in definition of macro '__DECLARE_TRACE'
     273 |  check_trace_callback_type_##name(void (*cb)(data_proto)) \
         |                                              ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:843:2: note: in expansion of macro 'DEFINE_EVENT'
     843 |  DEFINE_EVENT(nfs_readdir_descriptor_event_exit, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:844:4: note: in expansion of macro 'TP_PROTO'
     844 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:853:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT'
     853 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_readdir_search_pagecache_exit);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> fs/nfs/nfstrace.h:796:11: error: unknown type name 'nfs_readdir_descriptor_t'
     796 |     const nfs_readdir_descriptor_t *desc \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:235:32: note: in definition of macro '__DECLARE_TRACE'
     235 |  extern int __traceiter_##name(data_proto);   \
         |                                ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:794:2: note: in expansion of macro 'DEFINE_EVENT'
     794 |  DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:795:4: note: in expansion of macro 'TP_PROTO'
     795 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:854:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT'
     854 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_readdir_search_array_enter);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> fs/nfs/nfstrace.h:796:11: error: unknown type name 'nfs_readdir_descriptor_t'
     796 |     const nfs_readdir_descriptor_t *desc \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:238:34: note: in definition of macro '__DECLARE_TRACE'
     238 |  static inline void trace_##name(proto)    \
         |                                  ^~~~~
   include/linux/tracepoint.h:411:24: note: in expansion of macro 'PARAMS'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |                        ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:794:2: note: in expansion of macro 'DEFINE_EVENT'
     794 |  DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:795:4: note: in expansion of macro 'TP_PROTO'
     795 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:854:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT'
     854 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_readdir_search_array_enter);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> fs/nfs/nfstrace.h:796:11: error: unknown type name 'nfs_readdir_descriptor_t'
     796 |     const nfs_readdir_descriptor_t *desc \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:210:44: note: in definition of macro '__DECLARE_TRACE_RCU'
     210 |  static inline void trace_##name##_rcuidle(proto)  \
         |                                            ^~~~~
   include/linux/tracepoint.h:251:28: note: in expansion of macro 'PARAMS'
     251 |  __DECLARE_TRACE_RCU(name, PARAMS(proto), PARAMS(args),  \
         |                            ^~~~~~
   include/linux/tracepoint.h:411:2: note: in expansion of macro '__DECLARE_TRACE'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |  ^~~~~~~~~~~~~~~
   include/linux/tracepoint.h:411:24: note: in expansion of macro 'PARAMS'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |                        ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:794:2: note: in expansion of macro 'DEFINE_EVENT'
     794 |  DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:795:4: note: in expansion of macro 'TP_PROTO'
     795 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:854:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT'
     854 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_readdir_search_array_enter);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> fs/nfs/nfstrace.h:796:11: error: unknown type name 'nfs_readdir_descriptor_t'
     796 |     const nfs_readdir_descriptor_t *desc \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:254:38: note: in definition of macro '__DECLARE_TRACE'
     254 |  register_trace_##name(void (*probe)(data_proto), void *data) \
         |                                      ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:794:2: note: in expansion of macro 'DEFINE_EVENT'
     794 |  DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:795:4: note: in expansion of macro 'TP_PROTO'
     795 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:854:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT'
     854 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_readdir_search_array_enter);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> fs/nfs/nfstrace.h:796:11: error: unknown type name 'nfs_readdir_descriptor_t'
     796 |     const nfs_readdir_descriptor_t *desc \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:260:43: note: in definition of macro '__DECLARE_TRACE'
     260 |  register_trace_prio_##name(void (*probe)(data_proto), void *data,\
         |                                           ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:794:2: note: in expansion of macro 'DEFINE_EVENT'
     794 |  DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:795:4: note: in expansion of macro 'TP_PROTO'
     795 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:854:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT'
     854 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_readdir_search_array_enter);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> fs/nfs/nfstrace.h:796:11: error: unknown type name 'nfs_readdir_descriptor_t'
     796 |     const nfs_readdir_descriptor_t *desc \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:267:40: note: in definition of macro '__DECLARE_TRACE'
     267 |  unregister_trace_##name(void (*probe)(data_proto), void *data) \
         |                                        ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:794:2: note: in expansion of macro 'DEFINE_EVENT'
     794 |  DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:795:4: note: in expansion of macro 'TP_PROTO'
     795 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:854:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT'
     854 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_readdir_search_array_enter);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/nfs/nfstrace.h:796:11: error: unknown type name 'nfs_readdir_descriptor_t'
     796 |     const nfs_readdir_descriptor_t *desc \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:273:46: note: in definition of macro '__DECLARE_TRACE'
     273 |  check_trace_callback_type_##name(void (*cb)(data_proto)) \
         |                                              ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:794:2: note: in expansion of macro 'DEFINE_EVENT'
     794 |  DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:795:4: note: in expansion of macro 'TP_PROTO'
     795 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:854:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT'
     854 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(nfs_readdir_search_array_enter);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/nfs/nfstrace.h:845:11: error: unknown type name 'nfs_readdir_descriptor_t'
     845 |     const nfs_readdir_descriptor_t *desc, \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:235:32: note: in definition of macro '__DECLARE_TRACE'
     235 |  extern int __traceiter_##name(data_proto);   \
         |                                ^~~~~~~~~~
   include/linux/tracepoint.h:413:4: note: in expansion of macro 'PARAMS'
     413 |    PARAMS(void *__data, proto),   \
         |    ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:843:2: note: in expansion of macro 'DEFINE_EVENT'
     843 |  DEFINE_EVENT(nfs_readdir_descriptor_event_exit, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:844:4: note: in expansion of macro 'TP_PROTO'
     844 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:855:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT'
     855 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_readdir_search_array_exit);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/nfs/nfstrace.h:845:11: error: unknown type name 'nfs_readdir_descriptor_t'
     845 |     const nfs_readdir_descriptor_t *desc, \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:238:34: note: in definition of macro '__DECLARE_TRACE'
     238 |  static inline void trace_##name(proto)    \
         |                                  ^~~~~
   include/linux/tracepoint.h:411:24: note: in expansion of macro 'PARAMS'
     411 |  __DECLARE_TRACE(name, PARAMS(proto), PARAMS(args),  \
         |                        ^~~~~~
   include/linux/tracepoint.h:536:2: note: in expansion of macro 'DECLARE_TRACE'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |  ^~~~~~~~~~~~~
   include/linux/tracepoint.h:536:22: note: in expansion of macro 'PARAMS'
     536 |  DECLARE_TRACE(name, PARAMS(proto), PARAMS(args))
         |                      ^~~~~~
   fs/nfs/nfstrace.h:843:2: note: in expansion of macro 'DEFINE_EVENT'
     843 |  DEFINE_EVENT(nfs_readdir_descriptor_event_exit, name, \
         |  ^~~~~~~~~~~~
   fs/nfs/nfstrace.h:844:4: note: in expansion of macro 'TP_PROTO'
     844 |    TP_PROTO( \
         |    ^~~~~~~~
   fs/nfs/nfstrace.h:855:1: note: in expansion of macro 'DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT'
     855 | DEFINE_NFS_READDIR_DESCRIPTOR_EVENT_EXIT(nfs_readdir_search_array_exit);
         | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   fs/nfs/nfstrace.h:845:11: error: unknown type name 'nfs_readdir_descriptor_t'
     845 |     const nfs_readdir_descriptor_t *desc, \
         |           ^~~~~~~~~~~~~~~~~~~~~~~~
   include/linux/tracepoint.h:210:44: note: in definition of macro '__DECLARE_TRACE_RCU'
     210 |  static inline void trace_##name##_rcuidle(proto)  \

vim +/nfs_readdir_descriptor_t +796 fs/nfs/nfstrace.h

   792	
   793	#define DEFINE_NFS_READDIR_DESCRIPTOR_EVENT(name) \
   794		DEFINE_EVENT(nfs_readdir_descriptor_event_enter, name, \
   795				TP_PROTO( \
 > 796					const nfs_readdir_descriptor_t *desc \
   797				), \
   798				TP_ARGS(desc))
   799	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31366 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 09/11] NFS: Improve performance of listing directories being modified
  2020-11-02 17:31       ` Trond Myklebust
@ 2020-11-02 19:45         ` David Wysochanski
  2020-11-02 21:30           ` Trond Myklebust
  2020-11-03  0:09           ` Frank van der Linden
  0 siblings, 2 replies; 29+ messages in thread
From: David Wysochanski @ 2020-11-02 19:45 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, anna.schumaker

On Mon, Nov 2, 2020 at 12:31 PM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> On Mon, 2020-11-02 at 11:26 -0500, David Wysochanski wrote:
> > On Mon, Nov 2, 2020 at 11:22 AM Trond Myklebust <
> > trondmy@hammerspace.com> wrote:
> > >
> > > On Mon, 2020-11-02 at 08:50 -0500, Dave Wysochanski wrote:
> > > > A process can hang forever to 'ls -l' a directory while the
> > > > directory
> > > > is being modified such as another NFS client adding files to the
> > > > directory.  The problem is seen specifically with larger
> > > > directories
> > > > (I tested with 1 million) and/or slower NFS server responses to
> > > > READDIR.  If a combination of the NFS directory size, the NFS
> > > > server
> > > > responses to READDIR is such that the 'ls' process gets partially
> > > > through the listing before the attribute cache expires (time
> > > > exceeds acdirmax), we drop the pagecache and have to re-fill it,
> > > > and as a result, the process may never complete.  One could argue
> > > > for larger directories the acdirmin/acdirmax should be increased,
> > > > but it's not always possible to tune this effectively.
> > > >
> > > > The root cause of this problem is due to how the NFS readdir
> > > > cache
> > > > currently works.  The main search function,
> > > > readdir_search_pagecache(),
> > > > always starts searching at page_index and cookie == 0, and for
> > > > any
> > > > page not in the cache, fills in the page with entries obtained in
> > > > a READDIR NFS call.  If a page already exists, we proceed to
> > > > nfs_readdir_search_for_cookie(), which searches for the cookie
> > > > (pos) of the readdir call.  The search is O(n), where n is the
> > > > directory size before the cookie in question is found, and every
> > > > entry to nfs_readdir() pays this penalty, irrespective of the
> > > > current directory position (dir_context.pos).  The search is
> > > > expensive due to the opaque nature of readdir cookies, and the
> > > > fact
> > > > that no mapping (hash) exists from cookies to pages.  In the case
> > > > of a directory being modified, the above behavior can become an
> > > > excessive penalty, since the same process is forced to fill pages
> > > > it
> > > > may be no longer interested in (the entries were passed in a
> > > > previous
> > > > nfs_readdir call), and this can essentially lead no forward
> > > > progress.
> > > >
> > > > To fix this problem, at the end of nfs_readdir(), save the
> > > > page_index
> > > > corresponding to the directory position (cookie) inside the
> > > > process's
> > > > nfs_open_dir_context.  Then at the next entry of nfs_readdir(),
> > > > use
> > > > the saved page_index as the starting search point rather than
> > > > starting
> > > > at page_index == 0.  Not only does this fix the problem of
> > > > listing
> > > > a directory being modified, it also significantly improves
> > > > performance
> > > > in the unmodified case since no extra search penalty is paid at
> > > > each
> > > > entry to nfs_readdir().
> > > >
> > > > In the case of lseek, since there is no hash or other mapping
> > > > from a
> > > > cookie value to the page->index, just reset
> > > > nfs_open_dir_context.page_index
> > > > to 0, which will reset the search to the old behavior.
> > > >
> > > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> > > > ---
> > > >  fs/nfs/dir.c           | 8 +++++++-
> > > >  include/linux/nfs_fs.h | 1 +
> > > >  2 files changed, 8 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > > > index 52e06c8fc7cd..b266f505b521 100644
> > > > --- a/fs/nfs/dir.c
> > > > +++ b/fs/nfs/dir.c
> > > > @@ -78,6 +78,7 @@ static struct nfs_open_dir_context
> > > > *alloc_nfs_open_dir_context(struct inode *dir
> > > >                 ctx->attr_gencount = nfsi->attr_gencount;
> > > >                 ctx->dir_cookie = 0;
> > > >                 ctx->dup_cookie = 0;
> > > > +               ctx->page_index = 0;
> > > >                 ctx->cred = get_cred(cred);
> > > >                 spin_lock(&dir->i_lock);
> > > >                 if (list_empty(&nfsi->open_files) &&
> > > > @@ -763,7 +764,7 @@ int
> > > > find_and_lock_cache_page(nfs_readdir_descriptor_t *desc)
> > > >         return res;
> > > >  }
> > > >
> > > > -/* Search for desc->dir_cookie from the beginning of the page
> > > > cache
> > > > */
> > > > +/* Search for desc->dir_cookie starting at desc->page_index */
> > > >  static inline
> > > >  int readdir_search_pagecache(nfs_readdir_descriptor_t *desc)
> > > >  {
> > > > @@ -885,6 +886,8 @@ static int nfs_readdir(struct file *file,
> > > > struct
> > > > dir_context *ctx)
> > > >                 .ctx = ctx,
> > > >                 .dir_cookie = &dir_ctx->dir_cookie,
> > > >                 .plus = nfs_use_readdirplus(inode, ctx),
> > > > +               .page_index = dir_ctx->page_index,
> > > > +               .last_cookie = nfs_readdir_use_cookie(file) ?
> > > > ctx-
> > > > > pos : 0,
> > > >         },
> > > >                         *desc = &my_desc;
> > > >         int res = 0;
> > > > @@ -938,6 +941,7 @@ static int nfs_readdir(struct file *file,
> > > > struct
> > > > dir_context *ctx)
> > > >  out:
> > > >         if (res > 0)
> > > >                 res = 0;
> > > > +       dir_ctx->page_index = desc->page_index;
> > > >         trace_nfs_readdir_exit(inode, ctx->pos, dir_ctx-
> > > > >dir_cookie,
> > > >                                NFS_SERVER(inode)->dtsize,
> > > > my_desc.plus, res);
> > > >         return res;
> > > > @@ -975,6 +979,8 @@ static loff_t nfs_llseek_dir(struct file
> > > > *filp,
> > > > loff_t offset, int whence)
> > > >                 else
> > > >                         dir_ctx->dir_cookie = 0;
> > > >                 dir_ctx->duped = 0;
> > > > +               /* Force readdir_search_pagecache to start over
> > > > */
> > > > +               dir_ctx->page_index = 0;
> > > >         }
> > > >         inode_unlock(inode);
> > > >         return offset;
> > > > diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> > > > index a2c6455ea3fa..0e55c0154ccd 100644
> > > > --- a/include/linux/nfs_fs.h
> > > > +++ b/include/linux/nfs_fs.h
> > > > @@ -93,6 +93,7 @@ struct nfs_open_dir_context {
> > > >         __u64 dir_cookie;
> > > >         __u64 dup_cookie;
> > > >         signed char duped;
> > > > +       unsigned long   page_index;
> > > >  };
> > > >
> > > >  /*
> > >
> > > NACK. It makes no sense to store the page index as a cursor.
> > >
> >
> > A similar thing was done recently with:
> > 227823d2074d nfs: optimise readdir cache page invalidation
> >
>
> That's a very different thing. It is about discarding page data in
> order to force a re-read of the contents into cache.
>
Right - I only pointed it out because it is in effect a cursor about
the last access into the cache but it's on a global basis, not
process context.

> What you're doing is basically trying to guess where the data is
> located. which might work in some cases where the directory is
> completely static, but if it shrinks (e.g. due to a few unlink() or
> rename() calls) so that you overshoot the cookie, then you can end up
> reading all the way to the end of the directory before doing an
> uncached readdir.
>
First, consider the unmodified (idle directory) scenario.  Today the
performance is bad the larger the directory goes - do you see why?
I tried to explain in the cover letter and header but maybe it's not clear?

Second, the modified scenario today the performance is very bad
because of the same problem - the cookie is reset and the process
needs to start over at cookie 0, repeating READDIRs.  But maybe
there's a specific scenario I'm not thinking about.

The way I thought about this is that if you're in a heavily modified
scenario with a large directory and you're past the 'acdirmax' time,
you have to make the choice of either:
a) ignoring 'acdirmax' (this is what the NFSv3 patch did) and even
though you know the cache expired you keep going as though it
did not (at least until a different process starts a listing)
b) honoring 'acdirmax' (drop the pagecache), but keep going the
best you can based on the previous information and don't try to
rebuild the cache before continuing.

> IOW: This will have a detrimental effect for some workloads, which
> needs to be weighed up against the benefits. I saw that you've tested
> with large directories, but what workloads were you testing on those
> directories?
>
I can definitely do further testing and any scenario you want to try to
break it or find a pathological scenario. So far I've tested the
reader ("ls -lf") in parallel with one of the two writers:
1) random add a file every 0.1s:
while true; do i=$((1 + RANDOM % $NUM_FILES)); echo $i; touch
$MNT2/file$i.bin; builtin sleep 0.1; done > /dev/null 2>&1 &
2) random delete a file every 0.1 s:
while true; do i=$((1 + RANDOM % $NUM_FILES)); echo $i; rm -f
$MNT2/file$i; builtin sleep 0.1; done > /dev/null 2>&1 &

In no case did I see it take a longer time or ops vs vanilla 5.9, the idle
and modified performance is better (measured in seconds and ops)
with this patch.  Below is a short summary.  Note that the first time and
ops is with an idle directory, and the second one is the modified.

5.9 (vanilla): random delete a file every 0.1 s:
Ops increased from 4734 to 8834
Time increased from 23 to 44

5.9 (this patch): random delete a file every 0.1 s:
Ops increased from 4697 to 4696
Time increased from 20 to 30


5.9 (vanilla): random add a file every 0.1s:
Ops increased from 4734 to 9168
Time increased from 23 to 43

5.9 (this patch): random add a file every 0.1s:
Ops increased from 4697 to 4702
Time increased from 21 to 32


> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
>
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 09/11] NFS: Improve performance of listing directories being modified
  2020-11-02 19:45         ` David Wysochanski
@ 2020-11-02 21:30           ` Trond Myklebust
  2020-11-02 22:05             ` David Wysochanski
  2020-11-03  0:09           ` Frank van der Linden
  1 sibling, 1 reply; 29+ messages in thread
From: Trond Myklebust @ 2020-11-02 21:30 UTC (permalink / raw)
  To: dwysocha; +Cc: linux-nfs, anna.schumaker

On Mon, 2020-11-02 at 14:45 -0500, David Wysochanski wrote:
> On Mon, Nov 2, 2020 at 12:31 PM Trond Myklebust <
> trondmy@hammerspace.com> wrote:
> > 
> > On Mon, 2020-11-02 at 11:26 -0500, David Wysochanski wrote:
> > > On Mon, Nov 2, 2020 at 11:22 AM Trond Myklebust <
> > > trondmy@hammerspace.com> wrote:
> > > > 
> > > > On Mon, 2020-11-02 at 08:50 -0500, Dave Wysochanski wrote:
> > > > > A process can hang forever to 'ls -l' a directory while the
> > > > > directory
> > > > > is being modified such as another NFS client adding files to
> > > > > the
> > > > > directory.  The problem is seen specifically with larger
> > > > > directories
> > > > > (I tested with 1 million) and/or slower NFS server responses
> > > > > to
> > > > > READDIR.  If a combination of the NFS directory size, the NFS
> > > > > server
> > > > > responses to READDIR is such that the 'ls' process gets
> > > > > partially
> > > > > through the listing before the attribute cache expires (time
> > > > > exceeds acdirmax), we drop the pagecache and have to re-fill
> > > > > it,
> > > > > and as a result, the process may never complete.  One could
> > > > > argue
> > > > > for larger directories the acdirmin/acdirmax should be
> > > > > increased,
> > > > > but it's not always possible to tune this effectively.
> > > > > 
> > > > > The root cause of this problem is due to how the NFS readdir
> > > > > cache
> > > > > currently works.  The main search function,
> > > > > readdir_search_pagecache(),
> > > > > always starts searching at page_index and cookie == 0, and
> > > > > for
> > > > > any
> > > > > page not in the cache, fills in the page with entries
> > > > > obtained in
> > > > > a READDIR NFS call.  If a page already exists, we proceed to
> > > > > nfs_readdir_search_for_cookie(), which searches for the
> > > > > cookie
> > > > > (pos) of the readdir call.  The search is O(n), where n is
> > > > > the
> > > > > directory size before the cookie in question is found, and
> > > > > every
> > > > > entry to nfs_readdir() pays this penalty, irrespective of the
> > > > > current directory position (dir_context.pos).  The search is
> > > > > expensive due to the opaque nature of readdir cookies, and
> > > > > the
> > > > > fact
> > > > > that no mapping (hash) exists from cookies to pages.  In the
> > > > > case
> > > > > of a directory being modified, the above behavior can become
> > > > > an
> > > > > excessive penalty, since the same process is forced to fill
> > > > > pages
> > > > > it
> > > > > may be no longer interested in (the entries were passed in a
> > > > > previous
> > > > > nfs_readdir call), and this can essentially lead no forward
> > > > > progress.
> > > > > 
> > > > > To fix this problem, at the end of nfs_readdir(), save the
> > > > > page_index
> > > > > corresponding to the directory position (cookie) inside the
> > > > > process's
> > > > > nfs_open_dir_context.  Then at the next entry of
> > > > > nfs_readdir(),
> > > > > use
> > > > > the saved page_index as the starting search point rather than
> > > > > starting
> > > > > at page_index == 0.  Not only does this fix the problem of
> > > > > listing
> > > > > a directory being modified, it also significantly improves
> > > > > performance
> > > > > in the unmodified case since no extra search penalty is paid
> > > > > at
> > > > > each
> > > > > entry to nfs_readdir().
> > > > > 
> > > > > In the case of lseek, since there is no hash or other mapping
> > > > > from a
> > > > > cookie value to the page->index, just reset
> > > > > nfs_open_dir_context.page_index
> > > > > to 0, which will reset the search to the old behavior.
> > > > > 
> > > > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> > > > > ---
> > > > >  fs/nfs/dir.c           | 8 +++++++-
> > > > >  include/linux/nfs_fs.h | 1 +
> > > > >  2 files changed, 8 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > > > > index 52e06c8fc7cd..b266f505b521 100644
> > > > > --- a/fs/nfs/dir.c
> > > > > +++ b/fs/nfs/dir.c
> > > > > @@ -78,6 +78,7 @@ static struct nfs_open_dir_context
> > > > > *alloc_nfs_open_dir_context(struct inode *dir
> > > > >                 ctx->attr_gencount = nfsi->attr_gencount;
> > > > >                 ctx->dir_cookie = 0;
> > > > >                 ctx->dup_cookie = 0;
> > > > > +               ctx->page_index = 0;
> > > > >                 ctx->cred = get_cred(cred);
> > > > >                 spin_lock(&dir->i_lock);
> > > > >                 if (list_empty(&nfsi->open_files) &&
> > > > > @@ -763,7 +764,7 @@ int
> > > > > find_and_lock_cache_page(nfs_readdir_descriptor_t *desc)
> > > > >         return res;
> > > > >  }
> > > > > 
> > > > > -/* Search for desc->dir_cookie from the beginning of the
> > > > > page
> > > > > cache
> > > > > */
> > > > > +/* Search for desc->dir_cookie starting at desc->page_index
> > > > > */
> > > > >  static inline
> > > > >  int readdir_search_pagecache(nfs_readdir_descriptor_t *desc)
> > > > >  {
> > > > > @@ -885,6 +886,8 @@ static int nfs_readdir(struct file *file,
> > > > > struct
> > > > > dir_context *ctx)
> > > > >                 .ctx = ctx,
> > > > >                 .dir_cookie = &dir_ctx->dir_cookie,
> > > > >                 .plus = nfs_use_readdirplus(inode, ctx),
> > > > > +               .page_index = dir_ctx->page_index,
> > > > > +               .last_cookie = nfs_readdir_use_cookie(file) ?
> > > > > ctx-
> > > > > > pos : 0,
> > > > >         },
> > > > >                         *desc = &my_desc;
> > > > >         int res = 0;
> > > > > @@ -938,6 +941,7 @@ static int nfs_readdir(struct file *file,
> > > > > struct
> > > > > dir_context *ctx)
> > > > >  out:
> > > > >         if (res > 0)
> > > > >                 res = 0;
> > > > > +       dir_ctx->page_index = desc->page_index;
> > > > >         trace_nfs_readdir_exit(inode, ctx->pos, dir_ctx-
> > > > > > dir_cookie,
> > > > >                                NFS_SERVER(inode)->dtsize,
> > > > > my_desc.plus, res);
> > > > >         return res;
> > > > > @@ -975,6 +979,8 @@ static loff_t nfs_llseek_dir(struct file
> > > > > *filp,
> > > > > loff_t offset, int whence)
> > > > >                 else
> > > > >                         dir_ctx->dir_cookie = 0;
> > > > >                 dir_ctx->duped = 0;
> > > > > +               /* Force readdir_search_pagecache to start
> > > > > over
> > > > > */
> > > > > +               dir_ctx->page_index = 0;
> > > > >         }
> > > > >         inode_unlock(inode);
> > > > >         return offset;
> > > > > diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> > > > > index a2c6455ea3fa..0e55c0154ccd 100644
> > > > > --- a/include/linux/nfs_fs.h
> > > > > +++ b/include/linux/nfs_fs.h
> > > > > @@ -93,6 +93,7 @@ struct nfs_open_dir_context {
> > > > >         __u64 dir_cookie;
> > > > >         __u64 dup_cookie;
> > > > >         signed char duped;
> > > > > +       unsigned long   page_index;
> > > > >  };
> > > > > 
> > > > >  /*
> > > > 
> > > > NACK. It makes no sense to store the page index as a cursor.
> > > > 
> > > 
> > > A similar thing was done recently with:
> > > 227823d2074d nfs: optimise readdir cache page invalidation
> > > 
> > 
> > That's a very different thing. It is about discarding page data in
> > order to force a re-read of the contents into cache.
> > 
> Right - I only pointed it out because it is in effect a cursor about
> the last access into the cache but it's on a global basis, not
> process context.
> 
> > What you're doing is basically trying to guess where the data is
> > located. which might work in some cases where the directory is
> > completely static, but if it shrinks (e.g. due to a few unlink() or
> > rename() calls) so that you overshoot the cookie, then you can end
> > up
> > reading all the way to the end of the directory before doing an
> > uncached readdir.
> > 
> First, consider the unmodified (idle directory) scenario.  Today the
> performance is bad the larger the directory goes - do you see why?
> I tried to explain in the cover letter and header but maybe it's not
> clear?
> 
> Second, the modified scenario today the performance is very bad
> because of the same problem - the cookie is reset and the process
> needs to start over at cookie 0, repeating READDIRs.  But maybe
> there's a specific scenario I'm not thinking about.
> 
> The way I thought about this is that if you're in a heavily modified
> scenario with a large directory and you're past the 'acdirmax' time,
> you have to make the choice of either:
> a) ignoring 'acdirmax' (this is what the NFSv3 patch did) and even
> though you know the cache expired you keep going as though it
> did not (at least until a different process starts a listing)
> b) honoring 'acdirmax' (drop the pagecache), but keep going the
> best you can based on the previous information and don't try to
> rebuild the cache before continuing.
> 
> > IOW: This will have a detrimental effect for some workloads, which
> > needs to be weighed up against the benefits. I saw that you've
> > tested
> > with large directories, but what workloads were you testing on
> > those
> > directories?
> > 
> I can definitely do further testing and any scenario you want to try
> to
> break it or find a pathological scenario. So far I've tested the
> reader ("ls -lf") in parallel with one of the two writers:
> 1) random add a file every 0.1s:
> while true; do i=$((1 + RANDOM % $NUM_FILES)); echo $i; touch
> $MNT2/file$i.bin; builtin sleep 0.1; done > /dev/null 2>&1 &
> 2) random delete a file every 0.1 s:
> while true; do i=$((1 + RANDOM % $NUM_FILES)); echo $i; rm -f
> $MNT2/file$i; builtin sleep 0.1; done > /dev/null 2>&1 &
> 
> In no case did I see it take a longer time or ops vs vanilla 5.9, the
> idle
> and modified performance is better (measured in seconds and ops)
> with this patch.  Below is a short summary.  Note that the first time
> and
> ops is with an idle directory, and the second one is the modified.
> 
> 5.9 (vanilla): random delete a file every 0.1 s:
> Ops increased from 4734 to 8834
> Time increased from 23 to 44
> 
> 5.9 (this patch): random delete a file every 0.1 s:
> Ops increased from 4697 to 4696
> Time increased from 20 to 30
> 
> 
> 5.9 (vanilla): random add a file every 0.1s:
> Ops increased from 4734 to 9168
> Time increased from 23 to 43
> 
> 5.9 (this patch): random add a file every 0.1s:
> Ops increased from 4697 to 4702
> Time increased from 21 to 32
> 

If you're not seeing any change in number of ops then those numbers are
basically telling you that you're not seeing any cache invalidation.
You should be seeing cache invalidation when you are creating and
deleting files and are doing simultaneous readdirs.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 09/11] NFS: Improve performance of listing directories being modified
  2020-11-02 21:30           ` Trond Myklebust
@ 2020-11-02 22:05             ` David Wysochanski
  2020-11-03  3:38               ` Trond Myklebust
  0 siblings, 1 reply; 29+ messages in thread
From: David Wysochanski @ 2020-11-02 22:05 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, anna.schumaker

On Mon, Nov 2, 2020 at 4:30 PM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> On Mon, 2020-11-02 at 14:45 -0500, David Wysochanski wrote:
> > On Mon, Nov 2, 2020 at 12:31 PM Trond Myklebust <
> > trondmy@hammerspace.com> wrote:
> > >
> > > On Mon, 2020-11-02 at 11:26 -0500, David Wysochanski wrote:
> > > > On Mon, Nov 2, 2020 at 11:22 AM Trond Myklebust <
> > > > trondmy@hammerspace.com> wrote:
> > > > >
> > > > > On Mon, 2020-11-02 at 08:50 -0500, Dave Wysochanski wrote:
> > > > > > A process can hang forever to 'ls -l' a directory while the
> > > > > > directory
> > > > > > is being modified such as another NFS client adding files to
> > > > > > the
> > > > > > directory.  The problem is seen specifically with larger
> > > > > > directories
> > > > > > (I tested with 1 million) and/or slower NFS server responses
> > > > > > to
> > > > > > READDIR.  If a combination of the NFS directory size, the NFS
> > > > > > server
> > > > > > responses to READDIR is such that the 'ls' process gets
> > > > > > partially
> > > > > > through the listing before the attribute cache expires (time
> > > > > > exceeds acdirmax), we drop the pagecache and have to re-fill
> > > > > > it,
> > > > > > and as a result, the process may never complete.  One could
> > > > > > argue
> > > > > > for larger directories the acdirmin/acdirmax should be
> > > > > > increased,
> > > > > > but it's not always possible to tune this effectively.
> > > > > >
> > > > > > The root cause of this problem is due to how the NFS readdir
> > > > > > cache
> > > > > > currently works.  The main search function,
> > > > > > readdir_search_pagecache(),
> > > > > > always starts searching at page_index and cookie == 0, and
> > > > > > for
> > > > > > any
> > > > > > page not in the cache, fills in the page with entries
> > > > > > obtained in
> > > > > > a READDIR NFS call.  If a page already exists, we proceed to
> > > > > > nfs_readdir_search_for_cookie(), which searches for the
> > > > > > cookie
> > > > > > (pos) of the readdir call.  The search is O(n), where n is
> > > > > > the
> > > > > > directory size before the cookie in question is found, and
> > > > > > every
> > > > > > entry to nfs_readdir() pays this penalty, irrespective of the
> > > > > > current directory position (dir_context.pos).  The search is
> > > > > > expensive due to the opaque nature of readdir cookies, and
> > > > > > the
> > > > > > fact
> > > > > > that no mapping (hash) exists from cookies to pages.  In the
> > > > > > case
> > > > > > of a directory being modified, the above behavior can become
> > > > > > an
> > > > > > excessive penalty, since the same process is forced to fill
> > > > > > pages
> > > > > > it
> > > > > > may be no longer interested in (the entries were passed in a
> > > > > > previous
> > > > > > nfs_readdir call), and this can essentially lead no forward
> > > > > > progress.
> > > > > >
> > > > > > To fix this problem, at the end of nfs_readdir(), save the
> > > > > > page_index
> > > > > > corresponding to the directory position (cookie) inside the
> > > > > > process's
> > > > > > nfs_open_dir_context.  Then at the next entry of
> > > > > > nfs_readdir(),
> > > > > > use
> > > > > > the saved page_index as the starting search point rather than
> > > > > > starting
> > > > > > at page_index == 0.  Not only does this fix the problem of
> > > > > > listing
> > > > > > a directory being modified, it also significantly improves
> > > > > > performance
> > > > > > in the unmodified case since no extra search penalty is paid
> > > > > > at
> > > > > > each
> > > > > > entry to nfs_readdir().
> > > > > >
> > > > > > In the case of lseek, since there is no hash or other mapping
> > > > > > from a
> > > > > > cookie value to the page->index, just reset
> > > > > > nfs_open_dir_context.page_index
> > > > > > to 0, which will reset the search to the old behavior.
> > > > > >
> > > > > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> > > > > > ---
> > > > > >  fs/nfs/dir.c           | 8 +++++++-
> > > > > >  include/linux/nfs_fs.h | 1 +
> > > > > >  2 files changed, 8 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > > > > > index 52e06c8fc7cd..b266f505b521 100644
> > > > > > --- a/fs/nfs/dir.c
> > > > > > +++ b/fs/nfs/dir.c
> > > > > > @@ -78,6 +78,7 @@ static struct nfs_open_dir_context
> > > > > > *alloc_nfs_open_dir_context(struct inode *dir
> > > > > >                 ctx->attr_gencount = nfsi->attr_gencount;
> > > > > >                 ctx->dir_cookie = 0;
> > > > > >                 ctx->dup_cookie = 0;
> > > > > > +               ctx->page_index = 0;
> > > > > >                 ctx->cred = get_cred(cred);
> > > > > >                 spin_lock(&dir->i_lock);
> > > > > >                 if (list_empty(&nfsi->open_files) &&
> > > > > > @@ -763,7 +764,7 @@ int
> > > > > > find_and_lock_cache_page(nfs_readdir_descriptor_t *desc)
> > > > > >         return res;
> > > > > >  }
> > > > > >
> > > > > > -/* Search for desc->dir_cookie from the beginning of the
> > > > > > page
> > > > > > cache
> > > > > > */
> > > > > > +/* Search for desc->dir_cookie starting at desc->page_index
> > > > > > */
> > > > > >  static inline
> > > > > >  int readdir_search_pagecache(nfs_readdir_descriptor_t *desc)
> > > > > >  {
> > > > > > @@ -885,6 +886,8 @@ static int nfs_readdir(struct file *file,
> > > > > > struct
> > > > > > dir_context *ctx)
> > > > > >                 .ctx = ctx,
> > > > > >                 .dir_cookie = &dir_ctx->dir_cookie,
> > > > > >                 .plus = nfs_use_readdirplus(inode, ctx),
> > > > > > +               .page_index = dir_ctx->page_index,
> > > > > > +               .last_cookie = nfs_readdir_use_cookie(file) ?
> > > > > > ctx-
> > > > > > > pos : 0,
> > > > > >         },
> > > > > >                         *desc = &my_desc;
> > > > > >         int res = 0;
> > > > > > @@ -938,6 +941,7 @@ static int nfs_readdir(struct file *file,
> > > > > > struct
> > > > > > dir_context *ctx)
> > > > > >  out:
> > > > > >         if (res > 0)
> > > > > >                 res = 0;
> > > > > > +       dir_ctx->page_index = desc->page_index;
> > > > > >         trace_nfs_readdir_exit(inode, ctx->pos, dir_ctx-
> > > > > > > dir_cookie,
> > > > > >                                NFS_SERVER(inode)->dtsize,
> > > > > > my_desc.plus, res);
> > > > > >         return res;
> > > > > > @@ -975,6 +979,8 @@ static loff_t nfs_llseek_dir(struct file
> > > > > > *filp,
> > > > > > loff_t offset, int whence)
> > > > > >                 else
> > > > > >                         dir_ctx->dir_cookie = 0;
> > > > > >                 dir_ctx->duped = 0;
> > > > > > +               /* Force readdir_search_pagecache to start
> > > > > > over
> > > > > > */
> > > > > > +               dir_ctx->page_index = 0;
> > > > > >         }
> > > > > >         inode_unlock(inode);
> > > > > >         return offset;
> > > > > > diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> > > > > > index a2c6455ea3fa..0e55c0154ccd 100644
> > > > > > --- a/include/linux/nfs_fs.h
> > > > > > +++ b/include/linux/nfs_fs.h
> > > > > > @@ -93,6 +93,7 @@ struct nfs_open_dir_context {
> > > > > >         __u64 dir_cookie;
> > > > > >         __u64 dup_cookie;
> > > > > >         signed char duped;
> > > > > > +       unsigned long   page_index;
> > > > > >  };
> > > > > >
> > > > > >  /*
> > > > >
> > > > > NACK. It makes no sense to store the page index as a cursor.
> > > > >
> > > >
> > > > A similar thing was done recently with:
> > > > 227823d2074d nfs: optimise readdir cache page invalidation
> > > >
> > >
> > > That's a very different thing. It is about discarding page data in
> > > order to force a re-read of the contents into cache.
> > >
> > Right - I only pointed it out because it is in effect a cursor about
> > the last access into the cache but it's on a global basis, not
> > process context.
> >
> > > What you're doing is basically trying to guess where the data is
> > > located. which might work in some cases where the directory is
> > > completely static, but if it shrinks (e.g. due to a few unlink() or
> > > rename() calls) so that you overshoot the cookie, then you can end
> > > up
> > > reading all the way to the end of the directory before doing an
> > > uncached readdir.
> > >
> > First, consider the unmodified (idle directory) scenario.  Today the
> > performance is bad the larger the directory goes - do you see why?
> > I tried to explain in the cover letter and header but maybe it's not
> > clear?
> >
> > Second, the modified scenario today the performance is very bad
> > because of the same problem - the cookie is reset and the process
> > needs to start over at cookie 0, repeating READDIRs.  But maybe
> > there's a specific scenario I'm not thinking about.
> >
> > The way I thought about this is that if you're in a heavily modified
> > scenario with a large directory and you're past the 'acdirmax' time,
> > you have to make the choice of either:
> > a) ignoring 'acdirmax' (this is what the NFSv3 patch did) and even
> > though you know the cache expired you keep going as though it
> > did not (at least until a different process starts a listing)
> > b) honoring 'acdirmax' (drop the pagecache), but keep going the
> > best you can based on the previous information and don't try to
> > rebuild the cache before continuing.
> >
> > > IOW: This will have a detrimental effect for some workloads, which
> > > needs to be weighed up against the benefits. I saw that you've
> > > tested
> > > with large directories, but what workloads were you testing on
> > > those
> > > directories?
> > >
> > I can definitely do further testing and any scenario you want to try
> > to
> > break it or find a pathological scenario. So far I've tested the
> > reader ("ls -lf") in parallel with one of the two writers:
> > 1) random add a file every 0.1s:
> > while true; do i=$((1 + RANDOM % $NUM_FILES)); echo $i; touch
> > $MNT2/file$i.bin; builtin sleep 0.1; done > /dev/null 2>&1 &
> > 2) random delete a file every 0.1 s:
> > while true; do i=$((1 + RANDOM % $NUM_FILES)); echo $i; rm -f
> > $MNT2/file$i; builtin sleep 0.1; done > /dev/null 2>&1 &
> >
> > In no case did I see it take a longer time or ops vs vanilla 5.9, the
> > idle
> > and modified performance is better (measured in seconds and ops)
> > with this patch.  Below is a short summary.  Note that the first time
> > and
> > ops is with an idle directory, and the second one is the modified.
> >
> > 5.9 (vanilla): random delete a file every 0.1 s:
> > Ops increased from 4734 to 8834
> > Time increased from 23 to 44
> >
> > 5.9 (this patch): random delete a file every 0.1 s:
> > Ops increased from 4697 to 4696
> > Time increased from 20 to 30
> >
> >
> > 5.9 (vanilla): random add a file every 0.1s:
> > Ops increased from 4734 to 9168
> > Time increased from 23 to 43
> >
> > 5.9 (this patch): random add a file every 0.1s:
> > Ops increased from 4697 to 4702
> > Time increased from 21 to 32
> >
>
> If you're not seeing any change in number of ops then those numbers are
> basically telling you that you're not seeing any cache invalidation.
> You should be seeing cache invalidation when you are creating and
> deleting files and are doing simultaneous readdirs.
>

No I think you're misunderstanding or we're not on the same page.
The difference is, with 5.9 vanilla when the invalidation occurs, the
reader process has to go back to cookie 0 and pays the penalty
to refill all the cache pages up to cookie 'N'.  With the patch this
is not the case - it continues on with cookie 'N' and does not pay the
penalty - the next reader does.

> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
>
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 09/11] NFS: Improve performance of listing directories being modified
  2020-11-02 19:45         ` David Wysochanski
  2020-11-02 21:30           ` Trond Myklebust
@ 2020-11-03  0:09           ` Frank van der Linden
  2020-11-03 17:49             ` David Wysochanski
  1 sibling, 1 reply; 29+ messages in thread
From: Frank van der Linden @ 2020-11-03  0:09 UTC (permalink / raw)
  To: David Wysochanski; +Cc: Trond Myklebust, linux-nfs, anna.schumaker

Hi Dave,

Thanks for looking at this.

On Mon, Nov 02, 2020 at 02:45:09PM -0500, David Wysochanski wrote:
> On Mon, Nov 2, 2020 at 12:31 PM Trond Myklebust <trondmy@hammerspace.com> wrote:
> >
> > On Mon, 2020-11-02 at 11:26 -0500, David Wysochanski wrote:
> > > On Mon, Nov 2, 2020 at 11:22 AM Trond Myklebust <
> > > trondmy@hammerspace.com> wrote:
> > > >
> > > > On Mon, 2020-11-02 at 08:50 -0500, Dave Wysochanski wrote:
> > > > > A process can hang forever to 'ls -l' a directory while the
> > > > > directory
> > > > > is being modified such as another NFS client adding files to the
> > > > > directory.  The problem is seen specifically with larger
> > > > > directories
> > > > > (I tested with 1 million) and/or slower NFS server responses to
> > > > > READDIR.  If a combination of the NFS directory size, the NFS
> > > > > server
> > > > > responses to READDIR is such that the 'ls' process gets partially
> > > > > through the listing before the attribute cache expires (time
> > > > > exceeds acdirmax), we drop the pagecache and have to re-fill it,
> > > > > and as a result, the process may never complete.  One could argue
> > > > > for larger directories the acdirmin/acdirmax should be increased,
> > > > > but it's not always possible to tune this effectively.
> > > > >
> > > > > The root cause of this problem is due to how the NFS readdir
> > > > > cache
> > > > > currently works.  The main search function,
> > > > > readdir_search_pagecache(),
> > > > > always starts searching at page_index and cookie == 0, and for
> > > > > any
> > > > > page not in the cache, fills in the page with entries obtained in
> > > > > a READDIR NFS call.  If a page already exists, we proceed to
> > > > > nfs_readdir_search_for_cookie(), which searches for the cookie
> > > > > (pos) of the readdir call.  The search is O(n), where n is the
> > > > > directory size before the cookie in question is found, and every
> > > > > entry to nfs_readdir() pays this penalty, irrespective of the
> > > > > current directory position (dir_context.pos).  The search is
> > > > > expensive due to the opaque nature of readdir cookies, and the
> > > > > fact
> > > > > that no mapping (hash) exists from cookies to pages.  In the case
> > > > > of a directory being modified, the above behavior can become an
> > > > > excessive penalty, since the same process is forced to fill pages
> > > > > it
> > > > > may be no longer interested in (the entries were passed in a
> > > > > previous
> > > > > nfs_readdir call), and this can essentially lead no forward
> > > > > progress.
> > > > >
> > > > > To fix this problem, at the end of nfs_readdir(), save the
> > > > > page_index
> > > > > corresponding to the directory position (cookie) inside the
> > > > > process's
> > > > > nfs_open_dir_context.  Then at the next entry of nfs_readdir(),
> > > > > use
> > > > > the saved page_index as the starting search point rather than
> > > > > starting
> > > > > at page_index == 0.  Not only does this fix the problem of
> > > > > listing
> > > > > a directory being modified, it also significantly improves
> > > > > performance
> > > > > in the unmodified case since no extra search penalty is paid at
> > > > > each
> > > > > entry to nfs_readdir().
> > > > >
> > > > > In the case of lseek, since there is no hash or other mapping
> > > > > from a
> > > > > cookie value to the page->index, just reset
> > > > > nfs_open_dir_context.page_index
> > > > > to 0, which will reset the search to the old behavior.
> > > > >
> > > > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> > > > > ---
> > > > >  fs/nfs/dir.c           | 8 +++++++-
> > > > >  include/linux/nfs_fs.h | 1 +
> > > > >  2 files changed, 8 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > > > > index 52e06c8fc7cd..b266f505b521 100644
> > > > > --- a/fs/nfs/dir.c
> > > > > +++ b/fs/nfs/dir.c
> > > > > @@ -78,6 +78,7 @@ static struct nfs_open_dir_context
> > > > > *alloc_nfs_open_dir_context(struct inode *dir
> > > > >                 ctx->attr_gencount = nfsi->attr_gencount;
> > > > >                 ctx->dir_cookie = 0;
> > > > >                 ctx->dup_cookie = 0;
> > > > > +               ctx->page_index = 0;
> > > > >                 ctx->cred = get_cred(cred);
> > > > >                 spin_lock(&dir->i_lock);
> > > > >                 if (list_empty(&nfsi->open_files) &&
> > > > > @@ -763,7 +764,7 @@ int
> > > > > find_and_lock_cache_page(nfs_readdir_descriptor_t *desc)
> > > > >         return res;
> > > > >  }
> > > > >
> > > > > -/* Search for desc->dir_cookie from the beginning of the page
> > > > > cache
> > > > > */
> > > > > +/* Search for desc->dir_cookie starting at desc->page_index */
> > > > >  static inline
> > > > >  int readdir_search_pagecache(nfs_readdir_descriptor_t *desc)
> > > > >  {
> > > > > @@ -885,6 +886,8 @@ static int nfs_readdir(struct file *file,
> > > > > struct
> > > > > dir_context *ctx)
> > > > >                 .ctx = ctx,
> > > > >                 .dir_cookie = &dir_ctx->dir_cookie,
> > > > >                 .plus = nfs_use_readdirplus(inode, ctx),
> > > > > +               .page_index = dir_ctx->page_index,
> > > > > +               .last_cookie = nfs_readdir_use_cookie(file) ?
> > > > > ctx-
> > > > > > pos : 0,
> > > > >         },
> > > > >                         *desc = &my_desc;
> > > > >         int res = 0;
> > > > > @@ -938,6 +941,7 @@ static int nfs_readdir(struct file *file,
> > > > > struct
> > > > > dir_context *ctx)
> > > > >  out:
> > > > >         if (res > 0)
> > > > >                 res = 0;
> > > > > +       dir_ctx->page_index = desc->page_index;
> > > > >         trace_nfs_readdir_exit(inode, ctx->pos, dir_ctx-
> > > > > >dir_cookie,
> > > > >                                NFS_SERVER(inode)->dtsize,
> > > > > my_desc.plus, res);
> > > > >         return res;
> > > > > @@ -975,6 +979,8 @@ static loff_t nfs_llseek_dir(struct file
> > > > > *filp,
> > > > > loff_t offset, int whence)
> > > > >                 else
> > > > >                         dir_ctx->dir_cookie = 0;
> > > > >                 dir_ctx->duped = 0;
> > > > > +               /* Force readdir_search_pagecache to start over
> > > > > */
> > > > > +               dir_ctx->page_index = 0;
> > > > >         }
> > > > >         inode_unlock(inode);
> > > > >         return offset;
> > > > > diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> > > > > index a2c6455ea3fa..0e55c0154ccd 100644
> > > > > --- a/include/linux/nfs_fs.h
> > > > > +++ b/include/linux/nfs_fs.h
> > > > > @@ -93,6 +93,7 @@ struct nfs_open_dir_context {
> > > > >         __u64 dir_cookie;
> > > > >         __u64 dup_cookie;
> > > > >         signed char duped;
> > > > > +       unsigned long   page_index;
> > > > >  };
> > > > >
> > > > >  /*
> > > >
> > > > NACK. It makes no sense to store the page index as a cursor.
> > > >
> > >
> > > A similar thing was done recently with:
> > > 227823d2074d nfs: optimise readdir cache page invalidation
> > >
> >
> > That's a very different thing. It is about discarding page data in
> > order to force a re-read of the contents into cache.
> >
> Right - I only pointed it out because it is in effect a cursor about
> the last access into the cache but it's on a global basis, not
> process context.
> 
> > What you're doing is basically trying to guess where the data is
> > located. which might work in some cases where the directory is
> > completely static, but if it shrinks (e.g. due to a few unlink() or
> > rename() calls) so that you overshoot the cookie, then you can end up
> > reading all the way to the end of the directory before doing an
> > uncached readdir.
> >
> First, consider the unmodified (idle directory) scenario.  Today the
> performance is bad the larger the directory goes - do you see why?
> I tried to explain in the cover letter and header but maybe it's not clear?

I agree with you that the patch is good for that case. It seems
like a bug that, if the cache is not invalidated in the meantime,
nfs_readdir would start at offset 0 of the cache while searching.

But, as Trond said, page_index isn't a good cursor once the
page cache was invalidated, you don't know what data you are
leaving out inadvertently.

> 
> Second, the modified scenario today the performance is very bad
> because of the same problem - the cookie is reset and the process
> needs to start over at cookie 0, repeating READDIRs.  But maybe
> there's a specific scenario I'm not thinking about.
> 
> The way I thought about this is that if you're in a heavily modified
> scenario with a large directory and you're past the 'acdirmax' time,
> you have to make the choice of either:
> a) ignoring 'acdirmax' (this is what the NFSv3 patch did) and even
> though you know the cache expired you keep going as though it
> did not (at least until a different process starts a listing)
> b) honoring 'acdirmax' (drop the pagecache), but keep going the
> best you can based on the previous information and don't try to
> rebuild the cache before continuing.

Yeah, it seems like those are your options. The only other way
of doing this that I can think of would be to add logic that
says something like:

	if (pos != 0 and the cache is invalid)
		uncached_readdir()

..which is, by the looks of it, easily done by having
readdir_search_pagecache() return EBADCOOKIE if
there's no cache, and the offset is not 0. E.g. it
shouldn't try to fill the cache in that case, but
just let uncached reads take over.

The idea here is that the only information that is possibly
still valid is the cookie - and that the server will know
how to use the cookie to keep going where you left off.

Of course, this mean that, in the case of another client
modifying the directory, you'll end up doing mostly / all
uncached readdirs after an invalidate. Same for a scenario
where a client did: open llseek(<nonzero_offset), getdents().
Until another process starts at offset 0 and re-fills the
cache, of course.

I'm not sure if that's all that bad, though, it depends
on common usage patterns.

- Frank

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 09/11] NFS: Improve performance of listing directories being modified
  2020-11-02 22:05             ` David Wysochanski
@ 2020-11-03  3:38               ` Trond Myklebust
  2020-11-03 13:29                 ` David Wysochanski
  0 siblings, 1 reply; 29+ messages in thread
From: Trond Myklebust @ 2020-11-03  3:38 UTC (permalink / raw)
  To: dwysocha; +Cc: linux-nfs, anna.schumaker

On Mon, 2020-11-02 at 17:05 -0500, David Wysochanski wrote:
> On Mon, Nov 2, 2020 at 4:30 PM Trond Myklebust <
> trondmy@hammerspace.com> wrote:
> > 
> > On Mon, 2020-11-02 at 14:45 -0500, David Wysochanski wrote:
> > > On Mon, Nov 2, 2020 at 12:31 PM Trond Myklebust <
> > > trondmy@hammerspace.com> wrote:
> > > > 
> > > > On Mon, 2020-11-02 at 11:26 -0500, David Wysochanski wrote:
> > > > > On Mon, Nov 2, 2020 at 11:22 AM Trond Myklebust <
> > > > > trondmy@hammerspace.com> wrote:
> > > > > > 
> > > > > > On Mon, 2020-11-02 at 08:50 -0500, Dave Wysochanski wrote:
> > > > > > > A process can hang forever to 'ls -l' a directory while
> > > > > > > the
> > > > > > > directory
> > > > > > > is being modified such as another NFS client adding files
> > > > > > > to
> > > > > > > the
> > > > > > > directory.  The problem is seen specifically with larger
> > > > > > > directories
> > > > > > > (I tested with 1 million) and/or slower NFS server
> > > > > > > responses
> > > > > > > to
> > > > > > > READDIR.  If a combination of the NFS directory size, the
> > > > > > > NFS
> > > > > > > server
> > > > > > > responses to READDIR is such that the 'ls' process gets
> > > > > > > partially
> > > > > > > through the listing before the attribute cache expires
> > > > > > > (time
> > > > > > > exceeds acdirmax), we drop the pagecache and have to re-
> > > > > > > fill
> > > > > > > it,
> > > > > > > and as a result, the process may never complete.  One
> > > > > > > could
> > > > > > > argue
> > > > > > > for larger directories the acdirmin/acdirmax should be
> > > > > > > increased,
> > > > > > > but it's not always possible to tune this effectively.
> > > > > > > 
> > > > > > > The root cause of this problem is due to how the NFS
> > > > > > > readdir
> > > > > > > cache
> > > > > > > currently works.  The main search function,
> > > > > > > readdir_search_pagecache(),
> > > > > > > always starts searching at page_index and cookie == 0,
> > > > > > > and
> > > > > > > for
> > > > > > > any
> > > > > > > page not in the cache, fills in the page with entries
> > > > > > > obtained in
> > > > > > > a READDIR NFS call.  If a page already exists, we proceed
> > > > > > > to
> > > > > > > nfs_readdir_search_for_cookie(), which searches for the
> > > > > > > cookie
> > > > > > > (pos) of the readdir call.  The search is O(n), where n
> > > > > > > is
> > > > > > > the
> > > > > > > directory size before the cookie in question is found,
> > > > > > > and
> > > > > > > every
> > > > > > > entry to nfs_readdir() pays this penalty, irrespective of
> > > > > > > the
> > > > > > > current directory position (dir_context.pos).  The search
> > > > > > > is
> > > > > > > expensive due to the opaque nature of readdir cookies,
> > > > > > > and
> > > > > > > the
> > > > > > > fact
> > > > > > > that no mapping (hash) exists from cookies to pages.  In
> > > > > > > the
> > > > > > > case
> > > > > > > of a directory being modified, the above behavior can
> > > > > > > become
> > > > > > > an
> > > > > > > excessive penalty, since the same process is forced to
> > > > > > > fill
> > > > > > > pages
> > > > > > > it
> > > > > > > may be no longer interested in (the entries were passed
> > > > > > > in a
> > > > > > > previous
> > > > > > > nfs_readdir call), and this can essentially lead no
> > > > > > > forward
> > > > > > > progress.
> > > > > > > 
> > > > > > > To fix this problem, at the end of nfs_readdir(), save
> > > > > > > the
> > > > > > > page_index
> > > > > > > corresponding to the directory position (cookie) inside
> > > > > > > the
> > > > > > > process's
> > > > > > > nfs_open_dir_context.  Then at the next entry of
> > > > > > > nfs_readdir(),
> > > > > > > use
> > > > > > > the saved page_index as the starting search point rather
> > > > > > > than
> > > > > > > starting
> > > > > > > at page_index == 0.  Not only does this fix the problem
> > > > > > > of
> > > > > > > listing
> > > > > > > a directory being modified, it also significantly
> > > > > > > improves
> > > > > > > performance
> > > > > > > in the unmodified case since no extra search penalty is
> > > > > > > paid
> > > > > > > at
> > > > > > > each
> > > > > > > entry to nfs_readdir().
> > > > > > > 
> > > > > > > In the case of lseek, since there is no hash or other
> > > > > > > mapping
> > > > > > > from a
> > > > > > > cookie value to the page->index, just reset
> > > > > > > nfs_open_dir_context.page_index
> > > > > > > to 0, which will reset the search to the old behavior.
> > > > > > > 
> > > > > > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> > > > > > > ---
> > > > > > >  fs/nfs/dir.c           | 8 +++++++-
> > > > > > >  include/linux/nfs_fs.h | 1 +
> > > > > > >  2 files changed, 8 insertions(+), 1 deletion(-)
> > > > > > > 
> > > > > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > > > > > > index 52e06c8fc7cd..b266f505b521 100644
> > > > > > > --- a/fs/nfs/dir.c
> > > > > > > +++ b/fs/nfs/dir.c
> > > > > > > @@ -78,6 +78,7 @@ static struct nfs_open_dir_context
> > > > > > > *alloc_nfs_open_dir_context(struct inode *dir
> > > > > > >                 ctx->attr_gencount = nfsi->attr_gencount;
> > > > > > >                 ctx->dir_cookie = 0;
> > > > > > >                 ctx->dup_cookie = 0;
> > > > > > > +               ctx->page_index = 0;
> > > > > > >                 ctx->cred = get_cred(cred);
> > > > > > >                 spin_lock(&dir->i_lock);
> > > > > > >                 if (list_empty(&nfsi->open_files) &&
> > > > > > > @@ -763,7 +764,7 @@ int
> > > > > > > find_and_lock_cache_page(nfs_readdir_descriptor_t *desc)
> > > > > > >         return res;
> > > > > > >  }
> > > > > > > 
> > > > > > > -/* Search for desc->dir_cookie from the beginning of the
> > > > > > > page
> > > > > > > cache
> > > > > > > */
> > > > > > > +/* Search for desc->dir_cookie starting at desc-
> > > > > > > >page_index
> > > > > > > */
> > > > > > >  static inline
> > > > > > >  int readdir_search_pagecache(nfs_readdir_descriptor_t
> > > > > > > *desc)
> > > > > > >  {
> > > > > > > @@ -885,6 +886,8 @@ static int nfs_readdir(struct file
> > > > > > > *file,
> > > > > > > struct
> > > > > > > dir_context *ctx)
> > > > > > >                 .ctx = ctx,
> > > > > > >                 .dir_cookie = &dir_ctx->dir_cookie,
> > > > > > >                 .plus = nfs_use_readdirplus(inode, ctx),
> > > > > > > +               .page_index = dir_ctx->page_index,
> > > > > > > +               .last_cookie =
> > > > > > > nfs_readdir_use_cookie(file) ?
> > > > > > > ctx-
> > > > > > > > pos : 0,
> > > > > > >         },
> > > > > > >                         *desc = &my_desc;
> > > > > > >         int res = 0;
> > > > > > > @@ -938,6 +941,7 @@ static int nfs_readdir(struct file
> > > > > > > *file,
> > > > > > > struct
> > > > > > > dir_context *ctx)
> > > > > > >  out:
> > > > > > >         if (res > 0)
> > > > > > >                 res = 0;
> > > > > > > +       dir_ctx->page_index = desc->page_index;
> > > > > > >         trace_nfs_readdir_exit(inode, ctx->pos, dir_ctx-
> > > > > > > > dir_cookie,
> > > > > > >                                NFS_SERVER(inode)->dtsize,
> > > > > > > my_desc.plus, res);
> > > > > > >         return res;
> > > > > > > @@ -975,6 +979,8 @@ static loff_t nfs_llseek_dir(struct
> > > > > > > file
> > > > > > > *filp,
> > > > > > > loff_t offset, int whence)
> > > > > > >                 else
> > > > > > >                         dir_ctx->dir_cookie = 0;
> > > > > > >                 dir_ctx->duped = 0;
> > > > > > > +               /* Force readdir_search_pagecache to
> > > > > > > start
> > > > > > > over
> > > > > > > */
> > > > > > > +               dir_ctx->page_index = 0;
> > > > > > >         }
> > > > > > >         inode_unlock(inode);
> > > > > > >         return offset;
> > > > > > > diff --git a/include/linux/nfs_fs.h
> > > > > > > b/include/linux/nfs_fs.h
> > > > > > > index a2c6455ea3fa..0e55c0154ccd 100644
> > > > > > > --- a/include/linux/nfs_fs.h
> > > > > > > +++ b/include/linux/nfs_fs.h
> > > > > > > @@ -93,6 +93,7 @@ struct nfs_open_dir_context {
> > > > > > >         __u64 dir_cookie;
> > > > > > >         __u64 dup_cookie;
> > > > > > >         signed char duped;
> > > > > > > +       unsigned long   page_index;
> > > > > > >  };
> > > > > > > 
> > > > > > >  /*
> > > > > > 
> > > > > > NACK. It makes no sense to store the page index as a
> > > > > > cursor.
> > > > > > 
> > > > > 
> > > > > A similar thing was done recently with:
> > > > > 227823d2074d nfs: optimise readdir cache page invalidation
> > > > > 
> > > > 
> > > > That's a very different thing. It is about discarding page data
> > > > in
> > > > order to force a re-read of the contents into cache.
> > > > 
> > > Right - I only pointed it out because it is in effect a cursor
> > > about
> > > the last access into the cache but it's on a global basis, not
> > > process context.
> > > 
> > > > What you're doing is basically trying to guess where the data
> > > > is
> > > > located. which might work in some cases where the directory is
> > > > completely static, but if it shrinks (e.g. due to a few
> > > > unlink() or
> > > > rename() calls) so that you overshoot the cookie, then you can
> > > > end
> > > > up
> > > > reading all the way to the end of the directory before doing an
> > > > uncached readdir.
> > > > 
> > > First, consider the unmodified (idle directory) scenario.  Today
> > > the
> > > performance is bad the larger the directory goes - do you see
> > > why?
> > > I tried to explain in the cover letter and header but maybe it's
> > > not
> > > clear?
> > > 
> > > Second, the modified scenario today the performance is very bad
> > > because of the same problem - the cookie is reset and the process
> > > needs to start over at cookie 0, repeating READDIRs.  But maybe
> > > there's a specific scenario I'm not thinking about.
> > > 
> > > The way I thought about this is that if you're in a heavily
> > > modified
> > > scenario with a large directory and you're past the 'acdirmax'
> > > time,
> > > you have to make the choice of either:
> > > a) ignoring 'acdirmax' (this is what the NFSv3 patch did) and
> > > even
> > > though you know the cache expired you keep going as though it
> > > did not (at least until a different process starts a listing)
> > > b) honoring 'acdirmax' (drop the pagecache), but keep going the
> > > best you can based on the previous information and don't try to
> > > rebuild the cache before continuing.
> > > 
> > > > IOW: This will have a detrimental effect for some workloads,
> > > > which
> > > > needs to be weighed up against the benefits. I saw that you've
> > > > tested
> > > > with large directories, but what workloads were you testing on
> > > > those
> > > > directories?
> > > > 
> > > I can definitely do further testing and any scenario you want to
> > > try
> > > to
> > > break it or find a pathological scenario. So far I've tested the
> > > reader ("ls -lf") in parallel with one of the two writers:
> > > 1) random add a file every 0.1s:
> > > while true; do i=$((1 + RANDOM % $NUM_FILES)); echo $i; touch
> > > $MNT2/file$i.bin; builtin sleep 0.1; done > /dev/null 2>&1 &
> > > 2) random delete a file every 0.1 s:
> > > while true; do i=$((1 + RANDOM % $NUM_FILES)); echo $i; rm -f
> > > $MNT2/file$i; builtin sleep 0.1; done > /dev/null 2>&1 &
> > > 
> > > In no case did I see it take a longer time or ops vs vanilla 5.9,
> > > the
> > > idle
> > > and modified performance is better (measured in seconds and ops)
> > > with this patch.  Below is a short summary.  Note that the first
> > > time
> > > and
> > > ops is with an idle directory, and the second one is the
> > > modified.
> > > 
> > > 5.9 (vanilla): random delete a file every 0.1 s:
> > > Ops increased from 4734 to 8834
> > > Time increased from 23 to 44
> > > 
> > > 5.9 (this patch): random delete a file every 0.1 s:
> > > Ops increased from 4697 to 4696
> > > Time increased from 20 to 30
> > > 
> > > 
> > > 5.9 (vanilla): random add a file every 0.1s:
> > > Ops increased from 4734 to 9168
> > > Time increased from 23 to 43
> > > 
> > > 5.9 (this patch): random add a file every 0.1s:
> > > Ops increased from 4697 to 4702
> > > Time increased from 21 to 32
> > > 
> > 
> > If you're not seeing any change in number of ops then those numbers
> > are
> > basically telling you that you're not seeing any cache
> > invalidation.
> > You should be seeing cache invalidation when you are creating and
> > deleting files and are doing simultaneous readdirs.
> > 
> 
> No I think you're misunderstanding or we're not on the same page.
> The difference is, with 5.9 vanilla when the invalidation occurs, the
> reader process has to go back to cookie 0 and pays the penalty
> to refill all the cache pages up to cookie 'N'.  With the patch this
> is not the case - it continues on with cookie 'N' and does not pay
> the
> penalty - the next reader does
> 

Then I still don't see how you can avoid corrupting the page cache when
you're in effect just starting filling in cookies at a random page
cache index + offset that bears no relation to where the cookie would
end up if the reader started from 0.

However leaving aside that issue, what's the point of using the page
cache if the next reader has to clear out the data and start afresh
anyway? Why not just let all the threads avoid the page cache and just
do uncached readdir?

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 09/11] NFS: Improve performance of listing directories being modified
  2020-11-03  3:38               ` Trond Myklebust
@ 2020-11-03 13:29                 ` David Wysochanski
  0 siblings, 0 replies; 29+ messages in thread
From: David Wysochanski @ 2020-11-03 13:29 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs, anna.schumaker

On Mon, Nov 2, 2020 at 10:39 PM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> On Mon, 2020-11-02 at 17:05 -0500, David Wysochanski wrote:
> > On Mon, Nov 2, 2020 at 4:30 PM Trond Myklebust <
> > trondmy@hammerspace.com> wrote:
> > >
> > > On Mon, 2020-11-02 at 14:45 -0500, David Wysochanski wrote:
> > > > On Mon, Nov 2, 2020 at 12:31 PM Trond Myklebust <
> > > > trondmy@hammerspace.com> wrote:
> > > > >
> > > > > On Mon, 2020-11-02 at 11:26 -0500, David Wysochanski wrote:
> > > > > > On Mon, Nov 2, 2020 at 11:22 AM Trond Myklebust <
> > > > > > trondmy@hammerspace.com> wrote:
> > > > > > >
> > > > > > > On Mon, 2020-11-02 at 08:50 -0500, Dave Wysochanski wrote:
> > > > > > > > A process can hang forever to 'ls -l' a directory while
> > > > > > > > the
> > > > > > > > directory
> > > > > > > > is being modified such as another NFS client adding files
> > > > > > > > to
> > > > > > > > the
> > > > > > > > directory.  The problem is seen specifically with larger
> > > > > > > > directories
> > > > > > > > (I tested with 1 million) and/or slower NFS server
> > > > > > > > responses
> > > > > > > > to
> > > > > > > > READDIR.  If a combination of the NFS directory size, the
> > > > > > > > NFS
> > > > > > > > server
> > > > > > > > responses to READDIR is such that the 'ls' process gets
> > > > > > > > partially
> > > > > > > > through the listing before the attribute cache expires
> > > > > > > > (time
> > > > > > > > exceeds acdirmax), we drop the pagecache and have to re-
> > > > > > > > fill
> > > > > > > > it,
> > > > > > > > and as a result, the process may never complete.  One
> > > > > > > > could
> > > > > > > > argue
> > > > > > > > for larger directories the acdirmin/acdirmax should be
> > > > > > > > increased,
> > > > > > > > but it's not always possible to tune this effectively.
> > > > > > > >
> > > > > > > > The root cause of this problem is due to how the NFS
> > > > > > > > readdir
> > > > > > > > cache
> > > > > > > > currently works.  The main search function,
> > > > > > > > readdir_search_pagecache(),
> > > > > > > > always starts searching at page_index and cookie == 0,
> > > > > > > > and
> > > > > > > > for
> > > > > > > > any
> > > > > > > > page not in the cache, fills in the page with entries
> > > > > > > > obtained in
> > > > > > > > a READDIR NFS call.  If a page already exists, we proceed
> > > > > > > > to
> > > > > > > > nfs_readdir_search_for_cookie(), which searches for the
> > > > > > > > cookie
> > > > > > > > (pos) of the readdir call.  The search is O(n), where n
> > > > > > > > is
> > > > > > > > the
> > > > > > > > directory size before the cookie in question is found,
> > > > > > > > and
> > > > > > > > every
> > > > > > > > entry to nfs_readdir() pays this penalty, irrespective of
> > > > > > > > the
> > > > > > > > current directory position (dir_context.pos).  The search
> > > > > > > > is
> > > > > > > > expensive due to the opaque nature of readdir cookies,
> > > > > > > > and
> > > > > > > > the
> > > > > > > > fact
> > > > > > > > that no mapping (hash) exists from cookies to pages.  In
> > > > > > > > the
> > > > > > > > case
> > > > > > > > of a directory being modified, the above behavior can
> > > > > > > > become
> > > > > > > > an
> > > > > > > > excessive penalty, since the same process is forced to
> > > > > > > > fill
> > > > > > > > pages
> > > > > > > > it
> > > > > > > > may be no longer interested in (the entries were passed
> > > > > > > > in a
> > > > > > > > previous
> > > > > > > > nfs_readdir call), and this can essentially lead no
> > > > > > > > forward
> > > > > > > > progress.
> > > > > > > >
> > > > > > > > To fix this problem, at the end of nfs_readdir(), save
> > > > > > > > the
> > > > > > > > page_index
> > > > > > > > corresponding to the directory position (cookie) inside
> > > > > > > > the
> > > > > > > > process's
> > > > > > > > nfs_open_dir_context.  Then at the next entry of
> > > > > > > > nfs_readdir(),
> > > > > > > > use
> > > > > > > > the saved page_index as the starting search point rather
> > > > > > > > than
> > > > > > > > starting
> > > > > > > > at page_index == 0.  Not only does this fix the problem
> > > > > > > > of
> > > > > > > > listing
> > > > > > > > a directory being modified, it also significantly
> > > > > > > > improves
> > > > > > > > performance
> > > > > > > > in the unmodified case since no extra search penalty is
> > > > > > > > paid
> > > > > > > > at
> > > > > > > > each
> > > > > > > > entry to nfs_readdir().
> > > > > > > >
> > > > > > > > In the case of lseek, since there is no hash or other
> > > > > > > > mapping
> > > > > > > > from a
> > > > > > > > cookie value to the page->index, just reset
> > > > > > > > nfs_open_dir_context.page_index
> > > > > > > > to 0, which will reset the search to the old behavior.
> > > > > > > >
> > > > > > > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> > > > > > > > ---
> > > > > > > >  fs/nfs/dir.c           | 8 +++++++-
> > > > > > > >  include/linux/nfs_fs.h | 1 +
> > > > > > > >  2 files changed, 8 insertions(+), 1 deletion(-)
> > > > > > > >
> > > > > > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > > > > > > > index 52e06c8fc7cd..b266f505b521 100644
> > > > > > > > --- a/fs/nfs/dir.c
> > > > > > > > +++ b/fs/nfs/dir.c
> > > > > > > > @@ -78,6 +78,7 @@ static struct nfs_open_dir_context
> > > > > > > > *alloc_nfs_open_dir_context(struct inode *dir
> > > > > > > >                 ctx->attr_gencount = nfsi->attr_gencount;
> > > > > > > >                 ctx->dir_cookie = 0;
> > > > > > > >                 ctx->dup_cookie = 0;
> > > > > > > > +               ctx->page_index = 0;
> > > > > > > >                 ctx->cred = get_cred(cred);
> > > > > > > >                 spin_lock(&dir->i_lock);
> > > > > > > >                 if (list_empty(&nfsi->open_files) &&
> > > > > > > > @@ -763,7 +764,7 @@ int
> > > > > > > > find_and_lock_cache_page(nfs_readdir_descriptor_t *desc)
> > > > > > > >         return res;
> > > > > > > >  }
> > > > > > > >
> > > > > > > > -/* Search for desc->dir_cookie from the beginning of the
> > > > > > > > page
> > > > > > > > cache
> > > > > > > > */
> > > > > > > > +/* Search for desc->dir_cookie starting at desc-
> > > > > > > > >page_index
> > > > > > > > */
> > > > > > > >  static inline
> > > > > > > >  int readdir_search_pagecache(nfs_readdir_descriptor_t
> > > > > > > > *desc)
> > > > > > > >  {
> > > > > > > > @@ -885,6 +886,8 @@ static int nfs_readdir(struct file
> > > > > > > > *file,
> > > > > > > > struct
> > > > > > > > dir_context *ctx)
> > > > > > > >                 .ctx = ctx,
> > > > > > > >                 .dir_cookie = &dir_ctx->dir_cookie,
> > > > > > > >                 .plus = nfs_use_readdirplus(inode, ctx),
> > > > > > > > +               .page_index = dir_ctx->page_index,
> > > > > > > > +               .last_cookie =
> > > > > > > > nfs_readdir_use_cookie(file) ?
> > > > > > > > ctx-
> > > > > > > > > pos : 0,
> > > > > > > >         },
> > > > > > > >                         *desc = &my_desc;
> > > > > > > >         int res = 0;
> > > > > > > > @@ -938,6 +941,7 @@ static int nfs_readdir(struct file
> > > > > > > > *file,
> > > > > > > > struct
> > > > > > > > dir_context *ctx)
> > > > > > > >  out:
> > > > > > > >         if (res > 0)
> > > > > > > >                 res = 0;
> > > > > > > > +       dir_ctx->page_index = desc->page_index;
> > > > > > > >         trace_nfs_readdir_exit(inode, ctx->pos, dir_ctx-
> > > > > > > > > dir_cookie,
> > > > > > > >                                NFS_SERVER(inode)->dtsize,
> > > > > > > > my_desc.plus, res);
> > > > > > > >         return res;
> > > > > > > > @@ -975,6 +979,8 @@ static loff_t nfs_llseek_dir(struct
> > > > > > > > file
> > > > > > > > *filp,
> > > > > > > > loff_t offset, int whence)
> > > > > > > >                 else
> > > > > > > >                         dir_ctx->dir_cookie = 0;
> > > > > > > >                 dir_ctx->duped = 0;
> > > > > > > > +               /* Force readdir_search_pagecache to
> > > > > > > > start
> > > > > > > > over
> > > > > > > > */
> > > > > > > > +               dir_ctx->page_index = 0;
> > > > > > > >         }
> > > > > > > >         inode_unlock(inode);
> > > > > > > >         return offset;
> > > > > > > > diff --git a/include/linux/nfs_fs.h
> > > > > > > > b/include/linux/nfs_fs.h
> > > > > > > > index a2c6455ea3fa..0e55c0154ccd 100644
> > > > > > > > --- a/include/linux/nfs_fs.h
> > > > > > > > +++ b/include/linux/nfs_fs.h
> > > > > > > > @@ -93,6 +93,7 @@ struct nfs_open_dir_context {
> > > > > > > >         __u64 dir_cookie;
> > > > > > > >         __u64 dup_cookie;
> > > > > > > >         signed char duped;
> > > > > > > > +       unsigned long   page_index;
> > > > > > > >  };
> > > > > > > >
> > > > > > > >  /*
> > > > > > >
> > > > > > > NACK. It makes no sense to store the page index as a
> > > > > > > cursor.
> > > > > > >
> > > > > >
> > > > > > A similar thing was done recently with:
> > > > > > 227823d2074d nfs: optimise readdir cache page invalidation
> > > > > >
> > > > >
> > > > > That's a very different thing. It is about discarding page data
> > > > > in
> > > > > order to force a re-read of the contents into cache.
> > > > >
> > > > Right - I only pointed it out because it is in effect a cursor
> > > > about
> > > > the last access into the cache but it's on a global basis, not
> > > > process context.
> > > >
> > > > > What you're doing is basically trying to guess where the data
> > > > > is
> > > > > located. which might work in some cases where the directory is
> > > > > completely static, but if it shrinks (e.g. due to a few
> > > > > unlink() or
> > > > > rename() calls) so that you overshoot the cookie, then you can
> > > > > end
> > > > > up
> > > > > reading all the way to the end of the directory before doing an
> > > > > uncached readdir.
> > > > >
> > > > First, consider the unmodified (idle directory) scenario.  Today
> > > > the
> > > > performance is bad the larger the directory goes - do you see
> > > > why?
> > > > I tried to explain in the cover letter and header but maybe it's
> > > > not
> > > > clear?
> > > >
> > > > Second, the modified scenario today the performance is very bad
> > > > because of the same problem - the cookie is reset and the process
> > > > needs to start over at cookie 0, repeating READDIRs.  But maybe
> > > > there's a specific scenario I'm not thinking about.
> > > >
> > > > The way I thought about this is that if you're in a heavily
> > > > modified
> > > > scenario with a large directory and you're past the 'acdirmax'
> > > > time,
> > > > you have to make the choice of either:
> > > > a) ignoring 'acdirmax' (this is what the NFSv3 patch did) and
> > > > even
> > > > though you know the cache expired you keep going as though it
> > > > did not (at least until a different process starts a listing)
> > > > b) honoring 'acdirmax' (drop the pagecache), but keep going the
> > > > best you can based on the previous information and don't try to
> > > > rebuild the cache before continuing.
> > > >
> > > > > IOW: This will have a detrimental effect for some workloads,
> > > > > which
> > > > > needs to be weighed up against the benefits. I saw that you've
> > > > > tested
> > > > > with large directories, but what workloads were you testing on
> > > > > those
> > > > > directories?
> > > > >
> > > > I can definitely do further testing and any scenario you want to
> > > > try
> > > > to
> > > > break it or find a pathological scenario. So far I've tested the
> > > > reader ("ls -lf") in parallel with one of the two writers:
> > > > 1) random add a file every 0.1s:
> > > > while true; do i=$((1 + RANDOM % $NUM_FILES)); echo $i; touch
> > > > $MNT2/file$i.bin; builtin sleep 0.1; done > /dev/null 2>&1 &
> > > > 2) random delete a file every 0.1 s:
> > > > while true; do i=$((1 + RANDOM % $NUM_FILES)); echo $i; rm -f
> > > > $MNT2/file$i; builtin sleep 0.1; done > /dev/null 2>&1 &
> > > >
> > > > In no case did I see it take a longer time or ops vs vanilla 5.9,
> > > > the
> > > > idle
> > > > and modified performance is better (measured in seconds and ops)
> > > > with this patch.  Below is a short summary.  Note that the first
> > > > time
> > > > and
> > > > ops is with an idle directory, and the second one is the
> > > > modified.
> > > >
> > > > 5.9 (vanilla): random delete a file every 0.1 s:
> > > > Ops increased from 4734 to 8834
> > > > Time increased from 23 to 44
> > > >
> > > > 5.9 (this patch): random delete a file every 0.1 s:
> > > > Ops increased from 4697 to 4696
> > > > Time increased from 20 to 30
> > > >
> > > >
> > > > 5.9 (vanilla): random add a file every 0.1s:
> > > > Ops increased from 4734 to 9168
> > > > Time increased from 23 to 43
> > > >
> > > > 5.9 (this patch): random add a file every 0.1s:
> > > > Ops increased from 4697 to 4702
> > > > Time increased from 21 to 32
> > > >
> > >
> > > If you're not seeing any change in number of ops then those numbers
> > > are
> > > basically telling you that you're not seeing any cache
> > > invalidation.
> > > You should be seeing cache invalidation when you are creating and
> > > deleting files and are doing simultaneous readdirs.
> > >
> >
> > No I think you're misunderstanding or we're not on the same page.
> > The difference is, with 5.9 vanilla when the invalidation occurs, the
> > reader process has to go back to cookie 0 and pays the penalty
> > to refill all the cache pages up to cookie 'N'.  With the patch this
> > is not the case - it continues on with cookie 'N' and does not pay
> > the
> > penalty - the next reader does
> >
>
> Then I still don't see how you can avoid corrupting the page cache when
> you're in effect just starting filling in cookies at a random page
> cache index + offset that bears no relation to where the cookie would
> end up if the reader started from 0.
>
I was concerned about this too.

I don't have absolute proof of the pagecache state, but I did do
some tests in live crash listing the pages on the directory inode
while I had deleted a portion of the files in the directory and a
listing was going to see if the pagecache got messed up.  I
think we may end up with pages that get in effect "temporarily orphaned"
by the one reader that runs into this condition, but the next reader
will start at 0 and so those pages will only be temporary and not
findable by anyone else.  If the directory is constantly changing,
then the pagecache will get dumped again soon.  But as you point
out it probably is not a good approach.

> However leaving aside that issue, what's the point of using the page
> cache if the next reader has to clear out the data and start afresh
> anyway? Why not just let all the threads avoid the page cache and just
> do uncached readdir?
>
Yes I agree.  The problem I'm working on finding a solution for is
such that the cache is not valuable anymore - basically we have
timers (acdirmin/acdirmax) that govern how long the cache is valid
and we blow past those when doing one pass through the directory.
In that instance we mainly want to ensure forward progress, so
it would make sense to shift to uncached operation.



> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> trond.myklebust@hammerspace.com
>
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 09/11] NFS: Improve performance of listing directories being modified
  2020-11-03  0:09           ` Frank van der Linden
@ 2020-11-03 17:49             ` David Wysochanski
  0 siblings, 0 replies; 29+ messages in thread
From: David Wysochanski @ 2020-11-03 17:49 UTC (permalink / raw)
  To: Frank van der Linden; +Cc: Trond Myklebust, linux-nfs, anna.schumaker

On Mon, Nov 2, 2020 at 7:09 PM Frank van der Linden <fllinden@amazon.com> wrote:
>
> Hi Dave,
>
> Thanks for looking at this.
>
> On Mon, Nov 02, 2020 at 02:45:09PM -0500, David Wysochanski wrote:
> > On Mon, Nov 2, 2020 at 12:31 PM Trond Myklebust <trondmy@hammerspace.com> wrote:
> > >
> > > On Mon, 2020-11-02 at 11:26 -0500, David Wysochanski wrote:
> > > > On Mon, Nov 2, 2020 at 11:22 AM Trond Myklebust <
> > > > trondmy@hammerspace.com> wrote:
> > > > >
> > > > > On Mon, 2020-11-02 at 08:50 -0500, Dave Wysochanski wrote:
> > > > > > A process can hang forever to 'ls -l' a directory while the
> > > > > > directory
> > > > > > is being modified such as another NFS client adding files to the
> > > > > > directory.  The problem is seen specifically with larger
> > > > > > directories
> > > > > > (I tested with 1 million) and/or slower NFS server responses to
> > > > > > READDIR.  If a combination of the NFS directory size, the NFS
> > > > > > server
> > > > > > responses to READDIR is such that the 'ls' process gets partially
> > > > > > through the listing before the attribute cache expires (time
> > > > > > exceeds acdirmax), we drop the pagecache and have to re-fill it,
> > > > > > and as a result, the process may never complete.  One could argue
> > > > > > for larger directories the acdirmin/acdirmax should be increased,
> > > > > > but it's not always possible to tune this effectively.
> > > > > >
> > > > > > The root cause of this problem is due to how the NFS readdir
> > > > > > cache
> > > > > > currently works.  The main search function,
> > > > > > readdir_search_pagecache(),
> > > > > > always starts searching at page_index and cookie == 0, and for
> > > > > > any
> > > > > > page not in the cache, fills in the page with entries obtained in
> > > > > > a READDIR NFS call.  If a page already exists, we proceed to
> > > > > > nfs_readdir_search_for_cookie(), which searches for the cookie
> > > > > > (pos) of the readdir call.  The search is O(n), where n is the
> > > > > > directory size before the cookie in question is found, and every
> > > > > > entry to nfs_readdir() pays this penalty, irrespective of the
> > > > > > current directory position (dir_context.pos).  The search is
> > > > > > expensive due to the opaque nature of readdir cookies, and the
> > > > > > fact
> > > > > > that no mapping (hash) exists from cookies to pages.  In the case
> > > > > > of a directory being modified, the above behavior can become an
> > > > > > excessive penalty, since the same process is forced to fill pages
> > > > > > it
> > > > > > may be no longer interested in (the entries were passed in a
> > > > > > previous
> > > > > > nfs_readdir call), and this can essentially lead no forward
> > > > > > progress.
> > > > > >
> > > > > > To fix this problem, at the end of nfs_readdir(), save the
> > > > > > page_index
> > > > > > corresponding to the directory position (cookie) inside the
> > > > > > process's
> > > > > > nfs_open_dir_context.  Then at the next entry of nfs_readdir(),
> > > > > > use
> > > > > > the saved page_index as the starting search point rather than
> > > > > > starting
> > > > > > at page_index == 0.  Not only does this fix the problem of
> > > > > > listing
> > > > > > a directory being modified, it also significantly improves
> > > > > > performance
> > > > > > in the unmodified case since no extra search penalty is paid at
> > > > > > each
> > > > > > entry to nfs_readdir().
> > > > > >
> > > > > > In the case of lseek, since there is no hash or other mapping
> > > > > > from a
> > > > > > cookie value to the page->index, just reset
> > > > > > nfs_open_dir_context.page_index
> > > > > > to 0, which will reset the search to the old behavior.
> > > > > >
> > > > > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
> > > > > > ---
> > > > > >  fs/nfs/dir.c           | 8 +++++++-
> > > > > >  include/linux/nfs_fs.h | 1 +
> > > > > >  2 files changed, 8 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > > > > > index 52e06c8fc7cd..b266f505b521 100644
> > > > > > --- a/fs/nfs/dir.c
> > > > > > +++ b/fs/nfs/dir.c
> > > > > > @@ -78,6 +78,7 @@ static struct nfs_open_dir_context
> > > > > > *alloc_nfs_open_dir_context(struct inode *dir
> > > > > >                 ctx->attr_gencount = nfsi->attr_gencount;
> > > > > >                 ctx->dir_cookie = 0;
> > > > > >                 ctx->dup_cookie = 0;
> > > > > > +               ctx->page_index = 0;
> > > > > >                 ctx->cred = get_cred(cred);
> > > > > >                 spin_lock(&dir->i_lock);
> > > > > >                 if (list_empty(&nfsi->open_files) &&
> > > > > > @@ -763,7 +764,7 @@ int
> > > > > > find_and_lock_cache_page(nfs_readdir_descriptor_t *desc)
> > > > > >         return res;
> > > > > >  }
> > > > > >
> > > > > > -/* Search for desc->dir_cookie from the beginning of the page
> > > > > > cache
> > > > > > */
> > > > > > +/* Search for desc->dir_cookie starting at desc->page_index */
> > > > > >  static inline
> > > > > >  int readdir_search_pagecache(nfs_readdir_descriptor_t *desc)
> > > > > >  {
> > > > > > @@ -885,6 +886,8 @@ static int nfs_readdir(struct file *file,
> > > > > > struct
> > > > > > dir_context *ctx)
> > > > > >                 .ctx = ctx,
> > > > > >                 .dir_cookie = &dir_ctx->dir_cookie,
> > > > > >                 .plus = nfs_use_readdirplus(inode, ctx),
> > > > > > +               .page_index = dir_ctx->page_index,
> > > > > > +               .last_cookie = nfs_readdir_use_cookie(file) ?
> > > > > > ctx-
> > > > > > > pos : 0,
> > > > > >         },
> > > > > >                         *desc = &my_desc;
> > > > > >         int res = 0;
> > > > > > @@ -938,6 +941,7 @@ static int nfs_readdir(struct file *file,
> > > > > > struct
> > > > > > dir_context *ctx)
> > > > > >  out:
> > > > > >         if (res > 0)
> > > > > >                 res = 0;
> > > > > > +       dir_ctx->page_index = desc->page_index;
> > > > > >         trace_nfs_readdir_exit(inode, ctx->pos, dir_ctx-
> > > > > > >dir_cookie,
> > > > > >                                NFS_SERVER(inode)->dtsize,
> > > > > > my_desc.plus, res);
> > > > > >         return res;
> > > > > > @@ -975,6 +979,8 @@ static loff_t nfs_llseek_dir(struct file
> > > > > > *filp,
> > > > > > loff_t offset, int whence)
> > > > > >                 else
> > > > > >                         dir_ctx->dir_cookie = 0;
> > > > > >                 dir_ctx->duped = 0;
> > > > > > +               /* Force readdir_search_pagecache to start over
> > > > > > */
> > > > > > +               dir_ctx->page_index = 0;
> > > > > >         }
> > > > > >         inode_unlock(inode);
> > > > > >         return offset;
> > > > > > diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> > > > > > index a2c6455ea3fa..0e55c0154ccd 100644
> > > > > > --- a/include/linux/nfs_fs.h
> > > > > > +++ b/include/linux/nfs_fs.h
> > > > > > @@ -93,6 +93,7 @@ struct nfs_open_dir_context {
> > > > > >         __u64 dir_cookie;
> > > > > >         __u64 dup_cookie;
> > > > > >         signed char duped;
> > > > > > +       unsigned long   page_index;
> > > > > >  };
> > > > > >
> > > > > >  /*
> > > > >
> > > > > NACK. It makes no sense to store the page index as a cursor.
> > > > >
> > > >
> > > > A similar thing was done recently with:
> > > > 227823d2074d nfs: optimise readdir cache page invalidation
> > > >
> > >
> > > That's a very different thing. It is about discarding page data in
> > > order to force a re-read of the contents into cache.
> > >
> > Right - I only pointed it out because it is in effect a cursor about
> > the last access into the cache but it's on a global basis, not
> > process context.
> >
> > > What you're doing is basically trying to guess where the data is
> > > located. which might work in some cases where the directory is
> > > completely static, but if it shrinks (e.g. due to a few unlink() or
> > > rename() calls) so that you overshoot the cookie, then you can end up
> > > reading all the way to the end of the directory before doing an
> > > uncached readdir.
> > >
> > First, consider the unmodified (idle directory) scenario.  Today the
> > performance is bad the larger the directory goes - do you see why?
> > I tried to explain in the cover letter and header but maybe it's not clear?
>
> I agree with you that the patch is good for that case. It seems
> like a bug that, if the cache is not invalidated in the meantime,
> nfs_readdir would start at offset 0 of the cache while searching.
>
> But, as Trond said, page_index isn't a good cursor once the
> page cache was invalidated, you don't know what data you are
> leaving out inadvertently.
>

Hey Frank - thanks for looking at this too.  Yes I see it this is probably
not a good approach especially since you and Trond seem to come
to the same conclusion.


> >
> > Second, the modified scenario today the performance is very bad
> > because of the same problem - the cookie is reset and the process
> > needs to start over at cookie 0, repeating READDIRs.  But maybe
> > there's a specific scenario I'm not thinking about.
> >
> > The way I thought about this is that if you're in a heavily modified
> > scenario with a large directory and you're past the 'acdirmax' time,
> > you have to make the choice of either:
> > a) ignoring 'acdirmax' (this is what the NFSv3 patch did) and even
> > though you know the cache expired you keep going as though it
> > did not (at least until a different process starts a listing)
> > b) honoring 'acdirmax' (drop the pagecache), but keep going the
> > best you can based on the previous information and don't try to
> > rebuild the cache before continuing.
>
> Yeah, it seems like those are your options. The only other way
> of doing this that I can think of would be to add logic that
> says something like:
>
>         if (pos != 0 and the cache is invalid)
>                 uncached_readdir()
>
> ..which is, by the looks of it, easily done by having
> readdir_search_pagecache() return EBADCOOKIE if
> there's no cache, and the offset is not 0. E.g. it
> shouldn't try to fill the cache in that case, but
> just let uncached reads take over.
>
I thought about other logic, but this seems simple
enough - just check for 0 and probably the number
of pages in the cache to see if it's invalid.

> The idea here is that the only information that is possibly
> still valid is the cookie - and that the server will know
> how to use the cookie to keep going where you left off.
>
Makes sense.

> Of course, this mean that, in the case of another client
> modifying the directory, you'll end up doing mostly / all
> uncached readdirs after an invalidate. Same for a scenario
> where a client did: open llseek(<nonzero_offset), getdents().
> Until another process starts at offset 0 and re-fills the
> cache, of course.
>
> I'm not sure if that's all that bad, though, it depends
> on common usage patterns.
>

Yeah I was wondering about this too - what side effects would
happen when going uncached and how far should we go
to try to detect we're in this pattern.

I guess I was hoping to avoid going uncached because that
means we have to ask the server again for the same info,
which could very well be a lot.  But then again if we're blowing
past acdirmax then we're making the best of a bad situation
and just need to make forward progress.

I also considered whether it would be possible to improve the readdir
cache data structure and search to handle this, especially now that it
looks like Trond's latest patches have given us 16 bytes of free space in
each page.  I was thinking for example of adding a 'next_page_index'
into nfs_cache_array but then I think we get into a mess of overlap of
entries, etc. So far I don't see a clean way forward on that, and the
added complexity does not seem worthwhile.  So trying uncached
like you and Trond suggest may be the best approach.


> - Frank
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2020-11-03 17:50 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-02 13:50 [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories Dave Wysochanski
2020-11-02 13:50 ` [PATCH 01/11] NFSv4: Improve nfs4_readdir tracepoint by adding additional fields Dave Wysochanski
2020-11-02 13:50 ` [PATCH 02/11] NFS: Replace dfprintk statements with trace events in nfs_readdir Dave Wysochanski
2020-11-02 13:50 ` [PATCH 03/11] NFS: Move nfs_readdir_descriptor_t into internal header Dave Wysochanski
2020-11-02 13:50 ` [PATCH 04/11] NFS: Add tracepoints for functions involving nfs_readdir_descriptor_t Dave Wysochanski
2020-11-02 17:32   ` kernel test robot
2020-11-02 13:50 ` [PATCH 05/11] NFS: Add tracepoints for opendir, closedir, fsync_dir and llseek_dir Dave Wysochanski
2020-11-02 13:50 ` [PATCH 06/11] NFS: Add tracepoints for nfs_readdir_xdr_filler enter and exit Dave Wysochanski
2020-11-02 13:50 ` [PATCH 07/11] NFS: Add tracepoint to entry and exit of nfs_do_filldir Dave Wysochanski
2020-11-02 13:50 ` [PATCH 08/11] NFS: Replace LOOKUPCACHE dfprintk statements with tracepoints Dave Wysochanski
2020-11-02 13:50 ` [PATCH 09/11] NFS: Improve performance of listing directories being modified Dave Wysochanski
2020-11-02 16:21   ` Trond Myklebust
2020-11-02 16:26     ` David Wysochanski
2020-11-02 17:31       ` Trond Myklebust
2020-11-02 19:45         ` David Wysochanski
2020-11-02 21:30           ` Trond Myklebust
2020-11-02 22:05             ` David Wysochanski
2020-11-03  3:38               ` Trond Myklebust
2020-11-03 13:29                 ` David Wysochanski
2020-11-03  0:09           ` Frank van der Linden
2020-11-03 17:49             ` David Wysochanski
2020-11-02 13:50 ` [PATCH 10/11] NFS: Add page_index to nfs_readdir enter and exit tracepoints Dave Wysochanski
2020-11-02 13:50 ` [PATCH 11/11] NFS: Bring back nfs_dir_mapping_need_revalidate() in nfs_readdir() Dave Wysochanski
2020-11-02 15:38   ` Mkrtchyan, Tigran
2020-11-02 16:16     ` David Wysochanski
2020-11-02 14:27 ` [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories Chuck Lever
2020-11-02 15:07   ` David Wysochanski
2020-11-02 15:13     ` Chuck Lever
2020-11-02 15:58 ` Mkrtchyan, Tigran

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).