All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Wysochanski <dwysocha@redhat.com>
To: Trond Myklebust <trondmy@hammerspace.com>,
	Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Subject: [PATCH 09/11] NFS: Improve performance of listing directories being modified
Date: Mon,  2 Nov 2020 08:50:09 -0500	[thread overview]
Message-ID: <1604325011-29427-10-git-send-email-dwysocha@redhat.com> (raw)
In-Reply-To: <1604325011-29427-1-git-send-email-dwysocha@redhat.com>

A process can hang forever to 'ls -l' a directory while the directory
is being modified such as another NFS client adding files to the
directory.  The problem is seen specifically with larger directories
(I tested with 1 million) and/or slower NFS server responses to
READDIR.  If a combination of the NFS directory size, the NFS server
responses to READDIR is such that the 'ls' process gets partially
through the listing before the attribute cache expires (time
exceeds acdirmax), we drop the pagecache and have to re-fill it,
and as a result, the process may never complete.  One could argue
for larger directories the acdirmin/acdirmax should be increased,
but it's not always possible to tune this effectively.

The root cause of this problem is due to how the NFS readdir cache
currently works.  The main search function, readdir_search_pagecache(),
always starts searching at page_index and cookie == 0, and for any
page not in the cache, fills in the page with entries obtained in
a READDIR NFS call.  If a page already exists, we proceed to
nfs_readdir_search_for_cookie(), which searches for the cookie
(pos) of the readdir call.  The search is O(n), where n is the
directory size before the cookie in question is found, and every
entry to nfs_readdir() pays this penalty, irrespective of the
current directory position (dir_context.pos).  The search is
expensive due to the opaque nature of readdir cookies, and the fact
that no mapping (hash) exists from cookies to pages.  In the case
of a directory being modified, the above behavior can become an
excessive penalty, since the same process is forced to fill pages it
may be no longer interested in (the entries were passed in a previous
nfs_readdir call), and this can essentially lead no forward progress.

To fix this problem, at the end of nfs_readdir(), save the page_index
corresponding to the directory position (cookie) inside the process's
nfs_open_dir_context.  Then at the next entry of nfs_readdir(), use
the saved page_index as the starting search point rather than starting
at page_index == 0.  Not only does this fix the problem of listing
a directory being modified, it also significantly improves performance
in the unmodified case since no extra search penalty is paid at each
entry to nfs_readdir().

In the case of lseek, since there is no hash or other mapping from a
cookie value to the page->index, just reset nfs_open_dir_context.page_index
to 0, which will reset the search to the old behavior.

Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
---
 fs/nfs/dir.c           | 8 +++++++-
 include/linux/nfs_fs.h | 1 +
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 52e06c8fc7cd..b266f505b521 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -78,6 +78,7 @@ static struct nfs_open_dir_context *alloc_nfs_open_dir_context(struct inode *dir
 		ctx->attr_gencount = nfsi->attr_gencount;
 		ctx->dir_cookie = 0;
 		ctx->dup_cookie = 0;
+		ctx->page_index = 0;
 		ctx->cred = get_cred(cred);
 		spin_lock(&dir->i_lock);
 		if (list_empty(&nfsi->open_files) &&
@@ -763,7 +764,7 @@ int find_and_lock_cache_page(nfs_readdir_descriptor_t *desc)
 	return res;
 }
 
-/* Search for desc->dir_cookie from the beginning of the page cache */
+/* Search for desc->dir_cookie starting at desc->page_index */
 static inline
 int readdir_search_pagecache(nfs_readdir_descriptor_t *desc)
 {
@@ -885,6 +886,8 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 		.ctx = ctx,
 		.dir_cookie = &dir_ctx->dir_cookie,
 		.plus = nfs_use_readdirplus(inode, ctx),
+		.page_index = dir_ctx->page_index,
+		.last_cookie = nfs_readdir_use_cookie(file) ? ctx->pos : 0,
 	},
 			*desc = &my_desc;
 	int res = 0;
@@ -938,6 +941,7 @@ static int nfs_readdir(struct file *file, struct dir_context *ctx)
 out:
 	if (res > 0)
 		res = 0;
+	dir_ctx->page_index = desc->page_index;
 	trace_nfs_readdir_exit(inode, ctx->pos, dir_ctx->dir_cookie,
 			       NFS_SERVER(inode)->dtsize, my_desc.plus, res);
 	return res;
@@ -975,6 +979,8 @@ static loff_t nfs_llseek_dir(struct file *filp, loff_t offset, int whence)
 		else
 			dir_ctx->dir_cookie = 0;
 		dir_ctx->duped = 0;
+		/* Force readdir_search_pagecache to start over */
+		dir_ctx->page_index = 0;
 	}
 	inode_unlock(inode);
 	return offset;
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index a2c6455ea3fa..0e55c0154ccd 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -93,6 +93,7 @@ struct nfs_open_dir_context {
 	__u64 dir_cookie;
 	__u64 dup_cookie;
 	signed char duped;
+	unsigned long	page_index;
 };
 
 /*
-- 
1.8.3.1


  parent reply	other threads:[~2020-11-02 13:51 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-02 13:50 [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories Dave Wysochanski
2020-11-02 13:50 ` [PATCH 01/11] NFSv4: Improve nfs4_readdir tracepoint by adding additional fields Dave Wysochanski
2020-11-02 13:50 ` [PATCH 02/11] NFS: Replace dfprintk statements with trace events in nfs_readdir Dave Wysochanski
2020-11-02 13:50 ` [PATCH 03/11] NFS: Move nfs_readdir_descriptor_t into internal header Dave Wysochanski
2020-11-02 13:50 ` [PATCH 04/11] NFS: Add tracepoints for functions involving nfs_readdir_descriptor_t Dave Wysochanski
2020-11-02 17:32   ` kernel test robot
2020-11-02 17:32     ` kernel test robot
2020-11-02 13:50 ` [PATCH 05/11] NFS: Add tracepoints for opendir, closedir, fsync_dir and llseek_dir Dave Wysochanski
2020-11-02 13:50 ` [PATCH 06/11] NFS: Add tracepoints for nfs_readdir_xdr_filler enter and exit Dave Wysochanski
2020-11-02 13:50 ` [PATCH 07/11] NFS: Add tracepoint to entry and exit of nfs_do_filldir Dave Wysochanski
2020-11-02 13:50 ` [PATCH 08/11] NFS: Replace LOOKUPCACHE dfprintk statements with tracepoints Dave Wysochanski
2020-11-02 13:50 ` Dave Wysochanski [this message]
2020-11-02 16:21   ` [PATCH 09/11] NFS: Improve performance of listing directories being modified Trond Myklebust
2020-11-02 16:26     ` David Wysochanski
2020-11-02 17:31       ` Trond Myklebust
2020-11-02 19:45         ` David Wysochanski
2020-11-02 21:30           ` Trond Myklebust
2020-11-02 22:05             ` David Wysochanski
2020-11-03  3:38               ` Trond Myklebust
2020-11-03 13:29                 ` David Wysochanski
2020-11-03  0:09           ` Frank van der Linden
2020-11-03 17:49             ` David Wysochanski
2020-11-02 13:50 ` [PATCH 10/11] NFS: Add page_index to nfs_readdir enter and exit tracepoints Dave Wysochanski
2020-11-02 13:50 ` [PATCH 11/11] NFS: Bring back nfs_dir_mapping_need_revalidate() in nfs_readdir() Dave Wysochanski
2020-11-02 15:38   ` Mkrtchyan, Tigran
2020-11-02 16:16     ` David Wysochanski
2020-11-02 14:27 ` [PATCH 00/11] Add NFS readdir tracepoints and improve performance of reading directories Chuck Lever
2020-11-02 15:07   ` David Wysochanski
2020-11-02 15:13     ` Chuck Lever
2020-11-02 15:58 ` Mkrtchyan, Tigran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1604325011-29427-10-git-send-email-dwysocha@redhat.com \
    --to=dwysocha@redhat.com \
    --cc=anna.schumaker@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.