From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever III <chuck.lever@oracle.com>,
Andrew Morton <akpm@linux-foundation.org>
Cc: Chuck Lever <cel@kernel.org>,
"hughd@google.com" <hughd@google.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH v1] shmem: stable directory cookies
Date: Thu, 04 May 2023 13:21:41 -0400 [thread overview]
Message-ID: <cbd955c08432a82014cc21f36e42afc67962a718.camel@kernel.org> (raw)
In-Reply-To: <30E5A657-4005-4126-A962-A8E6D90240AB@oracle.com>
On Wed, 2023-05-03 at 00:43 +0000, Chuck Lever III wrote:
>
> > On May 2, 2023, at 8:12 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > On Mon, 17 Apr 2023 15:23:10 -0400 Chuck Lever <cel@kernel.org> wrote:
> >
> > > From: Chuck Lever <chuck.lever@oracle.com>
> > >
> > > The current cursor-based directory cookie mechanism doesn't work
> > > when a tmpfs filesystem is exported via NFS. This is because NFS
> > > clients do not open directories: each READDIR operation has to open
> > > the directory on the server, read it, then close it. The cursor
> > > state for that directory, being associated strictly with the opened
> > > struct file, is then discarded.
> > >
> > > Directory cookies are cached not only by NFS clients, but also by
> > > user space libraries on those clients. Essentially there is no way
> > > to invalidate those caches when directory offsets have changed on
> > > an NFS server after the offset-to-dentry mapping changes.
> > >
> > > The solution we've come up with is to make the directory cookie for
> > > each file in a tmpfs filesystem stable for the life of the directory
> > > entry it represents.
> > >
> > > Add a per-directory xarray. shmem_readdir() uses this to map each
> > > directory offset (an loff_t integer) to the memory address of a
> > > struct dentry.
> > >
> >
> > How have people survived for this long with this problem?
>
> It's less of a problem without NFS in the picture; local
> applications can hold the directory open, and that preserves
> the seek cursor. But you can still trigger it.
>
> Also, a plurality of applications are well-behaved in this
> regard. It's just the more complex and more useful ones
> (like git) that seem to trigger issues.
>
> It became less bearable for NFS because of a recent change
> on the Linux NFS client to optimize directory read behavior:
>
> 85aa8ddc3818 ("NFS: Trigger the "ls -l" readdir heuristic sooner")
>
> Trond argued that tmpfs directory cookie behavior has always
> been problematic (eg broken) therefore this commit does not
> count as a regression. However, it does make tmpfs exports
> less usable, breaking some tests that have always worked.
>
>
> > It's a lot of new code -
>
> I don't feel that this is a lot of new code:
>
> include/linux/shmem_fs.h | 2
> mm/shmem.c | 213 +++++++++++++++++++++++++++++++++++++++++++---
> 2 files changed, 201 insertions(+), 14 deletions(-)
>
> But I agree it might look a little daunting on first review.
> I am happy to try to break this single patch up or consider
> other approaches.
>
I wonder whether you really need an xarray here?
dcache_readdir walks the d_subdirs list. We add things to d_subdirs at
d_alloc time (and in d_move). If you were to assign its dirindex when
the dentry gets added to d_subdirs (maybe in ->d_init?) then you'd have
a list already ordered by index, and could deal with missing indexes
easily.
It's not as efficient as the xarray if you have to seek through a big
dir, but if keeping the changes tiny is a goal then that might be
another way to do this.
> We could, for instance, tuck a little more of this into
> lib/fs. Copying the readdir and directory seeking
> implementation from simplefs to tmpfs is one reason
> the insertion count is worrisome.
>
>
> > can we get away with simply disallowing
> > exports of tmpfs?
>
> I think the bottom line is that you /can/ trigger this
> behavior without NFS, just not as quickly. The threshold
> is high enough that most use cases aren't bothered by
> this right now.
>
> We'd rather not disallow exporting tmpfs. It's a very
> good testing platform for us, and disallowing it would
> be a noticeable regression for some folks.
>
>
Yeah, I'd not be in favor of that either. We've had an exportable tmpfs
for a long time. It's a good way to do testing of the entire NFS server
stack, without having to deal with underlying storage.
> > How can we maintain this? Is it possible to come up with a test
> > harness for inclusion in kernel selftests?
>
> There is very little directory cookie testing that I know of
> in the obvious place: fstests. That would be where this stuff
> should be unit tested, IMO.
>
I'd like to see this too. It's easy for programs to get this wrong. In
this case, could we emulate the NFS behavior by doing this in a loop
over a large directory?
opendir
seekdir (to result of last telldir)
readdir
unlink
telldir
closedir
At the end of it, check whether there are any entries left over.
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2023-05-04 17:21 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-04-17 19:23 [PATCH v1] shmem: stable directory cookies Chuck Lever
2023-04-20 18:52 ` Jeff Layton
2023-04-20 20:12 ` Chuck Lever III
2023-05-03 0:12 ` Andrew Morton
2023-05-03 0:43 ` Chuck Lever III
2023-05-04 17:21 ` Jeff Layton [this message]
2023-05-04 20:21 ` Benjamin Coddington
2023-05-05 5:06 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cbd955c08432a82014cc21f36e42afc67962a718.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=hughd@google.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).