linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever III <chuck.lever@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Chuck Lever <cel@kernel.org>,
	"hughd@google.com" <hughd@google.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH v1] shmem: stable directory cookies
Date: Thu, 04 May 2023 13:21:41 -0400	[thread overview]
Message-ID: <cbd955c08432a82014cc21f36e42afc67962a718.camel@kernel.org> (raw)
In-Reply-To: <30E5A657-4005-4126-A962-A8E6D90240AB@oracle.com>

On Wed, 2023-05-03 at 00:43 +0000, Chuck Lever III wrote:
> 
> > On May 2, 2023, at 8:12 PM, Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > On Mon, 17 Apr 2023 15:23:10 -0400 Chuck Lever <cel@kernel.org> wrote:
> > 
> > > From: Chuck Lever <chuck.lever@oracle.com>
> > > 
> > > The current cursor-based directory cookie mechanism doesn't work
> > > when a tmpfs filesystem is exported via NFS. This is because NFS
> > > clients do not open directories: each READDIR operation has to open
> > > the directory on the server, read it, then close it. The cursor
> > > state for that directory, being associated strictly with the opened
> > > struct file, is then discarded.
> > > 
> > > Directory cookies are cached not only by NFS clients, but also by
> > > user space libraries on those clients. Essentially there is no way
> > > to invalidate those caches when directory offsets have changed on
> > > an NFS server after the offset-to-dentry mapping changes.
> > > 
> > > The solution we've come up with is to make the directory cookie for
> > > each file in a tmpfs filesystem stable for the life of the directory
> > > entry it represents.
> > > 
> > > Add a per-directory xarray. shmem_readdir() uses this to map each
> > > directory offset (an loff_t integer) to the memory address of a
> > > struct dentry.
> > > 
> > 
> > How have people survived for this long with this problem?
> 
> It's less of a problem without NFS in the picture; local
> applications can hold the directory open, and that preserves
> the seek cursor. But you can still trigger it.
> 
> Also, a plurality of applications are well-behaved in this
> regard. It's just the more complex and more useful ones
> (like git) that seem to trigger issues.
> 
> It became less bearable for NFS because of a recent change
> on the Linux NFS client to optimize directory read behavior:
> 
> 85aa8ddc3818 ("NFS: Trigger the "ls -l" readdir heuristic sooner")
> 
> Trond argued that tmpfs directory cookie behavior has always
> been problematic (eg broken) therefore this commit does not
> count as a regression. However, it does make tmpfs exports
> less usable, breaking some tests that have always worked.
> 
> 
> > It's a lot of new code -
> 
> I don't feel that this is a lot of new code:
> 
> include/linux/shmem_fs.h |    2 
> mm/shmem.c               |  213 +++++++++++++++++++++++++++++++++++++++++++---
> 2 files changed, 201 insertions(+), 14 deletions(-)
> 
> But I agree it might look a little daunting on first review.
> I am happy to try to break this single patch up or consider
> other approaches.
> 

I wonder whether you really need an xarray here?

dcache_readdir walks the d_subdirs list. We add things to d_subdirs at
d_alloc time (and in d_move). If you were to assign its dirindex when
the dentry gets added to d_subdirs (maybe in ->d_init?) then you'd have
a list already ordered by index, and could deal with missing indexes
easily.

It's not as efficient as the xarray if you have to seek through a big
dir, but if keeping the changes tiny is a goal then that might be
another way to do this.

> We could, for instance, tuck a little more of this into
> lib/fs. Copying the readdir and directory seeking
> implementation from simplefs to tmpfs is one reason
> the insertion count is worrisome.
> 
> 
> > can we get away with simply disallowing
> > exports of tmpfs?
> 
> I think the bottom line is that you /can/ trigger this
> behavior without NFS, just not as quickly. The threshold
> is high enough that most use cases aren't bothered by
> this right now.
> 
> We'd rather not disallow exporting tmpfs. It's a very
> good testing platform for us, and disallowing it would
> be a noticeable regression for some folks.
> 
> 

Yeah, I'd not be in favor of that either. We've had an exportable tmpfs
for a long time. It's a good way to do testing of the entire NFS server
stack, without having to deal with underlying storage.

> > How can we maintain this?  Is it possible to come up with a test
> > harness for inclusion in kernel selftests?
> 
> There is very little directory cookie testing that I know of
> in the obvious place: fstests. That would be where this stuff
> should be unit tested, IMO.
> 

I'd like to see this too. It's easy for programs to get this wrong. In
this case, could we emulate the NFS behavior by doing this in a loop
over a large directory?

opendir
seekdir (to result of last telldir)
readdir
unlink
telldir
closedir

At the end of it, check whether there are any entries left over.
-- 
Jeff Layton <jlayton@kernel.org>

  reply	other threads:[~2023-05-04 17:21 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-17 19:23 [PATCH v1] shmem: stable directory cookies Chuck Lever
2023-04-20 18:52 ` Jeff Layton
2023-04-20 20:12   ` Chuck Lever III
2023-05-03  0:12 ` Andrew Morton
2023-05-03  0:43   ` Chuck Lever III
2023-05-04 17:21     ` Jeff Layton [this message]
2023-05-04 20:21     ` Benjamin Coddington
2023-05-05  5:06 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cbd955c08432a82014cc21f36e42afc67962a718.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=hughd@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).