Re: [RFC PATCH v3 0/9] Suppress negative dentry

From: Ian Kent <raven@themaw.net>
To: Amir Goldstein <amir73il@gmail.com>
Cc: Chengguang Xu <cgxu519@mykernel.net>,
	Miklos Szeredi <miklos@szeredi.hu>,
	Al Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	overlayfs <linux-unionfs@vger.kernel.org>
Subject: Re: [RFC PATCH v3 0/9] Suppress negative dentry
Date: Mon, 18 May 2020 18:26:48 +0800	[thread overview]
Message-ID: <cfc4e94f6667eabee664b63cc23051e9d816456c.camel@themaw.net> (raw)
In-Reply-To: <CAOQ4uxjT8DouPmf1mk1x24X8FcN5peYAqwdr362P4gcW+x15dw@mail.gmail.com>

On Mon, 2020-05-18 at 08:27 +0300, Amir Goldstein wrote:
> On Mon, May 18, 2020 at 3:53 AM Ian Kent <raven@themaw.net> wrote:
> > On Fri, 2020-05-15 at 15:20 +0800, Chengguang Xu wrote:
> > > This series adds a new lookup flag LOOKUP_DONTCACHE_NEGATIVE
> > > to indicate to drop negative dentry in slow path of lookup.
> > > 
> > > In overlayfs, negative dentries in upper/lower layers are useless
> > > after construction of overlayfs' own dentry, so in order to
> > > effectively reclaim those dentries, specify
> > > LOOKUP_DONTCACHE_NEGATIVE
> > > flag when doing lookup in upper/lower layers.
> > 
> > I've looked at this a couple of times now.
> > 
> > I'm not at all sure of the wisdom of adding a flag to a VFS
> > function
> > that allows circumventing what a file system chooses to do.
> 
> But it is not really a conscious choice is it?

I thought that too until recently when I had the need to ask the
question "how do these negative hashed dentries get into the dcache.

> How exactly does a filesystem express its desire to cache a negative
> dentry? The documentation of lookup() in vfs.rst makes it clear that
> it is not up to the filesystem to make that decision.

That's the question I had too and, somewhat embarrassingly, got
a response that started with "Are you serious ..." for Al when
I made a claim that ext4 doesn't create negative hashed dentries.

What was so bad about that claim is it's really obvious that ext4
does do this in ext4/namei.c:ext4_lookup():

...
inode = NULL;
if (bh) {
...
}
...
return d_splice_alias(inode, dentry);

and inode can be negative here.

Now d_splice_alias() is pretty complicated but, if passed a NULL
dentry it amounts to calling __d_add(dentry, NULL) which produces
a negative hashed dentry, a decision made by ext4 ->lookup() (and
I must say typical of the very few ways such dentries get into
the dcache).

Now on final dput of the walk these dentries should end up with a
reference count of zero which triggers dput() to add them to the
lru list so they can be considered as prune targets and can be
found in subsequent lookups (they are hashed).

This is how using negative hashed detries helps to avoid the
expensive alloc/free (at least) that occurs when looking up paths
that don't exist.

Negative "unhashed" dentries are discarded on final dput() so
don't really come into the picture except that dropping a negative
hashed dentry will cause it to be discarded on final dput().

But nothing is ever quite as simple as a description of how it
appears to (is meant to) work so, by all means, take what I say
with a grain of salt, ;)

> The VFS needs to cache the negative dentry on lookup(), so
> it can turn it positive on create().
> Low level kernel modules that call the VFS lookup() might know
> that caching the negative dentry is counter productive.
> 
> > I also do really see the need for it because only hashed negative
> > dentrys will be retained by the VFS so, if you see a hashed
> > negative
> > dentry then you can cause it to be discarded on release of the last
> > reference by dropping it.
> > 
> > So what's different here, why is adding an argument to do that drop
> > in the VFS itself needed instead of just doing it in overlayfs?
> 
> That was v1 patch. It was dealing with the possible race of
> returned negative dentry becoming positive before dropping it
> in an intrusive manner.

Right, very much something that overlay fs must care about, file
systems not so much.

It might be possible to assume that if the underlying file system
->lookup() call has produced a negative hashed dentry that it will
stay negative. That assumption definitely can't be made for negative
unhashed dentries, of course, since they may still be in the process
of being filled in.

But, unfortunately, I see your point, it's a risky assumption.

> 
> In retrospect, I think this race doesn't matter and there is no
> harm in dropping a positive dentry in a race obviously caused by
> accessing the underlying layer, which as documented results in
> "undefined behavior".

Umm ... I thought we were talking about negative dentries ...

Ian