linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ian Kent <raven@themaw.net>
To: "Eric W. Biederman" <ebiederm@xmission.com>,
	Miklos Szeredi <miklos@szeredi.hu>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Tejun Heo <tj@kernel.org>, Eric Sandeen <sandeen@sandeen.net>,
	Fox Chen <foxhlchen@gmail.com>,
	Brice Goglin <brice.goglin@gmail.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Rick Lindsley <ricklind@linux.vnet.ibm.com>,
	David Howells <dhowells@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [REPOST PATCH v4 2/5] kernfs: use VFS negative dentry caching
Date: Fri, 04 Jun 2021 11:14:47 +0800	[thread overview]
Message-ID: <6ae8e5f843855c2c14b58227340e2a0070ef1b6c.camel@themaw.net> (raw)
In-Reply-To: <87y2bqli8b.fsf@disp2133>

On Thu, 2021-06-03 at 17:02 -0500, Eric W. Biederman wrote:
> Miklos Szeredi <miklos@szeredi.hu> writes:
> 
> > On Thu, 3 Jun 2021 at 19:26, Eric W. Biederman < 
> > ebiederm@xmission.com> wrote:
> > > 
> > > Ian Kent <raven@themaw.net> writes:
> > > 
> > > > If there are many lookups for non-existent paths these negative
> > > > lookups
> > > > can lead to a lot of overhead during path walks.
> > > > 
> > > > The VFS allows dentries to be created as negative and hashed,
> > > > and caches
> > > > them so they can be used to reduce the fairly high overhead
> > > > alloc/free
> > > > cycle that occurs during these lookups.
> > > > 
> > > > Signed-off-by: Ian Kent <raven@themaw.net>
> > > > ---
> > > >  fs/kernfs/dir.c |   55 +++++++++++++++++++++++++++++++++------
> > > > ----------------
> > > >  1 file changed, 33 insertions(+), 22 deletions(-)
> > > > 
> > > > diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> > > > index 4c69e2af82dac..5151c712f06f5 100644
> > > > --- a/fs/kernfs/dir.c
> > > > +++ b/fs/kernfs/dir.c
> > > > @@ -1037,12 +1037,33 @@ static int kernfs_dop_revalidate(struct
> > > > dentry *dentry, unsigned int flags)
> > > >       if (flags & LOOKUP_RCU)
> > > >               return -ECHILD;
> > > > 
> > > > -     /* Always perform fresh lookup for negatives */
> > > > -     if (d_really_is_negative(dentry))
> > > > -             goto out_bad_unlocked;
> > > > +     mutex_lock(&kernfs_mutex);
> > > > 
> > > >       kn = kernfs_dentry_node(dentry);
> > > > -     mutex_lock(&kernfs_mutex);
> > > 
> > > Why bring kernfs_dentry_node inside the mutex?
> > > 
> > > The inode lock of the parent should protect negative to positive
> > > transitions not the kernfs_mutex.  So moving the code inside
> > > the mutex looks unnecessary and confusing.
> > 
> > Except that d_revalidate() may or may not be called with parent
> > lock
> > held.

Bringing the kernfs_dentry_node() inside taking the mutex is probably
wasteful, as you say, oddly the reason I did it that conceptually it
makes sense to me since the kernfs node is being grabbed. But it
probably isn't possible for a concurrent unlink so is not necessary.

Since you feel strongly about I can change it.

> 
> I grant that this works because kernfs_io_lookup today holds
> kernfs_mutex over d_splice_alias.

Changing that will require some thought but your points about
maintainability are well taken.

> 
> The problem is that the kernfs_mutex only should be protecting the
> kernfs data structures not the vfs data structures.
> 
> Reading through the code history that looks like a hold over from
> when
> sysfs lived in the dcache before it was reimplemented as a
> distributed
> file system.  So it was probably a complete over sight and something
> that did not matter.
> 
> The big problem is that if the code starts depending upon the
> kernfs_mutex (or the kernfs_rwsem) to provide semantics the rest of
> the
> filesystems does not the code will diverge from the rest of the
> filesystems and maintenance will become much more difficult.
> 
> Diverging from other filesystems and becoming a maintenance pain has
> already been seen once in the life of sysfs and I don't think we want
> to
> go back there.
> 
> Further extending the scope of lock, when the problem is that the
> locking is causing problems seems like the opposite of the direction
> we
> want the code to grow.
> 
> I really suspect all we want kernfs_dop_revalidate doing for negative
> dentries is something as simple as comparing the timestamp of the
> negative dentry to the timestamp of the parent dentry, and if the
> timestamp has changed perform the lookup.  That is roughly what
> nfs does today with negative dentries.
> 
> The dentry cache will always lag the kernfs_node data structures, and
> that is fundamental.  We should take advantage of that to make the
> code
> as simple and as fast as we can not to perform lots of work that
> creates
> overhead.
> 
> Plus the kernfs data structures should not change much so I expect
> there will be effectively 0 penalty in always performing the lookup
> of a
> negative dentry when the directory itself has changed.

This sounds good to me.

In fact this approach should be able to be used to resolve the
potential race Miklos pointed out in a much simpler way, not to
mention the revalidate simplification itself.

But isn't knowing whether the directory has changed harder to
do than checking a time stamp?

Look at kernfs_refresh_inode() and it's callers for example.

I suspect that would require bringing back the series patch to use
a generation number to identify directory changes (and also getting
rid of the search in revalidate).

Ian


  reply	other threads:[~2021-06-04  3:15 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-28  6:33 [REPOST PATCH v4 0/5] kernfs: proposed locking and concurrency improvement Ian Kent
2021-05-28  6:33 ` [REPOST PATCH v4 1/5] kernfs: move revalidate to be near lookup Ian Kent
2021-06-03 14:50   ` Eric W. Biederman
2021-06-04  2:29     ` Ian Kent
2021-05-28  6:34 ` [REPOST PATCH v4 2/5] kernfs: use VFS negative dentry caching Ian Kent
2021-06-01 12:41   ` Miklos Szeredi
2021-06-02  3:44     ` Ian Kent
2021-06-02  8:58       ` Miklos Szeredi
2021-06-02 10:57         ` Ian Kent
2021-06-03  2:15           ` Ian Kent
2021-06-03 23:57             ` Ian Kent
2021-06-04  1:07               ` Ian Kent
2021-06-03 17:26   ` Eric W. Biederman
2021-06-03 18:06     ` Miklos Szeredi
2021-06-03 22:02       ` Eric W. Biederman
2021-06-04  3:14         ` Ian Kent [this message]
2021-06-04 14:28           ` Eric W. Biederman
2021-06-05  3:19             ` Ian Kent
2021-06-05 20:52               ` Eric W. Biederman
2021-05-28  6:34 ` [REPOST PATCH v4 3/5] kernfs: switch kernfs to use an rwsem Ian Kent
2021-06-01 13:11   ` Miklos Szeredi
2021-06-03 16:59   ` Eric W. Biederman
2021-05-28  6:34 ` [REPOST PATCH v4 4/5] kernfs: use i_lock to protect concurrent inode updates Ian Kent
2021-05-31 14:53   ` [kernfs] 9a658329cd: stress-ng.get.ops_per_sec 191.4% improvement kernel test robot
2021-06-01 13:18   ` [REPOST PATCH v4 4/5] kernfs: use i_lock to protect concurrent inode updates Miklos Szeredi
2021-06-02  5:41     ` Ian Kent
2021-05-28  6:34 ` [REPOST PATCH v4 5/5] kernfs: add kernfs_need_inode_refresh() Ian Kent
2021-05-28  8:56 ` [REPOST PATCH v4 0/5] kernfs: proposed locking and concurrency improvement Greg Kroah-Hartman
2021-05-28 11:56   ` Fox Chen
2021-05-30  4:44   ` Fox Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6ae8e5f843855c2c14b58227340e2a0070ef1b6c.camel@themaw.net \
    --to=raven@themaw.net \
    --cc=brice.goglin@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=ebiederm@xmission.com \
    --cc=foxhlchen@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=mtosatti@redhat.com \
    --cc=ricklind@linux.vnet.ibm.com \
    --cc=sandeen@sandeen.net \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).