All of lore.kernel.org
 help / color / mirror / Atom feed
From: Miklos Szeredi <miklos@szeredi.hu>
To: Ian Kent <raven@themaw.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Tejun Heo <tj@kernel.org>, Eric Sandeen <sandeen@sandeen.net>,
	Fox Chen <foxhlchen@gmail.com>,
	Brice Goglin <brice.goglin@gmail.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Rick Lindsley <ricklind@linux.vnet.ibm.com>,
	David Howells <dhowells@redhat.com>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [REPOST PATCH v4 2/5] kernfs: use VFS negative dentry caching
Date: Tue, 1 Jun 2021 14:41:27 +0200	[thread overview]
Message-ID: <CAJfpeguUj5WKtKZsn_tZZNpiL17ggAPcPBXdpA03aAnjaexWug@mail.gmail.com> (raw)
In-Reply-To: <162218364554.34379.636306635794792903.stgit@web.messagingengine.com>

On Fri, 28 May 2021 at 08:34, Ian Kent <raven@themaw.net> wrote:
>
> If there are many lookups for non-existent paths these negative lookups
> can lead to a lot of overhead during path walks.
>
> The VFS allows dentries to be created as negative and hashed, and caches
> them so they can be used to reduce the fairly high overhead alloc/free
> cycle that occurs during these lookups.

Obviously there's a cost associated with negative caching too.  For
normal filesystems it's trivially worth that cost, but in case of
kernfs, not sure...

Can "fairly high" be somewhat substantiated with a microbenchmark for
negative lookups?

More comments inline.

>
> Signed-off-by: Ian Kent <raven@themaw.net>
> ---
>  fs/kernfs/dir.c |   55 +++++++++++++++++++++++++++++++++----------------------
>  1 file changed, 33 insertions(+), 22 deletions(-)
>
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index 4c69e2af82dac..5151c712f06f5 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -1037,12 +1037,33 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
>         if (flags & LOOKUP_RCU)
>                 return -ECHILD;
>
> -       /* Always perform fresh lookup for negatives */
> -       if (d_really_is_negative(dentry))
> -               goto out_bad_unlocked;
> +       mutex_lock(&kernfs_mutex);
>
>         kn = kernfs_dentry_node(dentry);
> -       mutex_lock(&kernfs_mutex);
> +
> +       /* Negative hashed dentry? */
> +       if (!kn) {
> +               struct kernfs_node *parent;
> +
> +               /* If the kernfs node can be found this is a stale negative
> +                * hashed dentry so it must be discarded and the lookup redone.
> +                */
> +               parent = kernfs_dentry_node(dentry->d_parent);

This doesn't look safe WRT a racing sys_rename().  In this case
d_move() is called only with parent inode locked, but not with
kernfs_mutex while ->d_revalidate() may not have parent inode locked.
After d_move() the old parent dentry can be freed, resulting in use
after free.  Easily fixed by dget_parent().

> +               if (parent) {
> +                       const void *ns = NULL;
> +
> +                       if (kernfs_ns_enabled(parent))
> +                               ns = kernfs_info(dentry->d_sb)->ns;
> +                       kn = kernfs_find_ns(parent, dentry->d_name.name, ns);

Same thing with d_name.  There's
take_dentry_name_snapshot()/release_dentry_name_snapshot() to properly
take care of that.


> +                       if (kn)
> +                               goto out_bad;
> +               }
> +
> +               /* The kernfs node doesn't exist, leave the dentry negative
> +                * and return success.
> +                */
> +               goto out;
> +       }
>
>         /* The kernfs node has been deactivated */
>         if (!kernfs_active_read(kn))
> @@ -1060,12 +1081,11 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
>         if (kn->parent && kernfs_ns_enabled(kn->parent) &&
>             kernfs_info(dentry->d_sb)->ns != kn->ns)
>                 goto out_bad;
> -
> +out:
>         mutex_unlock(&kernfs_mutex);
>         return 1;
>  out_bad:
>         mutex_unlock(&kernfs_mutex);
> -out_bad_unlocked:
>         return 0;
>  }
>
> @@ -1080,33 +1100,24 @@ static struct dentry *kernfs_iop_lookup(struct inode *dir,
>         struct dentry *ret;
>         struct kernfs_node *parent = dir->i_private;
>         struct kernfs_node *kn;
> -       struct inode *inode;
> +       struct inode *inode = NULL;
>         const void *ns = NULL;
>
>         mutex_lock(&kernfs_mutex);
> -
>         if (kernfs_ns_enabled(parent))
>                 ns = kernfs_info(dir->i_sb)->ns;
>
>         kn = kernfs_find_ns(parent, dentry->d_name.name, ns);
> -
> -       /* no such entry */
> -       if (!kn || !kernfs_active(kn)) {
> -               ret = NULL;
> -               goto out_unlock;
> -       }
> -
>         /* attach dentry and inode */
> -       inode = kernfs_get_inode(dir->i_sb, kn);
> -       if (!inode) {
> -               ret = ERR_PTR(-ENOMEM);
> -               goto out_unlock;
> +       if (kn && kernfs_active(kn)) {
> +               inode = kernfs_get_inode(dir->i_sb, kn);
> +               if (!inode)
> +                       inode = ERR_PTR(-ENOMEM);
>         }
> -
> -       /* instantiate and hash dentry */
> +       /* instantiate and hash (possibly negative) dentry */
>         ret = d_splice_alias(inode, dentry);
> - out_unlock:
>         mutex_unlock(&kernfs_mutex);
> +
>         return ret;
>  }
>
>
>

  reply	other threads:[~2021-06-01 12:41 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-28  6:33 [REPOST PATCH v4 0/5] kernfs: proposed locking and concurrency improvement Ian Kent
2021-05-28  6:33 ` [REPOST PATCH v4 1/5] kernfs: move revalidate to be near lookup Ian Kent
2021-06-03 14:50   ` Eric W. Biederman
2021-06-04  2:29     ` Ian Kent
2021-05-28  6:34 ` [REPOST PATCH v4 2/5] kernfs: use VFS negative dentry caching Ian Kent
2021-06-01 12:41   ` Miklos Szeredi [this message]
2021-06-02  3:44     ` Ian Kent
2021-06-02  8:58       ` Miklos Szeredi
2021-06-02 10:57         ` Ian Kent
2021-06-03  2:15           ` Ian Kent
2021-06-03 23:57             ` Ian Kent
2021-06-04  1:07               ` Ian Kent
2021-06-03 17:26   ` Eric W. Biederman
2021-06-03 18:06     ` Miklos Szeredi
2021-06-03 22:02       ` Eric W. Biederman
2021-06-04  3:14         ` Ian Kent
2021-06-04 14:28           ` Eric W. Biederman
2021-06-05  3:19             ` Ian Kent
2021-06-05 20:52               ` Eric W. Biederman
2021-05-28  6:34 ` [REPOST PATCH v4 3/5] kernfs: switch kernfs to use an rwsem Ian Kent
2021-06-01 13:11   ` Miklos Szeredi
2021-06-03 16:59   ` Eric W. Biederman
2021-05-28  6:34 ` [REPOST PATCH v4 4/5] kernfs: use i_lock to protect concurrent inode updates Ian Kent
2021-05-31 14:53   ` [kernfs] 9a658329cd: stress-ng.get.ops_per_sec 191.4% improvement kernel test robot
2021-05-31 14:53     ` kernel test robot
2021-06-01 13:18   ` [REPOST PATCH v4 4/5] kernfs: use i_lock to protect concurrent inode updates Miklos Szeredi
2021-06-02  5:41     ` Ian Kent
2021-05-28  6:34 ` [REPOST PATCH v4 5/5] kernfs: add kernfs_need_inode_refresh() Ian Kent
2021-05-28  8:56 ` [REPOST PATCH v4 0/5] kernfs: proposed locking and concurrency improvement Greg Kroah-Hartman
2021-05-28 11:56   ` Fox Chen
2021-05-30  4:44   ` Fox Chen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJfpeguUj5WKtKZsn_tZZNpiL17ggAPcPBXdpA03aAnjaexWug@mail.gmail.com \
    --to=miklos@szeredi.hu \
    --cc=brice.goglin@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=foxhlchen@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=raven@themaw.net \
    --cc=ricklind@linux.vnet.ibm.com \
    --cc=sandeen@sandeen.net \
    --cc=tj@kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.