From: Miklos Szeredi <miklos@szeredi.hu>
To: Ian Kent <raven@themaw.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
Tejun Heo <tj@kernel.org>, Eric Sandeen <sandeen@sandeen.net>,
Fox Chen <foxhlchen@gmail.com>,
Brice Goglin <brice.goglin@gmail.com>,
Al Viro <viro@zeniv.linux.org.uk>,
Rick Lindsley <ricklind@linux.vnet.ibm.com>,
David Howells <dhowells@redhat.com>,
Marcelo Tosatti <mtosatti@redhat.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [REPOST PATCH v4 2/5] kernfs: use VFS negative dentry caching
Date: Tue, 1 Jun 2021 14:41:27 +0200 [thread overview]
Message-ID: <CAJfpeguUj5WKtKZsn_tZZNpiL17ggAPcPBXdpA03aAnjaexWug@mail.gmail.com> (raw)
In-Reply-To: <162218364554.34379.636306635794792903.stgit@web.messagingengine.com>
On Fri, 28 May 2021 at 08:34, Ian Kent <raven@themaw.net> wrote:
>
> If there are many lookups for non-existent paths these negative lookups
> can lead to a lot of overhead during path walks.
>
> The VFS allows dentries to be created as negative and hashed, and caches
> them so they can be used to reduce the fairly high overhead alloc/free
> cycle that occurs during these lookups.
Obviously there's a cost associated with negative caching too. For
normal filesystems it's trivially worth that cost, but in case of
kernfs, not sure...
Can "fairly high" be somewhat substantiated with a microbenchmark for
negative lookups?
More comments inline.
>
> Signed-off-by: Ian Kent <raven@themaw.net>
> ---
> fs/kernfs/dir.c | 55 +++++++++++++++++++++++++++++++++----------------------
> 1 file changed, 33 insertions(+), 22 deletions(-)
>
> diff --git a/fs/kernfs/dir.c b/fs/kernfs/dir.c
> index 4c69e2af82dac..5151c712f06f5 100644
> --- a/fs/kernfs/dir.c
> +++ b/fs/kernfs/dir.c
> @@ -1037,12 +1037,33 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
> if (flags & LOOKUP_RCU)
> return -ECHILD;
>
> - /* Always perform fresh lookup for negatives */
> - if (d_really_is_negative(dentry))
> - goto out_bad_unlocked;
> + mutex_lock(&kernfs_mutex);
>
> kn = kernfs_dentry_node(dentry);
> - mutex_lock(&kernfs_mutex);
> +
> + /* Negative hashed dentry? */
> + if (!kn) {
> + struct kernfs_node *parent;
> +
> + /* If the kernfs node can be found this is a stale negative
> + * hashed dentry so it must be discarded and the lookup redone.
> + */
> + parent = kernfs_dentry_node(dentry->d_parent);
This doesn't look safe WRT a racing sys_rename(). In this case
d_move() is called only with parent inode locked, but not with
kernfs_mutex while ->d_revalidate() may not have parent inode locked.
After d_move() the old parent dentry can be freed, resulting in use
after free. Easily fixed by dget_parent().
> + if (parent) {
> + const void *ns = NULL;
> +
> + if (kernfs_ns_enabled(parent))
> + ns = kernfs_info(dentry->d_sb)->ns;
> + kn = kernfs_find_ns(parent, dentry->d_name.name, ns);
Same thing with d_name. There's
take_dentry_name_snapshot()/release_dentry_name_snapshot() to properly
take care of that.
> + if (kn)
> + goto out_bad;
> + }
> +
> + /* The kernfs node doesn't exist, leave the dentry negative
> + * and return success.
> + */
> + goto out;
> + }
>
> /* The kernfs node has been deactivated */
> if (!kernfs_active_read(kn))
> @@ -1060,12 +1081,11 @@ static int kernfs_dop_revalidate(struct dentry *dentry, unsigned int flags)
> if (kn->parent && kernfs_ns_enabled(kn->parent) &&
> kernfs_info(dentry->d_sb)->ns != kn->ns)
> goto out_bad;
> -
> +out:
> mutex_unlock(&kernfs_mutex);
> return 1;
> out_bad:
> mutex_unlock(&kernfs_mutex);
> -out_bad_unlocked:
> return 0;
> }
>
> @@ -1080,33 +1100,24 @@ static struct dentry *kernfs_iop_lookup(struct inode *dir,
> struct dentry *ret;
> struct kernfs_node *parent = dir->i_private;
> struct kernfs_node *kn;
> - struct inode *inode;
> + struct inode *inode = NULL;
> const void *ns = NULL;
>
> mutex_lock(&kernfs_mutex);
> -
> if (kernfs_ns_enabled(parent))
> ns = kernfs_info(dir->i_sb)->ns;
>
> kn = kernfs_find_ns(parent, dentry->d_name.name, ns);
> -
> - /* no such entry */
> - if (!kn || !kernfs_active(kn)) {
> - ret = NULL;
> - goto out_unlock;
> - }
> -
> /* attach dentry and inode */
> - inode = kernfs_get_inode(dir->i_sb, kn);
> - if (!inode) {
> - ret = ERR_PTR(-ENOMEM);
> - goto out_unlock;
> + if (kn && kernfs_active(kn)) {
> + inode = kernfs_get_inode(dir->i_sb, kn);
> + if (!inode)
> + inode = ERR_PTR(-ENOMEM);
> }
> -
> - /* instantiate and hash dentry */
> + /* instantiate and hash (possibly negative) dentry */
> ret = d_splice_alias(inode, dentry);
> - out_unlock:
> mutex_unlock(&kernfs_mutex);
> +
> return ret;
> }
>
>
>
next prev parent reply other threads:[~2021-06-01 12:41 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-28 6:33 [REPOST PATCH v4 0/5] kernfs: proposed locking and concurrency improvement Ian Kent
2021-05-28 6:33 ` [REPOST PATCH v4 1/5] kernfs: move revalidate to be near lookup Ian Kent
2021-06-03 14:50 ` Eric W. Biederman
2021-06-04 2:29 ` Ian Kent
2021-05-28 6:34 ` [REPOST PATCH v4 2/5] kernfs: use VFS negative dentry caching Ian Kent
2021-06-01 12:41 ` Miklos Szeredi [this message]
2021-06-02 3:44 ` Ian Kent
2021-06-02 8:58 ` Miklos Szeredi
2021-06-02 10:57 ` Ian Kent
2021-06-03 2:15 ` Ian Kent
2021-06-03 23:57 ` Ian Kent
2021-06-04 1:07 ` Ian Kent
2021-06-03 17:26 ` Eric W. Biederman
2021-06-03 18:06 ` Miklos Szeredi
2021-06-03 22:02 ` Eric W. Biederman
2021-06-04 3:14 ` Ian Kent
2021-06-04 14:28 ` Eric W. Biederman
2021-06-05 3:19 ` Ian Kent
2021-06-05 20:52 ` Eric W. Biederman
2021-05-28 6:34 ` [REPOST PATCH v4 3/5] kernfs: switch kernfs to use an rwsem Ian Kent
2021-06-01 13:11 ` Miklos Szeredi
2021-06-03 16:59 ` Eric W. Biederman
2021-05-28 6:34 ` [REPOST PATCH v4 4/5] kernfs: use i_lock to protect concurrent inode updates Ian Kent
2021-05-31 14:53 ` [kernfs] 9a658329cd: stress-ng.get.ops_per_sec 191.4% improvement kernel test robot
2021-06-01 13:18 ` [REPOST PATCH v4 4/5] kernfs: use i_lock to protect concurrent inode updates Miklos Szeredi
2021-06-02 5:41 ` Ian Kent
2021-05-28 6:34 ` [REPOST PATCH v4 5/5] kernfs: add kernfs_need_inode_refresh() Ian Kent
2021-05-28 8:56 ` [REPOST PATCH v4 0/5] kernfs: proposed locking and concurrency improvement Greg Kroah-Hartman
2021-05-28 11:56 ` Fox Chen
2021-05-30 4:44 ` Fox Chen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJfpeguUj5WKtKZsn_tZZNpiL17ggAPcPBXdpA03aAnjaexWug@mail.gmail.com \
--to=miklos@szeredi.hu \
--cc=brice.goglin@gmail.com \
--cc=dhowells@redhat.com \
--cc=foxhlchen@gmail.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mtosatti@redhat.com \
--cc=raven@themaw.net \
--cc=ricklind@linux.vnet.ibm.com \
--cc=sandeen@sandeen.net \
--cc=tj@kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).