From: Al Viro <viro@zeniv.linux.org.uk>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Greg KH <gregkh@linuxfoundation.org>,
yu kuai <yukuai3@huawei.com>,
rafael@kernel.org, oleg@redhat.com, mchehab+samsung@kernel.org,
corbet@lwn.net, tytso@mit.edu, jmorris@namei.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
zhengbin13@huawei.com, yi.zhang@huawei.com,
chenxiang66@hisilicon.com, xiexiuqi@huawei.com
Subject: Re: [PATCH 1/3] dcache: add a new enum type for 'dentry_d_lock_class'
Date: Fri, 15 Nov 2019 17:54:23 +0000 [thread overview]
Message-ID: <20191115175423.GS26530@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20191115141754.GR26530@ZenIV.linux.org.uk>
On Fri, Nov 15, 2019 at 02:17:54PM +0000, Al Viro wrote:
> On Fri, Nov 15, 2019 at 08:58:05AM -0500, Steven Rostedt wrote:
> > On Fri, 15 Nov 2019 13:48:23 +0000
> > Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > > > BTW, what do you mean by "can debugfs_remove_recursive() rely upon the
> > > > lack of attempts to create new entries inside the subtree it's trying
> > > > to kill?"
> > >
> > > Is it possible for something to call e.g. debugfs_create_dir() (or any
> > > similar primitive) with parent inside the subtree that has been
> > > passed to debugfs_remove_recursive() call that is still in progress?
> > >
> > > If debugfs needs to cope with that, debugfs_remove_recursive() needs
> > > considerably heavier locking, to start with.
> >
> > I don't know about debugfs, but at least tracefs (which cut and pasted
> > from debugfs) does not allow that. At least in theory it doesn't allow
> > that (and if it does, it's a bug in the locking at the higher levels).
> >
> > And perhaps debugfs shouldn't allow that either. As it is only suppose
> > to be a light weight way to interact with the kernel, hence the name
> > "debugfs".
> >
> > Yu, do you have a test case for the "infinite loop" case?
>
> Infinite loop, AFAICS, is reasonably easy to trigger - just open
> a non-empty subdirectory and lseek to e.g. next-to-last element
> in it. Again, list_empty() use in there is quite wrong - it can
> give false negatives just on the cursors. No arguments about
> that part...
FWIW, given that debugfs_remove_recursive() has no way to report an error
*and* that we have a single chokepoint for all entry creations (start_creating())
we could make sure nothing gets added pretty easily - just mark the victim
dentry as "don't allow any creations here" when we first reach it and
have start_creating check that, using e.g. inode_lock() for serialization.
Marking could be done e.g. by setting ->d_fsdata to
(void *)DEBUGFS_FSDATA_IS_REAL_FOPS_BIT, so that ->d_release() doesn't
need any changes.
BTW, this
if (!ret)
d_delete(dentry);
if (d_is_reg(dentry))
__debugfs_file_removed(dentry);
dput(dentry);
in __debugfs_remove() is both subtle and bogus. If we get here
with refcount > 1, d_delete() is just a d_drop() - dentry can't
be made negative, so it gets unhashed instead. If we *do* get
here with refcount 1, it will be made negative, all right.
Only to be killed off by immediately following dput(), since
debugfs doesn't retain unpinned dentries.
Why immediate? Because d_is_reg() is obviously false on
negative dentries, so __debugfs_file_removed() is not called.
That's where the subtle part begins: there should've been
nobody for __debugfs_file_removed() to wait for, since they
would've had to hold additional references to dentry and
refcount wouldn't have been 1. So that's actually not
a bug. However, it's too subtle to introduce without
having bothered to even comment the damn thing.
As for the "bogus" part - that d_delete() is bollocks.
First of all, it is fully equivalent to d_drop(). Always
had been. What's more, that sucker should've been
d_invalidate() instead.
Anyway, AFAICS removal could be done this way:
// parent is held exclusive, after is NULL or a child of parent
find_next_child(parent, prev)
child = NULL
node = prev ? &prev->d_child : &parent->d_subdirs;
grab parent->d_lock
for each entry in the list starting at node->next
d = container_of(entry, struct dentry, d_child)
grab d->d_lock
if simple_positive(d)
bump d->d_count
child = d
drop d->d_lock
if child
break
drop parent->d_lock
dput(prev);
return child
kill_it(victim, parent)
if simple_positive(victim)
d_invalidate(victim); // needed to avoid lost mounts
if victim is a directory
fsnotify_rmdir
else
fsnotify_unlink
if victim is regular
__debugfs_file_removed(victim)
dput(victim) // unpin it
recursive_removal(dentry)
this = dentry;
while (true) {
victim = NULL;
inode = this->d_inode;
inode_lock(inode);
if (d_is_dir(this))
mark this doomed
while ((child = find_next_child(this, victim)) == NULL) {
// no children (left); kill and ascend
// update metadata while it's still locked
inode->i_ctime = current_time(inode);
clear_nlink(inode);
inode_unlock(inode);
victim = this;
this = this->d_parent;
inode = this->d_inode;
inode_lock(inode);
kill_it(victim, this);
if (victim == dentry) {
inode->i_ctime = inode->i_mtime = current_time(inode);
if (d_is_dir(dentry))
drop_nlink(inode);
inode_unlock(inode);
dput(dentry);
return;
}
}
inode_unlock(inode);
this = child;
}
next prev parent reply other threads:[~2019-11-15 17:54 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-15 3:27 [PATCH 0/3] fix potential infinite loop in debugfs_remove_recursive yu kuai
2019-11-15 3:27 ` [PATCH 1/3] dcache: add a new enum type for 'dentry_d_lock_class' yu kuai
2019-11-15 3:27 ` Greg KH
2019-11-15 4:12 ` Al Viro
2019-11-15 7:20 ` Greg KH
2019-11-15 10:08 ` yukuai (C)
2019-11-15 13:16 ` Al Viro
2019-11-15 13:38 ` Steven Rostedt
2019-11-15 13:39 ` Steven Rostedt
2019-11-15 13:48 ` Al Viro
2019-11-15 13:58 ` Steven Rostedt
2019-11-15 14:17 ` Al Viro
2019-11-15 17:54 ` Al Viro [this message]
2019-11-15 18:42 ` [RFC] simple_recursive_removal() Al Viro
2019-11-15 19:41 ` Al Viro
2019-11-15 21:18 ` Al Viro
2019-11-15 21:26 ` Steven Rostedt
2019-11-15 22:10 ` Al Viro
2019-11-16 12:04 ` Greg KH
2019-11-17 22:24 ` Al Viro
2019-11-18 6:37 ` Greg KH
2019-11-15 10:02 ` [PATCH 1/3] dcache: add a new enum type for 'dentry_d_lock_class' yukuai (C)
2019-11-15 3:27 ` [PATCH 2/3] fs/libfs.c: use 'spin_lock_nested' when taking 'd_lock' for dentry in simple_empty yu kuai
2019-11-15 3:27 ` [PATCH 3/3] debugfs: fix potential infinite loop in debugfs_remove_recursive yu kuai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191115175423.GS26530@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=chenxiang66@hisilicon.com \
--cc=corbet@lwn.net \
--cc=gregkh@linuxfoundation.org \
--cc=jmorris@namei.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mchehab+samsung@kernel.org \
--cc=oleg@redhat.com \
--cc=rafael@kernel.org \
--cc=rostedt@goodmis.org \
--cc=tytso@mit.edu \
--cc=xiexiuqi@huawei.com \
--cc=yi.zhang@huawei.com \
--cc=yukuai3@huawei.com \
--cc=zhengbin13@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).