linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: Greg KH <gregkh@linuxfoundation.org>,
	yu kuai <yukuai3@huawei.com>,
	rafael@kernel.org, oleg@redhat.com, mchehab+samsung@kernel.org,
	corbet@lwn.net, tytso@mit.edu, jmorris@namei.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	zhengbin13@huawei.com, yi.zhang@huawei.com,
	chenxiang66@hisilicon.com, xiexiuqi@huawei.com
Subject: Re: [PATCH 1/3] dcache: add a new enum type for 'dentry_d_lock_class'
Date: Fri, 15 Nov 2019 17:54:23 +0000	[thread overview]
Message-ID: <20191115175423.GS26530@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20191115141754.GR26530@ZenIV.linux.org.uk>

On Fri, Nov 15, 2019 at 02:17:54PM +0000, Al Viro wrote:
> On Fri, Nov 15, 2019 at 08:58:05AM -0500, Steven Rostedt wrote:
> > On Fri, 15 Nov 2019 13:48:23 +0000
> > Al Viro <viro@zeniv.linux.org.uk> wrote:
> > 
> > > > BTW, what do you mean by "can debugfs_remove_recursive() rely upon the
> > > > lack of attempts to create new entries inside the subtree it's trying
> > > > to kill?"  
> > > 
> > > Is it possible for something to call e.g. debugfs_create_dir() (or any
> > > similar primitive) with parent inside the subtree that has been
> > > passed to debugfs_remove_recursive() call that is still in progress?
> > > 
> > > If debugfs needs to cope with that, debugfs_remove_recursive() needs
> > > considerably heavier locking, to start with.
> > 
> > I don't know about debugfs, but at least tracefs (which cut and pasted
> > from debugfs) does not allow that. At least in theory it doesn't allow
> > that (and if it does, it's a bug in the locking at the higher levels).
> > 
> > And perhaps debugfs shouldn't allow that either. As it is only suppose
> > to be a light weight way to interact with the kernel, hence the name
> > "debugfs".
> > 
> > Yu, do you have a test case for the "infinite loop" case?
> 
> Infinite loop, AFAICS, is reasonably easy to trigger - just open
> a non-empty subdirectory and lseek to e.g. next-to-last element
> in it.  Again, list_empty() use in there is quite wrong - it can
> give false negatives just on the cursors.  No arguments about
> that part...

FWIW, given that debugfs_remove_recursive() has no way to report an error
*and* that we have a single chokepoint for all entry creations (start_creating())
we could make sure nothing gets added pretty easily - just mark the victim
dentry as "don't allow any creations here" when we first reach it and
have start_creating check that, using e.g. inode_lock() for serialization.
Marking could be done e.g. by setting ->d_fsdata to
(void *)DEBUGFS_FSDATA_IS_REAL_FOPS_BIT, so that ->d_release() doesn't
need any changes.

BTW, this
                if (!ret)
                        d_delete(dentry);
                if (d_is_reg(dentry))
                        __debugfs_file_removed(dentry);
                dput(dentry);
in __debugfs_remove() is both subtle and bogus.  If we get here
with refcount > 1, d_delete() is just a d_drop() - dentry can't
be made negative, so it gets unhashed instead.  If we *do* get
here with refcount 1, it will be made negative, all right.
Only to be killed off by immediately following dput(), since
debugfs doesn't retain unpinned dentries.

Why immediate?  Because d_is_reg() is obviously false on
negative dentries, so __debugfs_file_removed() is not called.
That's where the subtle part begins: there should've been
nobody for __debugfs_file_removed() to wait for, since they
would've had to hold additional references to dentry and
refcount wouldn't have been 1.  So that's actually not
a bug.  However, it's too subtle to introduce without
having bothered to even comment the damn thing.

As for the "bogus" part - that d_delete() is bollocks.
First of all, it is fully equivalent to d_drop().  Always
had been.  What's more, that sucker should've been
d_invalidate() instead.

Anyway, AFAICS removal could be done this way:

// parent is held exclusive, after is NULL or a child of parent
find_next_child(parent, prev)
	child = NULL
	node = prev ? &prev->d_child : &parent->d_subdirs;
	grab parent->d_lock
	for each entry in the list starting at node->next
		d = container_of(entry, struct dentry, d_child)
		grab d->d_lock
		if simple_positive(d)
			bump d->d_count
			child = d
		drop d->d_lock
		if child
			break
	drop parent->d_lock
	dput(prev);
	return child

kill_it(victim, parent)
	if simple_positive(victim)
		d_invalidate(victim);	// needed to avoid lost mounts
		if victim is a directory
			fsnotify_rmdir
		else
			fsnotify_unlink
		if victim is regular
			__debugfs_file_removed(victim)
		dput(victim)		// unpin it

recursive_removal(dentry)
	this = dentry;
	while (true) {
		victim = NULL;
		inode = this->d_inode;
		inode_lock(inode);
		if (d_is_dir(this))
			mark this doomed
		while ((child = find_next_child(this, victim)) == NULL) {
			// no children (left); kill and ascend
			// update metadata while it's still locked
			inode->i_ctime = current_time(inode);
			clear_nlink(inode);
			inode_unlock(inode);
			victim = this;
			this = this->d_parent;
			inode = this->d_inode;
			inode_lock(inode);
			kill_it(victim, this);
			if (victim == dentry) {
				inode->i_ctime = inode->i_mtime = current_time(inode);
				if (d_is_dir(dentry))
					drop_nlink(inode);
				inode_unlock(inode);
				dput(dentry);
				return;
			}
		}
		inode_unlock(inode);
		this = child;
	}

  reply	other threads:[~2019-11-15 17:54 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-15  3:27 [PATCH 0/3] fix potential infinite loop in debugfs_remove_recursive yu kuai
2019-11-15  3:27 ` [PATCH 1/3] dcache: add a new enum type for 'dentry_d_lock_class' yu kuai
2019-11-15  3:27   ` Greg KH
2019-11-15  4:12     ` Al Viro
2019-11-15  7:20       ` Greg KH
2019-11-15 10:08         ` yukuai (C)
2019-11-15 13:16         ` Al Viro
2019-11-15 13:38           ` Steven Rostedt
2019-11-15 13:39             ` Steven Rostedt
2019-11-15 13:48             ` Al Viro
2019-11-15 13:58               ` Steven Rostedt
2019-11-15 14:17                 ` Al Viro
2019-11-15 17:54                   ` Al Viro [this message]
2019-11-15 18:42                     ` [RFC] simple_recursive_removal() Al Viro
2019-11-15 19:41                       ` Al Viro
2019-11-15 21:18                         ` Al Viro
2019-11-15 21:26                           ` Steven Rostedt
2019-11-15 22:10                             ` Al Viro
2019-11-16 12:04                               ` Greg KH
2019-11-17 22:24                               ` Al Viro
2019-11-18  6:37                                 ` Greg KH
2019-11-15 10:02     ` [PATCH 1/3] dcache: add a new enum type for 'dentry_d_lock_class' yukuai (C)
2019-11-15  3:27 ` [PATCH 2/3] fs/libfs.c: use 'spin_lock_nested' when taking 'd_lock' for dentry in simple_empty yu kuai
2019-11-15  3:27 ` [PATCH 3/3] debugfs: fix potential infinite loop in debugfs_remove_recursive yu kuai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191115175423.GS26530@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=chenxiang66@hisilicon.com \
    --cc=corbet@lwn.net \
    --cc=gregkh@linuxfoundation.org \
    --cc=jmorris@namei.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mchehab+samsung@kernel.org \
    --cc=oleg@redhat.com \
    --cc=rafael@kernel.org \
    --cc=rostedt@goodmis.org \
    --cc=tytso@mit.edu \
    --cc=xiexiuqi@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    --cc=zhengbin13@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).