All of lore.kernel.org
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Gautham Ananthakrishna <gautham.ananthakrishna@oracle.com>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-mm@kvack.org, matthew.wilcox@oracle.com,
	khlebnikov@yandex-team.ru
Subject: Re: [PATCH RFC 1/6] dcache: sweep cached negative dentries to the end of list of siblings
Date: Wed, 14 Apr 2021 03:41:10 +0000	[thread overview]
Message-ID: <YHZkVlhchiNB9o18@zeniv-ca.linux.org.uk> (raw)
In-Reply-To: <1611235185-1685-2-git-send-email-gautham.ananthakrishna@oracle.com>

On Thu, Jan 21, 2021 at 06:49:40PM +0530, Gautham Ananthakrishna wrote:

> +static void sweep_negative(struct dentry *dentry)
> +{
> +	struct dentry *parent;
> +
> +	if (!d_is_tail_negative(dentry)) {
> +		parent = lock_parent(dentry);
> +		if (!parent)
> +			return;

Wait a minute.  It's not a good environment for calling lock_parent().
Who said that dentry won't get freed right under it?

Right now callers of __lock_parent() either hold a reference to dentry
*or* are called for a positive dentry, with inode->i_lock held.
You are introducing something very different - 

>  		if (likely(retain_dentry(dentry))) {
> +			if (d_is_negative(dentry))
> +				sweep_negative(dentry);
>  			spin_unlock(&dentry->d_lock);

Here we can be called for a negative dentry with refcount already *NOT*
held by us.  Look:

static inline struct dentry *lock_parent(struct dentry *dentry)
{
        struct dentry *parent = dentry->d_parent;
	if (IS_ROOT(dentry))
		return NULL;
isn't a root

	if (likely(spin_trylock(&parent->d_lock)))
		return parent;

no such luck - someone's already holding parent's ->d_lock

	return __lock_parent(dentry);
and here we have
static struct dentry *__lock_parent(struct dentry *dentry)
{
	struct dentry *parent;
	rcu_read_lock();  

OK, anything we see in its ->d_parent is guaranteed to stay
allocated until we get to matching rcu_read_unlock()

	spin_unlock(&dentry->d_lock);
dropped the spinlock, now it's fair game for d_move(), d_drop(), etc.

again:
	parent = READ_ONCE(dentry->d_parent);
dentry couldn't have been reused, so it's the last value stored there.
Points to still allocated struct dentry instance, so we can...

	spin_lock(&parent->d_lock);
grab its ->d_lock.

	/*
	 * We can't blindly lock dentry until we are sure
	 * that we won't violate the locking order.
	 * Any changes of dentry->d_parent must have
	 * been done with parent->d_lock held, so
	 * spin_lock() above is enough of a barrier
	 * for checking if it's still our child.
	 */
	if (unlikely(parent != dentry->d_parent)) {
		spin_unlock(&parent->d_lock);
		goto again;
	}
Nevermind, it's still equal to our ->d_parent.  So we have
the last valid parent's ->d_lock held

	rcu_read_unlock();
What's to hold dentry allocated now?  IF we held its refcount - no
problem, it can't go away.  If we held its ->d_inode->i_lock - ditto
(it wouldn't get to __dentry_kill() until we drop that, since all
callers do acquire that lock and it couldn't get scheduled for
freeing until it gets through most of __dentry_kill()).

IOW, we are free to grab dentry->d_lock again.
	if (parent != dentry)
		spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
	else
		parent = NULL;
	return parent;
}

With your patch, though, you've got a call site where neither condition
is guaranteed.  Current kernel is fine - we are holding ->d_lock there,
and we don't touch dentry after it gets dropped.  Again, it can't get
scheduled for freeing until after we drop ->d_lock, so we are safe.
With that change, however, you've got a hard-to-hit memory corruptor
there...

  parent reply	other threads:[~2021-04-14  3:41 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-21 13:19 [PATCH RFC 0/6] fix the negative dentres bloating system memory usage Gautham Ananthakrishna
2021-01-21 13:19 ` [PATCH RFC 1/6] dcache: sweep cached negative dentries to the end of list of siblings Gautham Ananthakrishna
2021-04-14  3:00   ` Al Viro
2021-04-15 16:50     ` Al Viro
2021-04-14  3:41   ` Al Viro [this message]
2021-04-15 16:25     ` Al Viro
2021-01-21 13:19 ` [PATCH RFC 2/6] fsnotify: stop walking child dentries if remaining tail is negative Gautham Ananthakrishna
2021-01-21 13:19 ` [PATCH RFC 3/6] dcache: add action D_WALK_SKIP_SIBLINGS to d_walk() Gautham Ananthakrishna
2021-01-21 13:19 ` [PATCH RFC 4/6] dcache: stop walking siblings if remaining dentries all negative Gautham Ananthakrishna
2021-01-21 13:19 ` [PATCH RFC 5/6] dcache: push releasing dentry lock into sweep_negative Gautham Ananthakrishna
2021-01-21 13:19 ` [PATCH RFC 6/6] dcache: prevent flooding with negative dentries Gautham Ananthakrishna
2021-04-14  3:56   ` Al Viro
2021-03-31 14:23 ` [PATCH RFC 0/6] fix the negative dentres bloating system memory usage Matthew Wilcox
2021-04-14  2:40 ` Al Viro
2021-01-21 16:17 [PATCH RFC 1/6] dcache: sweep cached negative dentries to the end of list of siblings kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YHZkVlhchiNB9o18@zeniv-ca.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=gautham.ananthakrishna@oracle.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.wilcox@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.