linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Al Viro <viro@zeniv.linux.org.uk>
To: Eric Biggers <ebiggers@kernel.org>
Cc: "Tobin C. Harding" <me@tobin.cc>,
	linux-fsdevel@vger.kernel.org,
	"Paul E. McKenney" <paulmck@linux.ibm.com>
Subject: Re: dcache locking question
Date: Fri, 15 Mar 2019 18:54:55 +0000	[thread overview]
Message-ID: <20190315185455.GA2217@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20190315173819.GB77949@gmail.com>

On Fri, Mar 15, 2019 at 10:38:23AM -0700, Eric Biggers wrote:
> On Fri, Mar 15, 2019 at 01:50:21AM +0000, Al Viro wrote:
> > 
> > If it fails, we call __lock_parent().  Which
> > 	* grabs RCU lock
> > 	* drops ->d_lock (now we are not holding ->d_lock
> > on anything).
> > 	* fetches ->d_parent.  Note the READ_ONCE() there -
> > it's *NOT* stable (no ->d_lock held).  We can't expect
> > that ->d_parent won't change or that the reference it used
> > to contribute to parent's refcount is there anymore; as
> > the matter of fact, the only thing that prevents outright
> > _freeing_ of the object 'parent' points to is rcu_read_lock()
> > and RCU delay between dropping the last reference and
> > actual freeing of the sucker.  rcu_read_lock() is there,
> > though, which makes it safe to grab ->d_lock on 'parent'.
> > 
> > That 'parent' might very well have nothing to do with our
> > dentry by now.  We can check if it's equal to its
> > ->d_parent, though.  dentry->d_parent is *NOT* stable
> > at that point.  It might be changing right now.
> > 
> > However, the first store to dentry->d_parent making it
> > not equal to parent would have been done under parent->d_lock.
> > And since we are holding parent->d_lock, we won't miss that
> > store.  We might miss subsequent ones, but if we observe
> > dentry->d_parent == parent, we know that it's stable.  And
> > if we see dentry->d_parent != parent, we know that dentry
> > has moved around and we need to retry anyway.
> 
> Why isn't it necessary to use READ_ONCE(dentry->d_parent) here?
> 
> 	if (unlikely(parent != dentry->d_parent)) {
> 
> Suppose 'parent' is 0xAAAABBBB, and 'dentry->d_parent' is 0xAAAAAAAA and is
> concurrently changed to 0xBBBBBBBB.
> 
> d_parent could be read in two parts, 0xAAAA then 0xBBBB, resulting in it
> appearing that d_parent == 0xAAAABBBB == parent.
> 
> Yes it won't really be compiled as that in practice, but I thought the point of
> READ_ONCE() is to *guarantee* it's really done right...

READ_ONCE does not add any extra warranties of atomicity.  Fetches and stores
of pointers are atomic, period; if that ever breaks, we are in a very deep
trouble all over the place.

What's more, spin_lock acts as a compiler barrier and, on SMP, is an ACQUIRE
operation.  So that second fetch of ->d_parent will happen after we grab
parent->d_lock, from everyone's POV.  Critical areas for the same spinlock
are ordered wrt each other.  So we have observed
	FETCH dentry->d_parent => parent
	LOCK parent->d_lock
	FETCH dentry->d_parent => parent
All stores to dentry->d_parent are done with ->d_lock held on dentry,
old value of dentry->d_parent *and* new value of dentry->d_parent.  So
the second fetch is ordered wrt all stores making dentry->d_parent change
to parent and all stores making it change *from* parent.  We might miss
some stores changing it from one value other than parent to another such,
but the predicate itself is fine and will stay fine until we drop
parent->d_lock.

Paul, could you comment on that one?  The function in question is
this:
static struct dentry *__lock_parent(struct dentry *dentry)
{
        struct dentry *parent;
        rcu_read_lock();
        spin_unlock(&dentry->d_lock);
again:
        parent = READ_ONCE(dentry->d_parent);
        spin_lock(&parent->d_lock);
        /*
         * We can't blindly lock dentry until we are sure
         * that we won't violate the locking order.
         * Any changes of dentry->d_parent must have
         * been done with parent->d_lock held, so
         * spin_lock() above is enough of a barrier
         * for checking if it's still our child.
         */
        if (unlikely(parent != dentry->d_parent)) {
                spin_unlock(&parent->d_lock);
                goto again;
        }
        rcu_read_unlock();
        if (parent != dentry)
                spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
        else
                parent = NULL;
        return parent;
}
(in fs/dcache.c) and all stores to ->d_parent are guaranteed to be done
under ->d_lock on dentry itself and ->d_lock on both old and new values
of ->d_parent.

  reply	other threads:[~2019-03-15 18:55 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-03-14 22:56 dcache locking question Tobin C. Harding
2019-03-14 23:09 ` Matthew Wilcox
2019-03-15  1:38   ` Tobin C. Harding
2019-03-14 23:19 ` Tobin C. Harding
2019-03-15  1:50   ` Al Viro
2019-03-15 17:38     ` Eric Biggers
2019-03-15 18:54       ` Al Viro [this message]
2019-03-16 22:31         ` Paul E. McKenney
2019-03-17  0:18           ` Al Viro
2019-03-17  0:50             ` Paul E. McKenney
2019-03-17  2:20               ` James Bottomley
2019-03-17  3:06                 ` Al Viro
2019-03-17  4:23                   ` James Bottomley
2019-03-18  0:35                     ` Paul E. McKenney
2019-03-18 16:26                       ` James Bottomley
2019-03-18 17:11                         ` Paul E. McKenney
2019-03-19 15:45                           ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190315185455.GA2217@ZenIV.linux.org.uk \
    --to=viro@zeniv.linux.org.uk \
    --cc=ebiggers@kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=me@tobin.cc \
    --cc=paulmck@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).