From: Al Viro <viro@zeniv.linux.org.uk>
To: Eric Biggers <ebiggers@kernel.org>
Cc: "Tobin C. Harding" <me@tobin.cc>,
linux-fsdevel@vger.kernel.org,
"Paul E. McKenney" <paulmck@linux.ibm.com>
Subject: Re: dcache locking question
Date: Fri, 15 Mar 2019 18:54:55 +0000 [thread overview]
Message-ID: <20190315185455.GA2217@ZenIV.linux.org.uk> (raw)
In-Reply-To: <20190315173819.GB77949@gmail.com>
On Fri, Mar 15, 2019 at 10:38:23AM -0700, Eric Biggers wrote:
> On Fri, Mar 15, 2019 at 01:50:21AM +0000, Al Viro wrote:
> >
> > If it fails, we call __lock_parent(). Which
> > * grabs RCU lock
> > * drops ->d_lock (now we are not holding ->d_lock
> > on anything).
> > * fetches ->d_parent. Note the READ_ONCE() there -
> > it's *NOT* stable (no ->d_lock held). We can't expect
> > that ->d_parent won't change or that the reference it used
> > to contribute to parent's refcount is there anymore; as
> > the matter of fact, the only thing that prevents outright
> > _freeing_ of the object 'parent' points to is rcu_read_lock()
> > and RCU delay between dropping the last reference and
> > actual freeing of the sucker. rcu_read_lock() is there,
> > though, which makes it safe to grab ->d_lock on 'parent'.
> >
> > That 'parent' might very well have nothing to do with our
> > dentry by now. We can check if it's equal to its
> > ->d_parent, though. dentry->d_parent is *NOT* stable
> > at that point. It might be changing right now.
> >
> > However, the first store to dentry->d_parent making it
> > not equal to parent would have been done under parent->d_lock.
> > And since we are holding parent->d_lock, we won't miss that
> > store. We might miss subsequent ones, but if we observe
> > dentry->d_parent == parent, we know that it's stable. And
> > if we see dentry->d_parent != parent, we know that dentry
> > has moved around and we need to retry anyway.
>
> Why isn't it necessary to use READ_ONCE(dentry->d_parent) here?
>
> if (unlikely(parent != dentry->d_parent)) {
>
> Suppose 'parent' is 0xAAAABBBB, and 'dentry->d_parent' is 0xAAAAAAAA and is
> concurrently changed to 0xBBBBBBBB.
>
> d_parent could be read in two parts, 0xAAAA then 0xBBBB, resulting in it
> appearing that d_parent == 0xAAAABBBB == parent.
>
> Yes it won't really be compiled as that in practice, but I thought the point of
> READ_ONCE() is to *guarantee* it's really done right...
READ_ONCE does not add any extra warranties of atomicity. Fetches and stores
of pointers are atomic, period; if that ever breaks, we are in a very deep
trouble all over the place.
What's more, spin_lock acts as a compiler barrier and, on SMP, is an ACQUIRE
operation. So that second fetch of ->d_parent will happen after we grab
parent->d_lock, from everyone's POV. Critical areas for the same spinlock
are ordered wrt each other. So we have observed
FETCH dentry->d_parent => parent
LOCK parent->d_lock
FETCH dentry->d_parent => parent
All stores to dentry->d_parent are done with ->d_lock held on dentry,
old value of dentry->d_parent *and* new value of dentry->d_parent. So
the second fetch is ordered wrt all stores making dentry->d_parent change
to parent and all stores making it change *from* parent. We might miss
some stores changing it from one value other than parent to another such,
but the predicate itself is fine and will stay fine until we drop
parent->d_lock.
Paul, could you comment on that one? The function in question is
this:
static struct dentry *__lock_parent(struct dentry *dentry)
{
struct dentry *parent;
rcu_read_lock();
spin_unlock(&dentry->d_lock);
again:
parent = READ_ONCE(dentry->d_parent);
spin_lock(&parent->d_lock);
/*
* We can't blindly lock dentry until we are sure
* that we won't violate the locking order.
* Any changes of dentry->d_parent must have
* been done with parent->d_lock held, so
* spin_lock() above is enough of a barrier
* for checking if it's still our child.
*/
if (unlikely(parent != dentry->d_parent)) {
spin_unlock(&parent->d_lock);
goto again;
}
rcu_read_unlock();
if (parent != dentry)
spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
else
parent = NULL;
return parent;
}
(in fs/dcache.c) and all stores to ->d_parent are guaranteed to be done
under ->d_lock on dentry itself and ->d_lock on both old and new values
of ->d_parent.
next prev parent reply other threads:[~2019-03-15 18:55 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-03-14 22:56 dcache locking question Tobin C. Harding
2019-03-14 23:09 ` Matthew Wilcox
2019-03-15 1:38 ` Tobin C. Harding
2019-03-14 23:19 ` Tobin C. Harding
2019-03-15 1:50 ` Al Viro
2019-03-15 17:38 ` Eric Biggers
2019-03-15 18:54 ` Al Viro [this message]
2019-03-16 22:31 ` Paul E. McKenney
2019-03-17 0:18 ` Al Viro
2019-03-17 0:50 ` Paul E. McKenney
2019-03-17 2:20 ` James Bottomley
2019-03-17 3:06 ` Al Viro
2019-03-17 4:23 ` James Bottomley
2019-03-18 0:35 ` Paul E. McKenney
2019-03-18 16:26 ` James Bottomley
2019-03-18 17:11 ` Paul E. McKenney
2019-03-19 15:45 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190315185455.GA2217@ZenIV.linux.org.uk \
--to=viro@zeniv.linux.org.uk \
--cc=ebiggers@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=me@tobin.cc \
--cc=paulmck@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).