Re: dcache locking question

From: "Paul E. McKenney" <paulmck@linux.ibm.com>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Biggers <ebiggers@kernel.org>,
	"Tobin C. Harding" <me@tobin.cc>,
	linux-fsdevel@vger.kernel.org
Subject: Re: dcache locking question
Date: Sat, 16 Mar 2019 15:31:28 -0700	[thread overview]
Message-ID: <20190316223128.GV4102@linux.ibm.com> (raw)
In-Reply-To: <20190315185455.GA2217@ZenIV.linux.org.uk>

On Fri, Mar 15, 2019 at 06:54:55PM +0000, Al Viro wrote:
> On Fri, Mar 15, 2019 at 10:38:23AM -0700, Eric Biggers wrote:
> > On Fri, Mar 15, 2019 at 01:50:21AM +0000, Al Viro wrote:
> > > 
> > > If it fails, we call __lock_parent().  Which
> > > 	* grabs RCU lock
> > > 	* drops ->d_lock (now we are not holding ->d_lock
> > > on anything).
> > > 	* fetches ->d_parent.  Note the READ_ONCE() there -
> > > it's *NOT* stable (no ->d_lock held).  We can't expect
> > > that ->d_parent won't change or that the reference it used
> > > to contribute to parent's refcount is there anymore; as
> > > the matter of fact, the only thing that prevents outright
> > > _freeing_ of the object 'parent' points to is rcu_read_lock()
> > > and RCU delay between dropping the last reference and
> > > actual freeing of the sucker.  rcu_read_lock() is there,
> > > though, which makes it safe to grab ->d_lock on 'parent'.
> > > 
> > > That 'parent' might very well have nothing to do with our
> > > dentry by now.  We can check if it's equal to its
> > > ->d_parent, though.  dentry->d_parent is *NOT* stable
> > > at that point.  It might be changing right now.
> > > 
> > > However, the first store to dentry->d_parent making it
> > > not equal to parent would have been done under parent->d_lock.
> > > And since we are holding parent->d_lock, we won't miss that
> > > store.  We might miss subsequent ones, but if we observe
> > > dentry->d_parent == parent, we know that it's stable.  And
> > > if we see dentry->d_parent != parent, we know that dentry
> > > has moved around and we need to retry anyway.
> > 
> > Why isn't it necessary to use READ_ONCE(dentry->d_parent) here?
> > 
> > 	if (unlikely(parent != dentry->d_parent)) {
> > 
> > Suppose 'parent' is 0xAAAABBBB, and 'dentry->d_parent' is 0xAAAAAAAA and is
> > concurrently changed to 0xBBBBBBBB.
> > 
> > d_parent could be read in two parts, 0xAAAA then 0xBBBB, resulting in it
> > appearing that d_parent == 0xAAAABBBB == parent.
> > 
> > Yes it won't really be compiled as that in practice, but I thought the point of
> > READ_ONCE() is to *guarantee* it's really done right...
> 
> READ_ONCE does not add any extra warranties of atomicity.  Fetches and stores
> of pointers are atomic, period; if that ever breaks, we are in a very deep
> trouble all over the place.

Stepping back from the specifics of this particular function for a moment,
it would be unusual for a compiler to tear a pointer load or store,
with non-tearing of pointer stores assuming values of pointers aren't
known at compile time.  Compilers can and do fuse both loads and stores,
and they can and do repeat loads.

Having gotten that off my chest...  ;-)

> What's more, spin_lock acts as a compiler barrier and, on SMP, is an ACQUIRE
> operation.  So that second fetch of ->d_parent will happen after we grab
> parent->d_lock, from everyone's POV.  Critical areas for the same spinlock
> are ordered wrt each other.  So we have observed
> 	FETCH dentry->d_parent => parent
> 	LOCK parent->d_lock
> 	FETCH dentry->d_parent => parent
> All stores to dentry->d_parent are done with ->d_lock held on dentry,
> old value of dentry->d_parent *and* new value of dentry->d_parent.  So
> the second fetch is ordered wrt all stores making dentry->d_parent change
> to parent and all stores making it change *from* parent.  We might miss
> some stores changing it from one value other than parent to another such,
> but the predicate itself is fine and will stay fine until we drop
> parent->d_lock.
> 
> Paul, could you comment on that one?  The function in question is
> this:
> static struct dentry *__lock_parent(struct dentry *dentry)
> {
>         struct dentry *parent;
>         rcu_read_lock();
>         spin_unlock(&dentry->d_lock);
> again:
>         parent = READ_ONCE(dentry->d_parent);
>         spin_lock(&parent->d_lock);
>         /*
>          * We can't blindly lock dentry until we are sure
>          * that we won't violate the locking order.
>          * Any changes of dentry->d_parent must have
>          * been done with parent->d_lock held, so
>          * spin_lock() above is enough of a barrier
>          * for checking if it's still our child.
>          */
>         if (unlikely(parent != dentry->d_parent)) {

We are talking about the above access to dentry->d_parent, correct?

Given that dentry->d_parent isn't updated except when holding
dentry->d_lock, and given that locks do what they are supposed to,
both in terms of memory ordering and mutual exclusion, the value of
dentry->d_parent must be constant throughout the current critical section.
It therefore doesn't matter what strange code the compiler generates,
it will get the same value regardless.

So, no, there is absolutely no point in doing this:

	if (unlikely(parent != READ_ONCE(dentry->d_parent))) {

Or did I miss the point of this discussion?

							Thanx, Paul

>                 spin_unlock(&parent->d_lock);
>                 goto again;
>         }
>         rcu_read_unlock();
>         if (parent != dentry)
>                 spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
>         else
>                 parent = NULL;
>         return parent;
> }
> (in fs/dcache.c) and all stores to ->d_parent are guaranteed to be done
> under ->d_lock on dentry itself and ->d_lock on both old and new values
> of ->d_parent.
>