linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* dcache locking question
@ 2019-03-14 22:56 Tobin C. Harding
  2019-03-14 23:09 ` Matthew Wilcox
  2019-03-14 23:19 ` Tobin C. Harding
  0 siblings, 2 replies; 17+ messages in thread
From: Tobin C. Harding @ 2019-03-14 22:56 UTC (permalink / raw)
  To: linux-fsdevel

Hi,

I'm not able to understand the locking order in dcache.  dcache has been
around for a while so clearly its right and I'm wrong.

Could someone please explain to me how the locking order commented at
the top of the file is not violated in the following:

From top of fs/dcache.c

 * If there is an ancestor relationship:
 * dentry->d_parent->...->d_parent->d_lock
 *   ...
 *     dentry->d_parent->d_lock
 *       dentry->d_lock


dentry_kill() appears to require caller to hold the dentry->d_lock yet
it locks the parent with spin_trylock(&parent->d_lock), if this
fails it calls __lock_parent() which releases the dentry->d_lock before
locking the parent and re-acquiring [1] the dentry->d_lock .  Is this not
locking in two different orders? 

The same logic exists in lock_parent().

thanks,
Tobin.


[1] I do not fully understand the spin_lock_nested() macro.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dcache locking question
  2019-03-14 22:56 dcache locking question Tobin C. Harding
@ 2019-03-14 23:09 ` Matthew Wilcox
  2019-03-15  1:38   ` Tobin C. Harding
  2019-03-14 23:19 ` Tobin C. Harding
  1 sibling, 1 reply; 17+ messages in thread
From: Matthew Wilcox @ 2019-03-14 23:09 UTC (permalink / raw)
  To: Tobin C. Harding; +Cc: linux-fsdevel

On Fri, Mar 15, 2019 at 09:56:32AM +1100, Tobin C. Harding wrote:
> Could someone please explain to me how the locking order commented at
> the top of the file is not violated in the following:
> 
> >From top of fs/dcache.c
> 
>  * If there is an ancestor relationship:
>  * dentry->d_parent->...->d_parent->d_lock
>  *   ...
>  *     dentry->d_parent->d_lock
>  *       dentry->d_lock
> 
> 
> dentry_kill() appears to require caller to hold the dentry->d_lock yet
> it locks the parent with spin_trylock(&parent->d_lock), if this
> fails it calls __lock_parent() which releases the dentry->d_lock before
> locking the parent and re-acquiring [1] the dentry->d_lock .  Is this not
> locking in two different orders? 

What you're describing here is how we work around having to lock in the
wrong order.  There are "a few" places in the kernel where we do this.
Calling spin_trylock() won't deadlock -- it'll just return failure.
At that point, we drop the child, spin waiting for the parent, then lock
the child.

> [1] I do not fully understand the spin_lock_nested() macro.

It describes to lockdep that, while it looks like we're acquiring a lock
that we already own, there's actually a hierarchy of locks that are in
the same lock class; we're attempting to acquire a lock in "subclass N".
N is allowed to be between 0-7, inclusive.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dcache locking question
  2019-03-14 22:56 dcache locking question Tobin C. Harding
  2019-03-14 23:09 ` Matthew Wilcox
@ 2019-03-14 23:19 ` Tobin C. Harding
  2019-03-15  1:50   ` Al Viro
  1 sibling, 1 reply; 17+ messages in thread
From: Tobin C. Harding @ 2019-03-14 23:19 UTC (permalink / raw)
  To: linux-fsdevel

On Fri, Mar 15, 2019 at 09:56:32AM +1100, Tobin C. Harding wrote:
> Hi,
> 
> I'm not able to understand the locking order in dcache.  dcache has been
> around for a while so clearly its right and I'm wrong.
> 
> Could someone please explain to me how the locking order commented at
> the top of the file is not violated in the following:

> 
> From top of fs/dcache.c
> 
>  * If there is an ancestor relationship:
>  * dentry->d_parent->...->d_parent->d_lock
>  *   ...
>  *     dentry->d_parent->d_lock
>  *       dentry->d_lock
> 

This is a more simple example

static struct dentry *dentry_kill(struct dentry *dentry)
	__releases(dentry->d_lock)
{
        ...

slow_positive:
	spin_unlock(&dentry->d_lock);
	spin_lock(&inode->i_lock);
	spin_lock(&dentry->d_lock);
	parent = lock_parent(dentry);

        ...

thanks,
Tobin.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dcache locking question
  2019-03-14 23:09 ` Matthew Wilcox
@ 2019-03-15  1:38   ` Tobin C. Harding
  0 siblings, 0 replies; 17+ messages in thread
From: Tobin C. Harding @ 2019-03-15  1:38 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-fsdevel

On Thu, Mar 14, 2019 at 04:09:05PM -0700, Matthew Wilcox wrote:
> On Fri, Mar 15, 2019 at 09:56:32AM +1100, Tobin C. Harding wrote:
> > Could someone please explain to me how the locking order commented at
> > the top of the file is not violated in the following:
> > 
> > >From top of fs/dcache.c
> > 
> >  * If there is an ancestor relationship:
> >  * dentry->d_parent->...->d_parent->d_lock
> >  *   ...
> >  *     dentry->d_parent->d_lock
> >  *       dentry->d_lock
> > 
> > 
> > dentry_kill() appears to require caller to hold the dentry->d_lock yet
> > it locks the parent with spin_trylock(&parent->d_lock), if this
> > fails it calls __lock_parent() which releases the dentry->d_lock before
> > locking the parent and re-acquiring [1] the dentry->d_lock .  Is this not
> > locking in two different orders? 
> 
> What you're describing here is how we work around having to lock in the
> wrong order.  There are "a few" places in the kernel where we do this.
> Calling spin_trylock() won't deadlock -- it'll just return failure.
> At that point, we drop the child, spin waiting for the parent, then lock
> the child.
> 
> > [1] I do not fully understand the spin_lock_nested() macro.
> 
> It describes to lockdep that, while it looks like we're acquiring a lock
> that we already own, there's actually a hierarchy of locks that are in
> the same lock class; we're attempting to acquire a lock in "subclass N".
> N is allowed to be between 0-7, inclusive.

ok, cool.  Thanks for taking the time to explain this.

	Tobin
	

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dcache locking question
  2019-03-14 23:19 ` Tobin C. Harding
@ 2019-03-15  1:50   ` Al Viro
  2019-03-15 17:38     ` Eric Biggers
  0 siblings, 1 reply; 17+ messages in thread
From: Al Viro @ 2019-03-15  1:50 UTC (permalink / raw)
  To: Tobin C. Harding; +Cc: linux-fsdevel

On Fri, Mar 15, 2019 at 10:19:39AM +1100, Tobin C. Harding wrote:
> On Fri, Mar 15, 2019 at 09:56:32AM +1100, Tobin C. Harding wrote:
> > Hi,
> > 
> > I'm not able to understand the locking order in dcache.  dcache has been
> > around for a while so clearly its right and I'm wrong.
> > 
> > Could someone please explain to me how the locking order commented at
> > the top of the file is not violated in the following:
> 
> > 
> > From top of fs/dcache.c
> > 
> >  * If there is an ancestor relationship:
> >  * dentry->d_parent->...->d_parent->d_lock
> >  *   ...
> >  *     dentry->d_parent->d_lock
> >  *       dentry->d_lock
> > 
> 
> This is a more simple example
> 
> static struct dentry *dentry_kill(struct dentry *dentry)
> 	__releases(dentry->d_lock)
> {
>         ...
> 
> slow_positive:
> 	spin_unlock(&dentry->d_lock);
> 	spin_lock(&inode->i_lock);
> 	spin_lock(&dentry->d_lock);
> 	parent = lock_parent(dentry);

Yes?  Why is that a problem?  We start with ->d_lock held.
We drop it.  At that point we are holding a reference to
dentry and no locks whatsoever.  We know that inode is not
going away (we are holding a reference to positive dentry,
so its ->d_inode can't change under us, and it contributes
to inode refcount).  So we can safely grab its ->i_lock.
Then we grab (in normal order) ->d_lock.  Then, in
lock_parent() we check if ->d_parent points to ourselves.
->d_parent is stabilized by ->d_lock, so that check
if fine.  If it does *not* point to ourselves, we do
trylock on its ->d_parent. Also fine - ->d_parent isn't
going anywhere and trylock can't lead to deadlocks;
that's what "try" in the name is about.  If it succeeds,
we have
	* inode->i_lock held
	* dentry->d_lock held
	* parent->d_lock held
	* inode == dentry->d_inode
	* parent == dentry->d_parent
and we are fine, except that the reference we are holding
might not be the only existing one anymore (it used to be,
but we'd dropped ->d_lock, so additional ones might've
appeared, and we need to repeat the retain_dentry()-related
checks).

If it fails, we call __lock_parent().  Which
	* grabs RCU lock
	* drops ->d_lock (now we are not holding ->d_lock
on anything).
	* fetches ->d_parent.  Note the READ_ONCE() there -
it's *NOT* stable (no ->d_lock held).  We can't expect
that ->d_parent won't change or that the reference it used
to contribute to parent's refcount is there anymore; as
the matter of fact, the only thing that prevents outright
_freeing_ of the object 'parent' points to is rcu_read_lock()
and RCU delay between dropping the last reference and
actual freeing of the sucker.  rcu_read_lock() is there,
though, which makes it safe to grab ->d_lock on 'parent'.

That 'parent' might very well have nothing to do with our
dentry by now.  We can check if it's equal to its
->d_parent, though.  dentry->d_parent is *NOT* stable
at that point.  It might be changing right now.

However, the first store to dentry->d_parent making it
not equal to parent would have been done under parent->d_lock.
And since we are holding parent->d_lock, we won't miss that
store.  We might miss subsequent ones, but if we observe
dentry->d_parent == parent, we know that it's stable.  And
if we see dentry->d_parent != parent, we know that dentry
has moved around and we need to retry anyway.

If we do *not* need to retry, we have
	* inode->i_lock held
	* parent->d_lock held
	* dentry->d_parent == parent
	* dentry->d_inode == inode
At that point we can drop rcu_read_lock() - parent can't be
freed under us without dentry->d_parent switched from
parent to something else, and anyone trying to do that will
spin on parent->d_lock.

And now we can grab dentry->d_lock - ordering is guaranteed.
The only minor twist is that once upon a time __lock_parent()
used to have to cope with the possibility of dentry becoming
detached while we were retrying; in that case we obviously
don't want to grab dentry->d_lock - we are holding it already
(as parent->d_lock).

These days (since "make non-exchanging __d_move() copy ->d_parent
rather than swap them") if IS_ROOT(dentry) is false at some
point, it *never* will become true, so this check could've
been dropped, along with the checks for __lock_parent() return
value possibly being NULL.

I'm not saying it's not subtle and nasty - any time you see
READ_ONCE() (or rcu_read_lock(), for that matter), expect
headache.  But trylock is absolutely innocious part here; we
are not even retrying it once.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dcache locking question
  2019-03-15  1:50   ` Al Viro
@ 2019-03-15 17:38     ` Eric Biggers
  2019-03-15 18:54       ` Al Viro
  0 siblings, 1 reply; 17+ messages in thread
From: Eric Biggers @ 2019-03-15 17:38 UTC (permalink / raw)
  To: Al Viro; +Cc: Tobin C. Harding, linux-fsdevel

On Fri, Mar 15, 2019 at 01:50:21AM +0000, Al Viro wrote:
> 
> If it fails, we call __lock_parent().  Which
> 	* grabs RCU lock
> 	* drops ->d_lock (now we are not holding ->d_lock
> on anything).
> 	* fetches ->d_parent.  Note the READ_ONCE() there -
> it's *NOT* stable (no ->d_lock held).  We can't expect
> that ->d_parent won't change or that the reference it used
> to contribute to parent's refcount is there anymore; as
> the matter of fact, the only thing that prevents outright
> _freeing_ of the object 'parent' points to is rcu_read_lock()
> and RCU delay between dropping the last reference and
> actual freeing of the sucker.  rcu_read_lock() is there,
> though, which makes it safe to grab ->d_lock on 'parent'.
> 
> That 'parent' might very well have nothing to do with our
> dentry by now.  We can check if it's equal to its
> ->d_parent, though.  dentry->d_parent is *NOT* stable
> at that point.  It might be changing right now.
> 
> However, the first store to dentry->d_parent making it
> not equal to parent would have been done under parent->d_lock.
> And since we are holding parent->d_lock, we won't miss that
> store.  We might miss subsequent ones, but if we observe
> dentry->d_parent == parent, we know that it's stable.  And
> if we see dentry->d_parent != parent, we know that dentry
> has moved around and we need to retry anyway.

Why isn't it necessary to use READ_ONCE(dentry->d_parent) here?

	if (unlikely(parent != dentry->d_parent)) {

Suppose 'parent' is 0xAAAABBBB, and 'dentry->d_parent' is 0xAAAAAAAA and is
concurrently changed to 0xBBBBBBBB.

d_parent could be read in two parts, 0xAAAA then 0xBBBB, resulting in it
appearing that d_parent == 0xAAAABBBB == parent.

Yes it won't really be compiled as that in practice, but I thought the point of
READ_ONCE() is to *guarantee* it's really done right...

- Eric

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dcache locking question
  2019-03-15 17:38     ` Eric Biggers
@ 2019-03-15 18:54       ` Al Viro
  2019-03-16 22:31         ` Paul E. McKenney
  0 siblings, 1 reply; 17+ messages in thread
From: Al Viro @ 2019-03-15 18:54 UTC (permalink / raw)
  To: Eric Biggers; +Cc: Tobin C. Harding, linux-fsdevel, Paul E. McKenney

On Fri, Mar 15, 2019 at 10:38:23AM -0700, Eric Biggers wrote:
> On Fri, Mar 15, 2019 at 01:50:21AM +0000, Al Viro wrote:
> > 
> > If it fails, we call __lock_parent().  Which
> > 	* grabs RCU lock
> > 	* drops ->d_lock (now we are not holding ->d_lock
> > on anything).
> > 	* fetches ->d_parent.  Note the READ_ONCE() there -
> > it's *NOT* stable (no ->d_lock held).  We can't expect
> > that ->d_parent won't change or that the reference it used
> > to contribute to parent's refcount is there anymore; as
> > the matter of fact, the only thing that prevents outright
> > _freeing_ of the object 'parent' points to is rcu_read_lock()
> > and RCU delay between dropping the last reference and
> > actual freeing of the sucker.  rcu_read_lock() is there,
> > though, which makes it safe to grab ->d_lock on 'parent'.
> > 
> > That 'parent' might very well have nothing to do with our
> > dentry by now.  We can check if it's equal to its
> > ->d_parent, though.  dentry->d_parent is *NOT* stable
> > at that point.  It might be changing right now.
> > 
> > However, the first store to dentry->d_parent making it
> > not equal to parent would have been done under parent->d_lock.
> > And since we are holding parent->d_lock, we won't miss that
> > store.  We might miss subsequent ones, but if we observe
> > dentry->d_parent == parent, we know that it's stable.  And
> > if we see dentry->d_parent != parent, we know that dentry
> > has moved around and we need to retry anyway.
> 
> Why isn't it necessary to use READ_ONCE(dentry->d_parent) here?
> 
> 	if (unlikely(parent != dentry->d_parent)) {
> 
> Suppose 'parent' is 0xAAAABBBB, and 'dentry->d_parent' is 0xAAAAAAAA and is
> concurrently changed to 0xBBBBBBBB.
> 
> d_parent could be read in two parts, 0xAAAA then 0xBBBB, resulting in it
> appearing that d_parent == 0xAAAABBBB == parent.
> 
> Yes it won't really be compiled as that in practice, but I thought the point of
> READ_ONCE() is to *guarantee* it's really done right...

READ_ONCE does not add any extra warranties of atomicity.  Fetches and stores
of pointers are atomic, period; if that ever breaks, we are in a very deep
trouble all over the place.

What's more, spin_lock acts as a compiler barrier and, on SMP, is an ACQUIRE
operation.  So that second fetch of ->d_parent will happen after we grab
parent->d_lock, from everyone's POV.  Critical areas for the same spinlock
are ordered wrt each other.  So we have observed
	FETCH dentry->d_parent => parent
	LOCK parent->d_lock
	FETCH dentry->d_parent => parent
All stores to dentry->d_parent are done with ->d_lock held on dentry,
old value of dentry->d_parent *and* new value of dentry->d_parent.  So
the second fetch is ordered wrt all stores making dentry->d_parent change
to parent and all stores making it change *from* parent.  We might miss
some stores changing it from one value other than parent to another such,
but the predicate itself is fine and will stay fine until we drop
parent->d_lock.

Paul, could you comment on that one?  The function in question is
this:
static struct dentry *__lock_parent(struct dentry *dentry)
{
        struct dentry *parent;
        rcu_read_lock();
        spin_unlock(&dentry->d_lock);
again:
        parent = READ_ONCE(dentry->d_parent);
        spin_lock(&parent->d_lock);
        /*
         * We can't blindly lock dentry until we are sure
         * that we won't violate the locking order.
         * Any changes of dentry->d_parent must have
         * been done with parent->d_lock held, so
         * spin_lock() above is enough of a barrier
         * for checking if it's still our child.
         */
        if (unlikely(parent != dentry->d_parent)) {
                spin_unlock(&parent->d_lock);
                goto again;
        }
        rcu_read_unlock();
        if (parent != dentry)
                spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
        else
                parent = NULL;
        return parent;
}
(in fs/dcache.c) and all stores to ->d_parent are guaranteed to be done
under ->d_lock on dentry itself and ->d_lock on both old and new values
of ->d_parent.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dcache locking question
  2019-03-15 18:54       ` Al Viro
@ 2019-03-16 22:31         ` Paul E. McKenney
  2019-03-17  0:18           ` Al Viro
  0 siblings, 1 reply; 17+ messages in thread
From: Paul E. McKenney @ 2019-03-16 22:31 UTC (permalink / raw)
  To: Al Viro; +Cc: Eric Biggers, Tobin C. Harding, linux-fsdevel

On Fri, Mar 15, 2019 at 06:54:55PM +0000, Al Viro wrote:
> On Fri, Mar 15, 2019 at 10:38:23AM -0700, Eric Biggers wrote:
> > On Fri, Mar 15, 2019 at 01:50:21AM +0000, Al Viro wrote:
> > > 
> > > If it fails, we call __lock_parent().  Which
> > > 	* grabs RCU lock
> > > 	* drops ->d_lock (now we are not holding ->d_lock
> > > on anything).
> > > 	* fetches ->d_parent.  Note the READ_ONCE() there -
> > > it's *NOT* stable (no ->d_lock held).  We can't expect
> > > that ->d_parent won't change or that the reference it used
> > > to contribute to parent's refcount is there anymore; as
> > > the matter of fact, the only thing that prevents outright
> > > _freeing_ of the object 'parent' points to is rcu_read_lock()
> > > and RCU delay between dropping the last reference and
> > > actual freeing of the sucker.  rcu_read_lock() is there,
> > > though, which makes it safe to grab ->d_lock on 'parent'.
> > > 
> > > That 'parent' might very well have nothing to do with our
> > > dentry by now.  We can check if it's equal to its
> > > ->d_parent, though.  dentry->d_parent is *NOT* stable
> > > at that point.  It might be changing right now.
> > > 
> > > However, the first store to dentry->d_parent making it
> > > not equal to parent would have been done under parent->d_lock.
> > > And since we are holding parent->d_lock, we won't miss that
> > > store.  We might miss subsequent ones, but if we observe
> > > dentry->d_parent == parent, we know that it's stable.  And
> > > if we see dentry->d_parent != parent, we know that dentry
> > > has moved around and we need to retry anyway.
> > 
> > Why isn't it necessary to use READ_ONCE(dentry->d_parent) here?
> > 
> > 	if (unlikely(parent != dentry->d_parent)) {
> > 
> > Suppose 'parent' is 0xAAAABBBB, and 'dentry->d_parent' is 0xAAAAAAAA and is
> > concurrently changed to 0xBBBBBBBB.
> > 
> > d_parent could be read in two parts, 0xAAAA then 0xBBBB, resulting in it
> > appearing that d_parent == 0xAAAABBBB == parent.
> > 
> > Yes it won't really be compiled as that in practice, but I thought the point of
> > READ_ONCE() is to *guarantee* it's really done right...
> 
> READ_ONCE does not add any extra warranties of atomicity.  Fetches and stores
> of pointers are atomic, period; if that ever breaks, we are in a very deep
> trouble all over the place.

Stepping back from the specifics of this particular function for a moment,
it would be unusual for a compiler to tear a pointer load or store,
with non-tearing of pointer stores assuming values of pointers aren't
known at compile time.  Compilers can and do fuse both loads and stores,
and they can and do repeat loads.

Having gotten that off my chest...  ;-)

> What's more, spin_lock acts as a compiler barrier and, on SMP, is an ACQUIRE
> operation.  So that second fetch of ->d_parent will happen after we grab
> parent->d_lock, from everyone's POV.  Critical areas for the same spinlock
> are ordered wrt each other.  So we have observed
> 	FETCH dentry->d_parent => parent
> 	LOCK parent->d_lock
> 	FETCH dentry->d_parent => parent
> All stores to dentry->d_parent are done with ->d_lock held on dentry,
> old value of dentry->d_parent *and* new value of dentry->d_parent.  So
> the second fetch is ordered wrt all stores making dentry->d_parent change
> to parent and all stores making it change *from* parent.  We might miss
> some stores changing it from one value other than parent to another such,
> but the predicate itself is fine and will stay fine until we drop
> parent->d_lock.
> 
> Paul, could you comment on that one?  The function in question is
> this:
> static struct dentry *__lock_parent(struct dentry *dentry)
> {
>         struct dentry *parent;
>         rcu_read_lock();
>         spin_unlock(&dentry->d_lock);
> again:
>         parent = READ_ONCE(dentry->d_parent);
>         spin_lock(&parent->d_lock);
>         /*
>          * We can't blindly lock dentry until we are sure
>          * that we won't violate the locking order.
>          * Any changes of dentry->d_parent must have
>          * been done with parent->d_lock held, so
>          * spin_lock() above is enough of a barrier
>          * for checking if it's still our child.
>          */
>         if (unlikely(parent != dentry->d_parent)) {

We are talking about the above access to dentry->d_parent, correct?

Given that dentry->d_parent isn't updated except when holding
dentry->d_lock, and given that locks do what they are supposed to,
both in terms of memory ordering and mutual exclusion, the value of
dentry->d_parent must be constant throughout the current critical section.
It therefore doesn't matter what strange code the compiler generates,
it will get the same value regardless.

So, no, there is absolutely no point in doing this:

	if (unlikely(parent != READ_ONCE(dentry->d_parent))) {

Or did I miss the point of this discussion?

							Thanx, Paul

>                 spin_unlock(&parent->d_lock);
>                 goto again;
>         }
>         rcu_read_unlock();
>         if (parent != dentry)
>                 spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
>         else
>                 parent = NULL;
>         return parent;
> }
> (in fs/dcache.c) and all stores to ->d_parent are guaranteed to be done
> under ->d_lock on dentry itself and ->d_lock on both old and new values
> of ->d_parent.
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dcache locking question
  2019-03-16 22:31         ` Paul E. McKenney
@ 2019-03-17  0:18           ` Al Viro
  2019-03-17  0:50             ` Paul E. McKenney
  0 siblings, 1 reply; 17+ messages in thread
From: Al Viro @ 2019-03-17  0:18 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: Eric Biggers, Tobin C. Harding, linux-fsdevel

On Sat, Mar 16, 2019 at 03:31:28PM -0700, Paul E. McKenney wrote:
> > Paul, could you comment on that one?  The function in question is
> > this:
> > static struct dentry *__lock_parent(struct dentry *dentry)
> > {
> >         struct dentry *parent;
> >         rcu_read_lock();
> >         spin_unlock(&dentry->d_lock);
> > again:
> >         parent = READ_ONCE(dentry->d_parent);
> >         spin_lock(&parent->d_lock);
> >         /*
> >          * We can't blindly lock dentry until we are sure
> >          * that we won't violate the locking order.
> >          * Any changes of dentry->d_parent must have
> >          * been done with parent->d_lock held, so
> >          * spin_lock() above is enough of a barrier
> >          * for checking if it's still our child.
> >          */
> >         if (unlikely(parent != dentry->d_parent)) {
> 
> We are talking about the above access to dentry->d_parent, correct?
> 
> Given that dentry->d_parent isn't updated except when holding
> dentry->d_lock, and given that locks do what they are supposed to,
> both in terms of memory ordering and mutual exclusion, the value of
> dentry->d_parent must be constant throughout the current critical section.
> It therefore doesn't matter what strange code the compiler generates,
> it will get the same value regardless.
> 
> So, no, there is absolutely no point in doing this:
> 
> 	if (unlikely(parent != READ_ONCE(dentry->d_parent))) {
> 
> Or did I miss the point of this discussion?

dentry->d_parent is *NOT* guaranteed to be stable here - we are not
holding dentry->d_lock.  So it can change; what it can't do is to
change to or from the pointer we have in 'parent'.

IOW, the value of dentry->d_parent isn't stable; the result of
comparison, OTOH, is.

I also think that READ_ONCE would be useless here - reordering
can't happen due to compiler barrier in spin_lock() and memory
barriers should be provided by ACQUIRE/RELEASE on parent->d_lock.

We are observing a result of some store.  If we fetch the value
equal to 'parent', that store had been under parent->d_lock *AND*
any subsequent store changing the value of dentry->d_parent away
from 'parent' would also have been under parent->d_lock.  So
either it hasn't happened yet, or we would've observed its
results, since we are holding parent->lock ourselves.

If we fetch the value not equal to 'parent', any subsequent store
making dentry->d_parent equal to parent would've been done under
parent->d_lock, so again, it either has not happened yet, or we
would've observed its results.

FWIW, the whole point here is that we can't grab the lock sufficient
to stabilize dentry->d_parent here - not until we'd verified that
dentry still has the same parent.  We could play with trylock, but
AFAICS the trick above is sufficient (and gets fewer retries).

What I'm not certain about is whether we need READ_ONCE() on the
other fetch in there.  Suppose that gets replaced with plain
assignment; what problems could that cause?  It obviously can't
get reordered past the spin_lock() - the address of spinlock
is derived from the value we fetch.  And in case of retry...
wouldn't spin_unlock(parent->d_lock) we do there supply a compiler
barrier?

I realize that READ_ONCE() gives smp_read_barrier_depends(), but
do we need it here?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dcache locking question
  2019-03-17  0:18           ` Al Viro
@ 2019-03-17  0:50             ` Paul E. McKenney
  2019-03-17  2:20               ` James Bottomley
  0 siblings, 1 reply; 17+ messages in thread
From: Paul E. McKenney @ 2019-03-17  0:50 UTC (permalink / raw)
  To: Al Viro; +Cc: Eric Biggers, Tobin C. Harding, linux-fsdevel

On Sun, Mar 17, 2019 at 12:18:40AM +0000, Al Viro wrote:
> On Sat, Mar 16, 2019 at 03:31:28PM -0700, Paul E. McKenney wrote:
> > > Paul, could you comment on that one?  The function in question is
> > > this:
> > > static struct dentry *__lock_parent(struct dentry *dentry)
> > > {
> > >         struct dentry *parent;
> > >         rcu_read_lock();
> > >         spin_unlock(&dentry->d_lock);
> > > again:
> > >         parent = READ_ONCE(dentry->d_parent);
> > >         spin_lock(&parent->d_lock);
> > >         /*
> > >          * We can't blindly lock dentry until we are sure
> > >          * that we won't violate the locking order.
> > >          * Any changes of dentry->d_parent must have
> > >          * been done with parent->d_lock held, so
> > >          * spin_lock() above is enough of a barrier
> > >          * for checking if it's still our child.
> > >          */
> > >         if (unlikely(parent != dentry->d_parent)) {
> > 
> > We are talking about the above access to dentry->d_parent, correct?
> > 
> > Given that dentry->d_parent isn't updated except when holding
> > dentry->d_lock, and given that locks do what they are supposed to,
> > both in terms of memory ordering and mutual exclusion, the value of
> > dentry->d_parent must be constant throughout the current critical section.
> > It therefore doesn't matter what strange code the compiler generates,
> > it will get the same value regardless.
> > 
> > So, no, there is absolutely no point in doing this:
> > 
> > 	if (unlikely(parent != READ_ONCE(dentry->d_parent))) {
> > 
> > Or did I miss the point of this discussion?
> 
> dentry->d_parent is *NOT* guaranteed to be stable here - we are not
> holding dentry->d_lock.  So it can change; what it can't do is to
> change to or from the pointer we have in 'parent'.

Color me blind to "dentry" vs. "parent".  :-/

> IOW, the value of dentry->d_parent isn't stable; the result of
> comparison, OTOH, is.
> 
> I also think that READ_ONCE would be useless here - reordering
> can't happen due to compiler barrier in spin_lock() and memory
> barriers should be provided by ACQUIRE/RELEASE on parent->d_lock.
> 
> We are observing a result of some store.  If we fetch the value
> equal to 'parent', that store had been under parent->d_lock *AND*
> any subsequent store changing the value of dentry->d_parent away
> from 'parent' would also have been under parent->d_lock.  So
> either it hasn't happened yet, or we would've observed its
> results, since we are holding parent->lock ourselves.
> 
> If we fetch the value not equal to 'parent', any subsequent store
> making dentry->d_parent equal to parent would've been done under
> parent->d_lock, so again, it either has not happened yet, or we
> would've observed its results.
> 
> FWIW, the whole point here is that we can't grab the lock sufficient
> to stabilize dentry->d_parent here - not until we'd verified that
> dentry still has the same parent.  We could play with trylock, but
> AFAICS the trick above is sufficient (and gets fewer retries).

I can come up with sequences of values that could mimic the value of
'parent', but that assumes load tearing.  I -have- seen stores of constant
values be torn, but not stores of runtime-variable values and not loads.
Still, such tearing is permitted, and including the READ_ONCE() is making
it easier for things like thread sanitizers.  In addition, the READ_ONCE()
makes it clear that the value being loaded is unstable, which can be
useful documentation.

And yes, I am a bit paranoid about this sort of thing.  Something about
talking to a lot of compiler writers and standards committee members...

But at the end of the day, you are the maintainer of this code.

> What I'm not certain about is whether we need READ_ONCE() on the
> other fetch in there.  Suppose that gets replaced with plain
> assignment; what problems could that cause?  It obviously can't
> get reordered past the spin_lock() - the address of spinlock
> is derived from the value we fetch.  And in case of retry...
> wouldn't spin_unlock(parent->d_lock) we do there supply a compiler
> barrier?

Plus all callers to a spin_trylock() before invoking this function,
which does prevent the compiler from repeating the read (no reason to
in this case) or fusing with some other read.  Of course, I still like
the documentation and sanitizer-tooling benefits.

> I realize that READ_ONCE() gives smp_read_barrier_depends(), but
> do we need it here?

We need smp_read_barrier_depends() if we dereference the loaded pointer
using some operation that did not imply memory ordering, which does
happen, for example, in ecryptfs_symlink().  But before the caller has
a chance to do that, there is that spin_lock_nested(), which provides
the needed ordering.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dcache locking question
  2019-03-17  0:50             ` Paul E. McKenney
@ 2019-03-17  2:20               ` James Bottomley
  2019-03-17  3:06                 ` Al Viro
  0 siblings, 1 reply; 17+ messages in thread
From: James Bottomley @ 2019-03-17  2:20 UTC (permalink / raw)
  To: paulmck, Al Viro; +Cc: Eric Biggers, Tobin C. Harding, linux-fsdevel

On Sat, 2019-03-16 at 17:50 -0700, Paul E. McKenney wrote:
[...]
>  I -have- seen stores of constant values be torn, but not stores of
> runtime-variable values and not loads.  Still, such tearing is
> permitted, and including the READ_ONCE() is making it easier for
> things like thread sanitizers.  In addition, the READ_ONCE() makes it
> clear that the value being loaded is unstable, which can be
> useful documentation.

Um, just so I'm clear, because this assumption permeates all our code: 
load or store tearing can never occur if we're doing load or store of a
32 bit value which is naturally aligned.  Where naturally aligned is
within the gift of the CPU to determine but which the compiler or
kernel will always ensure for us unless we pack the structure or
deliberately misalign the allocation.

James


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dcache locking question
  2019-03-17  2:20               ` James Bottomley
@ 2019-03-17  3:06                 ` Al Viro
  2019-03-17  4:23                   ` James Bottomley
  0 siblings, 1 reply; 17+ messages in thread
From: Al Viro @ 2019-03-17  3:06 UTC (permalink / raw)
  To: James Bottomley; +Cc: paulmck, Eric Biggers, Tobin C. Harding, linux-fsdevel

On Sat, Mar 16, 2019 at 07:20:20PM -0700, James Bottomley wrote:
> On Sat, 2019-03-16 at 17:50 -0700, Paul E. McKenney wrote:
> [...]
> >  I -have- seen stores of constant values be torn, but not stores of
> > runtime-variable values and not loads.  Still, such tearing is
> > permitted, and including the READ_ONCE() is making it easier for
> > things like thread sanitizers.  In addition, the READ_ONCE() makes it
> > clear that the value being loaded is unstable, which can be
> > useful documentation.
> 
> Um, just so I'm clear, because this assumption permeates all our code: 
> load or store tearing can never occur if we're doing load or store of a
> 32 bit value which is naturally aligned.  Where naturally aligned is
> within the gift of the CPU to determine but which the compiler or
> kernel will always ensure for us unless we pack the structure or
> deliberately misalign the allocation.

Wait a sec; are there any 64bit architectures where the same is not
guaranteed for dereferencing properly aligned void **?

If that's the case, I can think of quite a few places that are rather
dubious, and I don't see how READ_ONCE() could help in those - e.g.
if an architecture only has 32bit loads, rcu list traversals are
not going to be doable without one hell of an extra headache.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dcache locking question
  2019-03-17  3:06                 ` Al Viro
@ 2019-03-17  4:23                   ` James Bottomley
  2019-03-18  0:35                     ` Paul E. McKenney
  0 siblings, 1 reply; 17+ messages in thread
From: James Bottomley @ 2019-03-17  4:23 UTC (permalink / raw)
  To: Al Viro; +Cc: paulmck, Eric Biggers, Tobin C. Harding, linux-fsdevel

On Sun, 2019-03-17 at 03:06 +0000, Al Viro wrote:
> On Sat, Mar 16, 2019 at 07:20:20PM -0700, James Bottomley wrote:
> > On Sat, 2019-03-16 at 17:50 -0700, Paul E. McKenney wrote:
> > [...]
> > >  I -have- seen stores of constant values be torn, but not stores
> > > of runtime-variable values and not loads.  Still, such tearing is
> > > permitted, and including the READ_ONCE() is making it easier for
> > > things like thread sanitizers.  In addition, the READ_ONCE()
> > > makes it clear that the value being loaded is unstable, which can
> > > be useful documentation.
> > 
> > Um, just so I'm clear, because this assumption permeates all our
> > code: load or store tearing can never occur if we're doing load or
> > store of a 32 bit value which is naturally aligned.  Where
> > naturally aligned is within the gift of the CPU to determine but
> > which the compiler or kernel will always ensure for us unless we
> > pack the structure or deliberately misalign the allocation.
> 
> Wait a sec; are there any 64bit architectures where the same is not
> guaranteed for dereferencing properly aligned void **?

Yes, naturally alligned void * dereference shouldn't tear either.  I
was just using 32 bit as my example because 64 bit accesses will tear
on 32 bit architectures but 64 bit naturally aligned accesses shouldn't
tear on 64 bit architectures.  However, since we can't guarantee the 64
bitness of the architecture 32 bit or void * is our gold standard for
not tearing.

James


> If that's the case, I can think of quite a few places that are rather
> dubious, and I don't see how READ_ONCE() could help in those - e.g.
> if an architecture only has 32bit loads, rcu list traversals are
> not going to be doable without one hell of an extra headache.
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dcache locking question
  2019-03-17  4:23                   ` James Bottomley
@ 2019-03-18  0:35                     ` Paul E. McKenney
  2019-03-18 16:26                       ` James Bottomley
  0 siblings, 1 reply; 17+ messages in thread
From: Paul E. McKenney @ 2019-03-18  0:35 UTC (permalink / raw)
  To: James Bottomley; +Cc: Al Viro, Eric Biggers, Tobin C. Harding, linux-fsdevel

On Sat, Mar 16, 2019 at 09:23:16PM -0700, James Bottomley wrote:
> On Sun, 2019-03-17 at 03:06 +0000, Al Viro wrote:
> > On Sat, Mar 16, 2019 at 07:20:20PM -0700, James Bottomley wrote:
> > > On Sat, 2019-03-16 at 17:50 -0700, Paul E. McKenney wrote:
> > > [...]
> > > >  I -have- seen stores of constant values be torn, but not stores
> > > > of runtime-variable values and not loads.  Still, such tearing is
> > > > permitted, and including the READ_ONCE() is making it easier for
> > > > things like thread sanitizers.  In addition, the READ_ONCE()
> > > > makes it clear that the value being loaded is unstable, which can
> > > > be useful documentation.
> > > 
> > > Um, just so I'm clear, because this assumption permeates all our
> > > code: load or store tearing can never occur if we're doing load or
> > > store of a 32 bit value which is naturally aligned.  Where
> > > naturally aligned is within the gift of the CPU to determine but
> > > which the compiler or kernel will always ensure for us unless we
> > > pack the structure or deliberately misalign the allocation.

A non-volatile store of certain 32-bit constants can and does tear
on some architectures.  These architectures would be the ones with a
store-immediate instruction with a small immediate field, and where the
32-bit constant is such that a pair of 16-bit immediate store instructions
can store that value.

There was a bug in an old version of GCC where even volatile 32-bit stores
of these constants would tear.  They did fix the bug, but it took some
time to find a GCC person who understood that this was in fact a bug.

Hence my preference for READ_ONCE() and WRITE_ONCE() for data-racing
loads and stores.

> > Wait a sec; are there any 64bit architectures where the same is not
> > guaranteed for dereferencing properly aligned void **?
> 
> Yes, naturally alligned void * dereference shouldn't tear either.  I
> was just using 32 bit as my example because 64 bit accesses will tear
> on 32 bit architectures but 64 bit naturally aligned accesses shouldn't
> tear on 64 bit architectures.  However, since we can't guarantee the 64
> bitness of the architecture 32 bit or void * is our gold standard for
> not tearing.

For stores of quantities not known at compiler time, agreed.  But that
same store-immediate situation could happen on 64-bit systems.

> James
> 
> 
> > If that's the case, I can think of quite a few places that are rather
> > dubious, and I don't see how READ_ONCE() could help in those - e.g.
> > if an architecture only has 32bit loads, rcu list traversals are
> > not going to be doable without one hell of an extra headache.

All the 64-bit systems that run the Linux kernel do have 64-bit load
instructions and rcu_dereference() uses READ_ONCE() internally, so we
should be fine with RCU list traverals.

							Thanx, Paul


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dcache locking question
  2019-03-18  0:35                     ` Paul E. McKenney
@ 2019-03-18 16:26                       ` James Bottomley
  2019-03-18 17:11                         ` Paul E. McKenney
  0 siblings, 1 reply; 17+ messages in thread
From: James Bottomley @ 2019-03-18 16:26 UTC (permalink / raw)
  To: paulmck; +Cc: Al Viro, Eric Biggers, Tobin C. Harding, linux-fsdevel

On Sun, 2019-03-17 at 17:35 -0700, Paul E. McKenney wrote:
> On Sat, Mar 16, 2019 at 09:23:16PM -0700, James Bottomley wrote:
> > On Sun, 2019-03-17 at 03:06 +0000, Al Viro wrote:
> > > On Sat, Mar 16, 2019 at 07:20:20PM -0700, James Bottomley wrote:
> > > > On Sat, 2019-03-16 at 17:50 -0700, Paul E. McKenney wrote:
> > > > [...]
> > > > >  I -have- seen stores of constant values be torn, but not
> > > > > stores of runtime-variable values and not loads.  Still, such
> > > > > tearing is permitted, and including the READ_ONCE() is making
> > > > > it easier for things like thread sanitizers.  In addition,
> > > > > the READ_ONCE() makes it clear that the value being loaded is
> > > > > unstable, which can be useful documentation.
> > > > 
> > > > Um, just so I'm clear, because this assumption permeates all
> > > > our code: load or store tearing can never occur if we're doing
> > > > load or store of a 32 bit value which is naturally
> > > > aligned.  Where naturally aligned is within the gift of the CPU
> > > > to determine but which the compiler or kernel will always
> > > > ensure for us unless we pack the structure or deliberately
> > > > misalign the allocation.
> 
> A non-volatile store of certain 32-bit constants can and does tear
> on some architectures.  These architectures would be the ones with a
> store-immediate instruction with a small immediate field, and where
> the 32-bit constant is such that a pair of 16-bit immediate store
> instructions can store that value.

Understood: PA-RISC is one such architecture: our ldil (load immediate
long) can only take 21 bits of immediate data and you have to use a
second instruction (ldo) to get the remaining 11 bits. However, the
compiler guarantees no tearing in memory visibility for PA by doing the
lidl/ldo sequence on a register and then writing the register to memory
which I believe is an architectural guarantee.

> There was a bug in an old version of GCC where even volatile 32-bit
> stores of these constants would tear.  They did fix the bug, but it
> took some time to find a GCC person who understood that this was in
> fact a bug.
> 
> Hence my preference for READ_ONCE() and WRITE_ONCE() for data-racing
> loads and stores.

OK, but didn't everyone eventually agree this was a compiler bug?

> > > Wait a sec; are there any 64bit architectures where the same is
> > > not guaranteed for dereferencing properly aligned void **?
> > 
> > Yes, naturally alligned void * dereference shouldn't tear
> > either.  Iwas just using 32 bit as my example because 64 bit
> > accesses will tear on 32 bit architectures but 64 bit naturally
> > aligned accesses shouldn't tear on 64 bit architectures.  However,
> > since we can't guarantee the 64 bitness of the architecture 32 bit
> > or void * is our gold standard for not tearing.
> 
> For stores of quantities not known at compiler time, agreed.  But
> that same store-immediate situation could happen on 64-bit systems.
> 
> > James
> > 
> > 
> > > If that's the case, I can think of quite a few places that are
> > > rather dubious, and I don't see how READ_ONCE() could help in
> > > those - e.g. if an architecture only has 32bit loads, rcu list
> > > traversals are not going to be doable without one hell of an
> > > extra headache.
> 
> All the 64-bit systems that run the Linux kernel do have 64-bit load
> instructions and rcu_dereference() uses READ_ONCE() internally, so we
> should be fine with RCU list traverals.

I really don't think it's possible to get the same immediate constant
tearing bug on 64 bit.  If you look at PA, we have no 64 bit
equivalent of the ldil/ldo pair so all 64 bit immediate stores come
straight from the global data table via a register, so no tearing.  I
bet every 64 bit architecture has a similar approach because 64 bit
immediate data just requires too many bits to stuff into an instruction
pair.

James


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dcache locking question
  2019-03-18 16:26                       ` James Bottomley
@ 2019-03-18 17:11                         ` Paul E. McKenney
  2019-03-19 15:45                           ` Paul E. McKenney
  0 siblings, 1 reply; 17+ messages in thread
From: Paul E. McKenney @ 2019-03-18 17:11 UTC (permalink / raw)
  To: James Bottomley; +Cc: Al Viro, Eric Biggers, Tobin C. Harding, linux-fsdevel

On Mon, Mar 18, 2019 at 09:26:18AM -0700, James Bottomley wrote:
> On Sun, 2019-03-17 at 17:35 -0700, Paul E. McKenney wrote:
> > On Sat, Mar 16, 2019 at 09:23:16PM -0700, James Bottomley wrote:
> > > On Sun, 2019-03-17 at 03:06 +0000, Al Viro wrote:
> > > > On Sat, Mar 16, 2019 at 07:20:20PM -0700, James Bottomley wrote:
> > > > > On Sat, 2019-03-16 at 17:50 -0700, Paul E. McKenney wrote:
> > > > > [...]
> > > > > >  I -have- seen stores of constant values be torn, but not
> > > > > > stores of runtime-variable values and not loads.  Still, such
> > > > > > tearing is permitted, and including the READ_ONCE() is making
> > > > > > it easier for things like thread sanitizers.  In addition,
> > > > > > the READ_ONCE() makes it clear that the value being loaded is
> > > > > > unstable, which can be useful documentation.
> > > > > 
> > > > > Um, just so I'm clear, because this assumption permeates all
> > > > > our code: load or store tearing can never occur if we're doing
> > > > > load or store of a 32 bit value which is naturally
> > > > > aligned.  Where naturally aligned is within the gift of the CPU
> > > > > to determine but which the compiler or kernel will always
> > > > > ensure for us unless we pack the structure or deliberately
> > > > > misalign the allocation.
> > 
> > A non-volatile store of certain 32-bit constants can and does tear
> > on some architectures.  These architectures would be the ones with a
> > store-immediate instruction with a small immediate field, and where
> > the 32-bit constant is such that a pair of 16-bit immediate store
> > instructions can store that value.
> 
> Understood: PA-RISC is one such architecture: our ldil (load immediate
> long) can only take 21 bits of immediate data and you have to use a
> second instruction (ldo) to get the remaining 11 bits. However, the
> compiler guarantees no tearing in memory visibility for PA by doing the
> lidl/ldo sequence on a register and then writing the register to memory
> which I believe is an architectural guarantee.

Good to know, thank you!

> > There was a bug in an old version of GCC where even volatile 32-bit
> > stores of these constants would tear.  They did fix the bug, but it
> > took some time to find a GCC person who understood that this was in
> > fact a bug.
> > 
> > Hence my preference for READ_ONCE() and WRITE_ONCE() for data-racing
> > loads and stores.
> 
> OK, but didn't everyone eventually agree this was a compiler bug?

They did agree, but only in the case where the store was volatile,
as in WRITE_ONCE(), and -not- in the case of a plain store.

At least the kernel doesn't make general use of vector instructions.
If it did, I would not be surprised to see compilers use three 32-bit
vector stores to store to a 32-bit int adjacent to a 64-bit pointer.  :-/

							Thanx, Paul

> > > > Wait a sec; are there any 64bit architectures where the same is
> > > > not guaranteed for dereferencing properly aligned void **?
> > > 
> > > Yes, naturally alligned void * dereference shouldn't tear
> > > either.  Iwas just using 32 bit as my example because 64 bit
> > > accesses will tear on 32 bit architectures but 64 bit naturally
> > > aligned accesses shouldn't tear on 64 bit architectures.  However,
> > > since we can't guarantee the 64 bitness of the architecture 32 bit
> > > or void * is our gold standard for not tearing.
> > 
> > For stores of quantities not known at compiler time, agreed.  But
> > that same store-immediate situation could happen on 64-bit systems.
> > 
> > > James
> > > 
> > > 
> > > > If that's the case, I can think of quite a few places that are
> > > > rather dubious, and I don't see how READ_ONCE() could help in
> > > > those - e.g. if an architecture only has 32bit loads, rcu list
> > > > traversals are not going to be doable without one hell of an
> > > > extra headache.
> > 
> > All the 64-bit systems that run the Linux kernel do have 64-bit load
> > instructions and rcu_dereference() uses READ_ONCE() internally, so we
> > should be fine with RCU list traverals.
> 
> I really don't think it's possible to get the same immediate constant
> tearing bug on 64 bit.  If you look at PA, we have no 64 bit
> equivalent of the ldil/ldo pair so all 64 bit immediate stores come
> straight from the global data table via a register, so no tearing.  I
> bet every 64 bit architecture has a similar approach because 64 bit
> immediate data just requires too many bits to stuff into an instruction
> pair.
> 
> James
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: dcache locking question
  2019-03-18 17:11                         ` Paul E. McKenney
@ 2019-03-19 15:45                           ` Paul E. McKenney
  0 siblings, 0 replies; 17+ messages in thread
From: Paul E. McKenney @ 2019-03-19 15:45 UTC (permalink / raw)
  To: James Bottomley; +Cc: Al Viro, Eric Biggers, Tobin C. Harding, linux-fsdevel

On Mon, Mar 18, 2019 at 10:11:06AM -0700, Paul E. McKenney wrote:
> On Mon, Mar 18, 2019 at 09:26:18AM -0700, James Bottomley wrote:
> > On Sun, 2019-03-17 at 17:35 -0700, Paul E. McKenney wrote:
> > > On Sat, Mar 16, 2019 at 09:23:16PM -0700, James Bottomley wrote:
> > > > On Sun, 2019-03-17 at 03:06 +0000, Al Viro wrote:
> > > > > On Sat, Mar 16, 2019 at 07:20:20PM -0700, James Bottomley wrote:
> > > > > > On Sat, 2019-03-16 at 17:50 -0700, Paul E. McKenney wrote:
> > > > > > [...]
> > > > > > >  I -have- seen stores of constant values be torn, but not
> > > > > > > stores of runtime-variable values and not loads.  Still, such
> > > > > > > tearing is permitted, and including the READ_ONCE() is making
> > > > > > > it easier for things like thread sanitizers.  In addition,
> > > > > > > the READ_ONCE() makes it clear that the value being loaded is
> > > > > > > unstable, which can be useful documentation.
> > > > > > 
> > > > > > Um, just so I'm clear, because this assumption permeates all
> > > > > > our code: load or store tearing can never occur if we're doing
> > > > > > load or store of a 32 bit value which is naturally
> > > > > > aligned.  Where naturally aligned is within the gift of the CPU
> > > > > > to determine but which the compiler or kernel will always
> > > > > > ensure for us unless we pack the structure or deliberately
> > > > > > misalign the allocation.
> > > 
> > > A non-volatile store of certain 32-bit constants can and does tear
> > > on some architectures.  These architectures would be the ones with a
> > > store-immediate instruction with a small immediate field, and where
> > > the 32-bit constant is such that a pair of 16-bit immediate store
> > > instructions can store that value.
> > 
> > Understood: PA-RISC is one such architecture: our ldil (load immediate
> > long) can only take 21 bits of immediate data and you have to use a
> > second instruction (ldo) to get the remaining 11 bits. However, the
> > compiler guarantees no tearing in memory visibility for PA by doing the
> > lidl/ldo sequence on a register and then writing the register to memory
> > which I believe is an architectural guarantee.
> 
> Good to know, thank you!
> 
> > > There was a bug in an old version of GCC where even volatile 32-bit
> > > stores of these constants would tear.  They did fix the bug, but it
> > > took some time to find a GCC person who understood that this was in
> > > fact a bug.
> > > 
> > > Hence my preference for READ_ONCE() and WRITE_ONCE() for data-racing
> > > loads and stores.
> > 
> > OK, but didn't everyone eventually agree this was a compiler bug?
> 
> They did agree, but only in the case where the store was volatile,
> as in WRITE_ONCE(), and -not- in the case of a plain store.
> 
> At least the kernel doesn't make general use of vector instructions.
> If it did, I would not be surprised to see compilers use three 32-bit
> vector stores to store to a 32-bit int adjacent to a 64-bit pointer.  :-/

And it turns out that the CPU architecture in question was x86-64, for
whatever that is worth.  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55981

(There is also a later bug report dealing strictly with volatile, but
my search-engine skills are failing me this morning.)

							Thanx, Paul

> > > > > Wait a sec; are there any 64bit architectures where the same is
> > > > > not guaranteed for dereferencing properly aligned void **?
> > > > 
> > > > Yes, naturally alligned void * dereference shouldn't tear
> > > > either.  Iwas just using 32 bit as my example because 64 bit
> > > > accesses will tear on 32 bit architectures but 64 bit naturally
> > > > aligned accesses shouldn't tear on 64 bit architectures.  However,
> > > > since we can't guarantee the 64 bitness of the architecture 32 bit
> > > > or void * is our gold standard for not tearing.
> > > 
> > > For stores of quantities not known at compiler time, agreed.  But
> > > that same store-immediate situation could happen on 64-bit systems.
> > > 
> > > > James
> > > > 
> > > > 
> > > > > If that's the case, I can think of quite a few places that are
> > > > > rather dubious, and I don't see how READ_ONCE() could help in
> > > > > those - e.g. if an architecture only has 32bit loads, rcu list
> > > > > traversals are not going to be doable without one hell of an
> > > > > extra headache.
> > > 
> > > All the 64-bit systems that run the Linux kernel do have 64-bit load
> > > instructions and rcu_dereference() uses READ_ONCE() internally, so we
> > > should be fine with RCU list traverals.
> > 
> > I really don't think it's possible to get the same immediate constant
> > tearing bug on 64 bit.  If you look at PA, we have no 64 bit
> > equivalent of the ldil/ldo pair so all 64 bit immediate stores come
> > straight from the global data table via a register, so no tearing.  I
> > bet every 64 bit architecture has a similar approach because 64 bit
> > immediate data just requires too many bits to stuff into an instruction
> > pair.
> > 
> > James
> > 


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2019-03-19 15:45 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-14 22:56 dcache locking question Tobin C. Harding
2019-03-14 23:09 ` Matthew Wilcox
2019-03-15  1:38   ` Tobin C. Harding
2019-03-14 23:19 ` Tobin C. Harding
2019-03-15  1:50   ` Al Viro
2019-03-15 17:38     ` Eric Biggers
2019-03-15 18:54       ` Al Viro
2019-03-16 22:31         ` Paul E. McKenney
2019-03-17  0:18           ` Al Viro
2019-03-17  0:50             ` Paul E. McKenney
2019-03-17  2:20               ` James Bottomley
2019-03-17  3:06                 ` Al Viro
2019-03-17  4:23                   ` James Bottomley
2019-03-18  0:35                     ` Paul E. McKenney
2019-03-18 16:26                       ` James Bottomley
2019-03-18 17:11                         ` Paul E. McKenney
2019-03-19 15:45                           ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).