All of lore.kernel.org
 help / color / mirror / Atom feed
From: Will Deacon <will.deacon@arm.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Boqun Feng <boqun.feng@gmail.com>,
	Oleg Nesterov <oleg@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Michal Hocko <mhocko@kernel.org>,
	David Howells <dhowells@redhat.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Paul Mackerras <paulus@samba.org>
Subject: Re: [PATCH 4/4] locking: Introduce smp_cond_acquire()
Date: Thu, 12 Nov 2015 22:09:13 +0000	[thread overview]
Message-ID: <20151112220913.GF23979@arm.com> (raw)
In-Reply-To: <CA+55aFxr4JX8XEF12qCOjbj1r92w8yuHZ0vbUm9jgvY167Svsw@mail.gmail.com>

On Thu, Nov 12, 2015 at 10:21:39AM -0800, Linus Torvalds wrote:
> On Wed, Nov 11, 2015 at 11:14 PM, Boqun Feng <boqun.feng@gmail.com> wrote:
> >
> > Hmm.. probably incorrect.. because the ACQUIRE semantics of spin_lock()
> > only guarantees that the memory operations following spin_lock() can't
> > be reorder before the *LOAD* part of spin_lock() not the *STORE* part,
> > i.e. the case below can happen(assuming the spin_lock() is implemented
> > as ll/sc loop)
> >
> >         spin_lock(&lock):
> >           r1 = *lock; // LL, r1 == 0
> >         o = READ_ONCE(object); // could be reordered here.
> >           *lock = 1; // SC
> 
> It may be worth noting that at least in theory, not only reads may
> pass the store. If the spin-lock is done as
> 
>         r1 = *lock   // read-acquire, r1 == 0
>         *lock = 1    // SC
> 
> then even *writes* inside the locked region might pass up through the
> "*lock = 1".

Right, but only if we didn't have the control dependency to branch back
in the case that the SC failed. In that case, the lock would be broken
because you'd dive into the critical section even if you failed to take
the lock.

> So other CPU's - that haven't taken the spinlock - could see the
> modifications inside the critical region before they actually see the
> lock itself change.
> 
> Now, the point of spin_unlock_wait() (and "spin_is_locked()") should
> generally be that you have some external ordering guarantee that
> guarantees that the lock has been taken. For example, for the IPC
> semaphores, we do either one of:
> 
>  (a) get large lock, then - once you hold that lock - wait for each small lock
> 
> or
> 
>  (b) get small lock, then - once you hold that lock - check that the
> largo lock is unlocked
> 
> and that's the case we should really worry about.  The other uses of
> spin_unlock_wait() should have similar "I have other reasons to know
> I've seen that the lock was taken, or will never be taken after this
> because XYZ".
> 
> This is why powerpc has a memory barrier in "arch_spin_is_locked()".
> Exactly so that the "check that the other lock is unlocked" is
> guaranteed to be ordered wrt the store that gets the first lock.
> 
> It looks like ARM64 gets this wrong and is fundamentally buggy wrt
> "spin_is_locked()" (and, as a result, "spin_unlock_wait()").

I don't see how a memory barrier would help us here, and Boqun's example
had smp_mb() either sides of the spin_unlock_wait(). What we actually need
is to make spin_unlock_wait more like a LOCK operation, so that it forces
parallel lockers to replay their LL/SC sequences.

I'll write a patch once I've heard more back from Paul about
smp_mb__after_unlock_lock().

> BUT! And this is a bug BUT:
> 
> It should be noted that that is purely an ARM64 bug. Not a bug in our
> users. If you have a spinlock where the "get lock write" part of the
> lock can be delayed, then you have to have a "arch_spin_is_locked()"
> that has the proper memory barriers.
> 
> Of course, ARM still hides their architecture manuals in odd places,
> so I can't double-check. But afaik, ARM64 store-conditional is "store
> exclusive with release", and it has only release semantics, and ARM64
> really does have the above bug.

The store-conditional in our spin_lock routine has no barrier semantics;
they are enforced by the load-exclusive-acquire that pairs with it.

> On that note: can anybody point me to the latest ARM64 8.1
> architecture manual in pdf form, without the "you have to register"
> crap? I thought ARM released it, but all my googling just points to
> the idiotic ARM service center that wants me to sign away something
> just to see the docs. Which I don't do.

We haven't yet released a document for the 8.1 instructions, but I can
certainly send you a copy when it's made available.

Will

  reply	other threads:[~2015-11-12 22:09 UTC|newest]

Thread overview: 78+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-02 13:29 [PATCH 0/4] scheduler ordering bits Peter Zijlstra
2015-11-02 13:29 ` [PATCH 1/4] sched: Better document the try_to_wake_up() barriers Peter Zijlstra
2015-12-04  0:09   ` Byungchul Park
2015-12-04  0:58   ` Byungchul Park
2015-11-02 13:29 ` [PATCH 2/4] sched: Document Program-Order guarantees Peter Zijlstra
2015-11-02 20:27   ` Paul Turner
2015-11-02 20:34     ` Peter Zijlstra
2015-11-02 22:09       ` Paul Turner
2015-11-02 22:12         ` Peter Zijlstra
2015-11-20 10:02     ` Peter Zijlstra
2015-11-20 14:08       ` Boqun Feng
2015-11-20 14:18         ` Peter Zijlstra
2015-11-20 14:21           ` Boqun Feng
2015-11-20 19:41             ` Peter Zijlstra
2015-11-02 13:29 ` [PATCH 3/4] sched: Fix a race in try_to_wake_up() vs schedule() Peter Zijlstra
2015-11-02 13:29 ` [PATCH 4/4] locking: Introduce smp_cond_acquire() Peter Zijlstra
2015-11-02 13:57   ` Peter Zijlstra
2015-11-02 17:43     ` Will Deacon
2015-11-03  1:14       ` Paul E. McKenney
2015-11-03  1:25         ` Linus Torvalds
2015-11-02 17:42   ` Will Deacon
2015-11-02 18:08   ` Linus Torvalds
2015-11-02 18:37     ` Will Deacon
2015-11-02 19:17       ` Linus Torvalds
2015-11-02 19:57         ` Will Deacon
2015-11-02 20:23           ` Peter Zijlstra
2015-11-02 21:56         ` Peter Zijlstra
2015-11-03  1:57         ` Paul E. McKenney
2015-11-03 19:40           ` Linus Torvalds
2015-11-04  3:57             ` Paul E. McKenney
2015-11-04  4:43               ` Linus Torvalds
2015-11-04 12:54                 ` Paul E. McKenney
2015-11-02 20:36   ` David Howells
2015-11-02 20:40     ` Peter Zijlstra
2015-11-02 21:11     ` Linus Torvalds
2015-11-03 17:59   ` Oleg Nesterov
2015-11-03 18:23     ` Peter Zijlstra
2015-11-11  9:39     ` Boqun Feng
2015-11-11 10:34       ` Boqun Feng
2015-11-11 19:53         ` Oleg Nesterov
2015-11-12 13:50         ` Paul E. McKenney
2015-11-11 12:12       ` Peter Zijlstra
2015-11-11 19:39         ` Oleg Nesterov
2015-11-11 21:23           ` Linus Torvalds
2015-11-12  7:14           ` Boqun Feng
2015-11-12 10:28             ` Peter Zijlstra
2015-11-12 15:00             ` Oleg Nesterov
2015-11-12 14:40               ` Paul E. McKenney
2015-11-12 14:49                 ` Boqun Feng
2015-11-12 15:02                   ` Paul E. McKenney
2015-11-12 21:53                     ` Will Deacon
2015-11-12 14:50                 ` Peter Zijlstra
2015-11-12 15:01                   ` Paul E. McKenney
2015-11-12 15:08                     ` Peter Zijlstra
2015-11-12 15:20                       ` Paul E. McKenney
2015-11-12 21:25                         ` Will Deacon
2015-11-12 15:18               ` Boqun Feng
2015-11-12 18:38                 ` Oleg Nesterov
2015-11-12 18:02                   ` Peter Zijlstra
2015-11-12 19:33                     ` Oleg Nesterov
2015-11-12 18:59                       ` Paul E. McKenney
2015-11-12 21:33                         ` Will Deacon
2015-11-12 23:43                           ` Paul E. McKenney
2015-11-16 13:58                             ` Will Deacon
2015-11-12 18:21             ` Linus Torvalds
2015-11-12 22:09               ` Will Deacon [this message]
2015-11-16 15:56               ` Peter Zijlstra
2015-11-16 16:04                 ` Peter Zijlstra
2015-11-16 16:24                   ` Will Deacon
2015-11-16 16:44                     ` Paul E. McKenney
2015-11-16 16:46                       ` Will Deacon
2015-11-16 17:15                         ` Paul E. McKenney
2015-11-16 21:58                     ` Linus Torvalds
2015-11-17 11:51                       ` Will Deacon
2015-11-17 21:01                         ` Paul E. McKenney
2015-11-18 11:25                           ` Will Deacon
2015-11-19 18:01                             ` Will Deacon
2015-11-20 10:09                               ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151112220913.GF23979@arm.com \
    --to=will.deacon@arm.com \
    --cc=benh@kernel.crashing.org \
    --cc=boqun.feng@gmail.com \
    --cc=corbet@lwn.net \
    --cc=dhowells@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=oleg@redhat.com \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.