All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 1/3] arm64: spinlock: order spin_{is_locked, unlock_wait} against local locks
@ 2016-06-08 16:25 Will Deacon
  2016-06-08 16:25 ` [PATCH v2 2/3] arm64: spinlock: fix spin_unlock_wait for LSE atomics Will Deacon
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Will Deacon @ 2016-06-08 16:25 UTC (permalink / raw)
  To: linux-arm-kernel

spin_is_locked has grown two very different use-cases:

(1) [The sane case] API functions may require a certain lock to be held
    by the caller and can therefore use spin_is_locked as part of an
    assert statement in order to verify that the lock is indeed held.
    For example, usage of assert_spin_locked.

(2) [The insane case] There are two locks, where a CPU takes one of the
    locks and then checks whether or not the other one is held before
    accessing some shared state. For example, the "optimized locking" in
    ipc/sem.c.

In the latter case, the sequence looks like:

  spin_lock(&sem->lock);
  if (!spin_is_locked(&sma->sem_perm.lock))
    /* Access shared state */

and requires that the spin_is_locked check is ordered after taking the
sem->lock. Unfortunately, since our spinlocks are implemented using a
LDAXR/STXR sequence, the read of &sma->sem_perm.lock can be speculated
before the STXR and consequently return a stale value.

Whilst this hasn't been seen to cause issues in practice, PowerPC fixed
the same issue in 51d7d5205d33 ("powerpc: Add smp_mb() to
arch_spin_is_locked()") and, although we did something similar for
spin_unlock_wait in d86b8da04dfa ("arm64: spinlock: serialise
spin_unlock_wait against concurrent lockers") that doesn't actually take
care of ordering against local acquisition of a different lock.

This patch adds an smp_mb() to the start of our arch_spin_is_locked and
arch_spin_unlock_wait routines to ensure that the lock value is always
loaded after any other locks have been taken by the current CPU.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/spinlock.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
index fc9682bfe002..aac64d55cb22 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -31,6 +31,12 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
 	unsigned int tmp;
 	arch_spinlock_t lockval;
 
+	/*
+	 * Ensure prior spin_lock operations to other locks have completed
+	 * on this CPU before we test whether "lock" is locked.
+	 */
+	smp_mb();
+
 	asm volatile(
 "	sevl\n"
 "1:	wfe\n"
@@ -148,6 +154,7 @@ static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 
 static inline int arch_spin_is_locked(arch_spinlock_t *lock)
 {
+	smp_mb(); /* See arch_spin_unlock_wait */
 	return !arch_spin_value_unlocked(READ_ONCE(*lock));
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 2/3] arm64: spinlock: fix spin_unlock_wait for LSE atomics
  2016-06-08 16:25 [PATCH v2 1/3] arm64: spinlock: order spin_{is_locked, unlock_wait} against local locks Will Deacon
@ 2016-06-08 16:25 ` Will Deacon
  2016-06-08 16:25 ` [PATCH v2 3/3] arm64: spinlock: use lock->owner to optimise spin_unlock_wait Will Deacon
  2016-06-10 13:36 ` [PATCH v2 1/3] arm64: spinlock: order spin_{is_locked, unlock_wait} against local locks Mark Rutland
  2 siblings, 0 replies; 7+ messages in thread
From: Will Deacon @ 2016-06-08 16:25 UTC (permalink / raw)
  To: linux-arm-kernel

Commit d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against
concurrent lockers") fixed spin_unlock_wait for LL/SC-based atomics under
the premise that the LSE atomics (in particular, the LDADDA instruction)
are indivisible.

Unfortunately, these instructions are only indivisible when used with the
-AL (full ordering) suffix and, consequently, the same issue can
theoretically be observed with LSE atomics, where a later (in program
order) load can be speculated before the write portion of the atomic
operation.

This patch fixes the issue by performing a CAS of the lock once we've
established that it's unlocked, in much the same way as the LL/SC code.

Fixes: d86b8da04dfa ("arm64: spinlock: serialise spin_unlock_wait against concurrent lockers")
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/spinlock.h | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
index aac64d55cb22..d5c894253e73 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -43,13 +43,17 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
 "2:	ldaxr	%w0, %2\n"
 "	eor	%w1, %w0, %w0, ror #16\n"
 "	cbnz	%w1, 1b\n"
+	/* Serialise against any concurrent lockers */
 	ARM64_LSE_ATOMIC_INSN(
 	/* LL/SC */
 "	stxr	%w1, %w0, %2\n"
-"	cbnz	%w1, 2b\n", /* Serialise against any concurrent lockers */
-	/* LSE atomics */
 "	nop\n"
-"	nop\n")
+"	nop\n",
+	/* LSE atomics */
+"	mov	%w1, %w0\n"
+"	cas	%w0, %w0, %2\n"
+"	eor	%w1, %w1, %w0\n")
+"	cbnz	%w1, 2b\n"
 	: "=&r" (lockval), "=&r" (tmp), "+Q" (*lock)
 	:
 	: "memory");
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/3] arm64: spinlock: use lock->owner to optimise spin_unlock_wait
  2016-06-08 16:25 [PATCH v2 1/3] arm64: spinlock: order spin_{is_locked, unlock_wait} against local locks Will Deacon
  2016-06-08 16:25 ` [PATCH v2 2/3] arm64: spinlock: fix spin_unlock_wait for LSE atomics Will Deacon
@ 2016-06-08 16:25 ` Will Deacon
  2016-06-10 12:25   ` Peter Zijlstra
  2016-06-10 13:36 ` [PATCH v2 1/3] arm64: spinlock: order spin_{is_locked, unlock_wait} against local locks Mark Rutland
  2 siblings, 1 reply; 7+ messages in thread
From: Will Deacon @ 2016-06-08 16:25 UTC (permalink / raw)
  To: linux-arm-kernel

Rather than wait until we observe the lock being free, we can also
return from spin_unlock_wait if we observe that the lock is now held
by somebody else, which implies that it was unlocked but we just missed
seeing it in that state.

Furthermore, in such a scenario there is no longer a need to write back
the value that we loaded, since we know that there has been a lock
hand-off, which is sufficient to publish any stores prior to the
unlock_wait. The litmus test is something like:

AArch64
{
0:X1=x; 0:X3=y;
1:X1=y;
2:X1=y; 2:X3=x;
}
 P0          | P1           | P2           ;
 MOV W0,#1   | MOV W0,#1    | LDAR W0,[X1] ;
 STR W0,[X1] | STLR W0,[X1] | LDR W2,[X3]  ;
 DMB SY      |              |              ;
 LDR W2,[X3] |              |              ;
exists
(0:X2=0 /\ 2:X0=1 /\ 2:X2=0)

where P0 is doing spin_unlock_wait, P1 is doing spin_unlock and P2 is
doing spin_lock.

Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/spinlock.h | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
index d5c894253e73..e875a5a551d7 100644
--- a/arch/arm64/include/asm/spinlock.h
+++ b/arch/arm64/include/asm/spinlock.h
@@ -30,20 +30,39 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
 {
 	unsigned int tmp;
 	arch_spinlock_t lockval;
+	u32 owner;
 
 	/*
 	 * Ensure prior spin_lock operations to other locks have completed
 	 * on this CPU before we test whether "lock" is locked.
 	 */
 	smp_mb();
+	owner = READ_ONCE(lock->owner) << 16;
 
 	asm volatile(
 "	sevl\n"
 "1:	wfe\n"
 "2:	ldaxr	%w0, %2\n"
+	/* Is the lock free? */
 "	eor	%w1, %w0, %w0, ror #16\n"
-"	cbnz	%w1, 1b\n"
-	/* Serialise against any concurrent lockers */
+"	cbz	%w1, 3f\n"
+	/* Lock taken -- has there been a subsequent unlock->lock transition? */
+"	eor	%w1, %w3, %w0, lsl #16\n"
+"	cbz	%w1, 1b\n"
+	/*
+	 * The owner has been updated, so there was an unlock->lock
+	 * transition that we missed. That means we can rely on the
+	 * store-release of the unlock operation paired with the
+	 * load-acquire of the lock operation to publish any of our
+	 * previous stores to the new lock owner and therefore don't
+	 * need to bother with the writeback below.
+	 */
+"	b	4f\n"
+"3:\n"
+	/*
+	 * Serialise against any concurrent lockers by writing back the
+	 * unlocked lock value
+	 */
 	ARM64_LSE_ATOMIC_INSN(
 	/* LL/SC */
 "	stxr	%w1, %w0, %2\n"
@@ -53,9 +72,11 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
 "	mov	%w1, %w0\n"
 "	cas	%w0, %w0, %2\n"
 "	eor	%w1, %w1, %w0\n")
+	/* Somebody else wrote to the lock, GOTO 10 and reload the value */
 "	cbnz	%w1, 2b\n"
+"4:"
 	: "=&r" (lockval), "=&r" (tmp), "+Q" (*lock)
-	:
+	: "r" (owner)
 	: "memory");
 }
 
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [PATCH v2 3/3] arm64: spinlock: use lock->owner to optimise spin_unlock_wait
  2016-06-08 16:25 ` [PATCH v2 3/3] arm64: spinlock: use lock->owner to optimise spin_unlock_wait Will Deacon
@ 2016-06-10 12:25   ` Peter Zijlstra
  2016-06-10 12:46     ` Will Deacon
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2016-06-10 12:25 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 08, 2016 at 05:25:39PM +0100, Will Deacon wrote:
> Rather than wait until we observe the lock being free, we can also
> return from spin_unlock_wait if we observe that the lock is now held
> by somebody else, which implies that it was unlocked but we just missed
> seeing it in that state.
> 
> Furthermore, in such a scenario there is no longer a need to write back
> the value that we loaded, since we know that there has been a lock
> hand-off, which is sufficient to publish any stores prior to the
> unlock_wait.

You might want a few words on _why_ here. It took me a little while to
figure that out.

Also; human readable arguments to support the thing below go a long way
into validating the test is indeed correct. Because as you've shown,
even the validators cannot be trusted ;-)

> The litmus test is something like:
> 
> AArch64
> {
> 0:X1=x; 0:X3=y;
> 1:X1=y;
> 2:X1=y; 2:X3=x;
> }
>  P0          | P1           | P2           ;
>  MOV W0,#1   | MOV W0,#1    | LDAR W0,[X1] ;
>  STR W0,[X1] | STLR W0,[X1] | LDR W2,[X3]  ;
>  DMB SY      |              |              ;
>  LDR W2,[X3] |              |              ;
> exists
> (0:X2=0 /\ 2:X0=1 /\ 2:X2=0)
> 
> where P0 is doing spin_unlock_wait, P1 is doing spin_unlock and P2 is
> doing spin_lock.

I still have a hard time deciphering these things..

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 3/3] arm64: spinlock: use lock->owner to optimise spin_unlock_wait
  2016-06-10 12:25   ` Peter Zijlstra
@ 2016-06-10 12:46     ` Will Deacon
  2016-06-10 13:13       ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Will Deacon @ 2016-06-10 12:46 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 10, 2016 at 02:25:20PM +0200, Peter Zijlstra wrote:
> On Wed, Jun 08, 2016 at 05:25:39PM +0100, Will Deacon wrote:
> > Rather than wait until we observe the lock being free, we can also
> > return from spin_unlock_wait if we observe that the lock is now held
> > by somebody else, which implies that it was unlocked but we just missed
> > seeing it in that state.
> > 
> > Furthermore, in such a scenario there is no longer a need to write back
> > the value that we loaded, since we know that there has been a lock
> > hand-off, which is sufficient to publish any stores prior to the
> > unlock_wait.
> 
> You might want a few words on _why_ here. It took me a little while to
> figure that out.

How about "... because the ARM architecture ensures that a Store-Release
is multi-copy-atomic when observed by a Load-Acquire instruction"?

> Also; human readable arguments to support the thing below go a long way
> into validating the test is indeed correct. Because as you've shown,
> even the validators cannot be trusted ;-)

Well, I didn't actually provide the output of a model here. I'm just
capturing the rationale in a non-ambiguous form.

> > The litmus test is something like:
> > 
> > AArch64
> > {
> > 0:X1=x; 0:X3=y;
> > 1:X1=y;
> > 2:X1=y; 2:X3=x;
> > }
> >  P0          | P1           | P2           ;
> >  MOV W0,#1   | MOV W0,#1    | LDAR W0,[X1] ;
> >  STR W0,[X1] | STLR W0,[X1] | LDR W2,[X3]  ;
> >  DMB SY      |              |              ;
> >  LDR W2,[X3] |              |              ;
> > exists
> > (0:X2=0 /\ 2:X0=1 /\ 2:X2=0)
> > 
> > where P0 is doing spin_unlock_wait, P1 is doing spin_unlock and P2 is
> > doing spin_lock.
> 
> I still have a hard time deciphering these things..

I'll nail you down at LPC and share the kool-aid :)

Will

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 3/3] arm64: spinlock: use lock->owner to optimise spin_unlock_wait
  2016-06-10 12:46     ` Will Deacon
@ 2016-06-10 13:13       ` Peter Zijlstra
  0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2016-06-10 13:13 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Jun 10, 2016 at 01:46:23PM +0100, Will Deacon wrote:
> On Fri, Jun 10, 2016 at 02:25:20PM +0200, Peter Zijlstra wrote:
> > On Wed, Jun 08, 2016 at 05:25:39PM +0100, Will Deacon wrote:
> > > Rather than wait until we observe the lock being free, we can also
> > > return from spin_unlock_wait if we observe that the lock is now held
> > > by somebody else, which implies that it was unlocked but we just missed
> > > seeing it in that state.
> > > 
> > > Furthermore, in such a scenario there is no longer a need to write back
> > > the value that we loaded, since we know that there has been a lock
> > > hand-off, which is sufficient to publish any stores prior to the
> > > unlock_wait.
> > 
> > You might want a few words on _why_ here. It took me a little while to
> > figure that out.
> 
> How about "... because the ARM architecture ensures that a Store-Release
> is multi-copy-atomic when observed by a Load-Acquire instruction"?

Yep, that works.

> > Also; human readable arguments to support the thing below go a long way
> > into validating the test is indeed correct. Because as you've shown,
> > even the validators cannot be trusted ;-)
> 
> Well, I didn't actually provide the output of a model here. I'm just
> capturing the rationale in a non-ambiguous form.

the litmus tests captures the problem statement, not the rationale for
the outcome.

> > > The litmus test is something like:
> > > 
> > > AArch64
> > > {
> > > 0:X1=x; 0:X3=y;
> > > 1:X1=y;
> > > 2:X1=y; 2:X3=x;
> > > }
> > >  P0          | P1           | P2           ;
> > >  MOV W0,#1   | MOV W0,#1    | LDAR W0,[X1] ;
> > >  STR W0,[X1] | STLR W0,[X1] | LDR W2,[X3]  ;
> > >  DMB SY      |              |              ;
> > >  LDR W2,[X3] |              |              ;
> > > exists
> > > (0:X2=0 /\ 2:X0=1 /\ 2:X2=0)
> > > 
> > > where P0 is doing spin_unlock_wait, P1 is doing spin_unlock and P2 is
> > > doing spin_lock.
> > 
> > I still have a hard time deciphering these things..
> 
> I'll nail you down at LPC and share the kool-aid :)

hehe; so I can more or less parse them, its just that it doesn't come
natural to me, and I keep forgetting ARM asm which doesn't help.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH v2 1/3] arm64: spinlock: order spin_{is_locked, unlock_wait} against local locks
  2016-06-08 16:25 [PATCH v2 1/3] arm64: spinlock: order spin_{is_locked, unlock_wait} against local locks Will Deacon
  2016-06-08 16:25 ` [PATCH v2 2/3] arm64: spinlock: fix spin_unlock_wait for LSE atomics Will Deacon
  2016-06-08 16:25 ` [PATCH v2 3/3] arm64: spinlock: use lock->owner to optimise spin_unlock_wait Will Deacon
@ 2016-06-10 13:36 ` Mark Rutland
  2 siblings, 0 replies; 7+ messages in thread
From: Mark Rutland @ 2016-06-10 13:36 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jun 08, 2016 at 05:25:37PM +0100, Will Deacon wrote:
> spin_is_locked has grown two very different use-cases:
> 
> (1) [The sane case] API functions may require a certain lock to be held
>     by the caller and can therefore use spin_is_locked as part of an
>     assert statement in order to verify that the lock is indeed held.
>     For example, usage of assert_spin_locked.
> 
> (2) [The insane case] There are two locks, where a CPU takes one of the
>     locks and then checks whether or not the other one is held before
>     accessing some shared state. For example, the "optimized locking" in
>     ipc/sem.c.
> 
> In the latter case, the sequence looks like:
> 
>   spin_lock(&sem->lock);
>   if (!spin_is_locked(&sma->sem_perm.lock))
>     /* Access shared state */
> 
> and requires that the spin_is_locked check is ordered after taking the
> sem->lock. Unfortunately, since our spinlocks are implemented using a
> LDAXR/STXR sequence, the read of &sma->sem_perm.lock can be speculated
> before the STXR and consequently return a stale value.
> 
> Whilst this hasn't been seen to cause issues in practice, PowerPC fixed
> the same issue in 51d7d5205d33 ("powerpc: Add smp_mb() to
> arch_spin_is_locked()") and, although we did something similar for
> spin_unlock_wait in d86b8da04dfa ("arm64: spinlock: serialise
> spin_unlock_wait against concurrent lockers") that doesn't actually take
> care of ordering against local acquisition of a different lock.
> 
> This patch adds an smp_mb() to the start of our arch_spin_is_locked and
> arch_spin_unlock_wait routines to ensure that the lock value is always
> loaded after any other locks have been taken by the current CPU.
> 
> Reported-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Will Deacon <will.deacon@arm.com>

I've taken a look at the series, and the asm looks sane to me. From
discussions at a white-board, the meat of the changes seems right.

So FWIW, for the series:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Thanks,
Mark.

> ---
>  arch/arm64/include/asm/spinlock.h | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/spinlock.h b/arch/arm64/include/asm/spinlock.h
> index fc9682bfe002..aac64d55cb22 100644
> --- a/arch/arm64/include/asm/spinlock.h
> +++ b/arch/arm64/include/asm/spinlock.h
> @@ -31,6 +31,12 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t *lock)
>  	unsigned int tmp;
>  	arch_spinlock_t lockval;
>  
> +	/*
> +	 * Ensure prior spin_lock operations to other locks have completed
> +	 * on this CPU before we test whether "lock" is locked.
> +	 */
> +	smp_mb();
> +
>  	asm volatile(
>  "	sevl\n"
>  "1:	wfe\n"
> @@ -148,6 +154,7 @@ static inline int arch_spin_value_unlocked(arch_spinlock_t lock)
>  
>  static inline int arch_spin_is_locked(arch_spinlock_t *lock)
>  {
> +	smp_mb(); /* See arch_spin_unlock_wait */
>  	return !arch_spin_value_unlocked(READ_ONCE(*lock));
>  }
>  
> -- 
> 2.1.4
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-06-10 13:36 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-08 16:25 [PATCH v2 1/3] arm64: spinlock: order spin_{is_locked, unlock_wait} against local locks Will Deacon
2016-06-08 16:25 ` [PATCH v2 2/3] arm64: spinlock: fix spin_unlock_wait for LSE atomics Will Deacon
2016-06-08 16:25 ` [PATCH v2 3/3] arm64: spinlock: use lock->owner to optimise spin_unlock_wait Will Deacon
2016-06-10 12:25   ` Peter Zijlstra
2016-06-10 12:46     ` Will Deacon
2016-06-10 13:13       ` Peter Zijlstra
2016-06-10 13:36 ` [PATCH v2 1/3] arm64: spinlock: order spin_{is_locked, unlock_wait} against local locks Mark Rutland

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.