All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Documentation/atomic_t: Document forward progress expectations
@ 2021-07-29 14:40 Peter Zijlstra
  2021-07-29 16:24 ` hev
  2021-08-05  9:40 ` [tip: locking/core] " tip-bot2 for Peter Zijlstra
  0 siblings, 2 replies; 4+ messages in thread
From: Peter Zijlstra @ 2021-07-29 14:40 UTC (permalink / raw)
  To: will, peterz, boqun.feng
  Cc: linux-kernel, stern, parri.andrea, npiggin, dhowells, j.alglave,
	luc.maranget, paulmck, akiyks, dlustig, joel, chenhuacai, guoren,
	geert, chenhuacai, mingo, arnd, wangrui, lixuefeng, jiaxun.yang


Add a few words on forward progress; there's been quite a bit of
confusion on the subject.

Specifically, more complex locking primitives (ticket/qspinlock) require
forward progress from their consituent operations in order to provide
better/more guarantees than TaS locks.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
---
--- a/Documentation/atomic_t.txt
+++ b/Documentation/atomic_t.txt
@@ -312,3 +312,56 @@ Both provide the same functionality, but
 
 NB. try_cmpxchg() also generates better code on some platforms (notably x86)
 where the function more closely matches the hardware instruction.
+
+
+FORWARD PROGRESS
+----------------
+
+In general strong forward progress is expected of all unconditional atomic
+operations -- those in the Arithmetic and Bitwise classes and xchg(). However
+a fair amount of code also requires forward progress from the conditional
+atomic operations.
+
+Specifically 'simple' cmpxchg() loops are expected to not starve one another
+indefinitely. However, this is not evident on LL/SC architectures, because
+while an LL/SC architecure 'can/should/must' provide forward progress
+guarantees between competing LL/SC sections, such a guarantee does not
+transfer to cmpxchg() implemented using LL/SC. Consider:
+
+  old = atomic_read(&v);
+  do {
+    new = func(old);
+  } while (!atomic_try_cmpxchg(&v, &old, new));
+
+which on LL/SC becomes something like:
+
+  old = atomic_read(&v);
+  do {
+    new = func(old);
+  } while (!({
+    volatile asm ("1: LL  %[oldval], %[v]\n"
+                  "   CMP %[oldval], %[old]\n"
+                  "   BNE 2f\n"
+                  "   SC  %[new], %[v]\n"
+                  "   BNE 1b\n"
+                  "2:\n"
+                  : [oldval] "=&r" (oldval), [v] "m" (v)
+		  : [old] "r" (old), [new] "r" (new)
+                  : "memory");
+    success = (oldval == old);
+    if (!success)
+      old = oldval;
+    success; }));
+
+However, even the forward branch from the failed compare can cause the LL/SC
+to fail on some architectures, let alone whatever the compiler makes of the C
+loop body. As a result there is no guarantee what so ever the cacheline
+containing @v will stay on the local CPU and progress is made.
+
+Even native CAS architectures can fail to provide forward progress for their
+primitive (See Sparc64 for an example).
+
+Such implementations are strongly encouraged to add exponential backoff loops
+to a failed CAS in order to ensure some progress. Affected architectures are
+also strongly encouraged to inspect/audit the atomic fallbacks, refcount_t and
+their locking primitives.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Documentation/atomic_t: Document forward progress expectations
  2021-07-29 14:40 [PATCH] Documentation/atomic_t: Document forward progress expectations Peter Zijlstra
@ 2021-07-29 16:24 ` hev
  2021-07-29 20:03   ` Peter Zijlstra
  2021-08-05  9:40 ` [tip: locking/core] " tip-bot2 for Peter Zijlstra
  1 sibling, 1 reply; 4+ messages in thread
From: hev @ 2021-07-29 16:24 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Boqun Feng, linux-kernel, stern, parri.andrea,
	npiggin, dhowells, j.alglave, luc.maranget, paulmck, akiyks,
	dlustig, joel, huacai chen, Guo Ren, geert, Huacai Chen,
	Ingo Molnar, Arnd Bergmann, wangrui, lixuefeng, Jiaxun Yang

Hi, Peter,

On Thu, Jul 29, 2021 at 10:40 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
>
> Add a few words on forward progress; there's been quite a bit of
> confusion on the subject.
>
> Specifically, more complex locking primitives (ticket/qspinlock) require
> forward progress from their consituent operations in order to provide
> better/more guarantees than TaS locks.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Acked-by: Will Deacon <will@kernel.org>
> Acked-by: Boqun Feng <boqun.feng@gmail.com>
> ---
> --- a/Documentation/atomic_t.txt
> +++ b/Documentation/atomic_t.txt
> @@ -312,3 +312,56 @@ Both provide the same functionality, but
>
>  NB. try_cmpxchg() also generates better code on some platforms (notably x86)
>  where the function more closely matches the hardware instruction.
> +
> +
> +FORWARD PROGRESS
> +----------------
> +
> +In general strong forward progress is expected of all unconditional atomic
> +operations -- those in the Arithmetic and Bitwise classes and xchg(). However
> +a fair amount of code also requires forward progress from the conditional
> +atomic operations.
> +
> +Specifically 'simple' cmpxchg() loops are expected to not starve one another
> +indefinitely. However, this is not evident on LL/SC architectures, because
> +while an LL/SC architecure 'can/should/must' provide forward progress
> +guarantees between competing LL/SC sections, such a guarantee does not
> +transfer to cmpxchg() implemented using LL/SC. Consider:

Thanks for your explanation.

> +
> +  old = atomic_read(&v);
> +  do {
> +    new = func(old);
> +  } while (!atomic_try_cmpxchg(&v, &old, new));

We may need new APIs to help LL/SC to implement atomic operations, but
this is obviously incompatible with native CAS. and many and many
common functions are CAS friendly. Let's more functions that implement
atomic semantics can be overridden by architecture may be a way. ;-)

In the above example, the correct implementation on LL/SC may be like:

do {
    old = LL(&v);
    new = func(old, &skip);
    if (skip) {
        break;
    }
} while (!SC(&v, new);

However, the success of SC may be affected by the inconstant
complexity of func. :-(

Regards,
Rui

> +
> +which on LL/SC becomes something like:
> +
> +  old = atomic_read(&v);
> +  do {
> +    new = func(old);
> +  } while (!({
> +    volatile asm ("1: LL  %[oldval], %[v]\n"
> +                  "   CMP %[oldval], %[old]\n"
> +                  "   BNE 2f\n"
> +                  "   SC  %[new], %[v]\n"
> +                  "   BNE 1b\n"
> +                  "2:\n"
> +                  : [oldval] "=&r" (oldval), [v] "m" (v)
> +                 : [old] "r" (old), [new] "r" (new)
> +                  : "memory");
> +    success = (oldval == old);
> +    if (!success)
> +      old = oldval;
> +    success; }));
> +
> +However, even the forward branch from the failed compare can cause the LL/SC
> +to fail on some architectures, let alone whatever the compiler makes of the C
> +loop body. As a result there is no guarantee what so ever the cacheline
> +containing @v will stay on the local CPU and progress is made.
> +
> +Even native CAS architectures can fail to provide forward progress for their
> +primitive (See Sparc64 for an example).
> +
> +Such implementations are strongly encouraged to add exponential backoff loops
> +to a failed CAS in order to ensure some progress. Affected architectures are
> +also strongly encouraged to inspect/audit the atomic fallbacks, refcount_t and
> +their locking primitives.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] Documentation/atomic_t: Document forward progress expectations
  2021-07-29 16:24 ` hev
@ 2021-07-29 20:03   ` Peter Zijlstra
  0 siblings, 0 replies; 4+ messages in thread
From: Peter Zijlstra @ 2021-07-29 20:03 UTC (permalink / raw)
  To: hev
  Cc: Will Deacon, Boqun Feng, linux-kernel, stern, parri.andrea,
	npiggin, dhowells, j.alglave, luc.maranget, paulmck, akiyks,
	dlustig, joel, huacai chen, Guo Ren, geert, Huacai Chen,
	Ingo Molnar, Arnd Bergmann, wangrui, lixuefeng, Jiaxun Yang

On Fri, Jul 30, 2021 at 12:24:14AM +0800, hev wrote:
> We may need new APIs to help LL/SC to implement atomic operations, but
> this is obviously incompatible with native CAS. and many and many
> common functions are CAS friendly. Let's more functions that implement
> atomic semantics can be overridden by architecture may be a way. ;-)
> 
> In the above example, the correct implementation on LL/SC may be like:
> 
> do {
>     old = LL(&v);
>     new = func(old, &skip);
>     if (skip) {
>         break;
>     }
> } while (!SC(&v, new);
> 
> However, the success of SC may be affected by the inconstant
> complexity of func. :-(

Right, so you can't really do that because the architecture constraints
on what is allowed between LL and SC vary. Also, you couldn't compile
that code on a CAS architecture because you simply cannot implement the
LL/SC semantics using CAS.

One thing that can be done is having the compiler transform a CAS loop
into a LL/SC loop, and clang actually tries that, but GCC is absolutely
failing there:

  https://godbolt.org/z/1MK6ceq46

(note; clang only does this for arm64, and the code it does generate is
pretty horrific)

And this is another thing where C11 is utter crap; because as far as
it's concerned this is equivalent, while obviously it is not, per the
parent argument.

Also, ideally there would be a variant where you'd mandate the
forward progress or a compiler error when not possible.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [tip: locking/core] Documentation/atomic_t: Document forward progress expectations
  2021-07-29 14:40 [PATCH] Documentation/atomic_t: Document forward progress expectations Peter Zijlstra
  2021-07-29 16:24 ` hev
@ 2021-08-05  9:40 ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 4+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-08-05  9:40 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Peter Zijlstra (Intel), Will Deacon, Boqun Feng, x86, linux-kernel

The following commit has been merged into the locking/core branch of tip:

Commit-ID:     55bccf1f93e4bf1b3209cc8648ab53f10f4601a5
Gitweb:        https://git.kernel.org/tip/55bccf1f93e4bf1b3209cc8648ab53f10f4601a5
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Thu, 29 Jul 2021 16:17:20 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 04 Aug 2021 15:16:47 +02:00

Documentation/atomic_t: Document forward progress expectations

Add a few words on forward progress; there's been quite a bit of
confusion on the subject.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lkml.kernel.org/r/YQK9ziyogxTH0m9H@hirez.programming.kicks-ass.net
---
 Documentation/atomic_t.txt | 53 +++++++++++++++++++++++++++++++++++++-
 1 file changed, 53 insertions(+)

diff --git a/Documentation/atomic_t.txt b/Documentation/atomic_t.txt
index a9c1e2b..0f1ffa0 100644
--- a/Documentation/atomic_t.txt
+++ b/Documentation/atomic_t.txt
@@ -312,3 +312,56 @@ Usage:
 
 NB. try_cmpxchg() also generates better code on some platforms (notably x86)
 where the function more closely matches the hardware instruction.
+
+
+FORWARD PROGRESS
+----------------
+
+In general strong forward progress is expected of all unconditional atomic
+operations -- those in the Arithmetic and Bitwise classes and xchg(). However
+a fair amount of code also requires forward progress from the conditional
+atomic operations.
+
+Specifically 'simple' cmpxchg() loops are expected to not starve one another
+indefinitely. However, this is not evident on LL/SC architectures, because
+while an LL/SC architecure 'can/should/must' provide forward progress
+guarantees between competing LL/SC sections, such a guarantee does not
+transfer to cmpxchg() implemented using LL/SC. Consider:
+
+  old = atomic_read(&v);
+  do {
+    new = func(old);
+  } while (!atomic_try_cmpxchg(&v, &old, new));
+
+which on LL/SC becomes something like:
+
+  old = atomic_read(&v);
+  do {
+    new = func(old);
+  } while (!({
+    volatile asm ("1: LL  %[oldval], %[v]\n"
+                  "   CMP %[oldval], %[old]\n"
+                  "   BNE 2f\n"
+                  "   SC  %[new], %[v]\n"
+                  "   BNE 1b\n"
+                  "2:\n"
+                  : [oldval] "=&r" (oldval), [v] "m" (v)
+		  : [old] "r" (old), [new] "r" (new)
+                  : "memory");
+    success = (oldval == old);
+    if (!success)
+      old = oldval;
+    success; }));
+
+However, even the forward branch from the failed compare can cause the LL/SC
+to fail on some architectures, let alone whatever the compiler makes of the C
+loop body. As a result there is no guarantee what so ever the cacheline
+containing @v will stay on the local CPU and progress is made.
+
+Even native CAS architectures can fail to provide forward progress for their
+primitive (See Sparc64 for an example).
+
+Such implementations are strongly encouraged to add exponential backoff loops
+to a failed CAS in order to ensure some progress. Affected architectures are
+also strongly encouraged to inspect/audit the atomic fallbacks, refcount_t and
+their locking primitives.

^ permalink raw reply related	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-08-05  9:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-29 14:40 [PATCH] Documentation/atomic_t: Document forward progress expectations Peter Zijlstra
2021-07-29 16:24 ` hev
2021-07-29 20:03   ` Peter Zijlstra
2021-08-05  9:40 ` [tip: locking/core] " tip-bot2 for Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.