All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH]: documentation,atomic: Add a new atomic_t document
@ 2017-06-09  9:24 Peter Zijlstra
  2017-06-09 11:05 ` [RFC][PATCH] atomic: Fix atomic_set_release() for 'funny' architectures Peter Zijlstra
                   ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Peter Zijlstra @ 2017-06-09  9:24 UTC (permalink / raw)
  To: Will Deacon, Paul McKenney, Boqun Feng
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner


Since we've vastly expanded the atomic_t interface in recent years the
existing documentation is woefully out of date and people seem to get
confused a bit.

Start a new document to hopefully better explain the current state of
affairs.

The old atomic_ops.txt also covers bitmaps and a few more details so
this is not a full replacement and we'll therefore keep that document
around until such a time that we've managed to write more text to cover
its entire.

Also please, ReST people, go away.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---

--- /dev/null	2017-05-05 13:16:22.636212333 +0200
+++ b/Documentation/atomic_t.txt	2017-06-09 11:05:31.501599153 +0200
@@ -0,0 +1,147 @@
+
+On atomic types (atomic_t atomic64_t and atomic_long_t).
+
+The atomic type provides an interface to the architecture's means of atomic
+RmW operations between CPUs (it specifically does not order/work/etc. on
+IO).
+
+The 'full' API consists of:
+
+Non RmW ops:
+
+  atomic_read(), atomic_set()
+  atomic_read_acquire(), atomic_set_release()
+
+
+RmW atomic operations:
+
+Arithmetic:
+
+  atomic_{add,sub,inc,dec}()
+  atomic_{add,sub,inc,dec}_return{,_relaxed,_acquire,_release}()
+  atomic_fetch_{add,sub,inc,dec}{,_relaxed,_acquire,_release)()
+
+
+Bitwise:
+
+  atomic_{and,or,xor,notand}()
+  atomic_fetch_{and,or,xor,notand}{,_relaxed,_acquire,_release}()
+
+
+Swap:
+
+  atomic_xchg{,_relaxed,_acquire,_release}()
+  atomic_cmpxchg{,_relaxed,_acquire,_release}()
+  atomic_try_cmpxchg{,_relaxed,_acquire,_release}()
+
+
+Reference count (but please see refcount_t):
+
+  atomic_add_unless(), atomic_inc_not_zero()
+  atomic_sub_and_test(), atomic_dec_and_test()
+
+
+Misc:
+
+  atomic_inc_and_test(), atomic_add_negative()
+  atomic_dec_unless_positive(), atomic_inc_unless_negative()
+
+
+Barriers:
+
+  smp_mb__{before,after}_atomic()
+
+
+
+Non RmW ops:
+
+The non-RmW ops are (typically) regular LOADs and STOREs and are canonically
+implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
+smp_store_release() respectively.
+
+The one detail to this is that atomic_set() should be observable to the RmW
+ops. That is:
+
+  CPU0						CPU1
+
+  val = atomic_read(&X)
+  do {
+						atomic_set(&X, 0)
+	new = val + 1;
+  } while (!atomic_try_cmpxchg(&X, &val, new));
+
+Should cause the cmpxchg to *FAIL* (when @val != 0). This is typically true;
+on 'normal' platforms; a regular competing STORE will invalidate a LL/SC.
+
+The obvious case where this is not so is where we need to implement atomic ops
+with a spinlock hashtable; the typical solution is to then implement
+atomic_set() with atomic_xchg().
+
+
+RmW ops:
+
+These come in various forms:
+
+ - plain operations without return value: atomic_{}()
+
+ - operations which return the modified value: atomic_{}_return()
+
+   these are limited to the arithmetic operations because those are
+   reversible. Bitops are irreversible and therefore the modified value
+   is of dubious utility.
+
+ - operations which return the original value: atomic_fetch_{}()
+
+ - swap operations: xchg(), cmpxchg() and try_cmpxchg()
+
+ - misc; the special purpose operations that are commonly used and would,
+   given the interface, normally be implemented using (try_)cmpxchg loops but
+   are time critical and can, (typically) on LL/SC architectures, be more
+   efficiently implemented.
+
+
+All these operations are SMP atomic; that is, the operations (for a single
+atomic variable) can be fully ordered and no intermediate state is lost or
+visible.
+
+
+Ordering:  (go read memory-barriers.txt first)
+
+The rule of thumb:
+
+ - non-RmW operations are unordered;
+
+ - RmW operations that have no return value are unordered;
+
+ - RmW operations that have a return value are Sequentially Consistent;
+
+ - RmW operations that are conditional are unordered on FAILURE, otherwise the
+   above rules apply.
+
+Except of course when an operation has an explicit ordering like:
+
+ {}_relaxed: unordered
+ {}_acquire: the R of the RmW is an ACQUIRE
+ {}_release: the W of the RmW is a  RELEASE
+
+NOTE: our ACQUIRE/RELEASE are RCpc
+
+
+The barriers:
+
+  smp_mb__{before,after}_atomic()
+
+only apply to the RmW ops and can be used to augment/upgrade the ordering
+inherit to the used atomic op. These barriers provide a full smp_mb().
+
+These helper barriers exist because architectures have varying implicit
+ordering on their SMP atomic primitives. For example our TSO architectures
+provide SC atomics and these barriers are no-ops.
+
+So while something like:
+
+	smp_mb__before_atomic();
+	val = atomic_dec_return_relaxed(&X);
+
+is a 'typical' RELEASE pattern (please use atomic_dec_return_release()), the
+barrier is strictly stronger than a RELEASE.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC][PATCH] atomic: Fix atomic_set_release() for 'funny' architectures
  2017-06-09  9:24 [RFC][PATCH]: documentation,atomic: Add a new atomic_t document Peter Zijlstra
@ 2017-06-09 11:05 ` Peter Zijlstra
  2017-06-09 11:13   ` Peter Zijlstra
                     ` (2 more replies)
  2017-06-09 15:44 ` [RFC][PATCH]: documentation,atomic: Add a new atomic_t document Will Deacon
  2017-06-09 18:15 ` [RFC][PATCH]: documentation,atomic: Add a new atomic_t document Randy Dunlap
  2 siblings, 3 replies; 45+ messages in thread
From: Peter Zijlstra @ 2017-06-09 11:05 UTC (permalink / raw)
  To: Will Deacon, Paul McKenney, Boqun Feng
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, vgupta, rkuo,
	james.hogan, jejb, davem, cmetcalf

On Fri, Jun 09, 2017 at 11:24:50AM +0200, Peter Zijlstra wrote:
> +Non RmW ops:
> +
> +The non-RmW ops are (typically) regular LOADs and STOREs and are canonically
> +implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
> +smp_store_release() respectively.
> +
> +The one detail to this is that atomic_set() should be observable to the RmW
> +ops. That is:
> +
> +  CPU0						CPU1
> +
> +  val = atomic_read(&X)
> +  do {
> +						atomic_set(&X, 0)
> +	new = val + 1;
> +  } while (!atomic_try_cmpxchg(&X, &val, new));
> +
> +Should cause the cmpxchg to *FAIL* (when @val != 0). This is typically true;
> +on 'normal' platforms; a regular competing STORE will invalidate a LL/SC.
> +
> +The obvious case where this is not so is where we need to implement atomic ops
> +with a spinlock hashtable; the typical solution is to then implement
> +atomic_set() with atomic_xchg().

---
Subject: atomic: Fix atomic_set_release() for 'funny' architectures

Those architectures that have a special atomic_set implementation also
need a special atomic_set_release(), because for the very same reason
WRITE_ONCE() is broken for them, smp_store_release() is too.

The vast majority is architectures that have spinlock hash based atomic
implementation except hexagon which seems to have a hardware 'feature'.

The spinlock based atomics should be SC, that is, none of them appear to
place extra barriers in atomic_cmpxchg() or any of the other SC atomic
primitives and therefore seem to rely on their spinlock implementation
being SC (I did not fully validate all that).

Therefore, the normal atomic_set() is SC and can be used at
atomic_set_release().

Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: James Hogan <james.hogan@imgtec.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Chris Metcalf <cmetcalf@mellanox.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 arch/arc/include/asm/atomic.h         | 2 ++
 arch/hexagon/include/asm/atomic.h     | 2 ++
 arch/metag/include/asm/atomic_lock1.h | 2 ++
 arch/parisc/include/asm/atomic.h      | 2 ++
 arch/sparc/include/asm/atomic_32.h    | 2 ++
 arch/tile/include/asm/atomic_32.h     | 2 ++
 include/asm-generic/atomic64.h        | 2 ++
 7 files changed, 14 insertions(+)

diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 54b54da6384c..11859287c52a 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -123,6 +123,8 @@ static inline void atomic_set(atomic_t *v, int i)
 	atomic_ops_unlock(flags);
 }
 
+#define atomic_set_release(v, i)	atomic_set((v), (i))
+
 #endif
 
 /*
diff --git a/arch/hexagon/include/asm/atomic.h b/arch/hexagon/include/asm/atomic.h
index a62ba368b27d..fb3dfb2a667e 100644
--- a/arch/hexagon/include/asm/atomic.h
+++ b/arch/hexagon/include/asm/atomic.h
@@ -42,6 +42,8 @@ static inline void atomic_set(atomic_t *v, int new)
 	);
 }
 
+#define atomic_set_release(v, i)	atomic_set((v), (i))
+
 /**
  * atomic_read - reads a word, atomically
  * @v: pointer to atomic value
diff --git a/arch/metag/include/asm/atomic_lock1.h b/arch/metag/include/asm/atomic_lock1.h
index 6c1380a8a0d4..eee779f26cc4 100644
--- a/arch/metag/include/asm/atomic_lock1.h
+++ b/arch/metag/include/asm/atomic_lock1.h
@@ -37,6 +37,8 @@ static inline int atomic_set(atomic_t *v, int i)
 	return i;
 }
 
+#define atomic_set_release(v, i) atomic_set((v), (i))
+
 #define ATOMIC_OP(op, c_op)						\
 static inline void atomic_##op(int i, atomic_t *v)			\
 {									\
diff --git a/arch/parisc/include/asm/atomic.h b/arch/parisc/include/asm/atomic.h
index 5394b9c5f914..17b98a87e5e2 100644
--- a/arch/parisc/include/asm/atomic.h
+++ b/arch/parisc/include/asm/atomic.h
@@ -65,6 +65,8 @@ static __inline__ void atomic_set(atomic_t *v, int i)
 	_atomic_spin_unlock_irqrestore(v, flags);
 }
 
+#define atomic_set_release(v, i)	atomic_set((v), (i))
+
 static __inline__ int atomic_read(const atomic_t *v)
 {
 	return READ_ONCE((v)->counter);
diff --git a/arch/sparc/include/asm/atomic_32.h b/arch/sparc/include/asm/atomic_32.h
index ee3f11c43cda..7643e979e333 100644
--- a/arch/sparc/include/asm/atomic_32.h
+++ b/arch/sparc/include/asm/atomic_32.h
@@ -29,6 +29,8 @@ int atomic_xchg(atomic_t *, int);
 int __atomic_add_unless(atomic_t *, int, int);
 void atomic_set(atomic_t *, int);
 
+#define atomic_set_release(v, i)	atomic_set((v), (i))
+
 #define atomic_read(v)          ACCESS_ONCE((v)->counter)
 
 #define atomic_add(i, v)	((void)atomic_add_return( (int)(i), (v)))
diff --git a/arch/tile/include/asm/atomic_32.h b/arch/tile/include/asm/atomic_32.h
index a93774255136..53a423e7cb92 100644
--- a/arch/tile/include/asm/atomic_32.h
+++ b/arch/tile/include/asm/atomic_32.h
@@ -101,6 +101,8 @@ static inline void atomic_set(atomic_t *v, int n)
 	_atomic_xchg(&v->counter, n);
 }
 
+#define atomic_set_release(v, i)	atomic_set((v), (i))
+
 /* A 64bit atomic type */
 
 typedef struct {
diff --git a/include/asm-generic/atomic64.h b/include/asm-generic/atomic64.h
index dad68bf46c77..8d28eb010d0d 100644
--- a/include/asm-generic/atomic64.h
+++ b/include/asm-generic/atomic64.h
@@ -21,6 +21,8 @@ typedef struct {
 extern long long atomic64_read(const atomic64_t *v);
 extern void	 atomic64_set(atomic64_t *v, long long i);
 
+#define atomic64_set_release(v, i)	atomic64_set((v), (i))
+
 #define ATOMIC64_OP(op)							\
 extern void	 atomic64_##op(long long a, atomic64_t *v);
 

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH] atomic: Fix atomic_set_release() for 'funny' architectures
  2017-06-09 11:05 ` [RFC][PATCH] atomic: Fix atomic_set_release() for 'funny' architectures Peter Zijlstra
@ 2017-06-09 11:13   ` Peter Zijlstra
  2017-06-09 17:28       ` Vineet Gupta
  2017-06-09 18:58     ` James Bottomley
  2017-06-09 14:03   ` Chris Metcalf
  2017-08-10 12:10   ` [tip:locking/core] locking/atomic: " tip-bot for Peter Zijlstra
  2 siblings, 2 replies; 45+ messages in thread
From: Peter Zijlstra @ 2017-06-09 11:13 UTC (permalink / raw)
  To: Will Deacon, Paul McKenney, Boqun Feng
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, vgupta, rkuo,
	james.hogan, jejb, davem, cmetcalf

On Fri, Jun 09, 2017 at 01:05:06PM +0200, Peter Zijlstra wrote:

> The spinlock based atomics should be SC, that is, none of them appear to
> place extra barriers in atomic_cmpxchg() or any of the other SC atomic
> primitives and therefore seem to rely on their spinlock implementation
> being SC (I did not fully validate all that).

So I did see that ARC and PARISC have 'superfluous' smp_mb() calls
around their spinlock implementation.

That is, for spinlock semantics you only need one _after_ lock and one
_before_ unlock. But the atomic stuff relies on being SC and thus would
need one before and after both lock and unlock.

Now, afaict PARISC doesn't even have memory barriers (it uses
asm-generic/barrier.h) so that's a bit of a puzzle.

But ARC could probably optimize (if they still care about that hardware)
by pulling out those barriers and putting it in the atomic
implementation.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH] atomic: Fix atomic_set_release() for 'funny' architectures
  2017-06-09 11:05 ` [RFC][PATCH] atomic: Fix atomic_set_release() for 'funny' architectures Peter Zijlstra
  2017-06-09 11:13   ` Peter Zijlstra
@ 2017-06-09 14:03   ` Chris Metcalf
  2017-08-10 12:10   ` [tip:locking/core] locking/atomic: " tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 45+ messages in thread
From: Chris Metcalf @ 2017-06-09 14:03 UTC (permalink / raw)
  To: Peter Zijlstra, Will Deacon, Paul McKenney, Boqun Feng
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, vgupta, rkuo,
	james.hogan, jejb, davem

On 6/9/2017 7:05 AM, Peter Zijlstra wrote:
> Subject: atomic: Fix atomic_set_release() for 'funny' architectures
>
> Those architectures that have a special atomic_set implementation also
> need a special atomic_set_release(), because for the very same reason
> WRITE_ONCE() is broken for them, smp_store_release() is too.
>
> The vast majority is architectures that have spinlock hash based atomic
> implementation except hexagon which seems to have a hardware 'feature'.
>
> The spinlock based atomics should be SC, that is, none of them appear to
> place extra barriers in atomic_cmpxchg() or any of the other SC atomic
> primitives and therefore seem to rely on their spinlock implementation
> being SC (I did not fully validate all that).
>
> Therefore, the normal atomic_set() is SC and can be used at
> atomic_set_release().

Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]

-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH]: documentation,atomic: Add a new atomic_t document
  2017-06-09  9:24 [RFC][PATCH]: documentation,atomic: Add a new atomic_t document Peter Zijlstra
  2017-06-09 11:05 ` [RFC][PATCH] atomic: Fix atomic_set_release() for 'funny' architectures Peter Zijlstra
@ 2017-06-09 15:44 ` Will Deacon
  2017-06-09 19:36   ` Peter Zijlstra
  2017-06-09 18:15 ` [RFC][PATCH]: documentation,atomic: Add a new atomic_t document Randy Dunlap
  2 siblings, 1 reply; 45+ messages in thread
From: Will Deacon @ 2017-06-09 15:44 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul McKenney, Boqun Feng, linux-kernel, Ingo Molnar, Thomas Gleixner

Hi Peter,

On Fri, Jun 09, 2017 at 11:24:50AM +0200, Peter Zijlstra wrote:
> 
> Since we've vastly expanded the atomic_t interface in recent years the
> existing documentation is woefully out of date and people seem to get
> confused a bit.
> 
> Start a new document to hopefully better explain the current state of
> affairs.
>
> The old atomic_ops.txt also covers bitmaps and a few more details so
> this is not a full replacement and we'll therefore keep that document
> around until such a time that we've managed to write more text to cover
> its entire.

Yeah, we should aim at killing (replacing) most of atomic_ops.txt in the
medium term, but this is a good start.

> Also please, ReST people, go away.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> 
> --- /dev/null	2017-05-05 13:16:22.636212333 +0200
> +++ b/Documentation/atomic_t.txt	2017-06-09 11:05:31.501599153 +0200
> @@ -0,0 +1,147 @@
> +
> +On atomic types (atomic_t atomic64_t and atomic_long_t).
> +
> +The atomic type provides an interface to the architecture's means of atomic
> +RmW operations between CPUs (it specifically does not order/work/etc. on
> +IO).

We should be stronger here: atomics to IO could lead to kernel panics (i.e.
raise a fatal abort), whereas this sounds like they just lose some ordering
or atomicity guarantees.

> +The 'full' API consists of:

Need to mention the 64-bit and the long variants?

> +Non RmW ops:
> +
> +  atomic_read(), atomic_set()
> +  atomic_read_acquire(), atomic_set_release()
> +
> +
> +RmW atomic operations:
> +
> +Arithmetic:
> +
> +  atomic_{add,sub,inc,dec}()
> +  atomic_{add,sub,inc,dec}_return{,_relaxed,_acquire,_release}()
> +  atomic_fetch_{add,sub,inc,dec}{,_relaxed,_acquire,_release)()
> +
> +
> +Bitwise:
> +
> +  atomic_{and,or,xor,notand}()
> +  atomic_fetch_{and,or,xor,notand}{,_relaxed,_acquire,_release}()

s/notand/andnot/

> +
> +Swap:
> +
> +  atomic_xchg{,_relaxed,_acquire,_release}()
> +  atomic_cmpxchg{,_relaxed,_acquire,_release}()
> +  atomic_try_cmpxchg{,_relaxed,_acquire,_release}()
> +
> +
> +Reference count (but please see refcount_t):
> +
> +  atomic_add_unless(), atomic_inc_not_zero()
> +  atomic_sub_and_test(), atomic_dec_and_test()
> +
> +
> +Misc:
> +
> +  atomic_inc_and_test(), atomic_add_negative()
> +  atomic_dec_unless_positive(), atomic_inc_unless_negative()

I *think* you have all of them here.

> +
> +Barriers:
> +
> +  smp_mb__{before,after}_atomic()
> +
> +
> +
> +Non RmW ops:
> +
> +The non-RmW ops are (typically) regular LOADs and STOREs and are canonically
> +implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
> +smp_store_release() respectively.
> +
> +The one detail to this is that atomic_set() should be observable to the RmW
> +ops. That is:
> +
> +  CPU0						CPU1
> +
> +  val = atomic_read(&X)
> +  do {
> +						atomic_set(&X, 0)
> +	new = val + 1;
> +  } while (!atomic_try_cmpxchg(&X, &val, new));
> +
> +Should cause the cmpxchg to *FAIL* (when @val != 0). This is typically true;
> +on 'normal' platforms; a regular competing STORE will invalidate a LL/SC.

I see what you're getting at here, but the example is a bit weird because
CPU1 might hold the store to X in a local store-buffer and not shoot down
the cmpxchg immediately. I think you need something to show how the write
to X has at least partially propagated.

> +The obvious case where this is not so is where we need to implement atomic ops
> +with a spinlock hashtable; the typical solution is to then implement
> +atomic_set() with atomic_xchg().

Looking at sparc32, atomic_set takes the hashed lock, so I can't see what
goes wrong here: atomic_try_cmpxchg will get called with val !=0, but the
comparison will fail because the value in memory will be 0. What am I
missing?

> +
> +
> +RmW ops:
> +
> +These come in various forms:
> +
> + - plain operations without return value: atomic_{}()

Maybe just list the API here, instead of having the separate section
previously?

> + - operations which return the modified value: atomic_{}_return()
> +
> +   these are limited to the arithmetic operations because those are
> +   reversible. Bitops are irreversible and therefore the modified value
> +   is of dubious utility.
> +
> + - operations which return the original value: atomic_fetch_{}()
> +
> + - swap operations: xchg(), cmpxchg() and try_cmpxchg()
> +
> + - misc; the special purpose operations that are commonly used and would,
> +   given the interface, normally be implemented using (try_)cmpxchg loops but
> +   are time critical and can, (typically) on LL/SC architectures, be more
> +   efficiently implemented.
> +
> +
> +All these operations are SMP atomic; that is, the operations (for a single
> +atomic variable) can be fully ordered and no intermediate state is lost or
> +visible.
> +
> +
> +Ordering:  (go read memory-barriers.txt first)
> +
> +The rule of thumb:
> +
> + - non-RmW operations are unordered;
> +
> + - RmW operations that have no return value are unordered;
> +
> + - RmW operations that have a return value are Sequentially Consistent;

I think it's stronger than that, because they also order non-RmW operations,
whereas this makes it sounds like there's just a total order over all RmW
operations.

> + - RmW operations that are conditional are unordered on FAILURE, otherwise the
> +   above rules apply.

We should make it clear that "unordered" here refers to accesses to other
memory locations.

> +
> +Except of course when an operation has an explicit ordering like:
> +
> + {}_relaxed: unordered
> + {}_acquire: the R of the RmW is an ACQUIRE
> + {}_release: the W of the RmW is a  RELEASE
> +
> +NOTE: our ACQUIRE/RELEASE are RCpc

The NOTE belongs in memory-barriers.txt

> +The barriers:
> +
> +  smp_mb__{before,after}_atomic()
> +
> +only apply to the RmW ops and can be used to augment/upgrade the ordering
> +inherit to the used atomic op. These barriers provide a full smp_mb().

inherit?

Will

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH] atomic: Fix atomic_set_release() for 'funny' architectures
  2017-06-09 11:13   ` Peter Zijlstra
@ 2017-06-09 17:28       ` Vineet Gupta
  2017-06-09 18:58     ` James Bottomley
  1 sibling, 0 replies; 45+ messages in thread
From: Vineet Gupta @ 2017-06-09 17:28 UTC (permalink / raw)
  To: Peter Zijlstra, Will Deacon, Paul McKenney, Boqun Feng
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, rkuo, james.hogan,
	jejb, davem, cmetcalf, arcml

On 06/09/2017 04:13 AM, Peter Zijlstra wrote:
> On Fri, Jun 09, 2017 at 01:05:06PM +0200, Peter Zijlstra wrote:
> 
>> The spinlock based atomics should be SC, that is, none of them appear to
>> place extra barriers in atomic_cmpxchg() or any of the other SC atomic
>> primitives and therefore seem to rely on their spinlock implementation
>> being SC (I did not fully validate all that).
> 
> So I did see that ARC and PARISC have 'superfluous' smp_mb() calls
> around their spinlock implementation.
> 
> That is, for spinlock semantics you only need one _after_ lock and one
> _before_ unlock. But the atomic stuff relies on being SC and thus would
> need one before and after both lock and unlock.

Right we discussed this a while back: https://lkml.org/lkml/2015/6/11/276

At the time when I tried removing these extra barriers, hackbench regressed. I'm 
about to get a new quad core 1GHz chip (vs. the FPGA before) and will 
re-experiment. Likely we don't need it otherwise I will add a comment of this 
"feature"

> But ARC could probably optimize (if they still care about that hardware)
> by pulling out those barriers and putting it in the atomic
> implementation.

A bit confused here. Reading the lkml posting for this thread, you posted 2 
patches, and they had to do with atomic_set() for EZChip platform which is really 
special (no ll/sc). The extra smp_mb() is related to ll/sc variants. Just tryign 
to make sure that we are talking 2 different things here :-)

-Vineet

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC][PATCH] atomic: Fix atomic_set_release() for 'funny' architectures
@ 2017-06-09 17:28       ` Vineet Gupta
  0 siblings, 0 replies; 45+ messages in thread
From: Vineet Gupta @ 2017-06-09 17:28 UTC (permalink / raw)
  To: linux-snps-arc

On 06/09/2017 04:13 AM, Peter Zijlstra wrote:
> On Fri, Jun 09, 2017@01:05:06PM +0200, Peter Zijlstra wrote:
> 
>> The spinlock based atomics should be SC, that is, none of them appear to
>> place extra barriers in atomic_cmpxchg() or any of the other SC atomic
>> primitives and therefore seem to rely on their spinlock implementation
>> being SC (I did not fully validate all that).
> 
> So I did see that ARC and PARISC have 'superfluous' smp_mb() calls
> around their spinlock implementation.
> 
> That is, for spinlock semantics you only need one _after_ lock and one
> _before_ unlock. But the atomic stuff relies on being SC and thus would
> need one before and after both lock and unlock.

Right we discussed this a while back: https://lkml.org/lkml/2015/6/11/276

At the time when I tried removing these extra barriers, hackbench regressed. I'm 
about to get a new quad core 1GHz chip (vs. the FPGA before) and will 
re-experiment. Likely we don't need it otherwise I will add a comment of this 
"feature"

> But ARC could probably optimize (if they still care about that hardware)
> by pulling out those barriers and putting it in the atomic
> implementation.

A bit confused here. Reading the lkml posting for this thread, you posted 2 
patches, and they had to do with atomic_set() for EZChip platform which is really 
special (no ll/sc). The extra smp_mb() is related to ll/sc variants. Just tryign 
to make sure that we are talking 2 different things here :-)

-Vineet

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH]: documentation,atomic: Add a new atomic_t document
  2017-06-09  9:24 [RFC][PATCH]: documentation,atomic: Add a new atomic_t document Peter Zijlstra
  2017-06-09 11:05 ` [RFC][PATCH] atomic: Fix atomic_set_release() for 'funny' architectures Peter Zijlstra
  2017-06-09 15:44 ` [RFC][PATCH]: documentation,atomic: Add a new atomic_t document Will Deacon
@ 2017-06-09 18:15 ` Randy Dunlap
  2 siblings, 0 replies; 45+ messages in thread
From: Randy Dunlap @ 2017-06-09 18:15 UTC (permalink / raw)
  To: Peter Zijlstra, Will Deacon, Paul McKenney, Boqun Feng
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner

On 06/09/17 02:24, Peter Zijlstra wrote:
> 
> --- /dev/null	2017-05-05 13:16:22.636212333 +0200
> +++ b/Documentation/atomic_t.txt	2017-06-09 11:05:31.501599153 +0200
> @@ -0,0 +1,147 @@
> +
> +The one detail to this is that atomic_set() should be observable to the RmW
> +ops. That is:
> +
> +  CPU0						CPU1
> +
> +  val = atomic_read(&X)
> +  do {
> +						atomic_set(&X, 0)
> +	new = val + 1;
> +  } while (!atomic_try_cmpxchg(&X, &val, new));
> +
> +Should cause the cmpxchg to *FAIL* (when @val != 0). This is typically true;

   should

> +on 'normal' platforms; a regular competing STORE will invalidate a LL/SC.

too many semi-colons above.

> +
> +The obvious case where this is not so is where we need to implement atomic ops
> +with a spinlock hashtable; the typical solution is to then implement
> +atomic_set() with atomic_xchg().
> +
> +
> +RmW ops:
> +
> +These come in various forms:
> +
> + - plain operations without return value: atomic_{}()
> +
> + - operations which return the modified value: atomic_{}_return()
> +
> +   these are limited to the arithmetic operations because those are
> +   reversible. Bitops are irreversible and therefore the modified value
> +   is of dubious utility.
> +
> + - operations which return the original value: atomic_fetch_{}()
> +
> + - swap operations: xchg(), cmpxchg() and try_cmpxchg()
> +
> + - misc; the special purpose operations that are commonly used and would,
> +   given the interface, normally be implemented using (try_)cmpxchg loops but
> +   are time critical and can, (typically) on LL/SC architectures, be more
> +   efficiently implemented.
> +
> +
> +All these operations are SMP atomic; that is, the operations (for a single
> +atomic variable) can be fully ordered and no intermediate state is lost or
> +visible.
> +
> +
> +Ordering:  (go read memory-barriers.txt first)
> +
> +The rule of thumb:
> +
> + - non-RmW operations are unordered;
> +
> + - RmW operations that have no return value are unordered;
> +
> + - RmW operations that have a return value are Sequentially Consistent;
> +
> + - RmW operations that are conditional are unordered on FAILURE, otherwise the
> +   above rules apply.
> +
> +Except of course when an operation has an explicit ordering like:
> +
> + {}_relaxed: unordered
> + {}_acquire: the R of the RmW is an ACQUIRE
> + {}_release: the W of the RmW is a  RELEASE
> +
> +NOTE: our ACQUIRE/RELEASE are RCpc
> +
> +
> +The barriers:
> +
> +  smp_mb__{before,after}_atomic()
> +
> +only apply to the RmW ops and can be used to augment/upgrade the ordering
> +inherit to the used atomic op. These barriers provide a full smp_mb().

   inherent ?

> +
> +These helper barriers exist because architectures have varying implicit
> +ordering on their SMP atomic primitives. For example our TSO architectures
> +provide SC atomics and these barriers are no-ops.
> +
> +So while something like:
> +
> +	smp_mb__before_atomic();
> +	val = atomic_dec_return_relaxed(&X);
> +
> +is a 'typical' RELEASE pattern (please use atomic_dec_return_release()), the
> +barrier is strictly stronger than a RELEASE.
> 


-- 
~Randy

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH] atomic: Fix atomic_set_release() for 'funny' architectures
  2017-06-09 17:28       ` Vineet Gupta
@ 2017-06-09 18:49         ` Peter Zijlstra
  -1 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2017-06-09 18:49 UTC (permalink / raw)
  To: Vineet Gupta
  Cc: Will Deacon, Paul McKenney, Boqun Feng, linux-kernel,
	Ingo Molnar, Thomas Gleixner, rkuo, james.hogan, jejb, davem,
	cmetcalf, arcml

On Fri, Jun 09, 2017 at 10:28:50AM -0700, Vineet Gupta wrote:
> On 06/09/2017 04:13 AM, Peter Zijlstra wrote:
> > On Fri, Jun 09, 2017 at 01:05:06PM +0200, Peter Zijlstra wrote:
> > 
> > > The spinlock based atomics should be SC, that is, none of them appear to
> > > place extra barriers in atomic_cmpxchg() or any of the other SC atomic
> > > primitives and therefore seem to rely on their spinlock implementation
> > > being SC (I did not fully validate all that).
> > 
> > So I did see that ARC and PARISC have 'superfluous' smp_mb() calls
> > around their spinlock implementation.
> > 
> > That is, for spinlock semantics you only need one _after_ lock and one
> > _before_ unlock. But the atomic stuff relies on being SC and thus would
> > need one before and after both lock and unlock.
> 
> Right we discussed this a while back: https://lkml.org/lkml/2015/6/11/276
> 
> At the time when I tried removing these extra barriers, hackbench regressed.
> I'm about to get a new quad core 1GHz chip (vs. the FPGA before) and will
> re-experiment. Likely we don't need it otherwise I will add a comment of
> this "feature"
> 
> > But ARC could probably optimize (if they still care about that hardware)
> > by pulling out those barriers and putting it in the atomic
> > implementation.
> 
> A bit confused here. Reading the lkml posting for this thread, you posted 2
> patches, and they had to do with atomic_set() for EZChip platform which is
> really special (no ll/sc). The extra smp_mb() is related to ll/sc variants.
> Just tryign to make sure that we are talking 2 different things here :-)

Could be I just got all my variants in a twist... wouldn't be the first
time ;-)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC][PATCH] atomic: Fix atomic_set_release() for 'funny' architectures
@ 2017-06-09 18:49         ` Peter Zijlstra
  0 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2017-06-09 18:49 UTC (permalink / raw)
  To: linux-snps-arc

On Fri, Jun 09, 2017@10:28:50AM -0700, Vineet Gupta wrote:
> On 06/09/2017 04:13 AM, Peter Zijlstra wrote:
> > On Fri, Jun 09, 2017@01:05:06PM +0200, Peter Zijlstra wrote:
> > 
> > > The spinlock based atomics should be SC, that is, none of them appear to
> > > place extra barriers in atomic_cmpxchg() or any of the other SC atomic
> > > primitives and therefore seem to rely on their spinlock implementation
> > > being SC (I did not fully validate all that).
> > 
> > So I did see that ARC and PARISC have 'superfluous' smp_mb() calls
> > around their spinlock implementation.
> > 
> > That is, for spinlock semantics you only need one _after_ lock and one
> > _before_ unlock. But the atomic stuff relies on being SC and thus would
> > need one before and after both lock and unlock.
> 
> Right we discussed this a while back: https://lkml.org/lkml/2015/6/11/276
> 
> At the time when I tried removing these extra barriers, hackbench regressed.
> I'm about to get a new quad core 1GHz chip (vs. the FPGA before) and will
> re-experiment. Likely we don't need it otherwise I will add a comment of
> this "feature"
> 
> > But ARC could probably optimize (if they still care about that hardware)
> > by pulling out those barriers and putting it in the atomic
> > implementation.
> 
> A bit confused here. Reading the lkml posting for this thread, you posted 2
> patches, and they had to do with atomic_set() for EZChip platform which is
> really special (no ll/sc). The extra smp_mb() is related to ll/sc variants.
> Just tryign to make sure that we are talking 2 different things here :-)

Could be I just got all my variants in a twist... wouldn't be the first
time ;-)

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH] atomic: Fix atomic_set_release() for 'funny' architectures
  2017-06-09 11:13   ` Peter Zijlstra
  2017-06-09 17:28       ` Vineet Gupta
@ 2017-06-09 18:58     ` James Bottomley
  1 sibling, 0 replies; 45+ messages in thread
From: James Bottomley @ 2017-06-09 18:58 UTC (permalink / raw)
  To: Peter Zijlstra, Will Deacon, Paul McKenney, Boqun Feng
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, vgupta, rkuo,
	james.hogan, jejb, davem, cmetcalf, Parisc List

[adding parisc list]
On Fri, 2017-06-09 at 13:13 +0200, Peter Zijlstra wrote:
> On Fri, Jun 09, 2017 at 01:05:06PM +0200, Peter Zijlstra wrote:
> 
> > The spinlock based atomics should be SC, that is, none of them
> > appear to
> > place extra barriers in atomic_cmpxchg() or any of the other SC
> > atomic
> > primitives and therefore seem to rely on their spinlock
> > implementation
> > being SC (I did not fully validate all that).
> 
> So I did see that ARC and PARISC have 'superfluous' smp_mb() calls
> around their spinlock implementation.
> 
> That is, for spinlock semantics you only need one _after_ lock and 
> one _before_ unlock. But the atomic stuff relies on being SC and thus
> would need one before and after both lock and unlock.

Actually, for us that's not true.  You are correct in the above for
safety but not for performance: If we remove the safety unnecessary
barriers, it can elongate our critical sections (the spinlock can move
up in the code stream and the spin unlock can move down) which leads to
performance regressions because we end up holding locks longer than we
need (we also have a lot of hot locks).

> Now, afaict PARISC doesn't even have memory barriers (it uses
> asm-generic/barrier.h) so that's a bit of a puzzle.

We disable relaxed ordering on our architecture which means the CPU
issue stream must match the instruction stream.  We've debated turning
on relaxed ordering, but decided it was more hassle than it's worth.

James


> But ARC could probably optimize (if they still care about that 
> hardware) by pulling out those barriers and putting it in the atomic
> implementation.
> 


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH]: documentation,atomic: Add a new atomic_t document
  2017-06-09 15:44 ` [RFC][PATCH]: documentation,atomic: Add a new atomic_t document Will Deacon
@ 2017-06-09 19:36   ` Peter Zijlstra
  2017-06-11 13:56     ` Boqun Feng
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2017-06-09 19:36 UTC (permalink / raw)
  To: Will Deacon
  Cc: Paul McKenney, Boqun Feng, linux-kernel, Ingo Molnar, Thomas Gleixner

On Fri, Jun 09, 2017 at 04:44:42PM +0100, Will Deacon wrote:

> > +++ b/Documentation/atomic_t.txt	2017-06-09 11:05:31.501599153 +0200
> > @@ -0,0 +1,147 @@
> > +
> > +On atomic types (atomic_t atomic64_t and atomic_long_t).
> > +
> > +The atomic type provides an interface to the architecture's means of atomic
> > +RmW operations between CPUs (it specifically does not order/work/etc. on
> > +IO).
> 
> We should be stronger here: atomics to IO could lead to kernel panics (i.e.
> raise a fatal abort), whereas this sounds like they just lose some ordering
> or atomicity guarantees.
> 
> > +The 'full' API consists of:
> 
> Need to mention the 64-bit and the long variants?

Makes a bit of a mess of things I felt, maybe I should be a little more
explicit and mention that everything applies to all 3 variants?


> > +Non RmW ops:
> > +
> > +The non-RmW ops are (typically) regular LOADs and STOREs and are canonically
> > +implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
> > +smp_store_release() respectively.
> > +
> > +The one detail to this is that atomic_set() should be observable to the RmW
> > +ops. That is:
> > +
> > +  CPU0						CPU1
> > +
> > +  val = atomic_read(&X)
> > +  do {
> > +						atomic_set(&X, 0)
> > +	new = val + 1;
> > +  } while (!atomic_try_cmpxchg(&X, &val, new));
> > +
> > +Should cause the cmpxchg to *FAIL* (when @val != 0). This is typically true;
> > +on 'normal' platforms; a regular competing STORE will invalidate a LL/SC.
> 
> I see what you're getting at here, but the example is a bit weird because
> CPU1 might hold the store to X in a local store-buffer and not shoot down
> the cmpxchg immediately. I think you need something to show how the write
> to X has at least partially propagated.

Hurm.. would the example from metag atomic_set be better?

> > +The obvious case where this is not so is where we need to implement atomic ops
> > +with a spinlock hashtable; the typical solution is to then implement
> > +atomic_set() with atomic_xchg().
> 
> Looking at sparc32, atomic_set takes the hashed lock, so I can't see what
> goes wrong here: atomic_try_cmpxchg will get called with val !=0, but the
> comparison will fail because the value in memory will be 0. What am I
> missing?

This is the reason their atomic_set() is special and needs to take the
lock.

> > +
> > +
> > +RmW ops:
> > +
> > +These come in various forms:
> > +
> > + - plain operations without return value: atomic_{}()
> 
> Maybe just list the API here, instead of having the separate section
> previously?

That then leaves you with no place to put those comments. I wanted
to slice the API in different ways for each subject without endless
repetition.

> > + - operations which return the modified value: atomic_{}_return()
> > +
> > +   these are limited to the arithmetic operations because those are
> > +   reversible. Bitops are irreversible and therefore the modified value
> > +   is of dubious utility.
> > +
> > + - operations which return the original value: atomic_fetch_{}()
> > +
> > + - swap operations: xchg(), cmpxchg() and try_cmpxchg()
> > +
> > + - misc; the special purpose operations that are commonly used and would,
> > +   given the interface, normally be implemented using (try_)cmpxchg loops but
> > +   are time critical and can, (typically) on LL/SC architectures, be more
> > +   efficiently implemented.
> > +
> > +
> > +All these operations are SMP atomic; that is, the operations (for a single
> > +atomic variable) can be fully ordered and no intermediate state is lost or
> > +visible.
> > +
> > +
> > +Ordering:  (go read memory-barriers.txt first)
> > +
> > +The rule of thumb:
> > +
> > + - non-RmW operations are unordered;
> > +
> > + - RmW operations that have no return value are unordered;
> > +
> > + - RmW operations that have a return value are Sequentially Consistent;
> 
> I think it's stronger than that, because they also order non-RmW operations,
> whereas this makes it sounds like there's just a total order over all RmW
> operations.

Right, what should I call it?

> > + - RmW operations that are conditional are unordered on FAILURE, otherwise the
> > +   above rules apply.
> 
> We should make it clear that "unordered" here refers to accesses to other
> memory locations.
> 
> > +
> > +Except of course when an operation has an explicit ordering like:
> > +
> > + {}_relaxed: unordered
> > + {}_acquire: the R of the RmW is an ACQUIRE
> > + {}_release: the W of the RmW is a  RELEASE
> > +
> > +NOTE: our ACQUIRE/RELEASE are RCpc
> 
> The NOTE belongs in memory-barriers.txt

It is of course; I'll remove that line.

> > +The barriers:
> > +
> > +  smp_mb__{before,after}_atomic()
> > +
> > +only apply to the RmW ops and can be used to augment/upgrade the ordering
> > +inherit to the used atomic op. These barriers provide a full smp_mb().
> 
> inherit?

"inherent" is I think the word I wanted..

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH]: documentation,atomic: Add a new atomic_t document
  2017-06-09 19:36   ` Peter Zijlstra
@ 2017-06-11 13:56     ` Boqun Feng
  2017-06-12 14:49       ` Peter Zijlstra
  0 siblings, 1 reply; 45+ messages in thread
From: Boqun Feng @ 2017-06-11 13:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Paul McKenney, linux-kernel, Ingo Molnar, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 1209 bytes --]

Hi Peter,

On Fri, Jun 09, 2017 at 09:36:04PM +0200, Peter Zijlstra wrote:
[...]
> > > +Ordering:  (go read memory-barriers.txt first)
> > > +
> > > +The rule of thumb:
> > > +
> > > + - non-RmW operations are unordered;
> > > +
> > > + - RmW operations that have no return value are unordered;
> > > +
> > > + - RmW operations that have a return value are Sequentially Consistent;
> > 
> > I think it's stronger than that, because they also order non-RmW operations,
> > whereas this makes it sounds like there's just a total order over all RmW
> > operations.
> 
> Right, what should I call it?
> 

I think the term we use to refer this behavior is "fully-ordered"? Could
we give it a slight formal definition like:

a.	memory operations preceding and following the RmW operation is
	Sequentially Consistent.

b.	load or store part of the RmW operation is Sequentially
	Consistent with operations preceding or following.

Though, sounds like defining "fully-ordered" is the job for
memory-barriers.txt, but it's never done ;-)

Regards,
Boqun



> > > + - RmW operations that are conditional are unordered on FAILURE, otherwise the
> > > +   above rules apply.
> > 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH]: documentation,atomic: Add a new atomic_t document
  2017-06-11 13:56     ` Boqun Feng
@ 2017-06-12 14:49       ` Peter Zijlstra
  2017-06-13  6:39         ` Boqun Feng
                           ` (3 more replies)
  0 siblings, 4 replies; 45+ messages in thread
From: Peter Zijlstra @ 2017-06-12 14:49 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Will Deacon, Paul McKenney, linux-kernel, Ingo Molnar, Thomas Gleixner

On Sun, Jun 11, 2017 at 09:56:32PM +0800, Boqun Feng wrote:

> I think the term we use to refer this behavior is "fully-ordered"?

Right, that is what we used to call it, and the term even occurs in
memory-barriers.txt but isn't actually defined therein.

> Could we give it a slight formal definition like:
> 
> a.	memory operations preceding and following the RmW operation is
> 	Sequentially Consistent.
> 
> b.	load or store part of the RmW operation is Sequentially
> 	Consistent with operations preceding or following.
> 
> Though, sounds like defining "fully-ordered" is the job for
> memory-barriers.txt, but it's never done ;-)

Right, so while memory-barriers.txt uses the term 'fully ordered' it
doesn't appear to mean the same thing we need here.

Still, lacking anything better, I did the below. Note that I also
removed much of the atomic stuff from memory-barrier.txt in order to
avoid duplication and confusion (it too was severely stale).



Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 Documentation/atomic_t.txt        |  182 ++++++++++++++++++++++++++++++++++++++
 Documentation/memory-barriers.txt |   86 -----------------
 2 files changed, 184 insertions(+), 84 deletions(-)

--- /dev/null
+++ b/Documentation/atomic_t.txt
@@ -0,0 +1,182 @@
+
+On atomic types (atomic_t atomic64_t and atomic_long_t).
+
+The atomic type provides an interface to the architecture's means of atomic
+RmW operations between CPUs (it specifically does not order/work/etc. on
+IO).
+
+The 'full' API consists of:
+
+Non RmW ops:
+
+  atomic_read(), atomic_set()
+  atomic_read_acquire(), atomic_set_release()
+
+
+RmW atomic operations:
+
+Arithmetic:
+
+  atomic_{add,sub,inc,dec}()
+  atomic_{add,sub,inc,dec}_return{,_relaxed,_acquire,_release}()
+  atomic_fetch_{add,sub,inc,dec}{,_relaxed,_acquire,_release}()
+
+
+Bitwise:
+
+  atomic_{and,or,xor,andnot}()
+  atomic_fetch_{and,or,xor,andnot}{,_relaxed,_acquire,_release}()
+
+
+Swap:
+
+  atomic_xchg{,_relaxed,_acquire,_release}()
+  atomic_cmpxchg{,_relaxed,_acquire,_release}()
+  atomic_try_cmpxchg{,_relaxed,_acquire,_release}()
+
+
+Reference count (but please see refcount_t):
+
+  atomic_add_unless(), atomic_inc_not_zero()
+  atomic_sub_and_test(), atomic_dec_and_test()
+
+
+Misc:
+
+  atomic_inc_and_test(), atomic_add_negative()
+  atomic_dec_unless_positive(), atomic_inc_unless_negative()
+
+
+Barriers:
+
+  smp_mb__{before,after}_atomic()
+
+
+
+Non RmW ops:
+
+The non-RmW ops are (typically) regular LOADs and STOREs and are canonically
+implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
+smp_store_release() respectively.
+
+The one detail to this is that atomic_set() should be observable to the RmW
+ops. That is:
+
+
+  PRE:
+  atomic_set(v, 1);
+
+  CPU0						CPU1
+  atomic_add_unless(v, 1, 0)			atomic_set(v, 0);
+
+  POST:
+  BUG_ON(v->counter == 2);
+
+
+In this case we would expect the atomic_set() from CPU1 to either happen
+before the atomic_add_unless(), in which case that latter one would no-op, or
+_after_ in which case we'd overwrite its result. In no case is "2" a valid
+outcome.
+
+This is typically true on 'normal' platforms, where a regular competing STORE
+will invalidate a LL/SC or fail a CMPXCHG.
+
+The obvious case where this is not so is when we need to implement atomic ops
+with a lock:
+
+
+  CPU0
+
+  atomic_add_unless(v, 1, 0);
+    lock();
+    ret = READ_ONCE(v->counter); // == 1
+						atomic_set(v, 0);
+    if (ret != u)				  WRITE_ONCE(v->counter, 0);
+      WRITE_ONCE(v->counter, ret + 1);
+    unlock();
+
+
+the typical solution is to then implement atomic_set() with atomic_xchg().
+
+
+RmW ops:
+
+These come in various forms:
+
+ - plain operations without return value: atomic_{}()
+
+ - operations which return the modified value: atomic_{}_return()
+
+   these are limited to the arithmetic operations because those are
+   reversible. Bitops are irreversible and therefore the modified value
+   is of dubious utility.
+
+ - operations which return the original value: atomic_fetch_{}()
+
+ - swap operations: xchg(), cmpxchg() and try_cmpxchg()
+
+ - misc; the special purpose operations that are commonly used and would,
+   given the interface, normally be implemented using (try_)cmpxchg loops but
+   are time critical and can, (typically) on LL/SC architectures, be more
+   efficiently implemented.
+
+
+All these operations are SMP atomic; that is, the operations (for a single
+atomic variable) can be fully ordered and no intermediate state is lost or
+visible.
+
+
+Ordering:  (go read memory-barriers.txt first)
+
+The rule of thumb:
+
+ - non-RmW operations are unordered;
+
+ - RmW operations that have no return value are unordered;
+
+ - RmW operations that have a return value are fully ordered;
+
+ - RmW operations that are conditional are unordered on FAILURE, otherwise the
+   above rules apply.
+
+Except of course when an operation has an explicit ordering like:
+
+ {}_relaxed: unordered
+ {}_acquire: the R of the RmW (or atomic_read) is an ACQUIRE
+ {}_release: the W of the RmW (or atomic_set)  is a  RELEASE
+
+
+Fully ordered primitives are ordered against everything prior and everything
+subsequenct. They also imply transitivity. Therefore a fully ordered primitive
+is like having an smp_mb() before and an smp_mb() after the primitive.
+
+
+The barriers:
+
+  smp_mb__{before,after}_atomic()
+
+only apply to the RmW ops and can be used to augment/upgrade the ordering
+inherit to the used atomic op. These barriers provide a full smp_mb().
+
+These helper barriers exist because architectures have varying implicit
+ordering on their SMP atomic primitives. For example our TSO architectures
+provide full ordered atomics and these barriers are no-ops.
+
+Thus:
+
+  atomic_fetch_add();
+
+is equivalent to:
+
+  smp_mb__before_atomic();
+  atomic_fetch_add_relaxed();
+  smp_mb__after_atomic();
+
+
+Further, while something like:
+
+  smp_mb__before_atomic();
+  atomic_dec(&X);
+
+is a 'typical' RELEASE pattern, the barrier is strictly stronger than
+a RELEASE.
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -498,7 +498,7 @@ VARIETIES OF MEMORY BARRIER
      This means that ACQUIRE acts as a minimal "acquire" operation and
      RELEASE acts as a minimal "release" operation.
 
-A subset of the atomic operations described in atomic_ops.txt have ACQUIRE
+A subset of the atomic operations described in atomic_t.txt have ACQUIRE
 and RELEASE variants in addition to fully-ordered and relaxed (no barrier
 semantics) definitions.  For compound atomics performing both a load and a
 store, ACQUIRE semantics apply only to the load and RELEASE semantics apply
@@ -1876,8 +1876,7 @@ compiler and the CPU from reordering the
      This makes sure that the death mark on the object is perceived to be set
      *before* the reference counter is decremented.
 
-     See Documentation/atomic_ops.txt for more information.  See the "Atomic
-     operations" subsection for information on where to use these.
+     See Documentation/atomic_t.txt for more information.
 
 
  (*) lockless_dereference();
@@ -2503,87 +2502,6 @@ operations are noted specially as some o
 some don't, but they're very heavily relied on as a group throughout the
 kernel.
 
-Any atomic operation that modifies some state in memory and returns information
-about the state (old or new) implies an SMP-conditional general memory barrier
-(smp_mb()) on each side of the actual operation (with the exception of
-explicit lock operations, described later).  These include:
-
-	xchg();
-	atomic_xchg();			atomic_long_xchg();
-	atomic_inc_return();		atomic_long_inc_return();
-	atomic_dec_return();		atomic_long_dec_return();
-	atomic_add_return();		atomic_long_add_return();
-	atomic_sub_return();		atomic_long_sub_return();
-	atomic_inc_and_test();		atomic_long_inc_and_test();
-	atomic_dec_and_test();		atomic_long_dec_and_test();
-	atomic_sub_and_test();		atomic_long_sub_and_test();
-	atomic_add_negative();		atomic_long_add_negative();
-	test_and_set_bit();
-	test_and_clear_bit();
-	test_and_change_bit();
-
-	/* when succeeds */
-	cmpxchg();
-	atomic_cmpxchg();		atomic_long_cmpxchg();
-	atomic_add_unless();		atomic_long_add_unless();
-
-These are used for such things as implementing ACQUIRE-class and RELEASE-class
-operations and adjusting reference counters towards object destruction, and as
-such the implicit memory barrier effects are necessary.
-
-
-The following operations are potential problems as they do _not_ imply memory
-barriers, but might be used for implementing such things as RELEASE-class
-operations:
-
-	atomic_set();
-	set_bit();
-	clear_bit();
-	change_bit();
-
-With these the appropriate explicit memory barrier should be used if necessary
-(smp_mb__before_atomic() for instance).
-
-
-The following also do _not_ imply memory barriers, and so may require explicit
-memory barriers under some circumstances (smp_mb__before_atomic() for
-instance):
-
-	atomic_add();
-	atomic_sub();
-	atomic_inc();
-	atomic_dec();
-
-If they're used for statistics generation, then they probably don't need memory
-barriers, unless there's a coupling between statistical data.
-
-If they're used for reference counting on an object to control its lifetime,
-they probably don't need memory barriers because either the reference count
-will be adjusted inside a locked section, or the caller will already hold
-sufficient references to make the lock, and thus a memory barrier unnecessary.
-
-If they're used for constructing a lock of some description, then they probably
-do need memory barriers as a lock primitive generally has to do things in a
-specific order.
-
-Basically, each usage case has to be carefully considered as to whether memory
-barriers are needed or not.
-
-The following operations are special locking primitives:
-
-	test_and_set_bit_lock();
-	clear_bit_unlock();
-	__clear_bit_unlock();
-
-These implement ACQUIRE-class and RELEASE-class operations.  These should be
-used in preference to other operations when implementing locking primitives,
-because their implementations can be optimised on many architectures.
-
-[!] Note that special memory barrier primitives are available for these
-situations because on some CPUs the atomic instructions used imply full memory
-barriers, and so barrier instructions are superfluous in conjunction with them,
-and in such cases the special barrier primitives will be no-ops.
-
 See Documentation/atomic_ops.txt for more information.
 
 

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH]: documentation,atomic: Add a new atomic_t document
  2017-06-12 14:49       ` Peter Zijlstra
@ 2017-06-13  6:39         ` Boqun Feng
  2017-06-14 12:33         ` Will Deacon
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 45+ messages in thread
From: Boqun Feng @ 2017-06-13  6:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Paul McKenney, linux-kernel, Ingo Molnar, Thomas Gleixner

On Mon, Jun 12, 2017 at 04:49:29PM +0200, Peter Zijlstra wrote:
> On Sun, Jun 11, 2017 at 09:56:32PM +0800, Boqun Feng wrote:
> 
> > I think the term we use to refer this behavior is "fully-ordered"?
> 
> Right, that is what we used to call it, and the term even occurs in
> memory-barriers.txt but isn't actually defined therein.
> 
> > Could we give it a slight formal definition like:
> > 
> > a.	memory operations preceding and following the RmW operation is
> > 	Sequentially Consistent.
> > 
> > b.	load or store part of the RmW operation is Sequentially
> > 	Consistent with operations preceding or following.
> > 
> > Though, sounds like defining "fully-ordered" is the job for
> > memory-barriers.txt, but it's never done ;-)
> 
> Right, so while memory-barriers.txt uses the term 'fully ordered' it
> doesn't appear to mean the same thing we need here.
> 
> Still, lacking anything better, I did the below. Note that I also
> removed much of the atomic stuff from memory-barrier.txt in order to
> avoid duplication and confusion (it too was severely stale).
> 

Agreed ;-)

> 
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  Documentation/atomic_t.txt        |  182 ++++++++++++++++++++++++++++++++++++++
>  Documentation/memory-barriers.txt |   86 -----------------
>  2 files changed, 184 insertions(+), 84 deletions(-)
> 
> --- /dev/null
> +++ b/Documentation/atomic_t.txt
> @@ -0,0 +1,182 @@
> +
> +On atomic types (atomic_t atomic64_t and atomic_long_t).
> +
> +The atomic type provides an interface to the architecture's means of atomic
> +RmW operations between CPUs (it specifically does not order/work/etc. on
> +IO).
> +
> +The 'full' API consists of:
> +
> +Non RmW ops:

This is the first "Non RmW ops:", and..

> +
> +  atomic_read(), atomic_set()
> +  atomic_read_acquire(), atomic_set_release()
> +
> +
> +RmW atomic operations:
> +
> +Arithmetic:
> +
> +  atomic_{add,sub,inc,dec}()
> +  atomic_{add,sub,inc,dec}_return{,_relaxed,_acquire,_release}()
> +  atomic_fetch_{add,sub,inc,dec}{,_relaxed,_acquire,_release}()
> +
> +
> +Bitwise:
> +
> +  atomic_{and,or,xor,andnot}()
> +  atomic_fetch_{and,or,xor,andnot}{,_relaxed,_acquire,_release}()
> +
> +
> +Swap:
> +
> +  atomic_xchg{,_relaxed,_acquire,_release}()
> +  atomic_cmpxchg{,_relaxed,_acquire,_release}()
> +  atomic_try_cmpxchg{,_relaxed,_acquire,_release}()
> +
> +
> +Reference count (but please see refcount_t):
> +
> +  atomic_add_unless(), atomic_inc_not_zero()
> +  atomic_sub_and_test(), atomic_dec_and_test()
> +
> +
> +Misc:
> +
> +  atomic_inc_and_test(), atomic_add_negative()
> +  atomic_dec_unless_positive(), atomic_inc_unless_negative()
> +
> +
> +Barriers:
> +
> +  smp_mb__{before,after}_atomic()
> +
> +
> +

I feel like some words or a cutting line required here, indicating we
end listing the api ops and begin to talk more details(atomicity,
ordering, etc.). Otherwise, the following second "Non RmW ops:" may
confuse people a little bit. Thoughts?

Regards,
Boqun

> +Non RmW ops:
> +
> +The non-RmW ops are (typically) regular LOADs and STOREs and are canonically
> +implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
> +smp_store_release() respectively.
> +
> +The one detail to this is that atomic_set() should be observable to the RmW
> +ops. That is:
> +
> +
> +  PRE:
> +  atomic_set(v, 1);
> +
> +  CPU0						CPU1
> +  atomic_add_unless(v, 1, 0)			atomic_set(v, 0);
> +
> +  POST:
> +  BUG_ON(v->counter == 2);
> +
> +
> +In this case we would expect the atomic_set() from CPU1 to either happen
> +before the atomic_add_unless(), in which case that latter one would no-op, or
> +_after_ in which case we'd overwrite its result. In no case is "2" a valid
> +outcome.
> +
> +This is typically true on 'normal' platforms, where a regular competing STORE
> +will invalidate a LL/SC or fail a CMPXCHG.
> +
> +The obvious case where this is not so is when we need to implement atomic ops
> +with a lock:
> +
> +
> +  CPU0
> +
> +  atomic_add_unless(v, 1, 0);
> +    lock();
> +    ret = READ_ONCE(v->counter); // == 1
> +						atomic_set(v, 0);
> +    if (ret != u)				  WRITE_ONCE(v->counter, 0);
> +      WRITE_ONCE(v->counter, ret + 1);
> +    unlock();
> +
> +
> +the typical solution is to then implement atomic_set() with atomic_xchg().
> +
> +
> +RmW ops:
> +
> +These come in various forms:
> +
> + - plain operations without return value: atomic_{}()
> +
> + - operations which return the modified value: atomic_{}_return()
> +
> +   these are limited to the arithmetic operations because those are
> +   reversible. Bitops are irreversible and therefore the modified value
> +   is of dubious utility.
> +
> + - operations which return the original value: atomic_fetch_{}()
> +
> + - swap operations: xchg(), cmpxchg() and try_cmpxchg()
> +
> + - misc; the special purpose operations that are commonly used and would,
> +   given the interface, normally be implemented using (try_)cmpxchg loops but
> +   are time critical and can, (typically) on LL/SC architectures, be more
> +   efficiently implemented.
> +
> +
> +All these operations are SMP atomic; that is, the operations (for a single
> +atomic variable) can be fully ordered and no intermediate state is lost or
> +visible.
> +
> +
> +Ordering:  (go read memory-barriers.txt first)
> +
> +The rule of thumb:
> +
> + - non-RmW operations are unordered;
> +
> + - RmW operations that have no return value are unordered;
> +
> + - RmW operations that have a return value are fully ordered;
> +
> + - RmW operations that are conditional are unordered on FAILURE, otherwise the
> +   above rules apply.
> +
> +Except of course when an operation has an explicit ordering like:
> +
> + {}_relaxed: unordered
> + {}_acquire: the R of the RmW (or atomic_read) is an ACQUIRE
> + {}_release: the W of the RmW (or atomic_set)  is a  RELEASE
> +
> +
> +Fully ordered primitives are ordered against everything prior and everything
> +subsequenct. They also imply transitivity. Therefore a fully ordered primitive
> +is like having an smp_mb() before and an smp_mb() after the primitive.
> +
> +
> +The barriers:
> +
> +  smp_mb__{before,after}_atomic()
> +
> +only apply to the RmW ops and can be used to augment/upgrade the ordering
> +inherit to the used atomic op. These barriers provide a full smp_mb().
> +
> +These helper barriers exist because architectures have varying implicit
> +ordering on their SMP atomic primitives. For example our TSO architectures
> +provide full ordered atomics and these barriers are no-ops.
> +
> +Thus:
> +
> +  atomic_fetch_add();
> +
> +is equivalent to:
> +
> +  smp_mb__before_atomic();
> +  atomic_fetch_add_relaxed();
> +  smp_mb__after_atomic();
> +
> +
> +Further, while something like:
> +
> +  smp_mb__before_atomic();
> +  atomic_dec(&X);
> +
> +is a 'typical' RELEASE pattern, the barrier is strictly stronger than
> +a RELEASE.
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -498,7 +498,7 @@ VARIETIES OF MEMORY BARRIER
>       This means that ACQUIRE acts as a minimal "acquire" operation and
>       RELEASE acts as a minimal "release" operation.
>  
> -A subset of the atomic operations described in atomic_ops.txt have ACQUIRE
> +A subset of the atomic operations described in atomic_t.txt have ACQUIRE
>  and RELEASE variants in addition to fully-ordered and relaxed (no barrier
>  semantics) definitions.  For compound atomics performing both a load and a
>  store, ACQUIRE semantics apply only to the load and RELEASE semantics apply
> @@ -1876,8 +1876,7 @@ compiler and the CPU from reordering the
>       This makes sure that the death mark on the object is perceived to be set
>       *before* the reference counter is decremented.
>  
> -     See Documentation/atomic_ops.txt for more information.  See the "Atomic
> -     operations" subsection for information on where to use these.
> +     See Documentation/atomic_t.txt for more information.
>  
>  
>   (*) lockless_dereference();
> @@ -2503,87 +2502,6 @@ operations are noted specially as some o
>  some don't, but they're very heavily relied on as a group throughout the
>  kernel.
>  
> -Any atomic operation that modifies some state in memory and returns information
> -about the state (old or new) implies an SMP-conditional general memory barrier
> -(smp_mb()) on each side of the actual operation (with the exception of
> -explicit lock operations, described later).  These include:
> -
> -	xchg();
> -	atomic_xchg();			atomic_long_xchg();
> -	atomic_inc_return();		atomic_long_inc_return();
> -	atomic_dec_return();		atomic_long_dec_return();
> -	atomic_add_return();		atomic_long_add_return();
> -	atomic_sub_return();		atomic_long_sub_return();
> -	atomic_inc_and_test();		atomic_long_inc_and_test();
> -	atomic_dec_and_test();		atomic_long_dec_and_test();
> -	atomic_sub_and_test();		atomic_long_sub_and_test();
> -	atomic_add_negative();		atomic_long_add_negative();
> -	test_and_set_bit();
> -	test_and_clear_bit();
> -	test_and_change_bit();
> -
> -	/* when succeeds */
> -	cmpxchg();
> -	atomic_cmpxchg();		atomic_long_cmpxchg();
> -	atomic_add_unless();		atomic_long_add_unless();
> -
> -These are used for such things as implementing ACQUIRE-class and RELEASE-class
> -operations and adjusting reference counters towards object destruction, and as
> -such the implicit memory barrier effects are necessary.
> -
> -
> -The following operations are potential problems as they do _not_ imply memory
> -barriers, but might be used for implementing such things as RELEASE-class
> -operations:
> -
> -	atomic_set();
> -	set_bit();
> -	clear_bit();
> -	change_bit();
> -
> -With these the appropriate explicit memory barrier should be used if necessary
> -(smp_mb__before_atomic() for instance).
> -
> -
> -The following also do _not_ imply memory barriers, and so may require explicit
> -memory barriers under some circumstances (smp_mb__before_atomic() for
> -instance):
> -
> -	atomic_add();
> -	atomic_sub();
> -	atomic_inc();
> -	atomic_dec();
> -
> -If they're used for statistics generation, then they probably don't need memory
> -barriers, unless there's a coupling between statistical data.
> -
> -If they're used for reference counting on an object to control its lifetime,
> -they probably don't need memory barriers because either the reference count
> -will be adjusted inside a locked section, or the caller will already hold
> -sufficient references to make the lock, and thus a memory barrier unnecessary.
> -
> -If they're used for constructing a lock of some description, then they probably
> -do need memory barriers as a lock primitive generally has to do things in a
> -specific order.
> -
> -Basically, each usage case has to be carefully considered as to whether memory
> -barriers are needed or not.
> -
> -The following operations are special locking primitives:
> -
> -	test_and_set_bit_lock();
> -	clear_bit_unlock();
> -	__clear_bit_unlock();
> -
> -These implement ACQUIRE-class and RELEASE-class operations.  These should be
> -used in preference to other operations when implementing locking primitives,
> -because their implementations can be optimised on many architectures.
> -
> -[!] Note that special memory barrier primitives are available for these
> -situations because on some CPUs the atomic instructions used imply full memory
> -barriers, and so barrier instructions are superfluous in conjunction with them,
> -and in such cases the special barrier primitives will be no-ops.
> -
>  See Documentation/atomic_ops.txt for more information.
>  
>  

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH]: documentation,atomic: Add a new atomic_t document
  2017-06-12 14:49       ` Peter Zijlstra
  2017-06-13  6:39         ` Boqun Feng
@ 2017-06-14 12:33         ` Will Deacon
  2017-07-12 12:53         ` Boqun Feng
  2017-07-26 11:53         ` [RFC][PATCH v3]: documentation,atomic: Add new documents Peter Zijlstra
  3 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-06-14 12:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Boqun Feng, Paul McKenney, linux-kernel, Ingo Molnar, Thomas Gleixner

On Mon, Jun 12, 2017 at 04:49:29PM +0200, Peter Zijlstra wrote:
> On Sun, Jun 11, 2017 at 09:56:32PM +0800, Boqun Feng wrote:
> 
> > I think the term we use to refer this behavior is "fully-ordered"?
> 
> Right, that is what we used to call it, and the term even occurs in
> memory-barriers.txt but isn't actually defined therein.
> 
> > Could we give it a slight formal definition like:
> > 
> > a.	memory operations preceding and following the RmW operation is
> > 	Sequentially Consistent.
> > 
> > b.	load or store part of the RmW operation is Sequentially
> > 	Consistent with operations preceding or following.
> > 
> > Though, sounds like defining "fully-ordered" is the job for
> > memory-barriers.txt, but it's never done ;-)
> 
> Right, so while memory-barriers.txt uses the term 'fully ordered' it
> doesn't appear to mean the same thing we need here.
> 
> Still, lacking anything better, I did the below. Note that I also
> removed much of the atomic stuff from memory-barrier.txt in order to
> avoid duplication and confusion (it too was severely stale).

A few more comments inline...

> +The one detail to this is that atomic_set() should be observable to the RmW
> +ops. That is:

I'm afraid this one is still confusing me :)

> +  PRE:
> +  atomic_set(v, 1);
> +
> +  CPU0						CPU1
> +  atomic_add_unless(v, 1, 0)			atomic_set(v, 0);
> +
> +  POST:
> +  BUG_ON(v->counter == 2);
> +
> +
> +In this case we would expect the atomic_set() from CPU1 to either happen
> +before the atomic_add_unless(), in which case that latter one would no-op, or
> +_after_ in which case we'd overwrite its result. In no case is "2" a valid
> +outcome.

What do you mean by PRE and POST? Are they running on CPU0, or someplace
else (with barriers)? It sounds like you want to rule out:

  CPU1
  PRE
  CPU0
  POST

but it's tough to say whether or not that's actually forbidden.

> +This is typically true on 'normal' platforms, where a regular competing STORE
> +will invalidate a LL/SC or fail a CMPXCHG.
> +
> +The obvious case where this is not so is when we need to implement atomic ops
> +with a lock:
> +
> +
> +  CPU0
> +
> +  atomic_add_unless(v, 1, 0);
> +    lock();
> +    ret = READ_ONCE(v->counter); // == 1
> +						atomic_set(v, 0);
> +    if (ret != u)				  WRITE_ONCE(v->counter, 0);
> +      WRITE_ONCE(v->counter, ret + 1);
> +    unlock();
> +
> +
> +the typical solution is to then implement atomic_set() with atomic_xchg().
> +
> +
> +RmW ops:
> +
> +These come in various forms:
> +
> + - plain operations without return value: atomic_{}()
> +
> + - operations which return the modified value: atomic_{}_return()
> +
> +   these are limited to the arithmetic operations because those are
> +   reversible. Bitops are irreversible and therefore the modified value
> +   is of dubious utility.
> +
> + - operations which return the original value: atomic_fetch_{}()
> +
> + - swap operations: xchg(), cmpxchg() and try_cmpxchg()
> +
> + - misc; the special purpose operations that are commonly used and would,
> +   given the interface, normally be implemented using (try_)cmpxchg loops but
> +   are time critical and can, (typically) on LL/SC architectures, be more
> +   efficiently implemented.
> +
> +
> +All these operations are SMP atomic; that is, the operations (for a single
> +atomic variable) can be fully ordered and no intermediate state is lost or
> +visible.
> +
> +
> +Ordering:  (go read memory-barriers.txt first)
> +
> +The rule of thumb:
> +
> + - non-RmW operations are unordered;
> +
> + - RmW operations that have no return value are unordered;
> +
> + - RmW operations that have a return value are fully ordered;
> +
> + - RmW operations that are conditional are unordered on FAILURE, otherwise the
> +   above rules apply.
> +
> +Except of course when an operation has an explicit ordering like:
> +
> + {}_relaxed: unordered
> + {}_acquire: the R of the RmW (or atomic_read) is an ACQUIRE
> + {}_release: the W of the RmW (or atomic_set)  is a  RELEASE
> +
> +
> +Fully ordered primitives are ordered against everything prior and everything
> +subsequenct. They also imply transitivity. Therefore a fully ordered primitive

subsequent

> +is like having an smp_mb() before and an smp_mb() after the primitive.

Actually, perhaps that's the best way to explain this: just say that
fully-ordered primitives behave as is they have an smp_mb() before and an
smp_mb() after. Defer the transitivity to memory_barriers.txt (espec. since
it makes it sounds like acquire/release have no transitivity at all).

> +
> +
> +The barriers:
> +
> +  smp_mb__{before,after}_atomic()
> +
> +only apply to the RmW ops and can be used to augment/upgrade the ordering
> +inherit to the used atomic op. These barriers provide a full smp_mb().
> +
> +These helper barriers exist because architectures have varying implicit
> +ordering on their SMP atomic primitives. For example our TSO architectures
> +provide full ordered atomics and these barriers are no-ops.
> +
> +Thus:
> +
> +  atomic_fetch_add();
> +
> +is equivalent to:
> +
> +  smp_mb__before_atomic();
> +  atomic_fetch_add_relaxed();
> +  smp_mb__after_atomic();
> +
> +
> +Further, while something like:
> +
> +  smp_mb__before_atomic();
> +  atomic_dec(&X);
> +
> +is a 'typical' RELEASE pattern, the barrier is strictly stronger than
> +a RELEASE.

There's also an ACQUIRE analogue here, and I think you can interwork the
{_acquire,_release} variants with the smp_mb__{before,after}_atomic
variants. On ARM64 the former will be stronger (RCsc), but the kernel memory
model doesn't distinguish. Agreed?

Will

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH]: documentation,atomic: Add a new atomic_t document
  2017-06-12 14:49       ` Peter Zijlstra
  2017-06-13  6:39         ` Boqun Feng
  2017-06-14 12:33         ` Will Deacon
@ 2017-07-12 12:53         ` Boqun Feng
  2017-07-12 13:08           ` Peter Zijlstra
  2017-07-26 11:53         ` [RFC][PATCH v3]: documentation,atomic: Add new documents Peter Zijlstra
  3 siblings, 1 reply; 45+ messages in thread
From: Boqun Feng @ 2017-07-12 12:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Paul McKenney, linux-kernel, Ingo Molnar, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 1266 bytes --]

On Mon, Jun 12, 2017 at 04:49:29PM +0200, Peter Zijlstra wrote:
[...]
> -Any atomic operation that modifies some state in memory and returns information
> -about the state (old or new) implies an SMP-conditional general memory barrier
> -(smp_mb()) on each side of the actual operation (with the exception of
> -explicit lock operations, described later).  These include:
> -
> -	xchg();
> -	atomic_xchg();			atomic_long_xchg();
> -	atomic_inc_return();		atomic_long_inc_return();
> -	atomic_dec_return();		atomic_long_dec_return();
> -	atomic_add_return();		atomic_long_add_return();
> -	atomic_sub_return();		atomic_long_sub_return();
> -	atomic_inc_and_test();		atomic_long_inc_and_test();
> -	atomic_dec_and_test();		atomic_long_dec_and_test();
> -	atomic_sub_and_test();		atomic_long_sub_and_test();
> -	atomic_add_negative();		atomic_long_add_negative();
> -	test_and_set_bit();
> -	test_and_clear_bit();
> -	test_and_change_bit();
> -

The bit related operations are removed from memory-barriers.txt, I think
we'd better add them in atomic_t.txt? By "them", I mean:

	test_and_{set,clear,change}_bit() as RMW atomic

	{set,clear,change}_bit() as non-RMW atomic

	test_and_set_bit_lock()
	clear_bit_unlock() as non-RMW(but barrier-like) atomic

Regards,
Boqun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH]: documentation,atomic: Add a new atomic_t document
  2017-07-12 12:53         ` Boqun Feng
@ 2017-07-12 13:08           ` Peter Zijlstra
  2017-07-12 19:13             ` Paul E. McKenney
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2017-07-12 13:08 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Will Deacon, Paul McKenney, linux-kernel, Ingo Molnar, Thomas Gleixner

On Wed, Jul 12, 2017 at 08:53:47PM +0800, Boqun Feng wrote:
> On Mon, Jun 12, 2017 at 04:49:29PM +0200, Peter Zijlstra wrote:
> [...]
> > -Any atomic operation that modifies some state in memory and returns information
> > -about the state (old or new) implies an SMP-conditional general memory barrier
> > -(smp_mb()) on each side of the actual operation (with the exception of
> > -explicit lock operations, described later).  These include:
> > -
> > -	xchg();
> > -	atomic_xchg();			atomic_long_xchg();
> > -	atomic_inc_return();		atomic_long_inc_return();
> > -	atomic_dec_return();		atomic_long_dec_return();
> > -	atomic_add_return();		atomic_long_add_return();
> > -	atomic_sub_return();		atomic_long_sub_return();
> > -	atomic_inc_and_test();		atomic_long_inc_and_test();
> > -	atomic_dec_and_test();		atomic_long_dec_and_test();
> > -	atomic_sub_and_test();		atomic_long_sub_and_test();
> > -	atomic_add_negative();		atomic_long_add_negative();
> > -	test_and_set_bit();
> > -	test_and_clear_bit();
> > -	test_and_change_bit();
> > -
> 
> The bit related operations are removed from memory-barriers.txt, I think
> we'd better add them in atomic_t.txt? By "them", I mean:
> 
> 	test_and_{set,clear,change}_bit() as RMW atomic
> 
> 	{set,clear,change}_bit() as non-RMW atomic
> 
> 	test_and_set_bit_lock()
> 	clear_bit_unlock() as non-RMW(but barrier-like) atomic

I was thinking maybe a separate file, as I was hoping to eventually
write a separate file on spinlocks too.

I'd like to keep the the new thing purely about the atomic* family of
stuff, that's large enough as is.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH]: documentation,atomic: Add a new atomic_t document
  2017-07-12 13:08           ` Peter Zijlstra
@ 2017-07-12 19:13             ` Paul E. McKenney
  0 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2017-07-12 19:13 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Boqun Feng, Will Deacon, linux-kernel, Ingo Molnar, Thomas Gleixner

On Wed, Jul 12, 2017 at 03:08:26PM +0200, Peter Zijlstra wrote:
> On Wed, Jul 12, 2017 at 08:53:47PM +0800, Boqun Feng wrote:
> > On Mon, Jun 12, 2017 at 04:49:29PM +0200, Peter Zijlstra wrote:
> > [...]
> > > -Any atomic operation that modifies some state in memory and returns information
> > > -about the state (old or new) implies an SMP-conditional general memory barrier
> > > -(smp_mb()) on each side of the actual operation (with the exception of
> > > -explicit lock operations, described later).  These include:
> > > -
> > > -	xchg();
> > > -	atomic_xchg();			atomic_long_xchg();
> > > -	atomic_inc_return();		atomic_long_inc_return();
> > > -	atomic_dec_return();		atomic_long_dec_return();
> > > -	atomic_add_return();		atomic_long_add_return();
> > > -	atomic_sub_return();		atomic_long_sub_return();
> > > -	atomic_inc_and_test();		atomic_long_inc_and_test();
> > > -	atomic_dec_and_test();		atomic_long_dec_and_test();
> > > -	atomic_sub_and_test();		atomic_long_sub_and_test();
> > > -	atomic_add_negative();		atomic_long_add_negative();
> > > -	test_and_set_bit();
> > > -	test_and_clear_bit();
> > > -	test_and_change_bit();
> > > -
> > 
> > The bit related operations are removed from memory-barriers.txt, I think
> > we'd better add them in atomic_t.txt? By "them", I mean:
> > 
> > 	test_and_{set,clear,change}_bit() as RMW atomic
> > 
> > 	{set,clear,change}_bit() as non-RMW atomic
> > 
> > 	test_and_set_bit_lock()
> > 	clear_bit_unlock() as non-RMW(but barrier-like) atomic
> 
> I was thinking maybe a separate file, as I was hoping to eventually
> write a separate file on spinlocks too.
> 
> I'd like to keep the the new thing purely about the atomic* family of
> stuff, that's large enough as is.

As long as wherever the information is kept actually gets updated when
new functions are added or old ones change, I am good.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-06-12 14:49       ` Peter Zijlstra
                           ` (2 preceding siblings ...)
  2017-07-12 12:53         ` Boqun Feng
@ 2017-07-26 11:53         ` Peter Zijlstra
  2017-07-26 12:47           ` Boqun Feng
  2017-07-26 16:28           ` Randy Dunlap
  3 siblings, 2 replies; 45+ messages in thread
From: Peter Zijlstra @ 2017-07-26 11:53 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Will Deacon, Paul McKenney, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap


New version..


---
Subject: documentation,atomic: Add new documents
From: Peter Zijlstra <peterz@infradead.org>
Date: Mon Jun 12 14:50:27 CEST 2017

Since we've vastly expanded the atomic_t interface in recent years the
existing documentation is woefully out of date and people seem to get
confused a bit.

Start a new document to hopefully better explain the current state of
affairs.

The old atomic_ops.txt also covers bitmaps and a few more details so
this is not a full replacement and we'll therefore keep that document
around until such a time that we've managed to write more text to cover
its entire.

Also please, ReST people, go away.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 Documentation/atomic_bitops.txt   |   66 ++++++++++++
 Documentation/atomic_t.txt        |  200 ++++++++++++++++++++++++++++++++++++++
 Documentation/memory-barriers.txt |   88 ----------------
 3 files changed, 269 insertions(+), 85 deletions(-)

--- /dev/null
+++ b/Documentation/atomic_bitops.txt
@@ -0,0 +1,66 @@
+
+On atomic bitops.
+
+
+While our bitmap_{}() functions are non-atomic, we have a number of operations
+operating on single bits in a bitmap that are atomic.
+
+
+API
+---
+
+The single bit operations are:
+
+Non RmW ops:
+
+  test_bit()
+
+RmW atomic operations without return value:
+
+  {set,clear,change}_bit()
+  clear_bit_unlock()
+
+RmW atomic operations with return value:
+
+  test_and_{set,clear,change}_bit()
+  test_and_set_bit_lock()
+
+Barriers:
+
+  smp_mb__{before,after}_atomic()
+
+
+All RmW atomic operations have a '__' prefixed variant which is non-atomic.
+
+
+SEMANTICS
+---------
+
+Non-atomic ops:
+
+In particular __clear_bit_unlock() suffers the same issue as atomic_set(),
+which is why the generic version maps to clear_bit_unlock(), see atomic_t.txt.
+
+
+RmW ops:
+
+The test_and_{}_bit() operations return the original value of the bit.
+
+
+ORDERING
+--------
+
+Like with atomic_t, the rule of thumb is:
+
+ - non-RmW operations are unordered;
+
+ - RmW operations that have no return value are unordered;
+
+ - RmW operations that have a return value are fully ordered.
+
+Except for test_and_set_bit_lock() which has ACQUIRE semantics and
+clear_bit_unlock() which has RELEASE semantics.
+
+Since a platform only has a single means of achieving atomic operations
+the same barriers as for atomic_t are used, see atomic_t.txt.
+
--- /dev/null
+++ b/Documentation/atomic_t.txt
@@ -0,0 +1,200 @@
+
+On atomic types (atomic_t atomic64_t and atomic_long_t).
+
+The atomic type provides an interface to the architecture's means of atomic
+RmW operations between CPUs (atomic operations on MMIO are not supported and
+can lead to fatal traps on some platforms).
+
+API
+---
+
+The 'full' API consists of (atomic64_ and atomic_long_ prefixes omitted for
+brevity):
+
+Non RmW ops:
+
+  atomic_read(), atomic_set()
+  atomic_read_acquire(), atomic_set_release()
+
+
+RmW atomic operations:
+
+Arithmetic:
+
+  atomic_{add,sub,inc,dec}()
+  atomic_{add,sub,inc,dec}_return{,_relaxed,_acquire,_release}()
+  atomic_fetch_{add,sub,inc,dec}{,_relaxed,_acquire,_release}()
+
+
+Bitwise:
+
+  atomic_{and,or,xor,andnot}()
+  atomic_fetch_{and,or,xor,andnot}{,_relaxed,_acquire,_release}()
+
+
+Swap:
+
+  atomic_xchg{,_relaxed,_acquire,_release}()
+  atomic_cmpxchg{,_relaxed,_acquire,_release}()
+  atomic_try_cmpxchg{,_relaxed,_acquire,_release}()
+
+
+Reference count (but please see refcount_t):
+
+  atomic_add_unless(), atomic_inc_not_zero()
+  atomic_sub_and_test(), atomic_dec_and_test()
+
+
+Misc:
+
+  atomic_inc_and_test(), atomic_add_negative()
+  atomic_dec_unless_positive(), atomic_inc_unless_negative()
+
+
+Barriers:
+
+  smp_mb__{before,after}_atomic()
+
+
+
+SEMANTICS
+---------
+
+Non RmW ops:
+
+The non-RmW ops are (typically) regular LOADs and STOREs and are canonically
+implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
+smp_store_release() respectively.
+
+The one detail to this is that atomic_set{}() should be observable to the RmW
+ops. That is:
+
+  C atomic-set
+
+  {
+    atomic_set(v, 1);
+  }
+
+  P1(atomic_t *v)
+  {
+    atomic_add_unless(v, 1, 0);
+  }
+
+  P2(atomic_t *v)
+  {
+    atomic_set(v, 0);
+  }
+
+  exists
+  (v=2)
+
+In this case we would expect the atomic_set() from CPU1 to either happen
+before the atomic_add_unless(), in which case that latter one would no-op, or
+_after_ in which case we'd overwrite its result. In no case is "2" a valid
+outcome.
+
+This is typically true on 'normal' platforms, where a regular competing STORE
+will invalidate a LL/SC or fail a CMPXCHG.
+
+The obvious case where this is not so is when we need to implement atomic ops
+with a lock:
+
+  CPU0						CPU1
+
+  atomic_add_unless(v, 1, 0);
+    lock();
+    ret = READ_ONCE(v->counter); // == 1
+						atomic_set(v, 0);
+    if (ret != u)				  WRITE_ONCE(v->counter, 0);
+      WRITE_ONCE(v->counter, ret + 1);
+    unlock();
+
+the typical solution is to then implement atomic_set{}() with atomic_xchg().
+
+
+RmW ops:
+
+These come in various forms:
+
+ - plain operations without return value: atomic_{}()
+
+ - operations which return the modified value: atomic_{}_return()
+
+   these are limited to the arithmetic operations because those are
+   reversible. Bitops are irreversible and therefore the modified value
+   is of dubious utility.
+
+ - operations which return the original value: atomic_fetch_{}()
+
+ - swap operations: xchg(), cmpxchg() and try_cmpxchg()
+
+ - misc; the special purpose operations that are commonly used and would,
+   given the interface, normally be implemented using (try_)cmpxchg loops but
+   are time critical and can, (typically) on LL/SC architectures, be more
+   efficiently implemented.
+
+All these operations are SMP atomic; that is, the operations (for a single
+atomic variable) can be fully ordered and no intermediate state is lost or
+visible.
+
+
+ORDERING  (go read memory-barriers.txt first)
+--------
+
+The rule of thumb:
+
+ - non-RmW operations are unordered;
+
+ - RmW operations that have no return value are unordered;
+
+ - RmW operations that have a return value are fully ordered;
+
+ - RmW operations that are conditional are unordered on FAILURE,
+   otherwise the above rules apply.
+
+Except of course when an operation has an explicit ordering like:
+
+ {}_relaxed: unordered
+ {}_acquire: the R of the RmW (or atomic_read) is an ACQUIRE
+ {}_release: the W of the RmW (or atomic_set)  is a  RELEASE
+
+Where 'unordered' is against other memory locations. Address dependencies are
+not defeated.
+
+Fully ordered primitives are ordered against everything prior and everything
+subsequent. Therefore a fully ordered primitive is like having an smp_mb()
+before and an smp_mb() after the primitive.
+
+
+The barriers:
+
+  smp_mb__{before,after}_atomic()
+
+only apply to the RmW ops and can be used to augment/upgrade the ordering
+inherent to the used atomic op. These barriers provide a full smp_mb().
+
+These helper barriers exist because architectures have varying implicit
+ordering on their SMP atomic primitives. For example our TSO architectures
+provide full ordered atomics and these barriers are no-ops.
+
+Thus:
+
+  atomic_fetch_add();
+
+is equivalent to:
+
+  smp_mb__before_atomic();
+  atomic_fetch_add_relaxed();
+  smp_mb__after_atomic();
+
+However the atomic_fetch_add() might be implemented more efficiently.
+
+Further, while something like:
+
+  smp_mb__before_atomic();
+  atomic_dec(&X);
+
+is a 'typical' RELEASE pattern, the barrier is strictly stronger than
+a RELEASE. Similarly for something like:
+
+
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -498,7 +498,7 @@ VARIETIES OF MEMORY BARRIER
      This means that ACQUIRE acts as a minimal "acquire" operation and
      RELEASE acts as a minimal "release" operation.
 
-A subset of the atomic operations described in core-api/atomic_ops.rst have
+A subset of the atomic operations described in atomic_t.txt have ACQUIRE
 ACQUIRE and RELEASE variants in addition to fully-ordered and relaxed (no
 barrier semantics) definitions.  For compound atomics performing both a load
 and a store, ACQUIRE semantics apply only to the load and RELEASE semantics
@@ -1876,8 +1876,7 @@ compiler and the CPU from reordering the
      This makes sure that the death mark on the object is perceived to be set
      *before* the reference counter is decremented.
 
-     See Documentation/core-api/atomic_ops.rst for more information.  See the
-     "Atomic operations" subsection for information on where to use these.
+     See Documentation/atomic_t.txt for more information.
 
 
  (*) lockless_dereference();
@@ -2503,88 +2502,7 @@ operations are noted specially as some o
 some don't, but they're very heavily relied on as a group throughout the
 kernel.
 
-Any atomic operation that modifies some state in memory and returns information
-about the state (old or new) implies an SMP-conditional general memory barrier
-(smp_mb()) on each side of the actual operation (with the exception of
-explicit lock operations, described later).  These include:
-
-	xchg();
-	atomic_xchg();			atomic_long_xchg();
-	atomic_inc_return();		atomic_long_inc_return();
-	atomic_dec_return();		atomic_long_dec_return();
-	atomic_add_return();		atomic_long_add_return();
-	atomic_sub_return();		atomic_long_sub_return();
-	atomic_inc_and_test();		atomic_long_inc_and_test();
-	atomic_dec_and_test();		atomic_long_dec_and_test();
-	atomic_sub_and_test();		atomic_long_sub_and_test();
-	atomic_add_negative();		atomic_long_add_negative();
-	test_and_set_bit();
-	test_and_clear_bit();
-	test_and_change_bit();
-
-	/* when succeeds */
-	cmpxchg();
-	atomic_cmpxchg();		atomic_long_cmpxchg();
-	atomic_add_unless();		atomic_long_add_unless();
-
-These are used for such things as implementing ACQUIRE-class and RELEASE-class
-operations and adjusting reference counters towards object destruction, and as
-such the implicit memory barrier effects are necessary.
-
-
-The following operations are potential problems as they do _not_ imply memory
-barriers, but might be used for implementing such things as RELEASE-class
-operations:
-
-	atomic_set();
-	set_bit();
-	clear_bit();
-	change_bit();
-
-With these the appropriate explicit memory barrier should be used if necessary
-(smp_mb__before_atomic() for instance).
-
-
-The following also do _not_ imply memory barriers, and so may require explicit
-memory barriers under some circumstances (smp_mb__before_atomic() for
-instance):
-
-	atomic_add();
-	atomic_sub();
-	atomic_inc();
-	atomic_dec();
-
-If they're used for statistics generation, then they probably don't need memory
-barriers, unless there's a coupling between statistical data.
-
-If they're used for reference counting on an object to control its lifetime,
-they probably don't need memory barriers because either the reference count
-will be adjusted inside a locked section, or the caller will already hold
-sufficient references to make the lock, and thus a memory barrier unnecessary.
-
-If they're used for constructing a lock of some description, then they probably
-do need memory barriers as a lock primitive generally has to do things in a
-specific order.
-
-Basically, each usage case has to be carefully considered as to whether memory
-barriers are needed or not.
-
-The following operations are special locking primitives:
-
-	test_and_set_bit_lock();
-	clear_bit_unlock();
-	__clear_bit_unlock();
-
-These implement ACQUIRE-class and RELEASE-class operations.  These should be
-used in preference to other operations when implementing locking primitives,
-because their implementations can be optimised on many architectures.
-
-[!] Note that special memory barrier primitives are available for these
-situations because on some CPUs the atomic instructions used imply full memory
-barriers, and so barrier instructions are superfluous in conjunction with them,
-and in such cases the special barrier primitives will be no-ops.
-
-See Documentation/core-api/atomic_ops.rst for more information.
+See Documentation/atomic_t.txt for more information.
 
 
 ACCESSING DEVICES

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-07-26 11:53         ` [RFC][PATCH v3]: documentation,atomic: Add new documents Peter Zijlstra
@ 2017-07-26 12:47           ` Boqun Feng
  2017-07-31  9:05             ` Peter Zijlstra
  2017-07-26 16:28           ` Randy Dunlap
  1 sibling, 1 reply; 45+ messages in thread
From: Boqun Feng @ 2017-07-26 12:47 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Paul McKenney, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

[-- Attachment #1: Type: text/plain, Size: 2116 bytes --]

On Wed, Jul 26, 2017 at 01:53:28PM +0200, Peter Zijlstra wrote:
> 
> New version..
> 
> 
> ---
> Subject: documentation,atomic: Add new documents
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Mon Jun 12 14:50:27 CEST 2017
> 
> Since we've vastly expanded the atomic_t interface in recent years the
> existing documentation is woefully out of date and people seem to get
> confused a bit.
> 
> Start a new document to hopefully better explain the current state of
> affairs.
> 
> The old atomic_ops.txt also covers bitmaps and a few more details so
> this is not a full replacement and we'll therefore keep that document
> around until such a time that we've managed to write more text to cover
> its entire.
> 

You seems have a unfinished paragraph..

> Also please, ReST people, go away.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
[...]
> +
> +Further, while something like:
> +
> +  smp_mb__before_atomic();
> +  atomic_dec(&X);
> +
> +is a 'typical' RELEASE pattern, the barrier is strictly stronger than
> +a RELEASE. Similarly for something like:
> +

.. at here. Maybe you planned to put stronger ACQUIRE pattern?

> +
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -498,7 +498,7 @@ VARIETIES OF MEMORY BARRIER
>       This means that ACQUIRE acts as a minimal "acquire" operation and
>       RELEASE acts as a minimal "release" operation.
>  
[...]
> -
> -[!] Note that special memory barrier primitives are available for these
> -situations because on some CPUs the atomic instructions used imply full memory
> -barriers, and so barrier instructions are superfluous in conjunction with them,
> -and in such cases the special barrier primitives will be no-ops.
> -
> -See Documentation/core-api/atomic_ops.rst for more information.
> +See Documentation/atomic_t.txt for more information.
>  

s/atomic_t.txt/atomic_{t,bitops}.txt/ ?

other than those two tiny things,

Reviewed-by: Boqun Feng <boqun.feng@gmail.com>

Regards,
Boqun

>  
>  ACCESSING DEVICES

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-07-26 11:53         ` [RFC][PATCH v3]: documentation,atomic: Add new documents Peter Zijlstra
  2017-07-26 12:47           ` Boqun Feng
@ 2017-07-26 16:28           ` Randy Dunlap
  1 sibling, 0 replies; 45+ messages in thread
From: Randy Dunlap @ 2017-07-26 16:28 UTC (permalink / raw)
  To: Peter Zijlstra, Boqun Feng
  Cc: Will Deacon, Paul McKenney, linux-kernel, Ingo Molnar, Thomas Gleixner

nits... <chirping>

On 07/26/2017 04:53 AM, Peter Zijlstra wrote:
> ---
> Subject: documentation,atomic: Add new documents
> From: Peter Zijlstra <peterz@infradead.org>
> Date: Mon Jun 12 14:50:27 CEST 2017
> 
> Since we've vastly expanded the atomic_t interface in recent years the
> existing documentation is woefully out of date and people seem to get
> confused a bit.
> 
> Start a new document to hopefully better explain the current state of
> affairs.
> 
> The old atomic_ops.txt also covers bitmaps and a few more details so
> this is not a full replacement and we'll therefore keep that document
> around until such a time that we've managed to write more text to cover
> its entire.
> 
> Also please, ReST people, go away.
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  Documentation/atomic_bitops.txt   |   66 ++++++++++++
>  Documentation/atomic_t.txt        |  200 ++++++++++++++++++++++++++++++++++++++
>  Documentation/memory-barriers.txt |   88 ----------------
>  3 files changed, 269 insertions(+), 85 deletions(-)
> 
> --- /dev/null
> +++ b/Documentation/atomic_bitops.txt
> @@ -0,0 +1,66 @@
> +
> +On atomic bitops.
> +
> +
> +While our bitmap_{}() functions are non-atomic, we have a number of operations
> +operating on single bits in a bitmap that are atomic.
> +
> +
> +API
> +---
> +
> +The single bit operations are:
> +
> +Non RmW ops:

   Non-RmW ops:

> +
> +  test_bit()
> +

[]

> --- /dev/null
> +++ b/Documentation/atomic_t.txt
> @@ -0,0 +1,200 @@
> +
> +On atomic types (atomic_t atomic64_t and atomic_long_t).
> +
> +The atomic type provides an interface to the architecture's means of atomic
> +RmW operations between CPUs (atomic operations on MMIO are not supported and
> +can lead to fatal traps on some platforms).
> +
> +API
> +---
> +
> +The 'full' API consists of (atomic64_ and atomic_long_ prefixes omitted for
> +brevity):
> +
> +Non RmW ops:

   Non-RmW ops:

> +
> +  atomic_read(), atomic_set()
> +  atomic_read_acquire(), atomic_set_release()
> +
> +
> +RmW atomic operations:
> +
> +Arithmetic:
> +
> +  atomic_{add,sub,inc,dec}()
> +  atomic_{add,sub,inc,dec}_return{,_relaxed,_acquire,_release}()
> +  atomic_fetch_{add,sub,inc,dec}{,_relaxed,_acquire,_release}()
> +
> +
> +Bitwise:
> +
> +  atomic_{and,or,xor,andnot}()
> +  atomic_fetch_{and,or,xor,andnot}{,_relaxed,_acquire,_release}()
> +
> +
> +Swap:
> +
> +  atomic_xchg{,_relaxed,_acquire,_release}()
> +  atomic_cmpxchg{,_relaxed,_acquire,_release}()
> +  atomic_try_cmpxchg{,_relaxed,_acquire,_release}()
> +
> +
> +Reference count (but please see refcount_t):
> +
> +  atomic_add_unless(), atomic_inc_not_zero()
> +  atomic_sub_and_test(), atomic_dec_and_test()
> +
> +
> +Misc:
> +
> +  atomic_inc_and_test(), atomic_add_negative()
> +  atomic_dec_unless_positive(), atomic_inc_unless_negative()
> +
> +
> +Barriers:
> +
> +  smp_mb__{before,after}_atomic()
> +
> +
> +
> +SEMANTICS
> +---------
> +
> +Non RmW ops:

   Non-RmW ops:

> +
> +The non-RmW ops are (typically) regular LOADs and STOREs and are canonically
> +implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
> +smp_store_release() respectively.
> +
> +The one detail to this is that atomic_set{}() should be observable to the RmW
> +ops. That is:
> +
> +  C atomic-set
> +
> +  {
> +    atomic_set(v, 1);
> +  }
> +
> +  P1(atomic_t *v)
> +  {
> +    atomic_add_unless(v, 1, 0);
> +  }
> +
> +  P2(atomic_t *v)
> +  {
> +    atomic_set(v, 0);
> +  }
> +
> +  exists
> +  (v=2)
> +
> +In this case we would expect the atomic_set() from CPU1 to either happen
> +before the atomic_add_unless(), in which case that latter one would no-op, or
> +_after_ in which case we'd overwrite its result. In no case is "2" a valid
> +outcome.
> +
> +This is typically true on 'normal' platforms, where a regular competing STORE
> +will invalidate a LL/SC or fail a CMPXCHG.
> +
> +The obvious case where this is not so is when we need to implement atomic ops
> +with a lock:
> +
> +  CPU0						CPU1
> +
> +  atomic_add_unless(v, 1, 0);
> +    lock();
> +    ret = READ_ONCE(v->counter); // == 1
> +						atomic_set(v, 0);
> +    if (ret != u)				  WRITE_ONCE(v->counter, 0);
> +      WRITE_ONCE(v->counter, ret + 1);
> +    unlock();
> +
> +the typical solution is to then implement atomic_set{}() with atomic_xchg().
> +
> +
> +RmW ops:

I prefer (and have usually seen) RMW, but it's your document -- er, text file. :)


> +
> +These come in various forms:
> +

[]

> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -498,7 +498,7 @@ VARIETIES OF MEMORY BARRIER
>       This means that ACQUIRE acts as a minimal "acquire" operation and
>       RELEASE acts as a minimal "release" operation.
>  
> -A subset of the atomic operations described in core-api/atomic_ops.rst have
> +A subset of the atomic operations described in atomic_t.txt have ACQUIRE
>  ACQUIRE and RELEASE variants in addition to fully-ordered and relaxed (no

duplicate ACQUIRE.

>  barrier semantics) definitions.  For compound atomics performing both a load
>  and a store, ACQUIRE semantics apply only to the load and RELEASE semantics



-- 
~Randy

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-07-26 12:47           ` Boqun Feng
@ 2017-07-31  9:05             ` Peter Zijlstra
  2017-07-31 11:04               ` Boqun Feng
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2017-07-31  9:05 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Will Deacon, Paul McKenney, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

On Wed, Jul 26, 2017 at 08:47:50PM +0800, Boqun Feng wrote:

> > +
> > +Further, while something like:
> > +
> > +  smp_mb__before_atomic();
> > +  atomic_dec(&X);
> > +
> > +is a 'typical' RELEASE pattern, the barrier is strictly stronger than
> > +a RELEASE. Similarly for something like:
> > +
> 
> .. at here. Maybe you planned to put stronger ACQUIRE pattern?

Yes, although I struggled to find a sensible one. The problem is that
ACQUIRE is on loads and value returning atomics have an ACQUIRE variant,
so why would you ever want to use smp_mb__after_atomic() for this.


That is, the best I could come up with is something like:

	val = atomic_fetch_or_relaxed(1, &var);
	smp_mb__after_atomic();

But in that case we should've just written:

	val = atomic_fetch_or_acquire(1, &var);


Suggestions?

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-07-31  9:05             ` Peter Zijlstra
@ 2017-07-31 11:04               ` Boqun Feng
  2017-07-31 17:43                 ` Paul E. McKenney
  0 siblings, 1 reply; 45+ messages in thread
From: Boqun Feng @ 2017-07-31 11:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Paul McKenney, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

[-- Attachment #1: Type: text/plain, Size: 1816 bytes --]

On Mon, Jul 31, 2017 at 11:05:35AM +0200, Peter Zijlstra wrote:
> On Wed, Jul 26, 2017 at 08:47:50PM +0800, Boqun Feng wrote:
> 
> > > +
> > > +Further, while something like:
> > > +
> > > +  smp_mb__before_atomic();
> > > +  atomic_dec(&X);
> > > +
> > > +is a 'typical' RELEASE pattern, the barrier is strictly stronger than
> > > +a RELEASE. Similarly for something like:
> > > +
> > 
> > .. at here. Maybe you planned to put stronger ACQUIRE pattern?
> 
> Yes, although I struggled to find a sensible one. The problem is that
> ACQUIRE is on loads and value returning atomics have an ACQUIRE variant,
> so why would you ever want to use smp_mb__after_atomic() for this.
> 
> 
> That is, the best I could come up with is something like:
> 
> 	val = atomic_fetch_or_relaxed(1, &var);
> 	smp_mb__after_atomic();
> 
> But in that case we should've just written:
> 
> 	val = atomic_fetch_or_acquire(1, &var);
> 

Agreed.

And besides, in memory-barriers.txt, the wording is:

 (*) smp_mb__before_atomic();
 (*) smp_mb__after_atomic();

     These are for use with atomic (such as add, subtract, increment and
     decrement) functions that don't return a value, especially when used for
     reference counting. 

So actually, using smp_mb__after_atomic() for ACQUIRE is a misuse.

> 
> Suggestions?

As a result, I think it's better we say smp_mb__{before,after}_atomic()
are only for 1) non-value-returning RmW atomic ops, 2)
{set,clear,change}_bit and 3) internal use of atomic primitives(e.g. the
generic version of fully ordered atomics).

1) prevents people to use it for an ACQUIRE, but allows for a RELEASE.
1) & 2) makes atomic_t.txt consistent with memory-barriers.txt
3) explains our usage of those barriers internally.

Thoughts?

Regards,
Boqun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-07-31 11:04               ` Boqun Feng
@ 2017-07-31 17:43                 ` Paul E. McKenney
  2017-08-01  2:14                   ` Boqun Feng
  2017-08-01  9:01                   ` Peter Zijlstra
  0 siblings, 2 replies; 45+ messages in thread
From: Paul E. McKenney @ 2017-07-31 17:43 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Peter Zijlstra, Will Deacon, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

On Mon, Jul 31, 2017 at 07:04:03PM +0800, Boqun Feng wrote:
> On Mon, Jul 31, 2017 at 11:05:35AM +0200, Peter Zijlstra wrote:
> > On Wed, Jul 26, 2017 at 08:47:50PM +0800, Boqun Feng wrote:
> > 
> > > > +
> > > > +Further, while something like:
> > > > +
> > > > +  smp_mb__before_atomic();
> > > > +  atomic_dec(&X);
> > > > +
> > > > +is a 'typical' RELEASE pattern, the barrier is strictly stronger than
> > > > +a RELEASE. Similarly for something like:
> > > > +
> > > 
> > > .. at here. Maybe you planned to put stronger ACQUIRE pattern?
> > 
> > Yes, although I struggled to find a sensible one. The problem is that
> > ACQUIRE is on loads and value returning atomics have an ACQUIRE variant,
> > so why would you ever want to use smp_mb__after_atomic() for this.
> > 
> > 
> > That is, the best I could come up with is something like:
> > 
> > 	val = atomic_fetch_or_relaxed(1, &var);
> > 	smp_mb__after_atomic();
> > 
> > But in that case we should've just written:
> > 
> > 	val = atomic_fetch_or_acquire(1, &var);
> > 
> 
> Agreed.
> 
> And besides, in memory-barriers.txt, the wording is:
> 
>  (*) smp_mb__before_atomic();
>  (*) smp_mb__after_atomic();
> 
>      These are for use with atomic (such as add, subtract, increment and
>      decrement) functions that don't return a value, especially when used for
>      reference counting. 
> 
> So actually, using smp_mb__after_atomic() for ACQUIRE is a misuse.

You lost me on this one.

Why wouldn't the following have ACQUIRE semantics?

	atomic_inc(&var);
	smp_mb__after_atomic();

Is the issue that there is no actual value returned or some such?

> > Suggestions?
> 
> As a result, I think it's better we say smp_mb__{before,after}_atomic()
> are only for 1) non-value-returning RmW atomic ops, 2)
> {set,clear,change}_bit and 3) internal use of atomic primitives(e.g. the
> generic version of fully ordered atomics).
> 
> 1) prevents people to use it for an ACQUIRE, but allows for a RELEASE.
> 1) & 2) makes atomic_t.txt consistent with memory-barriers.txt
> 3) explains our usage of those barriers internally.
> 
> Thoughts?

So if I have something like this, the assertion really can trigger?

	WRITE_ONCE(x, 1);		atomic_inc(&y);
	r0 = xchg_release(&y, 5);	smp_mb__after_atomic();
					r1 = READ_ONCE(x);


	WARN_ON(r0 == 0 && r1 == 0);

I must confess that I am not seeing why we would want to allow this
outcome.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-07-31 17:43                 ` Paul E. McKenney
@ 2017-08-01  2:14                   ` Boqun Feng
  2017-08-01  9:01                   ` Peter Zijlstra
  1 sibling, 0 replies; 45+ messages in thread
From: Boqun Feng @ 2017-08-01  2:14 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Will Deacon, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

[-- Attachment #1: Type: text/plain, Size: 3514 bytes --]

On Mon, Jul 31, 2017 at 10:43:45AM -0700, Paul E. McKenney wrote:
> On Mon, Jul 31, 2017 at 07:04:03PM +0800, Boqun Feng wrote:
> > On Mon, Jul 31, 2017 at 11:05:35AM +0200, Peter Zijlstra wrote:
> > > On Wed, Jul 26, 2017 at 08:47:50PM +0800, Boqun Feng wrote:
> > > 
> > > > > +
> > > > > +Further, while something like:
> > > > > +
> > > > > +  smp_mb__before_atomic();
> > > > > +  atomic_dec(&X);
> > > > > +
> > > > > +is a 'typical' RELEASE pattern, the barrier is strictly stronger than
> > > > > +a RELEASE. Similarly for something like:
> > > > > +
> > > > 
> > > > .. at here. Maybe you planned to put stronger ACQUIRE pattern?
> > > 
> > > Yes, although I struggled to find a sensible one. The problem is that
> > > ACQUIRE is on loads and value returning atomics have an ACQUIRE variant,
> > > so why would you ever want to use smp_mb__after_atomic() for this.
> > > 
> > > 
> > > That is, the best I could come up with is something like:
> > > 
> > > 	val = atomic_fetch_or_relaxed(1, &var);
> > > 	smp_mb__after_atomic();
> > > 
> > > But in that case we should've just written:
> > > 
> > > 	val = atomic_fetch_or_acquire(1, &var);
> > > 
> > 
> > Agreed.
> > 
> > And besides, in memory-barriers.txt, the wording is:
> > 
> >  (*) smp_mb__before_atomic();
> >  (*) smp_mb__after_atomic();
> > 
> >      These are for use with atomic (such as add, subtract, increment and
> >      decrement) functions that don't return a value, especially when used for
> >      reference counting. 
> > 
> > So actually, using smp_mb__after_atomic() for ACQUIRE is a misuse.
> 
> You lost me on this one.
> 
> Why wouldn't the following have ACQUIRE semantics?
> 
> 	atomic_inc(&var);
> 	smp_mb__after_atomic();
> 
> Is the issue that there is no actual value returned or some such?
> 

That "misuse" is a wrong word there ;-( 

I actually meant "the usage is correct but we don't encourage using
smp_mb__after_atomic() *only* for an ACQUIRE, because _acquire() could
always be used in that case, as Peter said".

In fact in your case, the ordering is stronger, both the load and store
part of the atomic op are ordered with the memory ops following it.

In short, I suggested we tell users to use
smp_mb__{before,after}_atomic() only when _{release,acquire} ops don't
suffice, i.e. for an RCsc RELEASE or a smp_mb() to order ops other than
the atomic op itself(like the one you use in __call_rcu_nocb_enqueue()).

But maybe this is too strict, and Peter said he would write something
about it in IRC, so I'm not that stick to this suggestion ;-)

Regards,
Boqun

> > > Suggestions?
> > 
> > As a result, I think it's better we say smp_mb__{before,after}_atomic()
> > are only for 1) non-value-returning RmW atomic ops, 2)
> > {set,clear,change}_bit and 3) internal use of atomic primitives(e.g. the
> > generic version of fully ordered atomics).
> > 
> > 1) prevents people to use it for an ACQUIRE, but allows for a RELEASE.
> > 1) & 2) makes atomic_t.txt consistent with memory-barriers.txt
> > 3) explains our usage of those barriers internally.
> > 
> > Thoughts?
> 
> So if I have something like this, the assertion really can trigger?
> 
> 	WRITE_ONCE(x, 1);		atomic_inc(&y);
> 	r0 = xchg_release(&y, 5);	smp_mb__after_atomic();
> 					r1 = READ_ONCE(x);
> 
> 
> 	WARN_ON(r0 == 0 && r1 == 0);
> 
> I must confess that I am not seeing why we would want to allow this
> outcome.
> 
> 							Thanx, Paul
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-07-31 17:43                 ` Paul E. McKenney
  2017-08-01  2:14                   ` Boqun Feng
@ 2017-08-01  9:01                   ` Peter Zijlstra
  2017-08-01 10:19                     ` Will Deacon
  2017-08-01 13:35                     ` Paul E. McKenney
  1 sibling, 2 replies; 45+ messages in thread
From: Peter Zijlstra @ 2017-08-01  9:01 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Boqun Feng, Will Deacon, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

On Mon, Jul 31, 2017 at 10:43:45AM -0700, Paul E. McKenney wrote:

> Why wouldn't the following have ACQUIRE semantics?
> 
> 	atomic_inc(&var);
> 	smp_mb__after_atomic();
> 
> Is the issue that there is no actual value returned or some such?

Yes, so that the inc is a load-store, and thus there is a load, we loose
the value.

But I see your point I think. Irrespective of still having the value,
the ordering is preserved and nothing should pass across that.

> So if I have something like this, the assertion really can trigger?
> 
> 	WRITE_ONCE(x, 1);		atomic_inc(&y);
> 	r0 = xchg_release(&y, 5);	smp_mb__after_atomic();
> 					r1 = READ_ONCE(x);
> 
> 
> 	WARN_ON(r0 == 0 && r1 == 0);
> 
> I must confess that I am not seeing why we would want to allow this
> outcome.

No you are indeed quite right. I just wasn't creative enough. Thanks for
the inspiration.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-08-01  9:01                   ` Peter Zijlstra
@ 2017-08-01 10:19                     ` Will Deacon
  2017-08-01 11:47                       ` Peter Zijlstra
  2017-08-01 13:35                     ` Paul E. McKenney
  1 sibling, 1 reply; 45+ messages in thread
From: Will Deacon @ 2017-08-01 10:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Boqun Feng, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

On Tue, Aug 01, 2017 at 11:01:21AM +0200, Peter Zijlstra wrote:
> On Mon, Jul 31, 2017 at 10:43:45AM -0700, Paul E. McKenney wrote:
> 
> > Why wouldn't the following have ACQUIRE semantics?
> > 
> > 	atomic_inc(&var);
> > 	smp_mb__after_atomic();
> > 
> > Is the issue that there is no actual value returned or some such?
> 
> Yes, so that the inc is a load-store, and thus there is a load, we loose
> the value.
> 
> But I see your point I think. Irrespective of still having the value,
> the ordering is preserved and nothing should pass across that.
> 
> > So if I have something like this, the assertion really can trigger?
> > 
> > 	WRITE_ONCE(x, 1);		atomic_inc(&y);
> > 	r0 = xchg_release(&y, 5);	smp_mb__after_atomic();
> > 					r1 = READ_ONCE(x);
> > 
> > 
> > 	WARN_ON(r0 == 0 && r1 == 0);
> > 
> > I must confess that I am not seeing why we would want to allow this
> > outcome.
> 
> No you are indeed quite right. I just wasn't creative enough. Thanks for
> the inspiration.

Just to close this out, we agree that an smp_rmb() instead of
smp_mb__after_atomic() would *not* forbid this outcome, right?

Will

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-08-01 10:19                     ` Will Deacon
@ 2017-08-01 11:47                       ` Peter Zijlstra
  2017-08-01 12:17                         ` Will Deacon
  0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2017-08-01 11:47 UTC (permalink / raw)
  To: Will Deacon
  Cc: Paul E. McKenney, Boqun Feng, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

On Tue, Aug 01, 2017 at 11:19:00AM +0100, Will Deacon wrote:
> On Tue, Aug 01, 2017 at 11:01:21AM +0200, Peter Zijlstra wrote:
> > On Mon, Jul 31, 2017 at 10:43:45AM -0700, Paul E. McKenney wrote:
> > 
> > > Why wouldn't the following have ACQUIRE semantics?
> > > 
> > > 	atomic_inc(&var);
> > > 	smp_mb__after_atomic();
> > > 
> > > Is the issue that there is no actual value returned or some such?
> > 
> > Yes, so that the inc is a load-store, and thus there is a load, we loose
> > the value.
> > 
> > But I see your point I think. Irrespective of still having the value,
> > the ordering is preserved and nothing should pass across that.
> > 
> > > So if I have something like this, the assertion really can trigger?
> > > 
> > > 	WRITE_ONCE(x, 1);		atomic_inc(&y);
> > > 	r0 = xchg_release(&y, 5);	smp_mb__after_atomic();
> > > 					r1 = READ_ONCE(x);
> > > 
> > > 
> > > 	WARN_ON(r0 == 0 && r1 == 0);
> > > 
> > > I must confess that I am not seeing why we would want to allow this
> > > outcome.
> > 
> > No you are indeed quite right. I just wasn't creative enough. Thanks for
> > the inspiration.
> 
> Just to close this out, we agree that an smp_rmb() instead of
> smp_mb__after_atomic() would *not* forbid this outcome, right?

So that really hurts my brain. Per the normal rules that smp_rmb() would
order the read of @x against the last ll of @y and per ll/sc ordering
you then still don't get to make the WARN happen.

On IRC you explained that your 8.1 LSE instructions are not in fact
ordered by a smp_rmb, only by smp_wmb, which is 'surprising' since you
really need to load the old value to compute the new value.

Not happy... :-(

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-08-01 11:47                       ` Peter Zijlstra
@ 2017-08-01 12:17                         ` Will Deacon
  2017-08-01 12:52                           ` Peter Zijlstra
  2017-08-01 16:14                           ` Paul E. McKenney
  0 siblings, 2 replies; 45+ messages in thread
From: Will Deacon @ 2017-08-01 12:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Boqun Feng, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

On Tue, Aug 01, 2017 at 01:47:44PM +0200, Peter Zijlstra wrote:
> On Tue, Aug 01, 2017 at 11:19:00AM +0100, Will Deacon wrote:
> > On Tue, Aug 01, 2017 at 11:01:21AM +0200, Peter Zijlstra wrote:
> > > On Mon, Jul 31, 2017 at 10:43:45AM -0700, Paul E. McKenney wrote:
> > > 
> > > > Why wouldn't the following have ACQUIRE semantics?
> > > > 
> > > > 	atomic_inc(&var);
> > > > 	smp_mb__after_atomic();
> > > > 
> > > > Is the issue that there is no actual value returned or some such?
> > > 
> > > Yes, so that the inc is a load-store, and thus there is a load, we loose
> > > the value.
> > > 
> > > But I see your point I think. Irrespective of still having the value,
> > > the ordering is preserved and nothing should pass across that.
> > > 
> > > > So if I have something like this, the assertion really can trigger?
> > > > 
> > > > 	WRITE_ONCE(x, 1);		atomic_inc(&y);
> > > > 	r0 = xchg_release(&y, 5);	smp_mb__after_atomic();
> > > > 					r1 = READ_ONCE(x);
> > > > 
> > > > 
> > > > 	WARN_ON(r0 == 0 && r1 == 0);
> > > > 
> > > > I must confess that I am not seeing why we would want to allow this
> > > > outcome.
> > > 
> > > No you are indeed quite right. I just wasn't creative enough. Thanks for
> > > the inspiration.
> > 
> > Just to close this out, we agree that an smp_rmb() instead of
> > smp_mb__after_atomic() would *not* forbid this outcome, right?
> 
> So that really hurts my brain. Per the normal rules that smp_rmb() would
> order the read of @x against the last ll of @y and per ll/sc ordering
> you then still don't get to make the WARN happen.
> 
> On IRC you explained that your 8.1 LSE instructions are not in fact
> ordered by a smp_rmb, only by smp_wmb, which is 'surprising' since you
> really need to load the old value to compute the new value.

To be clear, it's only the ST* variants of the LSE instructions that are
treated as a write for the purposes of memory ordering, so these are the
non-*_return variants. It's not unlikely that other architectures will
exhibit the same behaviour (e.g. Power, RISC-V), because the CPU can
treat non-return atomics as "fire-and-forget" and have them handled
elsewhere in the memory subsystem, causing them to be treated similarly
to posted writes.

For the code snippet above, the second thread has no idea about the value
of y and so smp_rmb() is the wrong thing to be using imo. It really cares
about ordering the store to y before the read of x, so needs a full mb (i.e.
the test is more like 'R' than 'MP').

Also, wouldn't this problem also arise if your atomics were built using a
spinlock where unlock had release semantics?

Will

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-08-01 12:17                         ` Will Deacon
@ 2017-08-01 12:52                           ` Peter Zijlstra
  2017-08-01 16:14                           ` Paul E. McKenney
  1 sibling, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2017-08-01 12:52 UTC (permalink / raw)
  To: Will Deacon
  Cc: Paul E. McKenney, Boqun Feng, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

On Tue, Aug 01, 2017 at 01:17:13PM +0100, Will Deacon wrote:

> Also, wouldn't this problem also arise if your atomics were built using a
> spinlock where unlock had release semantics?

I'm hoping none of our spnilock based atomics have weak ordering.
Spinlock based atomics are a little crazy to begin with, making them
weak *shudder*.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-08-01  9:01                   ` Peter Zijlstra
  2017-08-01 10:19                     ` Will Deacon
@ 2017-08-01 13:35                     ` Paul E. McKenney
  1 sibling, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2017-08-01 13:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Boqun Feng, Will Deacon, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

On Tue, Aug 01, 2017 at 11:01:21AM +0200, Peter Zijlstra wrote:
> On Mon, Jul 31, 2017 at 10:43:45AM -0700, Paul E. McKenney wrote:
> 
> > Why wouldn't the following have ACQUIRE semantics?
> > 
> > 	atomic_inc(&var);
> > 	smp_mb__after_atomic();
> > 
> > Is the issue that there is no actual value returned or some such?
> 
> Yes, so that the inc is a load-store, and thus there is a load, we loose
> the value.
> 
> But I see your point I think. Irrespective of still having the value,
> the ordering is preserved and nothing should pass across that.
> 
> > So if I have something like this, the assertion really can trigger?
> > 
> > 	WRITE_ONCE(x, 1);		atomic_inc(&y);
> > 	r0 = xchg_release(&y, 5);	smp_mb__after_atomic();
> > 					r1 = READ_ONCE(x);
> > 
> > 
> > 	WARN_ON(r0 == 0 && r1 == 0);
> > 
> > I must confess that I am not seeing why we would want to allow this
> > outcome.
> 
> No you are indeed quite right. I just wasn't creative enough. Thanks for
> the inspiration.

Whew!  You guys had me worried there for a bit.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-08-01 12:17                         ` Will Deacon
  2017-08-01 12:52                           ` Peter Zijlstra
@ 2017-08-01 16:14                           ` Paul E. McKenney
  2017-08-01 16:42                             ` Peter Zijlstra
                                               ` (2 more replies)
  1 sibling, 3 replies; 45+ messages in thread
From: Paul E. McKenney @ 2017-08-01 16:14 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Boqun Feng, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

On Tue, Aug 01, 2017 at 01:17:13PM +0100, Will Deacon wrote:
> On Tue, Aug 01, 2017 at 01:47:44PM +0200, Peter Zijlstra wrote:
> > On Tue, Aug 01, 2017 at 11:19:00AM +0100, Will Deacon wrote:
> > > On Tue, Aug 01, 2017 at 11:01:21AM +0200, Peter Zijlstra wrote:
> > > > On Mon, Jul 31, 2017 at 10:43:45AM -0700, Paul E. McKenney wrote:
> > > > 
> > > > > Why wouldn't the following have ACQUIRE semantics?
> > > > > 
> > > > > 	atomic_inc(&var);
> > > > > 	smp_mb__after_atomic();
> > > > > 
> > > > > Is the issue that there is no actual value returned or some such?
> > > > 
> > > > Yes, so that the inc is a load-store, and thus there is a load, we loose
> > > > the value.
> > > > 
> > > > But I see your point I think. Irrespective of still having the value,
> > > > the ordering is preserved and nothing should pass across that.
> > > > 
> > > > > So if I have something like this, the assertion really can trigger?
> > > > > 
> > > > > 	WRITE_ONCE(x, 1);		atomic_inc(&y);
> > > > > 	r0 = xchg_release(&y, 5);	smp_mb__after_atomic();
> > > > > 					r1 = READ_ONCE(x);
> > > > > 
> > > > > 
> > > > > 	WARN_ON(r0 == 0 && r1 == 0);
> > > > > 
> > > > > I must confess that I am not seeing why we would want to allow this
> > > > > outcome.
> > > > 
> > > > No you are indeed quite right. I just wasn't creative enough. Thanks for
> > > > the inspiration.
> > > 
> > > Just to close this out, we agree that an smp_rmb() instead of
> > > smp_mb__after_atomic() would *not* forbid this outcome, right?
> > 
> > So that really hurts my brain. Per the normal rules that smp_rmb() would
> > order the read of @x against the last ll of @y and per ll/sc ordering
> > you then still don't get to make the WARN happen.
> > 
> > On IRC you explained that your 8.1 LSE instructions are not in fact
> > ordered by a smp_rmb, only by smp_wmb, which is 'surprising' since you
> > really need to load the old value to compute the new value.
> 
> To be clear, it's only the ST* variants of the LSE instructions that are
> treated as a write for the purposes of memory ordering, so these are the
> non-*_return variants. It's not unlikely that other architectures will
> exhibit the same behaviour (e.g. Power, RISC-V), because the CPU can
> treat non-return atomics as "fire-and-forget" and have them handled
> elsewhere in the memory subsystem, causing them to be treated similarly
> to posted writes.
> 
> For the code snippet above, the second thread has no idea about the value
> of y and so smp_rmb() is the wrong thing to be using imo. It really cares
> about ordering the store to y before the read of x, so needs a full mb (i.e.
> the test is more like 'R' than 'MP').
> 
> Also, wouldn't this problem also arise if your atomics were built using a
> spinlock where unlock had release semantics?

The current Linux kernel memory model forbids this outcome with smp_rmb(),
though I did have to work around the current lack of atomic_inc() using
xchg_relaxed(), so please review my litmus tests carefully.

	C C-WillDeacon-MP+o-r+ai-rmb-o.litmus

	(*
	 * Expected result: Never.
	 *
	 * Desired litmus test, with atomic_inc() emulated by xchg_relaxed():
	 *
	 *     WRITE_ONCE(x, 1);               atomic_inc(&y);
	 *     r0 = xchg_release(&y, 5);       smp_rmb();
	 *                                     r1 = READ_ONCE(x);
	 *
	 *
	 *     WARN_ON(r0 == 0 && r1 == 0);
	 *)

	{
	}

	P0(int *x, int *y)
	{
		WRITE_ONCE(*x, 1);
		r0 = xchg_release(y, 5);
	}

	P1(int *x, int *y)
	{
		r2 = xchg_relaxed(y, 1);
		smp_rmb();
		r1 = READ_ONCE(*x);
	}

	exists
	(0:r0=0 /\ 1:r1=0)

Here is what herd thinks:

	$ herd7 -bell strong-kernel.bell -cat weak-kernel.cat -macros linux.def ../litmus/manual/kernel/C-WillDeacon-MP+o-r+ai-rmb-o.litmus
	Test C-WillDeacon-MP+o-r+ai-rmb-o Allowed
	States 3
	0:r0=0; 1:r1=1;
	0:r0=1; 1:r1=0;
	0:r0=1; 1:r1=1;
	No
	Witnesses
	Positive: 0 Negative: 3
	Condition exists (0:r0=0 /\ 1:r1=0)
	Observation C-WillDeacon-MP+o-r+ai-rmb-o Never 0 3
	Hash=0c3e25a94b38708a2c5ea11ff52c8077

I get the same answer from strong-kernel.cat (which is our best-guess
envelope over hardware guarantees), weak-kernel.cat (which is simplified
based on what people actually use), and proposal.cat (which is a candidate
model with further simplifications).

I converted this (possibly incorrectly) to PowerPC assembly:

	PPC w-RMWl-r+w-RMWl-r.litmus
	""
	(*
	 * Does 3.0 Linux-kernel Power atomic_add_return() provide local 
	 * barrier that orders prior stores against subsequent loads?
	 * Use the atomic_add_return() in both threads, but to different variables.
	 * And use the trailing-lwsync variant of atomic_add_return().
	 *)
	(* 24-Aug-2011: ppcmem says "Sometimes" *)
	{
	0:r1=1; 0:r2=x; 0:r3=5; 0:r4=y;   0:r10=0 ; 0:r11=0;
	1:r1=1; 1:r2=x; 1:r3=5; 1:r4=y;   1:r10=0 ; 1:r11=0;
	}
	 P0                | P1                ;
	 stw r1,0(r2)      | lwarx  r11,r10,r4 ;
	 lwsync            | stwcx. r1,r10,r4  ;
	 lwarx  r11,r10,r4 | bne Fail1         ;
	 stwcx. r3,r10,r4  | lwsync            ;
	 bne Fail0         | lwz r3,0(r2)      ;
	 li r3,42          | Fail1:            ;
	 Fail0:            |                   ;


	exists
	(0:r11=0 /\ 0:r3=42 /\ 1:r3=0)

And ppcmem agrees with the linux-kernel memory model:

	[ . . . ]

	Found     82 : Prune count= 13946  seen_succs=  7453   7454 states 
	Found     83 : Prune count= 13997  seen_succs=  7490   7491 states 
	Found     84 : Prune count= 14007  seen_succs=  7506   7507 states 
	Found     85 : Prune count= 17229  seen_succs=  8889   8890 states 
	Found     86 : Prune count= 17235  seen_succs=  8897   8898 states 
	Test w-RMWl-r+w-RMWl-r Allowed
	States 9
	0:r3=5; 0:r11=0; 1:r3=0;
	0:r3=5; 0:r11=0; 1:r3=1;
	0:r3=5; 0:r11=0; 1:r3=5;
	0:r3=5; 0:r11=1; 1:r3=0;
	0:r3=5; 0:r11=1; 1:r3=1;
	0:r3=42; 0:r11=0; 1:r3=1;
	0:r3=42; 0:r11=0; 1:r3=5;
	0:r3=42; 0:r11=1; 1:r3=0;
	0:r3=42; 0:r11=1; 1:r3=1;
	No (allowed not found)
	Condition exists (0:r11=0 /\ 0:r3=42 /\ 1:r3=0)
	Hash=58fb07516ac5697580c33e06a354f667
	Observation w-RMWl-r+w-RMWl-r Never 0 9 

So if ARM really needs the litmus test with smp_rmb() to be allowed,
we need to adjust the Linux-kernel memory model appropriately.  Which
means that one of us needs to reach out to the usual suspects.  Would
you like to do that, or would you like me to?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-08-01 16:14                           ` Paul E. McKenney
@ 2017-08-01 16:42                             ` Peter Zijlstra
  2017-08-01 16:53                               ` Will Deacon
  2017-08-01 22:18                               ` Paul E. McKenney
  2017-08-01 18:37                             ` Paul E. McKenney
  2017-08-02  9:45                             ` Will Deacon
  2 siblings, 2 replies; 45+ messages in thread
From: Peter Zijlstra @ 2017-08-01 16:42 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Will Deacon, Boqun Feng, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

On Tue, Aug 01, 2017 at 09:14:12AM -0700, Paul E. McKenney wrote:
> So if ARM really needs the litmus test with smp_rmb() to be allowed,
> we need to adjust the Linux-kernel memory model appropriately.  Which
> means that one of us needs to reach out to the usual suspects.  Would
> you like to do that, or would you like me to?

I'm really sad ARM8.1 LSE breaks this stuff.. It is rather counter
intuitive (then again, we _are_ talking barriers).

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-08-01 16:42                             ` Peter Zijlstra
@ 2017-08-01 16:53                               ` Will Deacon
  2017-08-01 22:18                               ` Paul E. McKenney
  1 sibling, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-08-01 16:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul E. McKenney, Boqun Feng, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

On Tue, Aug 01, 2017 at 06:42:00PM +0200, Peter Zijlstra wrote:
> On Tue, Aug 01, 2017 at 09:14:12AM -0700, Paul E. McKenney wrote:
> > So if ARM really needs the litmus test with smp_rmb() to be allowed,
> > we need to adjust the Linux-kernel memory model appropriately.  Which
> > means that one of us needs to reach out to the usual suspects.  Would
> > you like to do that, or would you like me to?
> 
> I'm really sad ARM8.1 LSE breaks this stuff.. It is rather counter
> intuitive (then again, we _are_ talking barriers).

I can upgrade smp_rmb to smp_mb if I have to, but I still think that
code using smp_rmb() to order an operation that doesn't return a value is
weird.

Will

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-08-01 16:14                           ` Paul E. McKenney
  2017-08-01 16:42                             ` Peter Zijlstra
@ 2017-08-01 18:37                             ` Paul E. McKenney
  2017-08-02  9:45                             ` Will Deacon
  2 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2017-08-01 18:37 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Boqun Feng, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap, stern

On Tue, Aug 01, 2017 at 09:14:12AM -0700, Paul E. McKenney wrote:
> On Tue, Aug 01, 2017 at 01:17:13PM +0100, Will Deacon wrote:
> > On Tue, Aug 01, 2017 at 01:47:44PM +0200, Peter Zijlstra wrote:
> > > On Tue, Aug 01, 2017 at 11:19:00AM +0100, Will Deacon wrote:
> > > > On Tue, Aug 01, 2017 at 11:01:21AM +0200, Peter Zijlstra wrote:
> > > > > On Mon, Jul 31, 2017 at 10:43:45AM -0700, Paul E. McKenney wrote:
> > > > > 
> > > > > > Why wouldn't the following have ACQUIRE semantics?
> > > > > > 
> > > > > > 	atomic_inc(&var);
> > > > > > 	smp_mb__after_atomic();
> > > > > > 
> > > > > > Is the issue that there is no actual value returned or some such?
> > > > > 
> > > > > Yes, so that the inc is a load-store, and thus there is a load, we loose
> > > > > the value.
> > > > > 
> > > > > But I see your point I think. Irrespective of still having the value,
> > > > > the ordering is preserved and nothing should pass across that.
> > > > > 
> > > > > > So if I have something like this, the assertion really can trigger?
> > > > > > 
> > > > > > 	WRITE_ONCE(x, 1);		atomic_inc(&y);
> > > > > > 	r0 = xchg_release(&y, 5);	smp_mb__after_atomic();
> > > > > > 					r1 = READ_ONCE(x);
> > > > > > 
> > > > > > 
> > > > > > 	WARN_ON(r0 == 0 && r1 == 0);
> > > > > > 
> > > > > > I must confess that I am not seeing why we would want to allow this
> > > > > > outcome.
> > > > > 
> > > > > No you are indeed quite right. I just wasn't creative enough. Thanks for
> > > > > the inspiration.
> > > > 
> > > > Just to close this out, we agree that an smp_rmb() instead of
> > > > smp_mb__after_atomic() would *not* forbid this outcome, right?
> > > 
> > > So that really hurts my brain. Per the normal rules that smp_rmb() would
> > > order the read of @x against the last ll of @y and per ll/sc ordering
> > > you then still don't get to make the WARN happen.
> > > 
> > > On IRC you explained that your 8.1 LSE instructions are not in fact
> > > ordered by a smp_rmb, only by smp_wmb, which is 'surprising' since you
> > > really need to load the old value to compute the new value.
> > 
> > To be clear, it's only the ST* variants of the LSE instructions that are
> > treated as a write for the purposes of memory ordering, so these are the
> > non-*_return variants. It's not unlikely that other architectures will
> > exhibit the same behaviour (e.g. Power, RISC-V), because the CPU can
> > treat non-return atomics as "fire-and-forget" and have them handled
> > elsewhere in the memory subsystem, causing them to be treated similarly
> > to posted writes.
> > 
> > For the code snippet above, the second thread has no idea about the value
> > of y and so smp_rmb() is the wrong thing to be using imo. It really cares
> > about ordering the store to y before the read of x, so needs a full mb (i.e.
> > the test is more like 'R' than 'MP').
> > 
> > Also, wouldn't this problem also arise if your atomics were built using a
> > spinlock where unlock had release semantics?

And responding more directly to the bit about spinlocks after a side
discussion with Alan Stern, both the xchg_release() and the atomic_inc()
are operating on the same variable, namely "y".  Even by the very weak
"roach motel" locking semantics, this outcome would be forbidden.

But if you have hardware that allows this, let's all discuss and get
it hashed out...

							Thanx, Paul

								Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-08-01 16:42                             ` Peter Zijlstra
  2017-08-01 16:53                               ` Will Deacon
@ 2017-08-01 22:18                               ` Paul E. McKenney
  2017-08-02  8:46                                 ` Will Deacon
  1 sibling, 1 reply; 45+ messages in thread
From: Paul E. McKenney @ 2017-08-01 22:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Boqun Feng, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

On Tue, Aug 01, 2017 at 06:42:00PM +0200, Peter Zijlstra wrote:
> On Tue, Aug 01, 2017 at 09:14:12AM -0700, Paul E. McKenney wrote:
> > So if ARM really needs the litmus test with smp_rmb() to be allowed,
> > we need to adjust the Linux-kernel memory model appropriately.  Which
> > means that one of us needs to reach out to the usual suspects.  Would
> > you like to do that, or would you like me to?
> 
> I'm really sad ARM8.1 LSE breaks this stuff.. It is rather counter
> intuitive (then again, we _are_ talking barriers).

No argument.

Then again, when we said that the Linux kernel memory model would
have a non-trivial rate of change, we weren't joking.

Will, is this the official description?

http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0557a.b/index.html

If so, is B6.1 what we should be looking at?

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-08-01 22:18                               ` Paul E. McKenney
@ 2017-08-02  8:46                                 ` Will Deacon
  0 siblings, 0 replies; 45+ messages in thread
From: Will Deacon @ 2017-08-02  8:46 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Boqun Feng, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

On Tue, Aug 01, 2017 at 03:18:18PM -0700, Paul E. McKenney wrote:
> On Tue, Aug 01, 2017 at 06:42:00PM +0200, Peter Zijlstra wrote:
> > On Tue, Aug 01, 2017 at 09:14:12AM -0700, Paul E. McKenney wrote:
> > > So if ARM really needs the litmus test with smp_rmb() to be allowed,
> > > we need to adjust the Linux-kernel memory model appropriately.  Which
> > > means that one of us needs to reach out to the usual suspects.  Would
> > > you like to do that, or would you like me to?
> > 
> > I'm really sad ARM8.1 LSE breaks this stuff.. It is rather counter
> > intuitive (then again, we _are_ talking barriers).
> 
> No argument.
> 
> Then again, when we said that the Linux kernel memory model would
> have a non-trivial rate of change, we weren't joking.
> 
> Will, is this the official description?
> 
> http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0557a.b/index.html
> 
> If so, is B6.1 what we should be looking at?

Sorry it's so tricky to find. The architecture document is here:

https://static.docs.arm.com/ddi0487/b/DDI0487B_a_armv8_arm.pdf

and in section C3.2.13 ("Atomic memory operations") it states:

| The ST<OP> instructions are not regarded as doing a read for the purpose
| of a DMB LD barrier.

Will

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-08-01 16:14                           ` Paul E. McKenney
  2017-08-01 16:42                             ` Peter Zijlstra
  2017-08-01 18:37                             ` Paul E. McKenney
@ 2017-08-02  9:45                             ` Will Deacon
  2017-08-02 16:17                               ` Paul E. McKenney
  2017-08-03 14:05                               ` Boqun Feng
  2 siblings, 2 replies; 45+ messages in thread
From: Will Deacon @ 2017-08-02  9:45 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Peter Zijlstra, Boqun Feng, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

Hi Paul,

On Tue, Aug 01, 2017 at 09:14:12AM -0700, Paul E. McKenney wrote:
> On Tue, Aug 01, 2017 at 01:17:13PM +0100, Will Deacon wrote:
> > On Tue, Aug 01, 2017 at 01:47:44PM +0200, Peter Zijlstra wrote:
> > > On Tue, Aug 01, 2017 at 11:19:00AM +0100, Will Deacon wrote:
> > > > On Tue, Aug 01, 2017 at 11:01:21AM +0200, Peter Zijlstra wrote:
> > > > > On Mon, Jul 31, 2017 at 10:43:45AM -0700, Paul E. McKenney wrote:
> > > > > > So if I have something like this, the assertion really can trigger?
> > > > > > 
> > > > > > 	WRITE_ONCE(x, 1);		atomic_inc(&y);
> > > > > > 	r0 = xchg_release(&y, 5);	smp_mb__after_atomic();
> > > > > > 					r1 = READ_ONCE(x);
> > > > > > 
> > > > > > 
> > > > > > 	WARN_ON(r0 == 0 && r1 == 0);
> > > > > > 
> > > > > > I must confess that I am not seeing why we would want to allow this
> > > > > > outcome.
> > > > > 
> > > > > No you are indeed quite right. I just wasn't creative enough. Thanks for
> > > > > the inspiration.
> > > > 
> > > > Just to close this out, we agree that an smp_rmb() instead of
> > > > smp_mb__after_atomic() would *not* forbid this outcome, right?
> > > 
> > > So that really hurts my brain. Per the normal rules that smp_rmb() would
> > > order the read of @x against the last ll of @y and per ll/sc ordering
> > > you then still don't get to make the WARN happen.
> > > 
> > > On IRC you explained that your 8.1 LSE instructions are not in fact
> > > ordered by a smp_rmb, only by smp_wmb, which is 'surprising' since you
> > > really need to load the old value to compute the new value.
> > 
> > To be clear, it's only the ST* variants of the LSE instructions that are
> > treated as a write for the purposes of memory ordering, so these are the
> > non-*_return variants. It's not unlikely that other architectures will
> > exhibit the same behaviour (e.g. Power, RISC-V), because the CPU can
> > treat non-return atomics as "fire-and-forget" and have them handled
> > elsewhere in the memory subsystem, causing them to be treated similarly
> > to posted writes.
> > 
> > For the code snippet above, the second thread has no idea about the value
> > of y and so smp_rmb() is the wrong thing to be using imo. It really cares
> > about ordering the store to y before the read of x, so needs a full mb (i.e.
> > the test is more like 'R' than 'MP').
> > 
> > Also, wouldn't this problem also arise if your atomics were built using a
> > spinlock where unlock had release semantics?
> 
> The current Linux kernel memory model forbids this outcome with smp_rmb(),
> though I did have to work around the current lack of atomic_inc() using
> xchg_relaxed(), so please review my litmus tests carefully.

It's worth noting that we don't have the problem with any value-returning
atomics, so all flavours of xchg in this test would be forbidden on arm64
too.

> 	C C-WillDeacon-MP+o-r+ai-rmb-o.litmus
> 
> 	(*
> 	 * Expected result: Never.
> 	 *
> 	 * Desired litmus test, with atomic_inc() emulated by xchg_relaxed():
> 	 *
> 	 *     WRITE_ONCE(x, 1);               atomic_inc(&y);
> 	 *     r0 = xchg_release(&y, 5);       smp_rmb();
> 	 *                                     r1 = READ_ONCE(x);
> 	 *
> 	 *
> 	 *     WARN_ON(r0 == 0 && r1 == 0);
> 	 *)
> 
> 	{
> 	}
> 
> 	P0(int *x, int *y)
> 	{
> 		WRITE_ONCE(*x, 1);
> 		r0 = xchg_release(y, 5);
> 	}
> 
> 	P1(int *x, int *y)
> 	{
> 		r2 = xchg_relaxed(y, 1);
> 		smp_rmb();
> 		r1 = READ_ONCE(*x);
> 	}
> 
> 	exists
> 	(0:r0=0 /\ 1:r1=0)
> 
> Here is what herd thinks:
> 
> 	$ herd7 -bell strong-kernel.bell -cat weak-kernel.cat -macros linux.def ../litmus/manual/kernel/C-WillDeacon-MP+o-r+ai-rmb-o.litmus
> 	Test C-WillDeacon-MP+o-r+ai-rmb-o Allowed
> 	States 3
> 	0:r0=0; 1:r1=1;
> 	0:r0=1; 1:r1=0;
> 	0:r0=1; 1:r1=1;
> 	No
> 	Witnesses
> 	Positive: 0 Negative: 3
> 	Condition exists (0:r0=0 /\ 1:r1=0)
> 	Observation C-WillDeacon-MP+o-r+ai-rmb-o Never 0 3
> 	Hash=0c3e25a94b38708a2c5ea11ff52c8077
> 
> I get the same answer from strong-kernel.cat (which is our best-guess
> envelope over hardware guarantees), weak-kernel.cat (which is simplified
> based on what people actually use), and proposal.cat (which is a candidate
> model with further simplifications).
> 
> I converted this (possibly incorrectly) to PowerPC assembly:
> 
> 	PPC w-RMWl-r+w-RMWl-r.litmus
> 	""
> 	(*
> 	 * Does 3.0 Linux-kernel Power atomic_add_return() provide local 
> 	 * barrier that orders prior stores against subsequent loads?
> 	 * Use the atomic_add_return() in both threads, but to different variables.
> 	 * And use the trailing-lwsync variant of atomic_add_return().
> 	 *)
> 	(* 24-Aug-2011: ppcmem says "Sometimes" *)
> 	{
> 	0:r1=1; 0:r2=x; 0:r3=5; 0:r4=y;   0:r10=0 ; 0:r11=0;
> 	1:r1=1; 1:r2=x; 1:r3=5; 1:r4=y;   1:r10=0 ; 1:r11=0;
> 	}
> 	 P0                | P1                ;
> 	 stw r1,0(r2)      | lwarx  r11,r10,r4 ;
> 	 lwsync            | stwcx. r1,r10,r4  ;
> 	 lwarx  r11,r10,r4 | bne Fail1         ;
> 	 stwcx. r3,r10,r4  | lwsync            ;
> 	 bne Fail0         | lwz r3,0(r2)      ;
> 	 li r3,42          | Fail1:            ;
> 	 Fail0:            |                   ;
> 
> 
> 	exists
> 	(0:r11=0 /\ 0:r3=42 /\ 1:r3=0)
> 
> And ppcmem agrees with the linux-kernel memory model:
> 
> 	[ . . . ]
> 
> 	Found     82 : Prune count= 13946  seen_succs=  7453   7454 states 
> 	Found     83 : Prune count= 13997  seen_succs=  7490   7491 states 
> 	Found     84 : Prune count= 14007  seen_succs=  7506   7507 states 
> 	Found     85 : Prune count= 17229  seen_succs=  8889   8890 states 
> 	Found     86 : Prune count= 17235  seen_succs=  8897   8898 states 
> 	Test w-RMWl-r+w-RMWl-r Allowed
> 	States 9
> 	0:r3=5; 0:r11=0; 1:r3=0;
> 	0:r3=5; 0:r11=0; 1:r3=1;
> 	0:r3=5; 0:r11=0; 1:r3=5;
> 	0:r3=5; 0:r11=1; 1:r3=0;
> 	0:r3=5; 0:r11=1; 1:r3=1;
> 	0:r3=42; 0:r11=0; 1:r3=1;
> 	0:r3=42; 0:r11=0; 1:r3=5;
> 	0:r3=42; 0:r11=1; 1:r3=0;
> 	0:r3=42; 0:r11=1; 1:r3=1;
> 	No (allowed not found)
> 	Condition exists (0:r11=0 /\ 0:r3=42 /\ 1:r3=0)
> 	Hash=58fb07516ac5697580c33e06a354f667
> 	Observation w-RMWl-r+w-RMWl-r Never 0 9 
> 
> So if ARM really needs the litmus test with smp_rmb() to be allowed,
> we need to adjust the Linux-kernel memory model appropriately.  Which
> means that one of us needs to reach out to the usual suspects.  Would
> you like to do that, or would you like me to?

If you don't mind doing it, then that would be great, thanks. Do shout if
you need me to help with anything, though!

Will

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-08-02  9:45                             ` Will Deacon
@ 2017-08-02 16:17                               ` Paul E. McKenney
  2017-08-03 14:05                               ` Boqun Feng
  1 sibling, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2017-08-02 16:17 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Boqun Feng, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

On Wed, Aug 02, 2017 at 10:45:32AM +0100, Will Deacon wrote:
> Hi Paul,
> 
> On Tue, Aug 01, 2017 at 09:14:12AM -0700, Paul E. McKenney wrote:
> > On Tue, Aug 01, 2017 at 01:17:13PM +0100, Will Deacon wrote:
> > > On Tue, Aug 01, 2017 at 01:47:44PM +0200, Peter Zijlstra wrote:
> > > > On Tue, Aug 01, 2017 at 11:19:00AM +0100, Will Deacon wrote:
> > > > > On Tue, Aug 01, 2017 at 11:01:21AM +0200, Peter Zijlstra wrote:
> > > > > > On Mon, Jul 31, 2017 at 10:43:45AM -0700, Paul E. McKenney wrote:
> > > > > > > So if I have something like this, the assertion really can trigger?
> > > > > > > 
> > > > > > > 	WRITE_ONCE(x, 1);		atomic_inc(&y);
> > > > > > > 	r0 = xchg_release(&y, 5);	smp_mb__after_atomic();
> > > > > > > 					r1 = READ_ONCE(x);
> > > > > > > 
> > > > > > > 
> > > > > > > 	WARN_ON(r0 == 0 && r1 == 0);
> > > > > > > 
> > > > > > > I must confess that I am not seeing why we would want to allow this
> > > > > > > outcome.
> > > > > > 
> > > > > > No you are indeed quite right. I just wasn't creative enough. Thanks for
> > > > > > the inspiration.
> > > > > 
> > > > > Just to close this out, we agree that an smp_rmb() instead of
> > > > > smp_mb__after_atomic() would *not* forbid this outcome, right?
> > > > 
> > > > So that really hurts my brain. Per the normal rules that smp_rmb() would
> > > > order the read of @x against the last ll of @y and per ll/sc ordering
> > > > you then still don't get to make the WARN happen.
> > > > 
> > > > On IRC you explained that your 8.1 LSE instructions are not in fact
> > > > ordered by a smp_rmb, only by smp_wmb, which is 'surprising' since you
> > > > really need to load the old value to compute the new value.
> > > 
> > > To be clear, it's only the ST* variants of the LSE instructions that are
> > > treated as a write for the purposes of memory ordering, so these are the
> > > non-*_return variants. It's not unlikely that other architectures will
> > > exhibit the same behaviour (e.g. Power, RISC-V), because the CPU can
> > > treat non-return atomics as "fire-and-forget" and have them handled
> > > elsewhere in the memory subsystem, causing them to be treated similarly
> > > to posted writes.
> > > 
> > > For the code snippet above, the second thread has no idea about the value
> > > of y and so smp_rmb() is the wrong thing to be using imo. It really cares
> > > about ordering the store to y before the read of x, so needs a full mb (i.e.
> > > the test is more like 'R' than 'MP').
> > > 
> > > Also, wouldn't this problem also arise if your atomics were built using a
> > > spinlock where unlock had release semantics?
> > 
> > The current Linux kernel memory model forbids this outcome with smp_rmb(),
> > though I did have to work around the current lack of atomic_inc() using
> > xchg_relaxed(), so please review my litmus tests carefully.
> 
> It's worth noting that we don't have the problem with any value-returning
> atomics, so all flavours of xchg in this test would be forbidden on arm64
> too.

Plus after upgrading to the latest and greatest version of herd,
atomic_inc() worked just fine.  (Hey, I -try- to keep up!)  The updated
litmus test is here:

https://github.com/paulmckrcu/litmus/blob/master/manual/kernel/C-WillDeacon-MP%2Bo-r%2Bai-rmb-o.litmus

Same outcome.  Alan Stern is looking into what might be adjusted.
Of course, there is no guarantee that this will turn out to be reasonable
or for that matter acceptable to the usual suspects, but if feasible we
should at least see what this does to the model.

> > 	C C-WillDeacon-MP+o-r+ai-rmb-o.litmus
> > 
> > 	(*
> > 	 * Expected result: Never.
> > 	 *
> > 	 * Desired litmus test, with atomic_inc() emulated by xchg_relaxed():
> > 	 *
> > 	 *     WRITE_ONCE(x, 1);               atomic_inc(&y);
> > 	 *     r0 = xchg_release(&y, 5);       smp_rmb();
> > 	 *                                     r1 = READ_ONCE(x);
> > 	 *
> > 	 *
> > 	 *     WARN_ON(r0 == 0 && r1 == 0);
> > 	 *)
> > 
> > 	{
> > 	}
> > 
> > 	P0(int *x, int *y)
> > 	{
> > 		WRITE_ONCE(*x, 1);
> > 		r0 = xchg_release(y, 5);
> > 	}
> > 
> > 	P1(int *x, int *y)
> > 	{
> > 		r2 = xchg_relaxed(y, 1);
> > 		smp_rmb();
> > 		r1 = READ_ONCE(*x);
> > 	}
> > 
> > 	exists
> > 	(0:r0=0 /\ 1:r1=0)
> > 
> > Here is what herd thinks:
> > 
> > 	$ herd7 -bell strong-kernel.bell -cat weak-kernel.cat -macros linux.def ../litmus/manual/kernel/C-WillDeacon-MP+o-r+ai-rmb-o.litmus
> > 	Test C-WillDeacon-MP+o-r+ai-rmb-o Allowed
> > 	States 3
> > 	0:r0=0; 1:r1=1;
> > 	0:r0=1; 1:r1=0;
> > 	0:r0=1; 1:r1=1;
> > 	No
> > 	Witnesses
> > 	Positive: 0 Negative: 3
> > 	Condition exists (0:r0=0 /\ 1:r1=0)
> > 	Observation C-WillDeacon-MP+o-r+ai-rmb-o Never 0 3
> > 	Hash=0c3e25a94b38708a2c5ea11ff52c8077
> > 
> > I get the same answer from strong-kernel.cat (which is our best-guess
> > envelope over hardware guarantees), weak-kernel.cat (which is simplified
> > based on what people actually use), and proposal.cat (which is a candidate
> > model with further simplifications).
> > 
> > I converted this (possibly incorrectly) to PowerPC assembly:
> > 
> > 	PPC w-RMWl-r+w-RMWl-r.litmus
> > 	""
> > 	(*
> > 	 * Does 3.0 Linux-kernel Power atomic_add_return() provide local 
> > 	 * barrier that orders prior stores against subsequent loads?
> > 	 * Use the atomic_add_return() in both threads, but to different variables.
> > 	 * And use the trailing-lwsync variant of atomic_add_return().
> > 	 *)
> > 	(* 24-Aug-2011: ppcmem says "Sometimes" *)
> > 	{
> > 	0:r1=1; 0:r2=x; 0:r3=5; 0:r4=y;   0:r10=0 ; 0:r11=0;
> > 	1:r1=1; 1:r2=x; 1:r3=5; 1:r4=y;   1:r10=0 ; 1:r11=0;
> > 	}
> > 	 P0                | P1                ;
> > 	 stw r1,0(r2)      | lwarx  r11,r10,r4 ;
> > 	 lwsync            | stwcx. r1,r10,r4  ;
> > 	 lwarx  r11,r10,r4 | bne Fail1         ;
> > 	 stwcx. r3,r10,r4  | lwsync            ;
> > 	 bne Fail0         | lwz r3,0(r2)      ;
> > 	 li r3,42          | Fail1:            ;
> > 	 Fail0:            |                   ;
> > 
> > 
> > 	exists
> > 	(0:r11=0 /\ 0:r3=42 /\ 1:r3=0)
> > 
> > And ppcmem agrees with the linux-kernel memory model:
> > 
> > 	[ . . . ]
> > 
> > 	Found     82 : Prune count= 13946  seen_succs=  7453   7454 states 
> > 	Found     83 : Prune count= 13997  seen_succs=  7490   7491 states 
> > 	Found     84 : Prune count= 14007  seen_succs=  7506   7507 states 
> > 	Found     85 : Prune count= 17229  seen_succs=  8889   8890 states 
> > 	Found     86 : Prune count= 17235  seen_succs=  8897   8898 states 
> > 	Test w-RMWl-r+w-RMWl-r Allowed
> > 	States 9
> > 	0:r3=5; 0:r11=0; 1:r3=0;
> > 	0:r3=5; 0:r11=0; 1:r3=1;
> > 	0:r3=5; 0:r11=0; 1:r3=5;
> > 	0:r3=5; 0:r11=1; 1:r3=0;
> > 	0:r3=5; 0:r11=1; 1:r3=1;
> > 	0:r3=42; 0:r11=0; 1:r3=1;
> > 	0:r3=42; 0:r11=0; 1:r3=5;
> > 	0:r3=42; 0:r11=1; 1:r3=0;
> > 	0:r3=42; 0:r11=1; 1:r3=1;
> > 	No (allowed not found)
> > 	Condition exists (0:r11=0 /\ 0:r3=42 /\ 1:r3=0)
> > 	Hash=58fb07516ac5697580c33e06a354f667
> > 	Observation w-RMWl-r+w-RMWl-r Never 0 9 
> > 
> > So if ARM really needs the litmus test with smp_rmb() to be allowed,
> > we need to adjust the Linux-kernel memory model appropriately.  Which
> > means that one of us needs to reach out to the usual suspects.  Would
> > you like to do that, or would you like me to?
> 
> If you don't mind doing it, then that would be great, thanks. Do shout if
> you need me to help with anything, though!

You will be copied, just to cut out the timezone delays if nothing else.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-08-02  9:45                             ` Will Deacon
  2017-08-02 16:17                               ` Paul E. McKenney
@ 2017-08-03 14:05                               ` Boqun Feng
  2017-08-03 14:55                                 ` Paul E. McKenney
  1 sibling, 1 reply; 45+ messages in thread
From: Boqun Feng @ 2017-08-03 14:05 UTC (permalink / raw)
  To: Will Deacon
  Cc: Paul E. McKenney, Peter Zijlstra, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

[-- Attachment #1: Type: text/plain, Size: 1666 bytes --]

Hi Will,

On Wed, Aug 02, 2017 at 10:45:32AM +0100, Will Deacon wrote:
[...]
> 
> It's worth noting that we don't have the problem with any value-returning
> atomics, so all flavours of xchg in this test would be forbidden on arm64
> too.
> 
> > 	C C-WillDeacon-MP+o-r+ai-rmb-o.litmus
> > 
> > 	(*
> > 	 * Expected result: Never.
> > 	 *
> > 	 * Desired litmus test, with atomic_inc() emulated by xchg_relaxed():
> > 	 *
> > 	 *     WRITE_ONCE(x, 1);               atomic_inc(&y);
> > 	 *     r0 = xchg_release(&y, 5);       smp_rmb();
> > 	 *                                     r1 = READ_ONCE(x);
> > 	 *
> > 	 *
> > 	 *     WARN_ON(r0 == 0 && r1 == 0);
> > 	 *)
> > 
> > 	{
> > 	}
> > 
> > 	P0(int *x, int *y)
> > 	{
> > 		WRITE_ONCE(*x, 1);
> > 		r0 = xchg_release(y, 5);
> > 	}
> > 
> > 	P1(int *x, int *y)
> > 	{
> > 		r2 = xchg_relaxed(y, 1);
> > 		smp_rmb();
> > 		r1 = READ_ONCE(*x);
> > 	}
> > 
> > 	exists
> > 	(0:r0=0 /\ 1:r1=0)
> > 

How about a litmus test like this?

 	C C-AMO-global-transitivity.litmus
 
 	{
 	}
 
 	P0(int *x, int *y)
 	{
 		WRITE_ONCE(*x, 1);
 		r0 = xchg_release(y, 5);
 	}
 
 	P1(int *y, int *z)
 	{
 		atomic_inc(y);
 		smp_mb();
 		r1 = READ_ONCE(*z);
 	}

	P2(int *x, int *z)
	{
		WRITE_ONCE(*z, 1);
		smp_mb();
		r2 = READ_ONCE(*x);
	}
 
 	exists
 	(0:r0=0 /\ 1:r1=0 /\ 2:r2=0 )

Should we forbid the outcome in the exists-clause? I ask because I want
to know whether we can just treat atomic_inc() as a store, because if I
replace atomic_inc() with a WRITE(*y, 6), IIUC, the current model says
this could happen.

Thoughts?

Regards,
Boqun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-08-03 14:05                               ` Boqun Feng
@ 2017-08-03 14:55                                 ` Paul E. McKenney
  2017-08-03 16:12                                   ` Will Deacon
  0 siblings, 1 reply; 45+ messages in thread
From: Paul E. McKenney @ 2017-08-03 14:55 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Will Deacon, Peter Zijlstra, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

On Thu, Aug 03, 2017 at 10:05:16PM +0800, Boqun Feng wrote:
> Hi Will,
> 
> On Wed, Aug 02, 2017 at 10:45:32AM +0100, Will Deacon wrote:
> [...]
> > 
> > It's worth noting that we don't have the problem with any value-returning
> > atomics, so all flavours of xchg in this test would be forbidden on arm64
> > too.
> > 
> > > 	C C-WillDeacon-MP+o-r+ai-rmb-o.litmus
> > > 
> > > 	(*
> > > 	 * Expected result: Never.
> > > 	 *
> > > 	 * Desired litmus test, with atomic_inc() emulated by xchg_relaxed():
> > > 	 *
> > > 	 *     WRITE_ONCE(x, 1);               atomic_inc(&y);
> > > 	 *     r0 = xchg_release(&y, 5);       smp_rmb();
> > > 	 *                                     r1 = READ_ONCE(x);
> > > 	 *
> > > 	 *
> > > 	 *     WARN_ON(r0 == 0 && r1 == 0);
> > > 	 *)
> > > 
> > > 	{
> > > 	}
> > > 
> > > 	P0(int *x, int *y)
> > > 	{
> > > 		WRITE_ONCE(*x, 1);
> > > 		r0 = xchg_release(y, 5);
> > > 	}
> > > 
> > > 	P1(int *x, int *y)
> > > 	{
> > > 		r2 = xchg_relaxed(y, 1);
> > > 		smp_rmb();
> > > 		r1 = READ_ONCE(*x);
> > > 	}
> > > 
> > > 	exists
> > > 	(0:r0=0 /\ 1:r1=0)
> > > 
> 
> How about a litmus test like this?
> 
>  	C C-AMO-global-transitivity.litmus
>  
>  	{
>  	}
>  
>  	P0(int *x, int *y)
>  	{
>  		WRITE_ONCE(*x, 1);
>  		r0 = xchg_release(y, 5);
>  	}
>  
>  	P1(int *y, int *z)
>  	{
>  		atomic_inc(y);
>  		smp_mb();

I am going to guess that the smp_mb() enforces the needed ordering,
but Will will let me know.  ;-)

							Thanx, Paul

>  		r1 = READ_ONCE(*z);
>  	}
> 
> 	P2(int *x, int *z)
> 	{
> 		WRITE_ONCE(*z, 1);
> 		smp_mb();
> 		r2 = READ_ONCE(*x);
> 	}
>  
>  	exists
>  	(0:r0=0 /\ 1:r1=0 /\ 2:r2=0 )
> 
> Should we forbid the outcome in the exists-clause? I ask because I want
> to know whether we can just treat atomic_inc() as a store, because if I
> replace atomic_inc() with a WRITE(*y, 6), IIUC, the current model says
> this could happen.
> 
> Thoughts?
> 
> Regards,
> Boqun

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-08-03 14:55                                 ` Paul E. McKenney
@ 2017-08-03 16:12                                   ` Will Deacon
  2017-08-03 16:58                                     ` Paul E. McKenney
  0 siblings, 1 reply; 45+ messages in thread
From: Will Deacon @ 2017-08-03 16:12 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Boqun Feng, Peter Zijlstra, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

On Thu, Aug 03, 2017 at 07:55:14AM -0700, Paul E. McKenney wrote:
> On Thu, Aug 03, 2017 at 10:05:16PM +0800, Boqun Feng wrote:
> > Hi Will,
> > 
> > On Wed, Aug 02, 2017 at 10:45:32AM +0100, Will Deacon wrote:
> > [...]
> > > 
> > > It's worth noting that we don't have the problem with any value-returning
> > > atomics, so all flavours of xchg in this test would be forbidden on arm64
> > > too.
> > > 
> > > > 	C C-WillDeacon-MP+o-r+ai-rmb-o.litmus
> > > > 
> > > > 	(*
> > > > 	 * Expected result: Never.
> > > > 	 *
> > > > 	 * Desired litmus test, with atomic_inc() emulated by xchg_relaxed():
> > > > 	 *
> > > > 	 *     WRITE_ONCE(x, 1);               atomic_inc(&y);
> > > > 	 *     r0 = xchg_release(&y, 5);       smp_rmb();
> > > > 	 *                                     r1 = READ_ONCE(x);
> > > > 	 *
> > > > 	 *
> > > > 	 *     WARN_ON(r0 == 0 && r1 == 0);
> > > > 	 *)
> > > > 
> > > > 	{
> > > > 	}
> > > > 
> > > > 	P0(int *x, int *y)
> > > > 	{
> > > > 		WRITE_ONCE(*x, 1);
> > > > 		r0 = xchg_release(y, 5);
> > > > 	}
> > > > 
> > > > 	P1(int *x, int *y)
> > > > 	{
> > > > 		r2 = xchg_relaxed(y, 1);
> > > > 		smp_rmb();
> > > > 		r1 = READ_ONCE(*x);
> > > > 	}
> > > > 
> > > > 	exists
> > > > 	(0:r0=0 /\ 1:r1=0)
> > > > 
> > 
> > How about a litmus test like this?
> > 
> >  	C C-AMO-global-transitivity.litmus
> >  
> >  	{
> >  	}
> >  
> >  	P0(int *x, int *y)
> >  	{
> >  		WRITE_ONCE(*x, 1);
> >  		r0 = xchg_release(y, 5);
> >  	}
> >  
> >  	P1(int *y, int *z)
> >  	{
> >  		atomic_inc(y);
> >  		smp_mb();
> 
> I am going to guess that the smp_mb() enforces the needed ordering,
> but Will will let me know.  ;-)

Yup, that would be forbidden on arm64, and would also be forbidden if
you used WRITE_ONCE instead of atomic_inc (remember that we recently
became multi-copy atomic).

Will

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [RFC][PATCH v3]: documentation,atomic: Add new documents
  2017-08-03 16:12                                   ` Will Deacon
@ 2017-08-03 16:58                                     ` Paul E. McKenney
  0 siblings, 0 replies; 45+ messages in thread
From: Paul E. McKenney @ 2017-08-03 16:58 UTC (permalink / raw)
  To: Will Deacon
  Cc: Boqun Feng, Peter Zijlstra, linux-kernel, Ingo Molnar,
	Thomas Gleixner, Randy Dunlap

On Thu, Aug 03, 2017 at 05:12:24PM +0100, Will Deacon wrote:
> On Thu, Aug 03, 2017 at 07:55:14AM -0700, Paul E. McKenney wrote:
> > On Thu, Aug 03, 2017 at 10:05:16PM +0800, Boqun Feng wrote:
> > > Hi Will,
> > > 
> > > On Wed, Aug 02, 2017 at 10:45:32AM +0100, Will Deacon wrote:
> > > [...]
> > > > 
> > > > It's worth noting that we don't have the problem with any value-returning
> > > > atomics, so all flavours of xchg in this test would be forbidden on arm64
> > > > too.
> > > > 
> > > > > 	C C-WillDeacon-MP+o-r+ai-rmb-o.litmus
> > > > > 
> > > > > 	(*
> > > > > 	 * Expected result: Never.
> > > > > 	 *
> > > > > 	 * Desired litmus test, with atomic_inc() emulated by xchg_relaxed():
> > > > > 	 *
> > > > > 	 *     WRITE_ONCE(x, 1);               atomic_inc(&y);
> > > > > 	 *     r0 = xchg_release(&y, 5);       smp_rmb();
> > > > > 	 *                                     r1 = READ_ONCE(x);
> > > > > 	 *
> > > > > 	 *
> > > > > 	 *     WARN_ON(r0 == 0 && r1 == 0);
> > > > > 	 *)
> > > > > 
> > > > > 	{
> > > > > 	}
> > > > > 
> > > > > 	P0(int *x, int *y)
> > > > > 	{
> > > > > 		WRITE_ONCE(*x, 1);
> > > > > 		r0 = xchg_release(y, 5);
> > > > > 	}
> > > > > 
> > > > > 	P1(int *x, int *y)
> > > > > 	{
> > > > > 		r2 = xchg_relaxed(y, 1);
> > > > > 		smp_rmb();
> > > > > 		r1 = READ_ONCE(*x);
> > > > > 	}
> > > > > 
> > > > > 	exists
> > > > > 	(0:r0=0 /\ 1:r1=0)
> > > > > 
> > > 
> > > How about a litmus test like this?
> > > 
> > >  	C C-AMO-global-transitivity.litmus
> > >  
> > >  	{
> > >  	}
> > >  
> > >  	P0(int *x, int *y)
> > >  	{
> > >  		WRITE_ONCE(*x, 1);
> > >  		r0 = xchg_release(y, 5);
> > >  	}
> > >  
> > >  	P1(int *y, int *z)
> > >  	{
> > >  		atomic_inc(y);
> > >  		smp_mb();
> > 
> > I am going to guess that the smp_mb() enforces the needed ordering,
> > but Will will let me know.  ;-)
> 
> Yup, that would be forbidden on arm64, and would also be forbidden if
> you used WRITE_ONCE instead of atomic_inc (remember that we recently
> became multi-copy atomic).

Thank you for the confirmation!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [tip:locking/core] locking/atomic: Fix atomic_set_release() for 'funny' architectures
  2017-06-09 11:05 ` [RFC][PATCH] atomic: Fix atomic_set_release() for 'funny' architectures Peter Zijlstra
  2017-06-09 11:13   ` Peter Zijlstra
  2017-06-09 14:03   ` Chris Metcalf
@ 2017-08-10 12:10   ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 45+ messages in thread
From: tip-bot for Peter Zijlstra @ 2017-08-10 12:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: torvalds, boqun.feng, mingo, linux-kernel, paulmck, peterz,
	will.deacon, tglx, hpa

Commit-ID:  9d664c0aec3bfdb77fcf7de61cfe1febbecdd389
Gitweb:     http://git.kernel.org/tip/9d664c0aec3bfdb77fcf7de61cfe1febbecdd389
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Fri, 9 Jun 2017 13:05:06 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 10 Aug 2017 12:28:54 +0200

locking/atomic: Fix atomic_set_release() for 'funny' architectures

Those architectures that have a special atomic_set implementation also
need a special atomic_set_release(), because for the very same reason
WRITE_ONCE() is broken for them, smp_store_release() is too.

The vast majority is architectures that have spinlock hash based atomic
implementation except hexagon which seems to have a hardware 'feature'.

The spinlock based atomics should be SC, that is, none of them appear to
place extra barriers in atomic_cmpxchg() or any of the other SC atomic
primitives and therefore seem to rely on their spinlock implementation
being SC (I did not fully validate all that).

Therefore, the normal atomic_set() is SC and can be used at
atomic_set_release().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Chris Metcalf <cmetcalf@mellanox.com> [for tile]
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: davem@davemloft.net
Cc: james.hogan@imgtec.com
Cc: jejb@parisc-linux.org
Cc: rkuo@codeaurora.org
Cc: vgupta@synopsys.com
Link: http://lkml.kernel.org/r/20170609110506.yod47flaav3wgoj5@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/arc/include/asm/atomic.h         | 2 ++
 arch/hexagon/include/asm/atomic.h     | 2 ++
 arch/metag/include/asm/atomic_lock1.h | 2 ++
 arch/parisc/include/asm/atomic.h      | 2 ++
 arch/sparc/include/asm/atomic_32.h    | 2 ++
 arch/tile/include/asm/atomic_32.h     | 2 ++
 include/asm-generic/atomic64.h        | 2 ++
 7 files changed, 14 insertions(+)

diff --git a/arch/arc/include/asm/atomic.h b/arch/arc/include/asm/atomic.h
index 54b54da..1185928 100644
--- a/arch/arc/include/asm/atomic.h
+++ b/arch/arc/include/asm/atomic.h
@@ -123,6 +123,8 @@ static inline void atomic_set(atomic_t *v, int i)
 	atomic_ops_unlock(flags);
 }
 
+#define atomic_set_release(v, i)	atomic_set((v), (i))
+
 #endif
 
 /*
diff --git a/arch/hexagon/include/asm/atomic.h b/arch/hexagon/include/asm/atomic.h
index a62ba36..fb3dfb2 100644
--- a/arch/hexagon/include/asm/atomic.h
+++ b/arch/hexagon/include/asm/atomic.h
@@ -42,6 +42,8 @@ static inline void atomic_set(atomic_t *v, int new)
 	);
 }
 
+#define atomic_set_release(v, i)	atomic_set((v), (i))
+
 /**
  * atomic_read - reads a word, atomically
  * @v: pointer to atomic value
diff --git a/arch/metag/include/asm/atomic_lock1.h b/arch/metag/include/asm/atomic_lock1.h
index 6c1380a..eee779f 100644
--- a/arch/metag/include/asm/atomic_lock1.h
+++ b/arch/metag/include/asm/atomic_lock1.h
@@ -37,6 +37,8 @@ static inline int atomic_set(atomic_t *v, int i)
 	return i;
 }
 
+#define atomic_set_release(v, i) atomic_set((v), (i))
+
 #define ATOMIC_OP(op, c_op)						\
 static inline void atomic_##op(int i, atomic_t *v)			\
 {									\
diff --git a/arch/parisc/include/asm/atomic.h b/arch/parisc/include/asm/atomic.h
index 5394b9c..17b98a8 100644
--- a/arch/parisc/include/asm/atomic.h
+++ b/arch/parisc/include/asm/atomic.h
@@ -65,6 +65,8 @@ static __inline__ void atomic_set(atomic_t *v, int i)
 	_atomic_spin_unlock_irqrestore(v, flags);
 }
 
+#define atomic_set_release(v, i)	atomic_set((v), (i))
+
 static __inline__ int atomic_read(const atomic_t *v)
 {
 	return READ_ONCE((v)->counter);
diff --git a/arch/sparc/include/asm/atomic_32.h b/arch/sparc/include/asm/atomic_32.h
index ee3f11c..7643e97 100644
--- a/arch/sparc/include/asm/atomic_32.h
+++ b/arch/sparc/include/asm/atomic_32.h
@@ -29,6 +29,8 @@ int atomic_xchg(atomic_t *, int);
 int __atomic_add_unless(atomic_t *, int, int);
 void atomic_set(atomic_t *, int);
 
+#define atomic_set_release(v, i)	atomic_set((v), (i))
+
 #define atomic_read(v)          ACCESS_ONCE((v)->counter)
 
 #define atomic_add(i, v)	((void)atomic_add_return( (int)(i), (v)))
diff --git a/arch/tile/include/asm/atomic_32.h b/arch/tile/include/asm/atomic_32.h
index a937742..53a423e 100644
--- a/arch/tile/include/asm/atomic_32.h
+++ b/arch/tile/include/asm/atomic_32.h
@@ -101,6 +101,8 @@ static inline void atomic_set(atomic_t *v, int n)
 	_atomic_xchg(&v->counter, n);
 }
 
+#define atomic_set_release(v, i)	atomic_set((v), (i))
+
 /* A 64bit atomic type */
 
 typedef struct {
diff --git a/include/asm-generic/atomic64.h b/include/asm-generic/atomic64.h
index dad68bf..8d28eb0 100644
--- a/include/asm-generic/atomic64.h
+++ b/include/asm-generic/atomic64.h
@@ -21,6 +21,8 @@ typedef struct {
 extern long long atomic64_read(const atomic64_t *v);
 extern void	 atomic64_set(atomic64_t *v, long long i);
 
+#define atomic64_set_release(v, i)	atomic64_set((v), (i))
+
 #define ATOMIC64_OP(op)							\
 extern void	 atomic64_##op(long long a, atomic64_t *v);
 

^ permalink raw reply related	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2017-08-10 12:15 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-09  9:24 [RFC][PATCH]: documentation,atomic: Add a new atomic_t document Peter Zijlstra
2017-06-09 11:05 ` [RFC][PATCH] atomic: Fix atomic_set_release() for 'funny' architectures Peter Zijlstra
2017-06-09 11:13   ` Peter Zijlstra
2017-06-09 17:28     ` Vineet Gupta
2017-06-09 17:28       ` Vineet Gupta
2017-06-09 18:49       ` Peter Zijlstra
2017-06-09 18:49         ` Peter Zijlstra
2017-06-09 18:58     ` James Bottomley
2017-06-09 14:03   ` Chris Metcalf
2017-08-10 12:10   ` [tip:locking/core] locking/atomic: " tip-bot for Peter Zijlstra
2017-06-09 15:44 ` [RFC][PATCH]: documentation,atomic: Add a new atomic_t document Will Deacon
2017-06-09 19:36   ` Peter Zijlstra
2017-06-11 13:56     ` Boqun Feng
2017-06-12 14:49       ` Peter Zijlstra
2017-06-13  6:39         ` Boqun Feng
2017-06-14 12:33         ` Will Deacon
2017-07-12 12:53         ` Boqun Feng
2017-07-12 13:08           ` Peter Zijlstra
2017-07-12 19:13             ` Paul E. McKenney
2017-07-26 11:53         ` [RFC][PATCH v3]: documentation,atomic: Add new documents Peter Zijlstra
2017-07-26 12:47           ` Boqun Feng
2017-07-31  9:05             ` Peter Zijlstra
2017-07-31 11:04               ` Boqun Feng
2017-07-31 17:43                 ` Paul E. McKenney
2017-08-01  2:14                   ` Boqun Feng
2017-08-01  9:01                   ` Peter Zijlstra
2017-08-01 10:19                     ` Will Deacon
2017-08-01 11:47                       ` Peter Zijlstra
2017-08-01 12:17                         ` Will Deacon
2017-08-01 12:52                           ` Peter Zijlstra
2017-08-01 16:14                           ` Paul E. McKenney
2017-08-01 16:42                             ` Peter Zijlstra
2017-08-01 16:53                               ` Will Deacon
2017-08-01 22:18                               ` Paul E. McKenney
2017-08-02  8:46                                 ` Will Deacon
2017-08-01 18:37                             ` Paul E. McKenney
2017-08-02  9:45                             ` Will Deacon
2017-08-02 16:17                               ` Paul E. McKenney
2017-08-03 14:05                               ` Boqun Feng
2017-08-03 14:55                                 ` Paul E. McKenney
2017-08-03 16:12                                   ` Will Deacon
2017-08-03 16:58                                     ` Paul E. McKenney
2017-08-01 13:35                     ` Paul E. McKenney
2017-07-26 16:28           ` Randy Dunlap
2017-06-09 18:15 ` [RFC][PATCH]: documentation,atomic: Add a new atomic_t document Randy Dunlap

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.