All of lore.kernel.org
 help / color / mirror / Atom feed
From: Boqun Feng <boqun.feng@gmail.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, Ingo Molnar <mingo@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [RFC][PATCH]: documentation,atomic: Add a new atomic_t document
Date: Tue, 13 Jun 2017 14:39:05 +0800	[thread overview]
Message-ID: <20170613063905.yartsltbdnjwgk63@tardis> (raw)
In-Reply-To: <20170612144929.3wiwtbqopsfpm3qk@hirez.programming.kicks-ass.net>

On Mon, Jun 12, 2017 at 04:49:29PM +0200, Peter Zijlstra wrote:
> On Sun, Jun 11, 2017 at 09:56:32PM +0800, Boqun Feng wrote:
> 
> > I think the term we use to refer this behavior is "fully-ordered"?
> 
> Right, that is what we used to call it, and the term even occurs in
> memory-barriers.txt but isn't actually defined therein.
> 
> > Could we give it a slight formal definition like:
> > 
> > a.	memory operations preceding and following the RmW operation is
> > 	Sequentially Consistent.
> > 
> > b.	load or store part of the RmW operation is Sequentially
> > 	Consistent with operations preceding or following.
> > 
> > Though, sounds like defining "fully-ordered" is the job for
> > memory-barriers.txt, but it's never done ;-)
> 
> Right, so while memory-barriers.txt uses the term 'fully ordered' it
> doesn't appear to mean the same thing we need here.
> 
> Still, lacking anything better, I did the below. Note that I also
> removed much of the atomic stuff from memory-barrier.txt in order to
> avoid duplication and confusion (it too was severely stale).
> 

Agreed ;-)

> 
> 
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
>  Documentation/atomic_t.txt        |  182 ++++++++++++++++++++++++++++++++++++++
>  Documentation/memory-barriers.txt |   86 -----------------
>  2 files changed, 184 insertions(+), 84 deletions(-)
> 
> --- /dev/null
> +++ b/Documentation/atomic_t.txt
> @@ -0,0 +1,182 @@
> +
> +On atomic types (atomic_t atomic64_t and atomic_long_t).
> +
> +The atomic type provides an interface to the architecture's means of atomic
> +RmW operations between CPUs (it specifically does not order/work/etc. on
> +IO).
> +
> +The 'full' API consists of:
> +
> +Non RmW ops:

This is the first "Non RmW ops:", and..

> +
> +  atomic_read(), atomic_set()
> +  atomic_read_acquire(), atomic_set_release()
> +
> +
> +RmW atomic operations:
> +
> +Arithmetic:
> +
> +  atomic_{add,sub,inc,dec}()
> +  atomic_{add,sub,inc,dec}_return{,_relaxed,_acquire,_release}()
> +  atomic_fetch_{add,sub,inc,dec}{,_relaxed,_acquire,_release}()
> +
> +
> +Bitwise:
> +
> +  atomic_{and,or,xor,andnot}()
> +  atomic_fetch_{and,or,xor,andnot}{,_relaxed,_acquire,_release}()
> +
> +
> +Swap:
> +
> +  atomic_xchg{,_relaxed,_acquire,_release}()
> +  atomic_cmpxchg{,_relaxed,_acquire,_release}()
> +  atomic_try_cmpxchg{,_relaxed,_acquire,_release}()
> +
> +
> +Reference count (but please see refcount_t):
> +
> +  atomic_add_unless(), atomic_inc_not_zero()
> +  atomic_sub_and_test(), atomic_dec_and_test()
> +
> +
> +Misc:
> +
> +  atomic_inc_and_test(), atomic_add_negative()
> +  atomic_dec_unless_positive(), atomic_inc_unless_negative()
> +
> +
> +Barriers:
> +
> +  smp_mb__{before,after}_atomic()
> +
> +
> +

I feel like some words or a cutting line required here, indicating we
end listing the api ops and begin to talk more details(atomicity,
ordering, etc.). Otherwise, the following second "Non RmW ops:" may
confuse people a little bit. Thoughts?

Regards,
Boqun

> +Non RmW ops:
> +
> +The non-RmW ops are (typically) regular LOADs and STOREs and are canonically
> +implemented using READ_ONCE(), WRITE_ONCE(), smp_load_acquire() and
> +smp_store_release() respectively.
> +
> +The one detail to this is that atomic_set() should be observable to the RmW
> +ops. That is:
> +
> +
> +  PRE:
> +  atomic_set(v, 1);
> +
> +  CPU0						CPU1
> +  atomic_add_unless(v, 1, 0)			atomic_set(v, 0);
> +
> +  POST:
> +  BUG_ON(v->counter == 2);
> +
> +
> +In this case we would expect the atomic_set() from CPU1 to either happen
> +before the atomic_add_unless(), in which case that latter one would no-op, or
> +_after_ in which case we'd overwrite its result. In no case is "2" a valid
> +outcome.
> +
> +This is typically true on 'normal' platforms, where a regular competing STORE
> +will invalidate a LL/SC or fail a CMPXCHG.
> +
> +The obvious case where this is not so is when we need to implement atomic ops
> +with a lock:
> +
> +
> +  CPU0
> +
> +  atomic_add_unless(v, 1, 0);
> +    lock();
> +    ret = READ_ONCE(v->counter); // == 1
> +						atomic_set(v, 0);
> +    if (ret != u)				  WRITE_ONCE(v->counter, 0);
> +      WRITE_ONCE(v->counter, ret + 1);
> +    unlock();
> +
> +
> +the typical solution is to then implement atomic_set() with atomic_xchg().
> +
> +
> +RmW ops:
> +
> +These come in various forms:
> +
> + - plain operations without return value: atomic_{}()
> +
> + - operations which return the modified value: atomic_{}_return()
> +
> +   these are limited to the arithmetic operations because those are
> +   reversible. Bitops are irreversible and therefore the modified value
> +   is of dubious utility.
> +
> + - operations which return the original value: atomic_fetch_{}()
> +
> + - swap operations: xchg(), cmpxchg() and try_cmpxchg()
> +
> + - misc; the special purpose operations that are commonly used and would,
> +   given the interface, normally be implemented using (try_)cmpxchg loops but
> +   are time critical and can, (typically) on LL/SC architectures, be more
> +   efficiently implemented.
> +
> +
> +All these operations are SMP atomic; that is, the operations (for a single
> +atomic variable) can be fully ordered and no intermediate state is lost or
> +visible.
> +
> +
> +Ordering:  (go read memory-barriers.txt first)
> +
> +The rule of thumb:
> +
> + - non-RmW operations are unordered;
> +
> + - RmW operations that have no return value are unordered;
> +
> + - RmW operations that have a return value are fully ordered;
> +
> + - RmW operations that are conditional are unordered on FAILURE, otherwise the
> +   above rules apply.
> +
> +Except of course when an operation has an explicit ordering like:
> +
> + {}_relaxed: unordered
> + {}_acquire: the R of the RmW (or atomic_read) is an ACQUIRE
> + {}_release: the W of the RmW (or atomic_set)  is a  RELEASE
> +
> +
> +Fully ordered primitives are ordered against everything prior and everything
> +subsequenct. They also imply transitivity. Therefore a fully ordered primitive
> +is like having an smp_mb() before and an smp_mb() after the primitive.
> +
> +
> +The barriers:
> +
> +  smp_mb__{before,after}_atomic()
> +
> +only apply to the RmW ops and can be used to augment/upgrade the ordering
> +inherit to the used atomic op. These barriers provide a full smp_mb().
> +
> +These helper barriers exist because architectures have varying implicit
> +ordering on their SMP atomic primitives. For example our TSO architectures
> +provide full ordered atomics and these barriers are no-ops.
> +
> +Thus:
> +
> +  atomic_fetch_add();
> +
> +is equivalent to:
> +
> +  smp_mb__before_atomic();
> +  atomic_fetch_add_relaxed();
> +  smp_mb__after_atomic();
> +
> +
> +Further, while something like:
> +
> +  smp_mb__before_atomic();
> +  atomic_dec(&X);
> +
> +is a 'typical' RELEASE pattern, the barrier is strictly stronger than
> +a RELEASE.
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -498,7 +498,7 @@ VARIETIES OF MEMORY BARRIER
>       This means that ACQUIRE acts as a minimal "acquire" operation and
>       RELEASE acts as a minimal "release" operation.
>  
> -A subset of the atomic operations described in atomic_ops.txt have ACQUIRE
> +A subset of the atomic operations described in atomic_t.txt have ACQUIRE
>  and RELEASE variants in addition to fully-ordered and relaxed (no barrier
>  semantics) definitions.  For compound atomics performing both a load and a
>  store, ACQUIRE semantics apply only to the load and RELEASE semantics apply
> @@ -1876,8 +1876,7 @@ compiler and the CPU from reordering the
>       This makes sure that the death mark on the object is perceived to be set
>       *before* the reference counter is decremented.
>  
> -     See Documentation/atomic_ops.txt for more information.  See the "Atomic
> -     operations" subsection for information on where to use these.
> +     See Documentation/atomic_t.txt for more information.
>  
>  
>   (*) lockless_dereference();
> @@ -2503,87 +2502,6 @@ operations are noted specially as some o
>  some don't, but they're very heavily relied on as a group throughout the
>  kernel.
>  
> -Any atomic operation that modifies some state in memory and returns information
> -about the state (old or new) implies an SMP-conditional general memory barrier
> -(smp_mb()) on each side of the actual operation (with the exception of
> -explicit lock operations, described later).  These include:
> -
> -	xchg();
> -	atomic_xchg();			atomic_long_xchg();
> -	atomic_inc_return();		atomic_long_inc_return();
> -	atomic_dec_return();		atomic_long_dec_return();
> -	atomic_add_return();		atomic_long_add_return();
> -	atomic_sub_return();		atomic_long_sub_return();
> -	atomic_inc_and_test();		atomic_long_inc_and_test();
> -	atomic_dec_and_test();		atomic_long_dec_and_test();
> -	atomic_sub_and_test();		atomic_long_sub_and_test();
> -	atomic_add_negative();		atomic_long_add_negative();
> -	test_and_set_bit();
> -	test_and_clear_bit();
> -	test_and_change_bit();
> -
> -	/* when succeeds */
> -	cmpxchg();
> -	atomic_cmpxchg();		atomic_long_cmpxchg();
> -	atomic_add_unless();		atomic_long_add_unless();
> -
> -These are used for such things as implementing ACQUIRE-class and RELEASE-class
> -operations and adjusting reference counters towards object destruction, and as
> -such the implicit memory barrier effects are necessary.
> -
> -
> -The following operations are potential problems as they do _not_ imply memory
> -barriers, but might be used for implementing such things as RELEASE-class
> -operations:
> -
> -	atomic_set();
> -	set_bit();
> -	clear_bit();
> -	change_bit();
> -
> -With these the appropriate explicit memory barrier should be used if necessary
> -(smp_mb__before_atomic() for instance).
> -
> -
> -The following also do _not_ imply memory barriers, and so may require explicit
> -memory barriers under some circumstances (smp_mb__before_atomic() for
> -instance):
> -
> -	atomic_add();
> -	atomic_sub();
> -	atomic_inc();
> -	atomic_dec();
> -
> -If they're used for statistics generation, then they probably don't need memory
> -barriers, unless there's a coupling between statistical data.
> -
> -If they're used for reference counting on an object to control its lifetime,
> -they probably don't need memory barriers because either the reference count
> -will be adjusted inside a locked section, or the caller will already hold
> -sufficient references to make the lock, and thus a memory barrier unnecessary.
> -
> -If they're used for constructing a lock of some description, then they probably
> -do need memory barriers as a lock primitive generally has to do things in a
> -specific order.
> -
> -Basically, each usage case has to be carefully considered as to whether memory
> -barriers are needed or not.
> -
> -The following operations are special locking primitives:
> -
> -	test_and_set_bit_lock();
> -	clear_bit_unlock();
> -	__clear_bit_unlock();
> -
> -These implement ACQUIRE-class and RELEASE-class operations.  These should be
> -used in preference to other operations when implementing locking primitives,
> -because their implementations can be optimised on many architectures.
> -
> -[!] Note that special memory barrier primitives are available for these
> -situations because on some CPUs the atomic instructions used imply full memory
> -barriers, and so barrier instructions are superfluous in conjunction with them,
> -and in such cases the special barrier primitives will be no-ops.
> -
>  See Documentation/atomic_ops.txt for more information.
>  
>  

  reply	other threads:[~2017-06-13  6:38 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-09  9:24 [RFC][PATCH]: documentation,atomic: Add a new atomic_t document Peter Zijlstra
2017-06-09 11:05 ` [RFC][PATCH] atomic: Fix atomic_set_release() for 'funny' architectures Peter Zijlstra
2017-06-09 11:13   ` Peter Zijlstra
2017-06-09 17:28     ` Vineet Gupta
2017-06-09 17:28       ` Vineet Gupta
2017-06-09 18:49       ` Peter Zijlstra
2017-06-09 18:49         ` Peter Zijlstra
2017-06-09 18:58     ` James Bottomley
2017-06-09 14:03   ` Chris Metcalf
2017-08-10 12:10   ` [tip:locking/core] locking/atomic: " tip-bot for Peter Zijlstra
2017-06-09 15:44 ` [RFC][PATCH]: documentation,atomic: Add a new atomic_t document Will Deacon
2017-06-09 19:36   ` Peter Zijlstra
2017-06-11 13:56     ` Boqun Feng
2017-06-12 14:49       ` Peter Zijlstra
2017-06-13  6:39         ` Boqun Feng [this message]
2017-06-14 12:33         ` Will Deacon
2017-07-12 12:53         ` Boqun Feng
2017-07-12 13:08           ` Peter Zijlstra
2017-07-12 19:13             ` Paul E. McKenney
2017-07-26 11:53         ` [RFC][PATCH v3]: documentation,atomic: Add new documents Peter Zijlstra
2017-07-26 12:47           ` Boqun Feng
2017-07-31  9:05             ` Peter Zijlstra
2017-07-31 11:04               ` Boqun Feng
2017-07-31 17:43                 ` Paul E. McKenney
2017-08-01  2:14                   ` Boqun Feng
2017-08-01  9:01                   ` Peter Zijlstra
2017-08-01 10:19                     ` Will Deacon
2017-08-01 11:47                       ` Peter Zijlstra
2017-08-01 12:17                         ` Will Deacon
2017-08-01 12:52                           ` Peter Zijlstra
2017-08-01 16:14                           ` Paul E. McKenney
2017-08-01 16:42                             ` Peter Zijlstra
2017-08-01 16:53                               ` Will Deacon
2017-08-01 22:18                               ` Paul E. McKenney
2017-08-02  8:46                                 ` Will Deacon
2017-08-01 18:37                             ` Paul E. McKenney
2017-08-02  9:45                             ` Will Deacon
2017-08-02 16:17                               ` Paul E. McKenney
2017-08-03 14:05                               ` Boqun Feng
2017-08-03 14:55                                 ` Paul E. McKenney
2017-08-03 16:12                                   ` Will Deacon
2017-08-03 16:58                                     ` Paul E. McKenney
2017-08-01 13:35                     ` Paul E. McKenney
2017-07-26 16:28           ` Randy Dunlap
2017-06-09 18:15 ` [RFC][PATCH]: documentation,atomic: Add a new atomic_t document Randy Dunlap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170613063905.yartsltbdnjwgk63@tardis \
    --to=boqun.feng@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.