* [PATCH v3] refcount_t vs. atomic_t ordering differences
@ 2017-11-29 12:36 Elena Reshetova
2017-11-29 12:36 ` [PATCH] refcount_t: documentation for memory " Elena Reshetova
0 siblings, 1 reply; 9+ messages in thread
From: Elena Reshetova @ 2017-11-29 12:36 UTC (permalink / raw)
To: peterz; +Cc: linux-kernel, keescook, david, Elena Reshetova
Changes in v3:
- fixes from Kees incorporated apart from concrete examples
for practical cases.
- document converted to rst and linked under core-api
Changes in v2:
- typos and english are fixed based on Randy Dunlap's
proof reading
- structure of document improved:
* definitions now in the beginning
* confusing examples removed
* less redundancy overall and more up-to-the-point text
- definitions try to follow LKMM defined in
github.com/aparri/memory-model/blob/master/Documentation/explanation.txt
Elena Reshetova (1):
refcount_t: documentation for memory ordering differences
Documentation/core-api/index.rst | 1 +
Documentation/core-api/refcount-vs-atomic.rst | 129 ++++++++++++++++++++++++++
2 files changed, 130 insertions(+)
create mode 100644 Documentation/core-api/refcount-vs-atomic.rst
--
2.7.4
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH] refcount_t: documentation for memory ordering differences
2017-11-29 12:36 [PATCH v3] refcount_t vs. atomic_t ordering differences Elena Reshetova
@ 2017-11-29 12:36 ` Elena Reshetova
2017-11-30 1:36 ` Kees Cook
2017-12-01 20:34 ` Randy Dunlap
0 siblings, 2 replies; 9+ messages in thread
From: Elena Reshetova @ 2017-11-29 12:36 UTC (permalink / raw)
To: peterz; +Cc: linux-kernel, keescook, david, Elena Reshetova
Some functions from refcount_t API provide different
memory ordering guarantees that their atomic counterparts.
This adds a document outlining these differences.
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
Documentation/core-api/index.rst | 1 +
Documentation/core-api/refcount-vs-atomic.rst | 129 ++++++++++++++++++++++++++
2 files changed, 130 insertions(+)
create mode 100644 Documentation/core-api/refcount-vs-atomic.rst
diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index d5bbe03..d4d54b0 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -14,6 +14,7 @@ Core utilities
kernel-api
assoc_array
atomic_ops
+ refcount-vs-atomic
cpu_hotplug
local_ops
workqueue
diff --git a/Documentation/core-api/refcount-vs-atomic.rst b/Documentation/core-api/refcount-vs-atomic.rst
new file mode 100644
index 0000000..5619d48
--- /dev/null
+++ b/Documentation/core-api/refcount-vs-atomic.rst
@@ -0,0 +1,129 @@
+===================================
+refcount_t API compared to atomic_t
+===================================
+
+The goal of refcount_t API is to provide a minimal API for implementing
+an object's reference counters. While a generic architecture-independent
+implementation from lib/refcount.c uses atomic operations underneath,
+there are a number of differences between some of the refcount_*() and
+atomic_*() functions with regards to the memory ordering guarantees.
+This document outlines the differences and provides respective examples
+in order to help maintainers validate their code against the change in
+these memory ordering guarantees.
+
+memory-barriers.txt and atomic_t.txt provide more background to the
+memory ordering in general and for atomic operations specifically.
+
+Relevant types of memory ordering
+=================================
+
+**Note**: the following section only covers some of the memory
+ordering types that are relevant for the atomics and reference
+counters and used through this document. For a much broader picture
+please consult memory-barriers.txt document.
+
+In the absence of any memory ordering guarantees (i.e. fully unordered)
+atomics & refcounters only provide atomicity and
+program order (po) relation (on the same CPU). It guarantees that
+each atomic_*() and refcount_*() operation is atomic and instructions
+are executed in program order on a single CPU.
+This is implemented using READ_ONCE()/WRITE_ONCE() and
+compare-and-swap primitives.
+
+A strong (full) memory ordering guarantees that all prior loads and
+stores (all po-earlier instructions) on the same CPU are completed
+before any po-later instruction is executed on the same CPU.
+It also guarantees that all po-earlier stores on the same CPU
+and all propagated stores from other CPUs must propagate to all
+other CPUs before any po-later instruction is executed on the original
+CPU (A-cumulative property). This is implemented using smp_mb().
+
+A RELEASE memory ordering guarantees that all prior loads and
+stores (all po-earlier instructions) on the same CPU are completed
+before the operation. It also guarantees that all po-earlier
+stores on the same CPU and all propagated stores from other CPUs
+must propagate to all other CPUs before the release operation
+(A-cumulative property). This is implemented using smp_store_release().
+
+A control dependency (on success) for refcounters guarantees that
+if a reference for an object was successfully obtained (reference
+counter increment or addition happened, function returned true),
+then further stores are ordered against this operation.
+Control dependency on stores are not implemented using any explicit
+barriers, but rely on CPU not to speculate on stores. This is only
+a single CPU relation and provides no guarantees for other CPUs.
+
+
+Comparison of functions
+=======================
+
+case 1) - non-"Read/Modify/Write" (RMW) ops
+-------------------------------------------
+
+Function changes:
+ atomic_set() --> refcount_set()
+ atomic_read() --> refcount_read()
+
+Memory ordering guarantee changes:
+ none (both fully unordered)
+
+case 2) - increment-based ops that return no value
+--------------------------------------------------
+
+Function changes:
+ atomic_inc() --> refcount_inc()
+ atomic_add() --> refcount_add()
+
+Memory ordering guarantee changes:
+ none (both fully unordered)
+
+
+case 3) - decrement-based RMW ops that return no value
+------------------------------------------------------
+Function changes:
+ atomic_dec() --> refcount_dec()
+
+Memory ordering guarantee changes:
+ fully unordered --> RELEASE ordering
+
+
+case 4) - increment-based RMW ops that return a value
+-----------------------------------------------------
+
+Function changes:
+ atomic_inc_not_zero() --> refcount_inc_not_zero()
+ no atomic counterpart --> refcount_add_not_zero()
+
+Memory ordering guarantees changes:
+ fully ordered --> control dependency on success for stores
+
+*Note*: we really assume here that necessary ordering is provided as a result
+of obtaining pointer to the object!
+
+
+case 5) - decrement-based RMW ops that return a value
+-----------------------------------------------------
+
+Function changes:
+ atomic_dec_and_test() --> refcount_dec_and_test()
+ atomic_sub_and_test() --> refcount_sub_and_test()
+ no atomic counterpart --> refcount_dec_if_one()
+ atomic_add_unless(&var, -1, 1) --> refcount_dec_not_one(&var)
+
+Memory ordering guarantees changes:
+ fully ordered --> RELEASE ordering + control dependency
+
+Note: atomic_add_unless() only provides full order on success.
+
+
+case 6) - lock-based RMW
+------------------------
+
+Function changes:
+
+ atomic_dec_and_lock() --> refcount_dec_and_lock()
+ atomic_dec_and_mutex_lock() --> refcount_dec_and_mutex_lock()
+
+Memory ordering guarantees changes:
+ fully ordered --> RELEASE ordering + control dependency +
+ hold spin_lock() on success
--
2.7.4
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] refcount_t: documentation for memory ordering differences
2017-11-29 12:36 ` [PATCH] refcount_t: documentation for memory " Elena Reshetova
@ 2017-11-30 1:36 ` Kees Cook
2017-12-05 10:36 ` Reshetova, Elena
2017-12-01 20:34 ` Randy Dunlap
1 sibling, 1 reply; 9+ messages in thread
From: Kees Cook @ 2017-11-30 1:36 UTC (permalink / raw)
To: Elena Reshetova
Cc: Peter Zijlstra, LKML, Dave Chinner, linux-doc, Jonathan Corbet
On Wed, Nov 29, 2017 at 4:36 AM, Elena Reshetova
<elena.reshetova@intel.com> wrote:
> Some functions from refcount_t API provide different
> memory ordering guarantees that their atomic counterparts.
> This adds a document outlining these differences.
>
> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
Thanks for the improvements!
I have some markup changes to add, but I'll send that as a separate patch.
Acked-by: Kees Cook <keescook@chromium.org>
-Kees
> ---
> Documentation/core-api/index.rst | 1 +
> Documentation/core-api/refcount-vs-atomic.rst | 129 ++++++++++++++++++++++++++
> 2 files changed, 130 insertions(+)
> create mode 100644 Documentation/core-api/refcount-vs-atomic.rst
>
> diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
> index d5bbe03..d4d54b0 100644
> --- a/Documentation/core-api/index.rst
> +++ b/Documentation/core-api/index.rst
> @@ -14,6 +14,7 @@ Core utilities
> kernel-api
> assoc_array
> atomic_ops
> + refcount-vs-atomic
> cpu_hotplug
> local_ops
> workqueue
> diff --git a/Documentation/core-api/refcount-vs-atomic.rst b/Documentation/core-api/refcount-vs-atomic.rst
> new file mode 100644
> index 0000000..5619d48
> --- /dev/null
> +++ b/Documentation/core-api/refcount-vs-atomic.rst
> @@ -0,0 +1,129 @@
> +===================================
> +refcount_t API compared to atomic_t
> +===================================
> +
> +The goal of refcount_t API is to provide a minimal API for implementing
> +an object's reference counters. While a generic architecture-independent
> +implementation from lib/refcount.c uses atomic operations underneath,
> +there are a number of differences between some of the refcount_*() and
> +atomic_*() functions with regards to the memory ordering guarantees.
> +This document outlines the differences and provides respective examples
> +in order to help maintainers validate their code against the change in
> +these memory ordering guarantees.
> +
> +memory-barriers.txt and atomic_t.txt provide more background to the
> +memory ordering in general and for atomic operations specifically.
> +
> +Relevant types of memory ordering
> +=================================
> +
> +**Note**: the following section only covers some of the memory
> +ordering types that are relevant for the atomics and reference
> +counters and used through this document. For a much broader picture
> +please consult memory-barriers.txt document.
> +
> +In the absence of any memory ordering guarantees (i.e. fully unordered)
> +atomics & refcounters only provide atomicity and
> +program order (po) relation (on the same CPU). It guarantees that
> +each atomic_*() and refcount_*() operation is atomic and instructions
> +are executed in program order on a single CPU.
> +This is implemented using READ_ONCE()/WRITE_ONCE() and
> +compare-and-swap primitives.
> +
> +A strong (full) memory ordering guarantees that all prior loads and
> +stores (all po-earlier instructions) on the same CPU are completed
> +before any po-later instruction is executed on the same CPU.
> +It also guarantees that all po-earlier stores on the same CPU
> +and all propagated stores from other CPUs must propagate to all
> +other CPUs before any po-later instruction is executed on the original
> +CPU (A-cumulative property). This is implemented using smp_mb().
> +
> +A RELEASE memory ordering guarantees that all prior loads and
> +stores (all po-earlier instructions) on the same CPU are completed
> +before the operation. It also guarantees that all po-earlier
> +stores on the same CPU and all propagated stores from other CPUs
> +must propagate to all other CPUs before the release operation
> +(A-cumulative property). This is implemented using smp_store_release().
> +
> +A control dependency (on success) for refcounters guarantees that
> +if a reference for an object was successfully obtained (reference
> +counter increment or addition happened, function returned true),
> +then further stores are ordered against this operation.
> +Control dependency on stores are not implemented using any explicit
> +barriers, but rely on CPU not to speculate on stores. This is only
> +a single CPU relation and provides no guarantees for other CPUs.
> +
> +
> +Comparison of functions
> +=======================
> +
> +case 1) - non-"Read/Modify/Write" (RMW) ops
> +-------------------------------------------
> +
> +Function changes:
> + atomic_set() --> refcount_set()
> + atomic_read() --> refcount_read()
> +
> +Memory ordering guarantee changes:
> + none (both fully unordered)
> +
> +case 2) - increment-based ops that return no value
> +--------------------------------------------------
> +
> +Function changes:
> + atomic_inc() --> refcount_inc()
> + atomic_add() --> refcount_add()
> +
> +Memory ordering guarantee changes:
> + none (both fully unordered)
> +
> +
> +case 3) - decrement-based RMW ops that return no value
> +------------------------------------------------------
> +Function changes:
> + atomic_dec() --> refcount_dec()
> +
> +Memory ordering guarantee changes:
> + fully unordered --> RELEASE ordering
> +
> +
> +case 4) - increment-based RMW ops that return a value
> +-----------------------------------------------------
> +
> +Function changes:
> + atomic_inc_not_zero() --> refcount_inc_not_zero()
> + no atomic counterpart --> refcount_add_not_zero()
> +
> +Memory ordering guarantees changes:
> + fully ordered --> control dependency on success for stores
> +
> +*Note*: we really assume here that necessary ordering is provided as a result
> +of obtaining pointer to the object!
> +
> +
> +case 5) - decrement-based RMW ops that return a value
> +-----------------------------------------------------
> +
> +Function changes:
> + atomic_dec_and_test() --> refcount_dec_and_test()
> + atomic_sub_and_test() --> refcount_sub_and_test()
> + no atomic counterpart --> refcount_dec_if_one()
> + atomic_add_unless(&var, -1, 1) --> refcount_dec_not_one(&var)
> +
> +Memory ordering guarantees changes:
> + fully ordered --> RELEASE ordering + control dependency
> +
> +Note: atomic_add_unless() only provides full order on success.
> +
> +
> +case 6) - lock-based RMW
> +------------------------
> +
> +Function changes:
> +
> + atomic_dec_and_lock() --> refcount_dec_and_lock()
> + atomic_dec_and_mutex_lock() --> refcount_dec_and_mutex_lock()
> +
> +Memory ordering guarantees changes:
> + fully ordered --> RELEASE ordering + control dependency +
> + hold spin_lock() on success
> --
> 2.7.4
>
--
Kees Cook
Pixel Security
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] refcount_t: documentation for memory ordering differences
2017-11-29 12:36 ` [PATCH] refcount_t: documentation for memory " Elena Reshetova
2017-11-30 1:36 ` Kees Cook
@ 2017-12-01 20:34 ` Randy Dunlap
2017-12-03 6:20 ` Andrea Parri
2017-12-05 7:36 ` Reshetova, Elena
1 sibling, 2 replies; 9+ messages in thread
From: Randy Dunlap @ 2017-12-01 20:34 UTC (permalink / raw)
To: Elena Reshetova, peterz; +Cc: linux-kernel, keescook, david
On 11/29/2017 04:36 AM, Elena Reshetova wrote:
> Some functions from refcount_t API provide different
> memory ordering guarantees that their atomic counterparts.
> This adds a document outlining these differences.
>
> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> ---
> Documentation/core-api/index.rst | 1 +
> Documentation/core-api/refcount-vs-atomic.rst | 129 ++++++++++++++++++++++++++
> 2 files changed, 130 insertions(+)
> create mode 100644 Documentation/core-api/refcount-vs-atomic.rst
> diff --git a/Documentation/core-api/refcount-vs-atomic.rst b/Documentation/core-api/refcount-vs-atomic.rst
> new file mode 100644
> index 0000000..5619d48
> --- /dev/null
> +++ b/Documentation/core-api/refcount-vs-atomic.rst
> @@ -0,0 +1,129 @@
> +===================================
> +refcount_t API compared to atomic_t
> +===================================
> +
> +The goal of refcount_t API is to provide a minimal API for implementing
> +an object's reference counters. While a generic architecture-independent
> +implementation from lib/refcount.c uses atomic operations underneath,
> +there are a number of differences between some of the refcount_*() and
> +atomic_*() functions with regards to the memory ordering guarantees.
> +This document outlines the differences and provides respective examples
> +in order to help maintainers validate their code against the change in
> +these memory ordering guarantees.
> +
> +memory-barriers.txt and atomic_t.txt provide more background to the
> +memory ordering in general and for atomic operations specifically.
> +
> +Relevant types of memory ordering
> +=================================
> +
> +**Note**: the following section only covers some of the memory
> +ordering types that are relevant for the atomics and reference
> +counters and used through this document. For a much broader picture
> +please consult memory-barriers.txt document.
> +
> +In the absence of any memory ordering guarantees (i.e. fully unordered)
> +atomics & refcounters only provide atomicity and
> +program order (po) relation (on the same CPU). It guarantees that
> +each atomic_*() and refcount_*() operation is atomic and instructions
> +are executed in program order on a single CPU.
> +This is implemented using READ_ONCE()/WRITE_ONCE() and
> +compare-and-swap primitives.
> +
> +A strong (full) memory ordering guarantees that all prior loads and
> +stores (all po-earlier instructions) on the same CPU are completed
> +before any po-later instruction is executed on the same CPU.
> +It also guarantees that all po-earlier stores on the same CPU
> +and all propagated stores from other CPUs must propagate to all
> +other CPUs before any po-later instruction is executed on the original
> +CPU (A-cumulative property). This is implemented using smp_mb().
I don't know what "A-cumulative property" means, and google search didn't
either.
Is it non-cumulative, similar to typical vs. atypical, where atypical
roughly means non-typical. Or is it accumlative (something being
accumulated, summed up, gathered up)?
Or is it something else.. TBD?
> +A RELEASE memory ordering guarantees that all prior loads and
> +stores (all po-earlier instructions) on the same CPU are completed
> +before the operation. It also guarantees that all po-earlier
> +stores on the same CPU and all propagated stores from other CPUs
> +must propagate to all other CPUs before the release operation
> +(A-cumulative property). This is implemented using smp_store_release().
thanks.
--
~Randy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] refcount_t: documentation for memory ordering differences
2017-12-01 20:34 ` Randy Dunlap
@ 2017-12-03 6:20 ` Andrea Parri
2017-12-03 6:27 ` Andrea Parri
2017-12-03 17:27 ` Randy Dunlap
2017-12-05 7:36 ` Reshetova, Elena
1 sibling, 2 replies; 9+ messages in thread
From: Andrea Parri @ 2017-12-03 6:20 UTC (permalink / raw)
To: Randy Dunlap
Cc: Elena Reshetova, peterz, linux-kernel, keescook, david,
Alan Stern, Paul E. McKenney
On Fri, Dec 01, 2017 at 12:34:23PM -0800, Randy Dunlap wrote:
> On 11/29/2017 04:36 AM, Elena Reshetova wrote:
> > Some functions from refcount_t API provide different
> > memory ordering guarantees that their atomic counterparts.
> > This adds a document outlining these differences.
> >
> > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> > ---
> > Documentation/core-api/index.rst | 1 +
> > Documentation/core-api/refcount-vs-atomic.rst | 129 ++++++++++++++++++++++++++
> > 2 files changed, 130 insertions(+)
> > create mode 100644 Documentation/core-api/refcount-vs-atomic.rst
>
> > diff --git a/Documentation/core-api/refcount-vs-atomic.rst b/Documentation/core-api/refcount-vs-atomic.rst
> > new file mode 100644
> > index 0000000..5619d48
> > --- /dev/null
> > +++ b/Documentation/core-api/refcount-vs-atomic.rst
> > @@ -0,0 +1,129 @@
> > +===================================
> > +refcount_t API compared to atomic_t
> > +===================================
> > +
> > +The goal of refcount_t API is to provide a minimal API for implementing
> > +an object's reference counters. While a generic architecture-independent
> > +implementation from lib/refcount.c uses atomic operations underneath,
> > +there are a number of differences between some of the refcount_*() and
> > +atomic_*() functions with regards to the memory ordering guarantees.
> > +This document outlines the differences and provides respective examples
> > +in order to help maintainers validate their code against the change in
> > +these memory ordering guarantees.
> > +
> > +memory-barriers.txt and atomic_t.txt provide more background to the
> > +memory ordering in general and for atomic operations specifically.
> > +
> > +Relevant types of memory ordering
> > +=================================
> > +
> > +**Note**: the following section only covers some of the memory
> > +ordering types that are relevant for the atomics and reference
> > +counters and used through this document. For a much broader picture
> > +please consult memory-barriers.txt document.
> > +
> > +In the absence of any memory ordering guarantees (i.e. fully unordered)
> > +atomics & refcounters only provide atomicity and
> > +program order (po) relation (on the same CPU). It guarantees that
> > +each atomic_*() and refcount_*() operation is atomic and instructions
> > +are executed in program order on a single CPU.
> > +This is implemented using READ_ONCE()/WRITE_ONCE() and
> > +compare-and-swap primitives.
> > +
> > +A strong (full) memory ordering guarantees that all prior loads and
> > +stores (all po-earlier instructions) on the same CPU are completed
> > +before any po-later instruction is executed on the same CPU.
> > +It also guarantees that all po-earlier stores on the same CPU
> > +and all propagated stores from other CPUs must propagate to all
> > +other CPUs before any po-later instruction is executed on the original
> > +CPU (A-cumulative property). This is implemented using smp_mb().
>
> I don't know what "A-cumulative property" means, and google search didn't
> either.
The description above seems to follow the (informal) definition given in:
https://github.com/aparri/memory-model/blob/master/Documentation/explanation.txt
(c.f., in part., Sect. 13-14)
and formalized by the LKMM. (The notion of A-cumulativity also appears, in
different contexts, in some memory consistency literature, e.g.,
http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/index.html
http://www.cl.cam.ac.uk/~pes20/armv8-mca/
https://arxiv.org/abs/1308.6810 )
A typical illustration of A-cumulativity (for smp_store_release(), say) is
given with the following program:
int x = 0;
int y = 0;
void thread0()
{
WRITE_ONCE(x, 1);
}
void thread1()
{
int r0;
r0 = READ_ONCE(x);
smp_store_release(&y, 1);
}
void thread2()
{
int r1;
int r2;
r1 = READ_ONCE(y);
smp_rmb();
r2 = READ_ONCE(x);
}
(This is a variation of the so called "message-passing" pattern, where the
stores are "distributed" over two threads; see also
https://github.com/aparri/memory-model/blob/master/litmus-tests/WRC%2Bpooncerelease%2Brmbonceonce%2BOnce.litmus )
The question we want to address is whether the final state
(r0 == 1 && r1 == 1 && r2 == 0)
can be reached/is allowed, and the answer is no (due to the A-cumulativity
of the store-release).
By contrast, dependencies provides no (A-)cumulativity; for example, if we
modify the previous program by replacing the store-release with a data dep.
as follows:
int x = 0;
int y = 0;
void thread0()
{
WRITE_ONCE(x, 1);
}
void thread1()
{
int r0;
r0 = READ_ONCE(x);
WRITE_ONCE(x, r0);
}
void thread2()
{
int r1;
int r2;
r1 = READ_ONCE(y);
smp_rmb();
r2 = READ_ONCE(x);
}
then that same final state is allowed (and observed on some PPC machines).
Andrea
>
> Is it non-cumulative, similar to typical vs. atypical, where atypical
> roughly means non-typical. Or is it accumlative (something being
> accumulated, summed up, gathered up)?
>
> Or is it something else.. TBD?
>
> > +A RELEASE memory ordering guarantees that all prior loads and
> > +stores (all po-earlier instructions) on the same CPU are completed
> > +before the operation. It also guarantees that all po-earlier
> > +stores on the same CPU and all propagated stores from other CPUs
> > +must propagate to all other CPUs before the release operation
> > +(A-cumulative property). This is implemented using smp_store_release().
>
> thanks.
> --
> ~Randy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] refcount_t: documentation for memory ordering differences
2017-12-03 6:20 ` Andrea Parri
@ 2017-12-03 6:27 ` Andrea Parri
2017-12-03 17:27 ` Randy Dunlap
1 sibling, 0 replies; 9+ messages in thread
From: Andrea Parri @ 2017-12-03 6:27 UTC (permalink / raw)
To: Randy Dunlap
Cc: Elena Reshetova, peterz, linux-kernel, keescook, david,
Alan Stern, Paul E. McKenney
On Sun, Dec 03, 2017 at 07:20:03AM +0100, Andrea Parri wrote:
> On Fri, Dec 01, 2017 at 12:34:23PM -0800, Randy Dunlap wrote:
> > On 11/29/2017 04:36 AM, Elena Reshetova wrote:
> > > Some functions from refcount_t API provide different
> > > memory ordering guarantees that their atomic counterparts.
> > > This adds a document outlining these differences.
> > >
> > > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> > > ---
> > > Documentation/core-api/index.rst | 1 +
> > > Documentation/core-api/refcount-vs-atomic.rst | 129 ++++++++++++++++++++++++++
> > > 2 files changed, 130 insertions(+)
> > > create mode 100644 Documentation/core-api/refcount-vs-atomic.rst
> >
> > > diff --git a/Documentation/core-api/refcount-vs-atomic.rst b/Documentation/core-api/refcount-vs-atomic.rst
> > > new file mode 100644
> > > index 0000000..5619d48
> > > --- /dev/null
> > > +++ b/Documentation/core-api/refcount-vs-atomic.rst
> > > @@ -0,0 +1,129 @@
> > > +===================================
> > > +refcount_t API compared to atomic_t
> > > +===================================
> > > +
> > > +The goal of refcount_t API is to provide a minimal API for implementing
> > > +an object's reference counters. While a generic architecture-independent
> > > +implementation from lib/refcount.c uses atomic operations underneath,
> > > +there are a number of differences between some of the refcount_*() and
> > > +atomic_*() functions with regards to the memory ordering guarantees.
> > > +This document outlines the differences and provides respective examples
> > > +in order to help maintainers validate their code against the change in
> > > +these memory ordering guarantees.
> > > +
> > > +memory-barriers.txt and atomic_t.txt provide more background to the
> > > +memory ordering in general and for atomic operations specifically.
> > > +
> > > +Relevant types of memory ordering
> > > +=================================
> > > +
> > > +**Note**: the following section only covers some of the memory
> > > +ordering types that are relevant for the atomics and reference
> > > +counters and used through this document. For a much broader picture
> > > +please consult memory-barriers.txt document.
> > > +
> > > +In the absence of any memory ordering guarantees (i.e. fully unordered)
> > > +atomics & refcounters only provide atomicity and
> > > +program order (po) relation (on the same CPU). It guarantees that
> > > +each atomic_*() and refcount_*() operation is atomic and instructions
> > > +are executed in program order on a single CPU.
> > > +This is implemented using READ_ONCE()/WRITE_ONCE() and
> > > +compare-and-swap primitives.
> > > +
> > > +A strong (full) memory ordering guarantees that all prior loads and
> > > +stores (all po-earlier instructions) on the same CPU are completed
> > > +before any po-later instruction is executed on the same CPU.
> > > +It also guarantees that all po-earlier stores on the same CPU
> > > +and all propagated stores from other CPUs must propagate to all
> > > +other CPUs before any po-later instruction is executed on the original
> > > +CPU (A-cumulative property). This is implemented using smp_mb().
> >
> > I don't know what "A-cumulative property" means, and google search didn't
> > either.
>
> The description above seems to follow the (informal) definition given in:
>
> https://github.com/aparri/memory-model/blob/master/Documentation/explanation.txt
> (c.f., in part., Sect. 13-14)
>
> and formalized by the LKMM. (The notion of A-cumulativity also appears, in
> different contexts, in some memory consistency literature, e.g.,
>
> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/index.html
> http://www.cl.cam.ac.uk/~pes20/armv8-mca/
> https://arxiv.org/abs/1308.6810 )
>
> A typical illustration of A-cumulativity (for smp_store_release(), say) is
> given with the following program:
>
> int x = 0;
> int y = 0;
>
> void thread0()
> {
> WRITE_ONCE(x, 1);
> }
>
> void thread1()
> {
> int r0;
>
> r0 = READ_ONCE(x);
> smp_store_release(&y, 1);
> }
>
> void thread2()
> {
> int r1;
> int r2;
>
> r1 = READ_ONCE(y);
> smp_rmb();
> r2 = READ_ONCE(x);
> }
>
> (This is a variation of the so called "message-passing" pattern, where the
> stores are "distributed" over two threads; see also
>
> https://github.com/aparri/memory-model/blob/master/litmus-tests/WRC%2Bpooncerelease%2Brmbonceonce%2BOnce.litmus )
>
> The question we want to address is whether the final state
>
> (r0 == 1 && r1 == 1 && r2 == 0)
>
> can be reached/is allowed, and the answer is no (due to the A-cumulativity
> of the store-release).
>
> By contrast, dependencies provides no (A-)cumulativity; for example, if we
> modify the previous program by replacing the store-release with a data dep.
> as follows:
>
> int x = 0;
> int y = 0;
>
> void thread0()
> {
> WRITE_ONCE(x, 1);
> }
>
> void thread1()
> {
> int r0;
>
> r0 = READ_ONCE(x);
> WRITE_ONCE(x, r0);
should have been "WRITE_ONCE(y, r0);"
Andrea
> }
>
> void thread2()
> {
> int r1;
> int r2;
>
> r1 = READ_ONCE(y);
> smp_rmb();
> r2 = READ_ONCE(x);
> }
>
> then that same final state is allowed (and observed on some PPC machines).
>
> Andrea
>
>
> >
> > Is it non-cumulative, similar to typical vs. atypical, where atypical
> > roughly means non-typical. Or is it accumlative (something being
> > accumulated, summed up, gathered up)?
> >
> > Or is it something else.. TBD?
> >
> > > +A RELEASE memory ordering guarantees that all prior loads and
> > > +stores (all po-earlier instructions) on the same CPU are completed
> > > +before the operation. It also guarantees that all po-earlier
> > > +stores on the same CPU and all propagated stores from other CPUs
> > > +must propagate to all other CPUs before the release operation
> > > +(A-cumulative property). This is implemented using smp_store_release().
> >
> > thanks.
> > --
> > ~Randy
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] refcount_t: documentation for memory ordering differences
2017-12-03 6:20 ` Andrea Parri
2017-12-03 6:27 ` Andrea Parri
@ 2017-12-03 17:27 ` Randy Dunlap
1 sibling, 0 replies; 9+ messages in thread
From: Randy Dunlap @ 2017-12-03 17:27 UTC (permalink / raw)
To: Andrea Parri
Cc: Elena Reshetova, peterz, linux-kernel, keescook, david,
Alan Stern, Paul E. McKenney
On 12/02/2017 10:20 PM, Andrea Parri wrote:
> On Fri, Dec 01, 2017 at 12:34:23PM -0800, Randy Dunlap wrote:
>> On 11/29/2017 04:36 AM, Elena Reshetova wrote:
>>> Some functions from refcount_t API provide different
>>> memory ordering guarantees that their atomic counterparts.
>>> This adds a document outlining these differences.
>>>
>>> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
>>> ---
>>> Documentation/core-api/index.rst | 1 +
>>> Documentation/core-api/refcount-vs-atomic.rst | 129 ++++++++++++++++++++++++++
>>> 2 files changed, 130 insertions(+)
>>> create mode 100644 Documentation/core-api/refcount-vs-atomic.rst
>>
>>> diff --git a/Documentation/core-api/refcount-vs-atomic.rst b/Documentation/core-api/refcount-vs-atomic.rst
>>> new file mode 100644
>>> index 0000000..5619d48
>>> --- /dev/null
>>> +++ b/Documentation/core-api/refcount-vs-atomic.rst
>>> @@ -0,0 +1,129 @@
>>> +===================================
>>> +refcount_t API compared to atomic_t
>>> +===================================
>>> +
>>> +The goal of refcount_t API is to provide a minimal API for implementing
>>> +an object's reference counters. While a generic architecture-independent
>>> +implementation from lib/refcount.c uses atomic operations underneath,
>>> +there are a number of differences between some of the refcount_*() and
>>> +atomic_*() functions with regards to the memory ordering guarantees.
>>> +This document outlines the differences and provides respective examples
>>> +in order to help maintainers validate their code against the change in
>>> +these memory ordering guarantees.
>>> +
>>> +memory-barriers.txt and atomic_t.txt provide more background to the
>>> +memory ordering in general and for atomic operations specifically.
>>> +
>>> +Relevant types of memory ordering
>>> +=================================
>>> +
>>> +**Note**: the following section only covers some of the memory
>>> +ordering types that are relevant for the atomics and reference
>>> +counters and used through this document. For a much broader picture
>>> +please consult memory-barriers.txt document.
>>> +
>>> +In the absence of any memory ordering guarantees (i.e. fully unordered)
>>> +atomics & refcounters only provide atomicity and
>>> +program order (po) relation (on the same CPU). It guarantees that
>>> +each atomic_*() and refcount_*() operation is atomic and instructions
>>> +are executed in program order on a single CPU.
>>> +This is implemented using READ_ONCE()/WRITE_ONCE() and
>>> +compare-and-swap primitives.
>>> +
>>> +A strong (full) memory ordering guarantees that all prior loads and
>>> +stores (all po-earlier instructions) on the same CPU are completed
>>> +before any po-later instruction is executed on the same CPU.
>>> +It also guarantees that all po-earlier stores on the same CPU
>>> +and all propagated stores from other CPUs must propagate to all
>>> +other CPUs before any po-later instruction is executed on the original
>>> +CPU (A-cumulative property). This is implemented using smp_mb().
>>
>> I don't know what "A-cumulative property" means, and google search didn't
>> either.
>
> The description above seems to follow the (informal) definition given in:
>
> https://github.com/aparri/memory-model/blob/master/Documentation/explanation.txt
> (c.f., in part., Sect. 13-14)
>
> and formalized by the LKMM. (The notion of A-cumulativity also appears, in
> different contexts, in some memory consistency literature, e.g.,
>
> http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/index.html
> http://www.cl.cam.ac.uk/~pes20/armv8-mca/
> https://arxiv.org/abs/1308.6810 )
Got it. Thanks.
--
~Randy
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH] refcount_t: documentation for memory ordering differences
2017-12-01 20:34 ` Randy Dunlap
2017-12-03 6:20 ` Andrea Parri
@ 2017-12-05 7:36 ` Reshetova, Elena
1 sibling, 0 replies; 9+ messages in thread
From: Reshetova, Elena @ 2017-12-05 7:36 UTC (permalink / raw)
To: Randy Dunlap, peterz; +Cc: linux-kernel, keescook, david
On 11/29/2017 04:36 AM, Elena Reshetova wrote:
> > Some functions from refcount_t API provide different
> > memory ordering guarantees that their atomic counterparts.
> > This adds a document outlining these differences.
> >
> > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> > ---
> > Documentation/core-api/index.rst | 1 +
> > Documentation/core-api/refcount-vs-atomic.rst | 129
> ++++++++++++++++++++++++++
> > 2 files changed, 130 insertions(+)
> > create mode 100644 Documentation/core-api/refcount-vs-atomic.rst
>
> > diff --git a/Documentation/core-api/refcount-vs-atomic.rst
> b/Documentation/core-api/refcount-vs-atomic.rst
> > new file mode 100644
> > index 0000000..5619d48
> > --- /dev/null
> > +++ b/Documentation/core-api/refcount-vs-atomic.rst
> > @@ -0,0 +1,129 @@
> > +===================================
> > +refcount_t API compared to atomic_t
> > +===================================
> > +
> > +The goal of refcount_t API is to provide a minimal API for implementing
> > +an object's reference counters. While a generic architecture-independent
> > +implementation from lib/refcount.c uses atomic operations underneath,
> > +there are a number of differences between some of the refcount_*() and
> > +atomic_*() functions with regards to the memory ordering guarantees.
> > +This document outlines the differences and provides respective examples
> > +in order to help maintainers validate their code against the change in
> > +these memory ordering guarantees.
> > +
> > +memory-barriers.txt and atomic_t.txt provide more background to the
> > +memory ordering in general and for atomic operations specifically.
> > +
> > +Relevant types of memory ordering
> > +=================================
> > +
> > +**Note**: the following section only covers some of the memory
> > +ordering types that are relevant for the atomics and reference
> > +counters and used through this document. For a much broader picture
> > +please consult memory-barriers.txt document.
> > +
> > +In the absence of any memory ordering guarantees (i.e. fully unordered)
> > +atomics & refcounters only provide atomicity and
> > +program order (po) relation (on the same CPU). It guarantees that
> > +each atomic_*() and refcount_*() operation is atomic and instructions
> > +are executed in program order on a single CPU.
> > +This is implemented using READ_ONCE()/WRITE_ONCE() and
> > +compare-and-swap primitives.
> > +
> > +A strong (full) memory ordering guarantees that all prior loads and
> > +stores (all po-earlier instructions) on the same CPU are completed
> > +before any po-later instruction is executed on the same CPU.
> > +It also guarantees that all po-earlier stores on the same CPU
> > +and all propagated stores from other CPUs must propagate to all
> > +other CPUs before any po-later instruction is executed on the original
> > +CPU (A-cumulative property). This is implemented using smp_mb().
>
> I don't know what "A-cumulative property" means, and google search didn't
> either.
>
> Is it non-cumulative, similar to typical vs. atypical, where atypical
> roughly means non-typical. Or is it accumlative (something being
> accumulated, summed up, gathered up)?
>
> Or is it something else.. TBD?
Sorry, I should have mentioned also explicitly in this document where the terms are
coming from. I have mentioned in cover letter, but failed to say here.
I will fix it.
Thank you for catching! I see that reply was already given to this by Andrea.
Best Regards,
Elena
>
> > +A RELEASE memory ordering guarantees that all prior loads and
> > +stores (all po-earlier instructions) on the same CPU are completed
> > +before the operation. It also guarantees that all po-earlier
> > +stores on the same CPU and all propagated stores from other CPUs
> > +must propagate to all other CPUs before the release operation
> > +(A-cumulative property). This is implemented using smp_store_release().
>
> thanks.
> --
> ~Randy
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: [PATCH] refcount_t: documentation for memory ordering differences
2017-11-30 1:36 ` Kees Cook
@ 2017-12-05 10:36 ` Reshetova, Elena
0 siblings, 0 replies; 9+ messages in thread
From: Reshetova, Elena @ 2017-12-05 10:36 UTC (permalink / raw)
To: Kees Cook; +Cc: Peter Zijlstra, LKML, Dave Chinner, linux-doc, Jonathan Corbet
On Wed, Nov 29, 2017 at 4:36 AM, Elena Reshetova
> <elena.reshetova@intel.com> wrote:
> > Some functions from refcount_t API provide different
> > memory ordering guarantees that their atomic counterparts.
> > This adds a document outlining these differences.
> >
> > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
>
> Thanks for the improvements!
>
> I have some markup changes to add, but I'll send that as a separate patch.
Thank you Kees! I guess I was too minimal on my markup use, so doc was pretty plain
before. I have just joined your changes with mine and put both of our sign-off
to the resulting patch. I think this way it is easier for reviewers since ultimately
content is the same.
I will now fix one more thing Randy noticed and then send it to linux-doc and
Jon Corbet.
Best Regards,
Elena.
>
> Acked-by: Kees Cook <keescook@chromium.org>
>
> -Kees
>
> > ---
> > Documentation/core-api/index.rst | 1 +
> > Documentation/core-api/refcount-vs-atomic.rst | 129
> ++++++++++++++++++++++++++
> > 2 files changed, 130 insertions(+)
> > create mode 100644 Documentation/core-api/refcount-vs-atomic.rst
> >
> > diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
> > index d5bbe03..d4d54b0 100644
> > --- a/Documentation/core-api/index.rst
> > +++ b/Documentation/core-api/index.rst
> > @@ -14,6 +14,7 @@ Core utilities
> > kernel-api
> > assoc_array
> > atomic_ops
> > + refcount-vs-atomic
> > cpu_hotplug
> > local_ops
> > workqueue
> > diff --git a/Documentation/core-api/refcount-vs-atomic.rst
> b/Documentation/core-api/refcount-vs-atomic.rst
> > new file mode 100644
> > index 0000000..5619d48
> > --- /dev/null
> > +++ b/Documentation/core-api/refcount-vs-atomic.rst
> > @@ -0,0 +1,129 @@
> > +===================================
> > +refcount_t API compared to atomic_t
> > +===================================
> > +
> > +The goal of refcount_t API is to provide a minimal API for implementing
> > +an object's reference counters. While a generic architecture-independent
> > +implementation from lib/refcount.c uses atomic operations underneath,
> > +there are a number of differences between some of the refcount_*() and
> > +atomic_*() functions with regards to the memory ordering guarantees.
> > +This document outlines the differences and provides respective examples
> > +in order to help maintainers validate their code against the change in
> > +these memory ordering guarantees.
> > +
> > +memory-barriers.txt and atomic_t.txt provide more background to the
> > +memory ordering in general and for atomic operations specifically.
> > +
> > +Relevant types of memory ordering
> > +=================================
> > +
> > +**Note**: the following section only covers some of the memory
> > +ordering types that are relevant for the atomics and reference
> > +counters and used through this document. For a much broader picture
> > +please consult memory-barriers.txt document.
> > +
> > +In the absence of any memory ordering guarantees (i.e. fully unordered)
> > +atomics & refcounters only provide atomicity and
> > +program order (po) relation (on the same CPU). It guarantees that
> > +each atomic_*() and refcount_*() operation is atomic and instructions
> > +are executed in program order on a single CPU.
> > +This is implemented using READ_ONCE()/WRITE_ONCE() and
> > +compare-and-swap primitives.
> > +
> > +A strong (full) memory ordering guarantees that all prior loads and
> > +stores (all po-earlier instructions) on the same CPU are completed
> > +before any po-later instruction is executed on the same CPU.
> > +It also guarantees that all po-earlier stores on the same CPU
> > +and all propagated stores from other CPUs must propagate to all
> > +other CPUs before any po-later instruction is executed on the original
> > +CPU (A-cumulative property). This is implemented using smp_mb().
> > +
> > +A RELEASE memory ordering guarantees that all prior loads and
> > +stores (all po-earlier instructions) on the same CPU are completed
> > +before the operation. It also guarantees that all po-earlier
> > +stores on the same CPU and all propagated stores from other CPUs
> > +must propagate to all other CPUs before the release operation
> > +(A-cumulative property). This is implemented using smp_store_release().
> > +
> > +A control dependency (on success) for refcounters guarantees that
> > +if a reference for an object was successfully obtained (reference
> > +counter increment or addition happened, function returned true),
> > +then further stores are ordered against this operation.
> > +Control dependency on stores are not implemented using any explicit
> > +barriers, but rely on CPU not to speculate on stores. This is only
> > +a single CPU relation and provides no guarantees for other CPUs.
> > +
> > +
> > +Comparison of functions
> > +=======================
> > +
> > +case 1) - non-"Read/Modify/Write" (RMW) ops
> > +-------------------------------------------
> > +
> > +Function changes:
> > + atomic_set() --> refcount_set()
> > + atomic_read() --> refcount_read()
> > +
> > +Memory ordering guarantee changes:
> > + none (both fully unordered)
> > +
> > +case 2) - increment-based ops that return no value
> > +--------------------------------------------------
> > +
> > +Function changes:
> > + atomic_inc() --> refcount_inc()
> > + atomic_add() --> refcount_add()
> > +
> > +Memory ordering guarantee changes:
> > + none (both fully unordered)
> > +
> > +
> > +case 3) - decrement-based RMW ops that return no value
> > +------------------------------------------------------
> > +Function changes:
> > + atomic_dec() --> refcount_dec()
> > +
> > +Memory ordering guarantee changes:
> > + fully unordered --> RELEASE ordering
> > +
> > +
> > +case 4) - increment-based RMW ops that return a value
> > +-----------------------------------------------------
> > +
> > +Function changes:
> > + atomic_inc_not_zero() --> refcount_inc_not_zero()
> > + no atomic counterpart --> refcount_add_not_zero()
> > +
> > +Memory ordering guarantees changes:
> > + fully ordered --> control dependency on success for stores
> > +
> > +*Note*: we really assume here that necessary ordering is provided as a result
> > +of obtaining pointer to the object!
> > +
> > +
> > +case 5) - decrement-based RMW ops that return a value
> > +-----------------------------------------------------
> > +
> > +Function changes:
> > + atomic_dec_and_test() --> refcount_dec_and_test()
> > + atomic_sub_and_test() --> refcount_sub_and_test()
> > + no atomic counterpart --> refcount_dec_if_one()
> > + atomic_add_unless(&var, -1, 1) --> refcount_dec_not_one(&var)
> > +
> > +Memory ordering guarantees changes:
> > + fully ordered --> RELEASE ordering + control dependency
> > +
> > +Note: atomic_add_unless() only provides full order on success.
> > +
> > +
> > +case 6) - lock-based RMW
> > +------------------------
> > +
> > +Function changes:
> > +
> > + atomic_dec_and_lock() --> refcount_dec_and_lock()
> > + atomic_dec_and_mutex_lock() --> refcount_dec_and_mutex_lock()
> > +
> > +Memory ordering guarantees changes:
> > + fully ordered --> RELEASE ordering + control dependency +
> > + hold spin_lock() on success
> > --
> > 2.7.4
> >
>
>
>
> --
> Kees Cook
> Pixel Security
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-12-05 10:36 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-29 12:36 [PATCH v3] refcount_t vs. atomic_t ordering differences Elena Reshetova
2017-11-29 12:36 ` [PATCH] refcount_t: documentation for memory " Elena Reshetova
2017-11-30 1:36 ` Kees Cook
2017-12-05 10:36 ` Reshetova, Elena
2017-12-01 20:34 ` Randy Dunlap
2017-12-03 6:20 ` Andrea Parri
2017-12-03 6:27 ` Andrea Parri
2017-12-03 17:27 ` Randy Dunlap
2017-12-05 7:36 ` Reshetova, Elena
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).