linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants
@ 2019-01-28 12:09 Elena Reshetova
  2019-01-28 14:29 ` Andrea Parri
  2019-01-29 14:00 ` Dmitry Vyukov
  0 siblings, 2 replies; 12+ messages in thread
From: Elena Reshetova @ 2019-01-28 12:09 UTC (permalink / raw)
  To: peterz; +Cc: linux-kernel, keescook, andrea.parri, Elena Reshetova

This adds an smp_acquire__after_ctrl_dep() barrier on successful
decrease of refcounter value from 1 to 0 for refcount_dec(sub)_and_test
variants and therefore gives stronger memory ordering guarantees than
prior versions of these functions.

Co-Developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 Documentation/core-api/refcount-vs-atomic.rst | 28 +++++++++++++++++++++++----
 arch/x86/include/asm/refcount.h               | 21 ++++++++++++++++----
 lib/refcount.c                                | 16 ++++++++++-----
 3 files changed, 52 insertions(+), 13 deletions(-)

diff --git a/Documentation/core-api/refcount-vs-atomic.rst b/Documentation/core-api/refcount-vs-atomic.rst
index 322851b..95d4b4e 100644
--- a/Documentation/core-api/refcount-vs-atomic.rst
+++ b/Documentation/core-api/refcount-vs-atomic.rst
@@ -54,6 +54,14 @@ must propagate to all other CPUs before the release operation
 (A-cumulative property). This is implemented using
 :c:func:`smp_store_release`.
 
+An ACQUIRE memory ordering guarantees that all post loads and
+stores (all po-later instructions) on the same CPU are
+completed after the acquire operation. It also guarantees that all
+po-later stores on the same CPU and all propagated stores from other CPUs
+must propagate to all other CPUs after the acquire operation
+(A-cumulative property). This is implemented using
+:c:func:`smp_acquire__after_ctrl_dep`.
+
 A control dependency (on success) for refcounters guarantees that
 if a reference for an object was successfully obtained (reference
 counter increment or addition happened, function returned true),
@@ -119,24 +127,36 @@ Memory ordering guarantees changes:
    result of obtaining pointer to the object!
 
 
-case 5) - decrement-based RMW ops that return a value
------------------------------------------------------
+case 5) - generic dec/sub decrement-based RMW ops that return a value
+---------------------------------------------------------------------
 
 Function changes:
 
  * :c:func:`atomic_dec_and_test` --> :c:func:`refcount_dec_and_test`
  * :c:func:`atomic_sub_and_test` --> :c:func:`refcount_sub_and_test`
+
+Memory ordering guarantees changes:
+
+ * fully ordered --> RELEASE ordering + ACQUIRE ordering and control dependency
+   on success.  
+
+
+case 6) other decrement-based RMW ops that return a value
+---------------------------------------------------------
+
+Function changes:
+
  * no atomic counterpart --> :c:func:`refcount_dec_if_one`
  * ``atomic_add_unless(&var, -1, 1)`` --> ``refcount_dec_not_one(&var)``
 
 Memory ordering guarantees changes:
 
- * fully ordered --> RELEASE ordering + control dependency
+ * fully ordered --> RELEASE ordering + control dependency 
 
 .. note:: :c:func:`atomic_add_unless` only provides full order on success.
 
 
-case 6) - lock-based RMW
+case 7) - lock-based RMW
 ------------------------
 
 Function changes:
diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
index dbaed55..ab8f584 100644
--- a/arch/x86/include/asm/refcount.h
+++ b/arch/x86/include/asm/refcount.h
@@ -67,16 +67,29 @@ static __always_inline void refcount_dec(refcount_t *r)
 static __always_inline __must_check
 bool refcount_sub_and_test(unsigned int i, refcount_t *r)
 {
-	return GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
+	bool ret = GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
 					 REFCOUNT_CHECK_LT_ZERO,
 					 r->refs.counter, e, "er", i, "cx");
+
+    if (ret) {
+               smp_acquire__after_ctrl_dep();
+               return true;
+    }
+
+    return false;
 }
 
 static __always_inline __must_check bool refcount_dec_and_test(refcount_t *r)
 {
-	return GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl",
-					REFCOUNT_CHECK_LT_ZERO,
-					r->refs.counter, e, "cx");
+	bool ret = GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl",
+                   REFCOUNT_CHECK_LT_ZERO,
+                   r->refs.counter, e, "cx");
+    if (ret) {
+               smp_acquire__after_ctrl_dep();
+               return true;
+    }
+
+    return false;
 }
 
 static __always_inline __must_check
diff --git a/lib/refcount.c b/lib/refcount.c
index ebcf8cd..732feac 100644
--- a/lib/refcount.c
+++ b/lib/refcount.c
@@ -33,6 +33,9 @@
  * Note that the allocator is responsible for ordering things between free()
  * and alloc().
  *
+ * The decrements dec_and_test() and sub_and_test() also provide acquire
+ * ordering on success. 
+ *
  */
 
 #include <linux/mutex.h>
@@ -164,8 +167,7 @@ EXPORT_SYMBOL(refcount_inc_checked);
  * at UINT_MAX.
  *
  * Provides release memory ordering, such that prior loads and stores are done
- * before, and provides a control dependency such that free() must come after.
- * See the comment on top.
+ * before, and provides an acquire ordering on success such that free() must come after.
  *
  * Use of this function is not recommended for the normal reference counting
  * use case in which references are taken and released one at a time.  In these
@@ -190,7 +192,12 @@ bool refcount_sub_and_test_checked(unsigned int i, refcount_t *r)
 
 	} while (!atomic_try_cmpxchg_release(&r->refs, &val, new));
 
-	return !new;
+    if (!new) {
+               smp_acquire__after_ctrl_dep();
+               return true;
+    }
+    return false;
+
 }
 EXPORT_SYMBOL(refcount_sub_and_test_checked);
 
@@ -202,8 +209,7 @@ EXPORT_SYMBOL(refcount_sub_and_test_checked);
  * decrement when saturated at UINT_MAX.
  *
  * Provides release memory ordering, such that prior loads and stores are done
- * before, and provides a control dependency such that free() must come after.
- * See the comment on top.
+ * before, and provides an acquire ordering on success such that free() must come after.
  *
  * Return: true if the resulting refcount is 0, false otherwise
  */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants
  2019-01-28 12:09 [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants Elena Reshetova
@ 2019-01-28 14:29 ` Andrea Parri
  2019-01-29  9:51   ` Peter Zijlstra
  2019-01-29 14:00 ` Dmitry Vyukov
  1 sibling, 1 reply; 12+ messages in thread
From: Andrea Parri @ 2019-01-28 14:29 UTC (permalink / raw)
  To: Elena Reshetova; +Cc: peterz, linux-kernel, keescook, Alan Stern, Dmitry Vyukov

On Mon, Jan 28, 2019 at 02:09:37PM +0200, Elena Reshetova wrote:
> This adds an smp_acquire__after_ctrl_dep() barrier on successful
> decrease of refcounter value from 1 to 0 for refcount_dec(sub)_and_test
> variants and therefore gives stronger memory ordering guarantees than
> prior versions of these functions.
> 
> Co-Developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>

+ Alan, Dmitry; they might also deserve a Suggested-by:  ;-)

[...]


> +An ACQUIRE memory ordering guarantees that all post loads and
> +stores (all po-later instructions) on the same CPU are
> +completed after the acquire operation. It also guarantees that all
> +po-later stores on the same CPU and all propagated stores from other CPUs
> +must propagate to all other CPUs after the acquire operation
> +(A-cumulative property).

Mmh, this property (A-cumulativity) isn't really associated to ACQUIREs
in the LKMM; I'd suggest to simply remove the last sentence.

[...]


> diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
> index dbaed55..ab8f584 100644
> --- a/arch/x86/include/asm/refcount.h
> +++ b/arch/x86/include/asm/refcount.h
> @@ -67,16 +67,29 @@ static __always_inline void refcount_dec(refcount_t *r)
>  static __always_inline __must_check
>  bool refcount_sub_and_test(unsigned int i, refcount_t *r)
>  {
> -	return GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
> +	bool ret = GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
>  					 REFCOUNT_CHECK_LT_ZERO,
>  					 r->refs.counter, e, "er", i, "cx");
> +
> +    if (ret) {
> +               smp_acquire__after_ctrl_dep();
> +               return true;
> +    }
> +
> +    return false;

There appears to be some white-space damage (here and in other places);
checkpatch.pl should point these and other style problems out.

  Andrea


>  }
>  
>  static __always_inline __must_check bool refcount_dec_and_test(refcount_t *r)
>  {
> -	return GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl",
> -					REFCOUNT_CHECK_LT_ZERO,
> -					r->refs.counter, e, "cx");
> +	bool ret = GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl",
> +                   REFCOUNT_CHECK_LT_ZERO,
> +                   r->refs.counter, e, "cx");
> +    if (ret) {
> +               smp_acquire__after_ctrl_dep();
> +               return true;
> +    }
> +
> +    return false;
>  }
>  
>  static __always_inline __must_check
> diff --git a/lib/refcount.c b/lib/refcount.c
> index ebcf8cd..732feac 100644
> --- a/lib/refcount.c
> +++ b/lib/refcount.c
> @@ -33,6 +33,9 @@
>   * Note that the allocator is responsible for ordering things between free()
>   * and alloc().
>   *
> + * The decrements dec_and_test() and sub_and_test() also provide acquire
> + * ordering on success. 
> + *
>   */
>  
>  #include <linux/mutex.h>
> @@ -164,8 +167,7 @@ EXPORT_SYMBOL(refcount_inc_checked);
>   * at UINT_MAX.
>   *
>   * Provides release memory ordering, such that prior loads and stores are done
> - * before, and provides a control dependency such that free() must come after.
> - * See the comment on top.
> + * before, and provides an acquire ordering on success such that free() must come after.
>   *
>   * Use of this function is not recommended for the normal reference counting
>   * use case in which references are taken and released one at a time.  In these
> @@ -190,7 +192,12 @@ bool refcount_sub_and_test_checked(unsigned int i, refcount_t *r)
>  
>  	} while (!atomic_try_cmpxchg_release(&r->refs, &val, new));
>  
> -	return !new;
> +    if (!new) {
> +               smp_acquire__after_ctrl_dep();
> +               return true;
> +    }
> +    return false;
> +
>  }
>  EXPORT_SYMBOL(refcount_sub_and_test_checked);
>  
> @@ -202,8 +209,7 @@ EXPORT_SYMBOL(refcount_sub_and_test_checked);
>   * decrement when saturated at UINT_MAX.
>   *
>   * Provides release memory ordering, such that prior loads and stores are done
> - * before, and provides a control dependency such that free() must come after.
> - * See the comment on top.
> + * before, and provides an acquire ordering on success such that free() must come after.
>   *
>   * Return: true if the resulting refcount is 0, false otherwise
>   */
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants
  2019-01-28 14:29 ` Andrea Parri
@ 2019-01-29  9:51   ` Peter Zijlstra
  2019-01-29 13:39     ` Reshetova, Elena
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2019-01-29  9:51 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Elena Reshetova, linux-kernel, keescook, Alan Stern, Dmitry Vyukov

On Mon, Jan 28, 2019 at 03:29:10PM +0100, Andrea Parri wrote:

> > diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
> > index dbaed55..ab8f584 100644
> > --- a/arch/x86/include/asm/refcount.h
> > +++ b/arch/x86/include/asm/refcount.h
> > @@ -67,16 +67,29 @@ static __always_inline void refcount_dec(refcount_t *r)
> >  static __always_inline __must_check
> >  bool refcount_sub_and_test(unsigned int i, refcount_t *r)
> >  {
> > -	return GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
> > +	bool ret = GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
> >  					 REFCOUNT_CHECK_LT_ZERO,
> >  					 r->refs.counter, e, "er", i, "cx");
> > +
> > +    if (ret) {
> > +               smp_acquire__after_ctrl_dep();
> > +               return true;
> > +    }
> > +
> > +    return false;
> 
> There appears to be some white-space damage (here and in other places);
> checkpatch.pl should point these and other style problems out.

It's worse...

patch: **** malformed patch at line 81: diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h

And yes, there's a lot of whitespace damage all around. Lots of trailing
spaces too.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants
  2019-01-29  9:51   ` Peter Zijlstra
@ 2019-01-29 13:39     ` Reshetova, Elena
  0 siblings, 0 replies; 12+ messages in thread
From: Reshetova, Elena @ 2019-01-29 13:39 UTC (permalink / raw)
  To: Peter Zijlstra, Andrea Parri
  Cc: linux-kernel, keescook, Alan Stern, Dmitry Vyukov

 
> On Mon, Jan 28, 2019 at 03:29:10PM +0100, Andrea Parri wrote:
> 
> > > diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
> > > index dbaed55..ab8f584 100644
> > > --- a/arch/x86/include/asm/refcount.h
> > > +++ b/arch/x86/include/asm/refcount.h
> > > @@ -67,16 +67,29 @@ static __always_inline void refcount_dec(refcount_t *r)
> > >  static __always_inline __must_check
> > >  bool refcount_sub_and_test(unsigned int i, refcount_t *r)
> > >  {
> > > -	return GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
> > > +	bool ret = GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
> > >
> REFCOUNT_CHECK_LT_ZERO,
> > >  					 r-
> >refs.counter, e, "er", i, "cx");
> > > +
> > > +    if (ret) {
> > > +               smp_acquire__after_ctrl_dep();
> > > +               return true;
> > > +    }
> > > +
> > > +    return false;
> >
> > There appears to be some white-space damage (here and in other places);
> > checkpatch.pl should point these and other style problems out.
> 
> It's worse...
> 
> patch: **** malformed patch at line 81: diff --git
> a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
> 
> And yes, there's a lot of whitespace damage all around. Lots of trailing
> spaces too.


I am very sorry about this, smth is really wrong with my system, in addition to all
above, I haven't even received Andrea reply to my inbox, neither this patch itself. 

I will fix all the whitespacing/trailing stuff and address this comment from Andrea:

"Mmh, this property (A-cumulativity) isn't really associated to ACQUIREs
in the LKMM; I'd suggest to simply remove the last sentence."

Anything else that needs fixing, content-wise? 

Best Regards,
Elena.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants
  2019-01-28 12:09 [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants Elena Reshetova
  2019-01-28 14:29 ` Andrea Parri
@ 2019-01-29 14:00 ` Dmitry Vyukov
  2019-01-29 17:37   ` Reshetova, Elena
  1 sibling, 1 reply; 12+ messages in thread
From: Dmitry Vyukov @ 2019-01-29 14:00 UTC (permalink / raw)
  To: Elena Reshetova; +Cc: Peter Zijlstra, LKML, Kees Cook, Andrea Parri

On Mon, Jan 28, 2019 at 1:10 PM Elena Reshetova
<elena.reshetova@intel.com> wrote:
>
> This adds an smp_acquire__after_ctrl_dep() barrier on successful
> decrease of refcounter value from 1 to 0 for refcount_dec(sub)_and_test
> variants and therefore gives stronger memory ordering guarantees than
> prior versions of these functions.
>
> Co-Developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> ---
>  Documentation/core-api/refcount-vs-atomic.rst | 28 +++++++++++++++++++++++----
>  arch/x86/include/asm/refcount.h               | 21 ++++++++++++++++----
>  lib/refcount.c                                | 16 ++++++++++-----
>  3 files changed, 52 insertions(+), 13 deletions(-)
>
> diff --git a/Documentation/core-api/refcount-vs-atomic.rst b/Documentation/core-api/refcount-vs-atomic.rst
> index 322851b..95d4b4e 100644
> --- a/Documentation/core-api/refcount-vs-atomic.rst
> +++ b/Documentation/core-api/refcount-vs-atomic.rst
> @@ -54,6 +54,14 @@ must propagate to all other CPUs before the release operation
>  (A-cumulative property). This is implemented using
>  :c:func:`smp_store_release`.
>
> +An ACQUIRE memory ordering guarantees that all post loads and
> +stores (all po-later instructions) on the same CPU are
> +completed after the acquire operation. It also guarantees that all
> +po-later stores on the same CPU and all propagated stores from other CPUs
> +must propagate to all other CPUs after the acquire operation
> +(A-cumulative property). This is implemented using
> +:c:func:`smp_acquire__after_ctrl_dep`.

The second part starting from "It also guarantees that". I am not sure
I understand what it means. Is it just a copy-paste from RELEASE? I am
not sure ACQUIRE provides anything like this.


> +
>  A control dependency (on success) for refcounters guarantees that
>  if a reference for an object was successfully obtained (reference
>  counter increment or addition happened, function returned true),
> @@ -119,24 +127,36 @@ Memory ordering guarantees changes:
>     result of obtaining pointer to the object!
>
>
> -case 5) - decrement-based RMW ops that return a value
> ------------------------------------------------------
> +case 5) - generic dec/sub decrement-based RMW ops that return a value
> +---------------------------------------------------------------------
>
>  Function changes:
>
>   * :c:func:`atomic_dec_and_test` --> :c:func:`refcount_dec_and_test`
>   * :c:func:`atomic_sub_and_test` --> :c:func:`refcount_sub_and_test`
> +
> +Memory ordering guarantees changes:
> +
> + * fully ordered --> RELEASE ordering + ACQUIRE ordering and control dependency
> +   on success.

Is ACQUIRE strictly stronger than control dependency?
It generally looks so unless there is something very subtle that I am
missing. If so, should we replace it with just "RELEASE ordering +
ACQUIRE ordering on success"? Looks simpler with less magic trickery.


> +
> +
> +case 6) other decrement-based RMW ops that return a value
> +---------------------------------------------------------
> +
> +Function changes:
> +
>   * no atomic counterpart --> :c:func:`refcount_dec_if_one`
>   * ``atomic_add_unless(&var, -1, 1)`` --> ``refcount_dec_not_one(&var)``
>
>  Memory ordering guarantees changes:
>
> - * fully ordered --> RELEASE ordering + control dependency
> + * fully ordered --> RELEASE ordering + control dependency
>
>  .. note:: :c:func:`atomic_add_unless` only provides full order on success.
>
>
> -case 6) - lock-based RMW
> +case 7) - lock-based RMW
>  ------------------------
>
>  Function changes:
> diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
> index dbaed55..ab8f584 100644
> --- a/arch/x86/include/asm/refcount.h
> +++ b/arch/x86/include/asm/refcount.h
> @@ -67,16 +67,29 @@ static __always_inline void refcount_dec(refcount_t *r)
>  static __always_inline __must_check
>  bool refcount_sub_and_test(unsigned int i, refcount_t *r)
>  {
> -       return GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
> +       bool ret = GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
>                                          REFCOUNT_CHECK_LT_ZERO,
>                                          r->refs.counter, e, "er", i, "cx");
> +
> +    if (ret) {
> +               smp_acquire__after_ctrl_dep();
> +               return true;
> +    }
> +
> +    return false;
>  }
>
>  static __always_inline __must_check bool refcount_dec_and_test(refcount_t *r)
>  {
> -       return GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl",
> -                                       REFCOUNT_CHECK_LT_ZERO,
> -                                       r->refs.counter, e, "cx");
> +       bool ret = GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl",
> +                   REFCOUNT_CHECK_LT_ZERO,
> +                   r->refs.counter, e, "cx");
> +    if (ret) {
> +               smp_acquire__after_ctrl_dep();
> +               return true;
> +    }
> +
> +    return false;
>  }
>
>  static __always_inline __must_check
> diff --git a/lib/refcount.c b/lib/refcount.c
> index ebcf8cd..732feac 100644
> --- a/lib/refcount.c
> +++ b/lib/refcount.c
> @@ -33,6 +33,9 @@
>   * Note that the allocator is responsible for ordering things between free()
>   * and alloc().
>   *
> + * The decrements dec_and_test() and sub_and_test() also provide acquire
> + * ordering on success.
> + *
>   */
>
>  #include <linux/mutex.h>
> @@ -164,8 +167,7 @@ EXPORT_SYMBOL(refcount_inc_checked);
>   * at UINT_MAX.
>   *
>   * Provides release memory ordering, such that prior loads and stores are done
> - * before, and provides a control dependency such that free() must come after.
> - * See the comment on top.
> + * before, and provides an acquire ordering on success such that free() must come after.
>   *
>   * Use of this function is not recommended for the normal reference counting
>   * use case in which references are taken and released one at a time.  In these
> @@ -190,7 +192,12 @@ bool refcount_sub_and_test_checked(unsigned int i, refcount_t *r)
>
>         } while (!atomic_try_cmpxchg_release(&r->refs, &val, new));
>
> -       return !new;
> +    if (!new) {
> +               smp_acquire__after_ctrl_dep();
> +               return true;
> +    }
> +    return false;
> +
>  }
>  EXPORT_SYMBOL(refcount_sub_and_test_checked);
>
> @@ -202,8 +209,7 @@ EXPORT_SYMBOL(refcount_sub_and_test_checked);
>   * decrement when saturated at UINT_MAX.
>   *
>   * Provides release memory ordering, such that prior loads and stores are done
> - * before, and provides a control dependency such that free() must come after.
> - * See the comment on top.
> + * before, and provides an acquire ordering on success such that free() must come after.
>   *
>   * Return: true if the resulting refcount is 0, false otherwise
>   */
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants
  2019-01-29 14:00 ` Dmitry Vyukov
@ 2019-01-29 17:37   ` Reshetova, Elena
  2019-01-30  3:33     ` Andrea Parri
  0 siblings, 1 reply; 12+ messages in thread
From: Reshetova, Elena @ 2019-01-29 17:37 UTC (permalink / raw)
  To: Dmitry Vyukov; +Cc: Peter Zijlstra, LKML, Kees Cook, Andrea Parri


> On Mon, Jan 28, 2019 at 1:10 PM Elena Reshetova
> <elena.reshetova@intel.com> wrote:
> >
> > This adds an smp_acquire__after_ctrl_dep() barrier on successful
> > decrease of refcounter value from 1 to 0 for refcount_dec(sub)_and_test
> > variants and therefore gives stronger memory ordering guarantees than
> > prior versions of these functions.
> >
> > Co-Developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> > ---
> >  Documentation/core-api/refcount-vs-atomic.rst | 28
> +++++++++++++++++++++++----
> >  arch/x86/include/asm/refcount.h               | 21 ++++++++++++++++----
> >  lib/refcount.c                                | 16 ++++++++++-----
> >  3 files changed, 52 insertions(+), 13 deletions(-)
> >
> > diff --git a/Documentation/core-api/refcount-vs-atomic.rst
> b/Documentation/core-api/refcount-vs-atomic.rst
> > index 322851b..95d4b4e 100644
> > --- a/Documentation/core-api/refcount-vs-atomic.rst
> > +++ b/Documentation/core-api/refcount-vs-atomic.rst
> > @@ -54,6 +54,14 @@ must propagate to all other CPUs before the release
> operation
> >  (A-cumulative property). This is implemented using
> >  :c:func:`smp_store_release`.
> >
> > +An ACQUIRE memory ordering guarantees that all post loads and
> > +stores (all po-later instructions) on the same CPU are
> > +completed after the acquire operation. It also guarantees that all
> > +po-later stores on the same CPU and all propagated stores from other CPUs
> > +must propagate to all other CPUs after the acquire operation
> > +(A-cumulative property). This is implemented using
> > +:c:func:`smp_acquire__after_ctrl_dep`.
> 
> The second part starting from "It also guarantees that". I am not sure
> I understand what it means. Is it just a copy-paste from RELEASE? I am
> not sure ACQUIRE provides anything like this.
> 	

So, you are saying that ACQUIRE does not guarantee that "po-later stores
on the same CPU and all propagated stores from other CPUs
must propagate to all other CPUs after the acquire operation "? 
I was reading about acquire before posting this and trying to understand,
and this was my conclusion that it should provide this, but I can easily be wrong
on this. 

Andrea, Peter, could you please comment?


> 
> > +
> >  A control dependency (on success) for refcounters guarantees that
> >  if a reference for an object was successfully obtained (reference
> >  counter increment or addition happened, function returned true),
> > @@ -119,24 +127,36 @@ Memory ordering guarantees changes:
> >     result of obtaining pointer to the object!
> >
> >
> > -case 5) - decrement-based RMW ops that return a value
> > ------------------------------------------------------
> > +case 5) - generic dec/sub decrement-based RMW ops that return a value
> > +---------------------------------------------------------------------
> >
> >  Function changes:
> >
> >   * :c:func:`atomic_dec_and_test` --> :c:func:`refcount_dec_and_test`
> >   * :c:func:`atomic_sub_and_test` --> :c:func:`refcount_sub_and_test`
> > +
> > +Memory ordering guarantees changes:
> > +
> > + * fully ordered --> RELEASE ordering + ACQUIRE ordering and control
> dependency
> > +   on success.
> 
> Is ACQUIRE strictly stronger than control dependency?

In my understanding yes.

> It generally looks so unless there is something very subtle that I am
> missing. If so, should we replace it with just "RELEASE ordering +
> ACQUIRE ordering on success"? Looks simpler with less magic trickery.

I was just trying to mention all the applicable orderings/guarantees. 
I can remove "control dependency" part if it is easier for people to understand
(the main goal of documentation).

Best Regards,
Elena.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants
  2019-01-29 17:37   ` Reshetova, Elena
@ 2019-01-30  3:33     ` Andrea Parri
  2019-01-30 10:19       ` Reshetova, Elena
  0 siblings, 1 reply; 12+ messages in thread
From: Andrea Parri @ 2019-01-30  3:33 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Dmitry Vyukov, Peter Zijlstra, LKML, Kees Cook, Alan Stern

> So, you are saying that ACQUIRE does not guarantee that "po-later stores
> on the same CPU and all propagated stores from other CPUs
> must propagate to all other CPUs after the acquire operation "? 
> I was reading about acquire before posting this and trying to understand,
> and this was my conclusion that it should provide this, but I can easily be wrong
> on this. 
> 
> Andrea, Peter, could you please comment?

Short version:  I am not convinced by the above sentence, and I suggest
to remove it (as done in

  http://lkml.kernel.org/r/20190128142910.GA7232@andrea ).

---
To elaborate:  I think that we should first discuss the meaning of that
"[...] after the acquire operation (does)",  because there is no notion
of "ACQUIRE (or more generally, load) propagation" in the LKMM:

Stores propagate (after being executed) to other CPUs.  Loads _execute_
(possibly multiple times /speculatively, but this is irrelevant for the
discussion below).

A detailed, but still informal, description of these concepts is in:

  tools/memory-model/Documentation/explanation.txt

(c.f., in particular, section "AN OPERATIONAL MODEL"); I can illustrate
them with an example:

	{ initially: x=0, y=0; }

	CPU0			CPU1
	--------------------------------------
	LOAD-ACQUIRE x=0	LOAD y=1
	STORE y=1

In this scenario,

  a) CPU0's "LOAD-ACQUIRE x=0" executes before CPU0's "STORE y=1"
     executes (this is guaranteed by the ACQUIRE),

  b) CPU0's "STORE y=1" executes before "STORE y=1" propagates to
     CPU1 (a store cannot be propagated before being executed),

  c) CPU0's "STORE y=1" propagates to CPU1 before CPU1's "LOAD y=1"
     executes (since CPU1 "sees the store"). 

The example also illustrates the following property:

  ACQUIRE guarantees that po-later stores on the same CPU must
  propagate to all other CPUs after the acquire _executes_.

(combine (a) and (b) ).

OTOH, please notice that:

  ACQUIRE does _NOT_ guarantee that all propagated stores from
  other CPUs (to the CPU executing the ACQUIRE) must propagate
  to all other CPUs after the acquire operation _executes_.

In fact, we've already seen how full barriers can be used to break such
"guarantee"; for example, in

	{ initially: x=0, y=0; }

	CPU0			CPU1			...
	---------------------------------------------------
	STORE x=1		LOAD x=1	
				FULL-BARRIER
				LOAD-ACQUIRE y=0

the full barrier forces CPU0's "STORE x=1" (seen by/propagated to CPU1)
to be propagated to all CPUs _before_ "LOAD-ACQUIRE y=0" is executed.

Does this make sense?


> > Is ACQUIRE strictly stronger than control dependency?
> 
> In my understanding yes.

+1 (or we have a problem)


>
> > It generally looks so unless there is something very subtle that I am
> > missing. If so, should we replace it with just "RELEASE ordering +
> > ACQUIRE ordering on success"? Looks simpler with less magic trickery.
> 
> I was just trying to mention all the applicable orderings/guarantees. 
> I can remove "control dependency" part if it is easier for people to understand
> (the main goal of documentation).

This sounds like a good idea; thank you, Dmitry, for pointing this out.

  Andrea


> 
> Best Regards,
> Elena.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants
  2019-01-30  3:33     ` Andrea Parri
@ 2019-01-30 10:19       ` Reshetova, Elena
  2019-01-30 10:34         ` Dmitry Vyukov
  0 siblings, 1 reply; 12+ messages in thread
From: Reshetova, Elena @ 2019-01-30 10:19 UTC (permalink / raw)
  To: Andrea Parri; +Cc: Dmitry Vyukov, Peter Zijlstra, LKML, Kees Cook, Alan Stern

 > So, you are saying that ACQUIRE does not guarantee that "po-later stores
> > on the same CPU and all propagated stores from other CPUs
> > must propagate to all other CPUs after the acquire operation "?
> > I was reading about acquire before posting this and trying to understand,
> > and this was my conclusion that it should provide this, but I can easily be wrong
> > on this.
> >
> > Andrea, Peter, could you please comment?
> 
> Short version:  I am not convinced by the above sentence, and I suggest
> to remove it (as done in
> 
>   http://lkml.kernel.org/r/20190128142910.GA7232@andrea ).

Sorry, I misunderstood your previous email on this. I somehow misread it
that " A-cumulative property" as a notion that is not used in LKMM for ACQUIRE,
so I should not mention the notion, but the guarantees stay, but it is guarantees
that are also wrong, which is much worse. 

> 
> ---
> To elaborate:  I think that we should first discuss the meaning of that
> "[...] after the acquire operation (does)",  because there is no notion
> of "ACQUIRE (or more generally, load) propagation" in the LKMM:
> 
> Stores propagate (after being executed) to other CPUs.  Loads _execute_
> (possibly multiple times /speculatively, but this is irrelevant for the
> discussion below).
> 
> A detailed, but still informal, description of these concepts is in:
> 
>   tools/memory-model/Documentation/explanation.txt
> 
> (c.f., in particular, section "AN OPERATIONAL MODEL"); I can illustrate
> them with an example:
> 
> 	{ initially: x=0, y=0; }
> 
> 	CPU0			CPU1
> 	--------------------------------------
> 	LOAD-ACQUIRE x=0	LOAD y=1
> 	STORE y=1
> 
> In this scenario,
> 
>   a) CPU0's "LOAD-ACQUIRE x=0" executes before CPU0's "STORE y=1"
>      executes (this is guaranteed by the ACQUIRE),
> 
>   b) CPU0's "STORE y=1" executes before "STORE y=1" propagates to
>      CPU1 (a store cannot be propagated before being executed),
> 
>   c) CPU0's "STORE y=1" propagates to CPU1 before CPU1's "LOAD y=1"
>      executes (since CPU1 "sees the store").
> 
> The example also illustrates the following property:
> 
>   ACQUIRE guarantees that po-later stores on the same CPU must
>   propagate to all other CPUs after the acquire _executes_.
> 
> (combine (a) and (b) ).
> 
> OTOH, please notice that:
> 
>   ACQUIRE does _NOT_ guarantee that all propagated stores from
>   other CPUs (to the CPU executing the ACQUIRE) must propagate
>   to all other CPUs after the acquire operation _executes_.

Thank you very much Andrea, this example and explanation clarifies it nicely!
So Acquire only really affects the current CPU "view of the world" and operation
propagation from it, and not anything else, which is actually very logical.

My initial confusion was because I was thinking of ACQUIRE as a pair
for RELEASE, i.e. it should provide a complementary guarantees to 
 RELEASE ones, just on po-later operations. 

> 
> In fact, we've already seen how full barriers can be used to break such
> "guarantee"; for example, in
> 
> 	{ initially: x=0, y=0; }
> 
> 	CPU0			CPU1
> 		...
> 	---------------------------------------------------
> 	STORE x=1		LOAD x=1
> 				FULL-BARRIER
> 				LOAD-ACQUIRE y=0
> 
> the full barrier forces CPU0's "STORE x=1" (seen by/propagated to CPU1)
> to be propagated to all CPUs _before_ "LOAD-ACQUIRE y=0" is executed.
> 
> Does this make sense?

Yes, thank you again! I think it would take me still a long while to be familiar
with all these notions and not to be confused even in simple things. 

> 
> 
> > > Is ACQUIRE strictly stronger than control dependency?
> >
> > In my understanding yes.
> 
> +1 (or we have a problem)
> 
> 
> >
> > > It generally looks so unless there is something very subtle that I am
> > > missing. If so, should we replace it with just "RELEASE ordering +
> > > ACQUIRE ordering on success"? Looks simpler with less magic trickery.
> >
> > I was just trying to mention all the applicable orderings/guarantees.
> > I can remove "control dependency" part if it is easier for people to understand
> > (the main goal of documentation).
> 
> This sounds like a good idea; thank you, Dmitry, for pointing this out.

I will remove it. So, the rule that we always mention the strongest type of barrier
When we mention some ordering guarantees, right? 

Best Regards,
Elena.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants
  2019-01-30 10:19       ` Reshetova, Elena
@ 2019-01-30 10:34         ` Dmitry Vyukov
  0 siblings, 0 replies; 12+ messages in thread
From: Dmitry Vyukov @ 2019-01-30 10:34 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Andrea Parri, Peter Zijlstra, LKML, Kees Cook, Alan Stern

On Wed, Jan 30, 2019 at 11:19 AM Reshetova, Elena
<elena.reshetova@intel.com> wrote:
>
>  > So, you are saying that ACQUIRE does not guarantee that "po-later stores
> > > on the same CPU and all propagated stores from other CPUs
> > > must propagate to all other CPUs after the acquire operation "?
> > > I was reading about acquire before posting this and trying to understand,
> > > and this was my conclusion that it should provide this, but I can easily be wrong
> > > on this.
> > >
> > > Andrea, Peter, could you please comment?
> >
> > Short version:  I am not convinced by the above sentence, and I suggest
> > to remove it (as done in
> >
> >   http://lkml.kernel.org/r/20190128142910.GA7232@andrea ).
>
> Sorry, I misunderstood your previous email on this. I somehow misread it
> that " A-cumulative property" as a notion that is not used in LKMM for ACQUIRE,
> so I should not mention the notion, but the guarantees stay, but it is guarantees
> that are also wrong, which is much worse.
>
> >
> > ---
> > To elaborate:  I think that we should first discuss the meaning of that
> > "[...] after the acquire operation (does)",  because there is no notion
> > of "ACQUIRE (or more generally, load) propagation" in the LKMM:
> >
> > Stores propagate (after being executed) to other CPUs.  Loads _execute_
> > (possibly multiple times /speculatively, but this is irrelevant for the
> > discussion below).
> >
> > A detailed, but still informal, description of these concepts is in:
> >
> >   tools/memory-model/Documentation/explanation.txt
> >
> > (c.f., in particular, section "AN OPERATIONAL MODEL"); I can illustrate
> > them with an example:
> >
> >       { initially: x=0, y=0; }
> >
> >       CPU0                    CPU1
> >       --------------------------------------
> >       LOAD-ACQUIRE x=0        LOAD y=1
> >       STORE y=1
> >
> > In this scenario,
> >
> >   a) CPU0's "LOAD-ACQUIRE x=0" executes before CPU0's "STORE y=1"
> >      executes (this is guaranteed by the ACQUIRE),
> >
> >   b) CPU0's "STORE y=1" executes before "STORE y=1" propagates to
> >      CPU1 (a store cannot be propagated before being executed),
> >
> >   c) CPU0's "STORE y=1" propagates to CPU1 before CPU1's "LOAD y=1"
> >      executes (since CPU1 "sees the store").
> >
> > The example also illustrates the following property:
> >
> >   ACQUIRE guarantees that po-later stores on the same CPU must
> >   propagate to all other CPUs after the acquire _executes_.
> >
> > (combine (a) and (b) ).
> >
> > OTOH, please notice that:
> >
> >   ACQUIRE does _NOT_ guarantee that all propagated stores from
> >   other CPUs (to the CPU executing the ACQUIRE) must propagate
> >   to all other CPUs after the acquire operation _executes_.
>
> Thank you very much Andrea, this example and explanation clarifies it nicely!
> So Acquire only really affects the current CPU "view of the world" and operation
> propagation from it, and not anything else, which is actually very logical.
>
> My initial confusion was because I was thinking of ACQUIRE as a pair
> for RELEASE, i.e. it should provide a complementary guarantees to
>  RELEASE ones, just on po-later operations.
>
> >
> > In fact, we've already seen how full barriers can be used to break such
> > "guarantee"; for example, in
> >
> >       { initially: x=0, y=0; }
> >
> >       CPU0                    CPU1
> >               ...
> >       ---------------------------------------------------
> >       STORE x=1               LOAD x=1
> >                               FULL-BARRIER
> >                               LOAD-ACQUIRE y=0
> >
> > the full barrier forces CPU0's "STORE x=1" (seen by/propagated to CPU1)
> > to be propagated to all CPUs _before_ "LOAD-ACQUIRE y=0" is executed.
> >
> > Does this make sense?
>
> Yes, thank you again! I think it would take me still a long while to be familiar
> with all these notions and not to be confused even in simple things.
>
> >
> >
> > > > Is ACQUIRE strictly stronger than control dependency?
> > >
> > > In my understanding yes.
> >
> > +1 (or we have a problem)
> >
> >
> > >
> > > > It generally looks so unless there is something very subtle that I am
> > > > missing. If so, should we replace it with just "RELEASE ordering +
> > > > ACQUIRE ordering on success"? Looks simpler with less magic trickery.
> > >
> > > I was just trying to mention all the applicable orderings/guarantees.
> > > I can remove "control dependency" part if it is easier for people to understand
> > > (the main goal of documentation).
> >
> > This sounds like a good idea; thank you, Dmitry, for pointing this out.
>
> I will remove it. So, the rule that we always mention the strongest type of barrier
> When we mention some ordering guarantees, right?


My reasoning here was that control dependency is just a very subtle
thing so I think it's better if people just not see it at all and not
start thinking in terms of control dependencies until absolutely
necessary.

I am not sure how to generalize this. There are not too many other
cases where one barrier type is a full superset of another. E.g.
rmb/wmb are orthogonal to acquire/release.

But if we take full barrier, then, yes, it definitely makes sense to
just say that an operation provides full barrier rather than full
barrier, acquire barrier, release barrier, read barrier, write
barrier, control dependency, ... :)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants
  2019-01-30 12:31   ` Andrea Parri
@ 2019-01-30 14:21     ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2019-01-30 14:21 UTC (permalink / raw)
  To: Andrea Parri; +Cc: Elena Reshetova, linux-kernel, dvyukov, stern, keescook

On Wed, Jan 30, 2019 at 01:31:31PM +0100, Andrea Parri wrote:
> On Wed, Jan 30, 2019 at 01:18:51PM +0200, Elena Reshetova wrote:
> > This adds an smp_acquire__after_ctrl_dep() barrier on successful
> > decrease of refcounter value from 1 to 0 for refcount_dec(sub)_and_test
> > variants and therefore gives stronger memory ordering guarantees than
> > prior versions of these functions.
> > 
> > Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
> 
> Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com>

Thanks, got it queued now.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants
  2019-01-30 11:18 ` [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants Elena Reshetova
@ 2019-01-30 12:31   ` Andrea Parri
  2019-01-30 14:21     ` Peter Zijlstra
  0 siblings, 1 reply; 12+ messages in thread
From: Andrea Parri @ 2019-01-30 12:31 UTC (permalink / raw)
  To: Elena Reshetova; +Cc: peterz, linux-kernel, dvyukov, stern, keescook

On Wed, Jan 30, 2019 at 01:18:51PM +0200, Elena Reshetova wrote:
> This adds an smp_acquire__after_ctrl_dep() barrier on successful
> decrease of refcounter value from 1 to 0 for refcount_dec(sub)_and_test
> variants and therefore gives stronger memory ordering guarantees than
> prior versions of these functions.
> 
> Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>

Reviewed-by: Andrea Parri <andrea.parri@amarulasolutions.com>

  Andrea


> ---
>  Documentation/core-api/refcount-vs-atomic.rst | 24 +++++++++++++++++++++---
>  arch/x86/include/asm/refcount.h               | 22 ++++++++++++++++++----
>  lib/refcount.c                                | 18 +++++++++++++-----
>  3 files changed, 52 insertions(+), 12 deletions(-)
> 
> diff --git a/Documentation/core-api/refcount-vs-atomic.rst b/Documentation/core-api/refcount-vs-atomic.rst
> index 322851b..976e85a 100644
> --- a/Documentation/core-api/refcount-vs-atomic.rst
> +++ b/Documentation/core-api/refcount-vs-atomic.rst
> @@ -54,6 +54,13 @@ must propagate to all other CPUs before the release operation
>  (A-cumulative property). This is implemented using
>  :c:func:`smp_store_release`.
>  
> +An ACQUIRE memory ordering guarantees that all post loads and
> +stores (all po-later instructions) on the same CPU are
> +completed after the acquire operation. It also guarantees that all
> +po-later stores on the same CPU must propagate to all other CPUs
> +after the acquire operation executes. This is implemented using
> +:c:func:`smp_acquire__after_ctrl_dep`.
> +
>  A control dependency (on success) for refcounters guarantees that
>  if a reference for an object was successfully obtained (reference
>  counter increment or addition happened, function returned true),
> @@ -119,13 +126,24 @@ Memory ordering guarantees changes:
>     result of obtaining pointer to the object!
>  
>  
> -case 5) - decrement-based RMW ops that return a value
> ------------------------------------------------------
> +case 5) - generic dec/sub decrement-based RMW ops that return a value
> +---------------------------------------------------------------------
>  
>  Function changes:
>  
>   * :c:func:`atomic_dec_and_test` --> :c:func:`refcount_dec_and_test`
>   * :c:func:`atomic_sub_and_test` --> :c:func:`refcount_sub_and_test`
> +
> +Memory ordering guarantees changes:
> +
> + * fully ordered --> RELEASE ordering + ACQUIRE ordering on success
> +
> +
> +case 6) other decrement-based RMW ops that return a value
> +---------------------------------------------------------
> +
> +Function changes:
> +
>   * no atomic counterpart --> :c:func:`refcount_dec_if_one`
>   * ``atomic_add_unless(&var, -1, 1)`` --> ``refcount_dec_not_one(&var)``
>  
> @@ -136,7 +154,7 @@ Memory ordering guarantees changes:
>  .. note:: :c:func:`atomic_add_unless` only provides full order on success.
>  
>  
> -case 6) - lock-based RMW
> +case 7) - lock-based RMW
>  ------------------------
>  
>  Function changes:
> diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
> index dbaed55..232f856 100644
> --- a/arch/x86/include/asm/refcount.h
> +++ b/arch/x86/include/asm/refcount.h
> @@ -67,16 +67,30 @@ static __always_inline void refcount_dec(refcount_t *r)
>  static __always_inline __must_check
>  bool refcount_sub_and_test(unsigned int i, refcount_t *r)
>  {
> -	return GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
> +	bool ret = GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
>  					 REFCOUNT_CHECK_LT_ZERO,
>  					 r->refs.counter, e, "er", i, "cx");
> +
> +	if (ret) {
> +		smp_acquire__after_ctrl_dep();
> +		return true;
> +	}
> +
> +	return false;
>  }
>  
>  static __always_inline __must_check bool refcount_dec_and_test(refcount_t *r)
>  {
> -	return GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl",
> -					REFCOUNT_CHECK_LT_ZERO,
> -					r->refs.counter, e, "cx");
> +	bool ret = GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl",
> +					 REFCOUNT_CHECK_LT_ZERO,
> +					 r->refs.counter, e, "cx");
> +
> +	if (ret) {
> +		smp_acquire__after_ctrl_dep();
> +		return true;
> +	}
> +
> +	return false;
>  }
>  
>  static __always_inline __must_check
> diff --git a/lib/refcount.c b/lib/refcount.c
> index ebcf8cd..6e904af 100644
> --- a/lib/refcount.c
> +++ b/lib/refcount.c
> @@ -33,6 +33,9 @@
>   * Note that the allocator is responsible for ordering things between free()
>   * and alloc().
>   *
> + * The decrements dec_and_test() and sub_and_test() also provide acquire
> + * ordering on success.
> + *
>   */
>  
>  #include <linux/mutex.h>
> @@ -164,8 +167,8 @@ EXPORT_SYMBOL(refcount_inc_checked);
>   * at UINT_MAX.
>   *
>   * Provides release memory ordering, such that prior loads and stores are done
> - * before, and provides a control dependency such that free() must come after.
> - * See the comment on top.
> + * before, and provides an acquire ordering on success such that free()
> + * must come after.
>   *
>   * Use of this function is not recommended for the normal reference counting
>   * use case in which references are taken and released one at a time.  In these
> @@ -190,7 +193,12 @@ bool refcount_sub_and_test_checked(unsigned int i, refcount_t *r)
>  
>  	} while (!atomic_try_cmpxchg_release(&r->refs, &val, new));
>  
> -	return !new;
> +	if (!new) {
> +		smp_acquire__after_ctrl_dep();
> +		return true;
> +	}
> +	return false;
> +
>  }
>  EXPORT_SYMBOL(refcount_sub_and_test_checked);
>  
> @@ -202,8 +210,8 @@ EXPORT_SYMBOL(refcount_sub_and_test_checked);
>   * decrement when saturated at UINT_MAX.
>   *
>   * Provides release memory ordering, such that prior loads and stores are done
> - * before, and provides a control dependency such that free() must come after.
> - * See the comment on top.
> + * before, and provides an acquire ordering on success such that free()
> + * must come after.
>   *
>   * Return: true if the resulting refcount is 0, false otherwise
>   */
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants
  2019-01-30 11:18 [PATCH v2] Adding smp_acquire__after_ctrl_dep barrier Elena Reshetova
@ 2019-01-30 11:18 ` Elena Reshetova
  2019-01-30 12:31   ` Andrea Parri
  0 siblings, 1 reply; 12+ messages in thread
From: Elena Reshetova @ 2019-01-30 11:18 UTC (permalink / raw)
  To: peterz
  Cc: linux-kernel, dvyukov, andrea.parri, stern, keescook, Elena Reshetova

This adds an smp_acquire__after_ctrl_dep() barrier on successful
decrease of refcounter value from 1 to 0 for refcount_dec(sub)_and_test
variants and therefore gives stronger memory ordering guarantees than
prior versions of these functions.

Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 Documentation/core-api/refcount-vs-atomic.rst | 24 +++++++++++++++++++++---
 arch/x86/include/asm/refcount.h               | 22 ++++++++++++++++++----
 lib/refcount.c                                | 18 +++++++++++++-----
 3 files changed, 52 insertions(+), 12 deletions(-)

diff --git a/Documentation/core-api/refcount-vs-atomic.rst b/Documentation/core-api/refcount-vs-atomic.rst
index 322851b..976e85a 100644
--- a/Documentation/core-api/refcount-vs-atomic.rst
+++ b/Documentation/core-api/refcount-vs-atomic.rst
@@ -54,6 +54,13 @@ must propagate to all other CPUs before the release operation
 (A-cumulative property). This is implemented using
 :c:func:`smp_store_release`.
 
+An ACQUIRE memory ordering guarantees that all post loads and
+stores (all po-later instructions) on the same CPU are
+completed after the acquire operation. It also guarantees that all
+po-later stores on the same CPU must propagate to all other CPUs
+after the acquire operation executes. This is implemented using
+:c:func:`smp_acquire__after_ctrl_dep`.
+
 A control dependency (on success) for refcounters guarantees that
 if a reference for an object was successfully obtained (reference
 counter increment or addition happened, function returned true),
@@ -119,13 +126,24 @@ Memory ordering guarantees changes:
    result of obtaining pointer to the object!
 
 
-case 5) - decrement-based RMW ops that return a value
------------------------------------------------------
+case 5) - generic dec/sub decrement-based RMW ops that return a value
+---------------------------------------------------------------------
 
 Function changes:
 
  * :c:func:`atomic_dec_and_test` --> :c:func:`refcount_dec_and_test`
  * :c:func:`atomic_sub_and_test` --> :c:func:`refcount_sub_and_test`
+
+Memory ordering guarantees changes:
+
+ * fully ordered --> RELEASE ordering + ACQUIRE ordering on success
+
+
+case 6) other decrement-based RMW ops that return a value
+---------------------------------------------------------
+
+Function changes:
+
  * no atomic counterpart --> :c:func:`refcount_dec_if_one`
  * ``atomic_add_unless(&var, -1, 1)`` --> ``refcount_dec_not_one(&var)``
 
@@ -136,7 +154,7 @@ Memory ordering guarantees changes:
 .. note:: :c:func:`atomic_add_unless` only provides full order on success.
 
 
-case 6) - lock-based RMW
+case 7) - lock-based RMW
 ------------------------
 
 Function changes:
diff --git a/arch/x86/include/asm/refcount.h b/arch/x86/include/asm/refcount.h
index dbaed55..232f856 100644
--- a/arch/x86/include/asm/refcount.h
+++ b/arch/x86/include/asm/refcount.h
@@ -67,16 +67,30 @@ static __always_inline void refcount_dec(refcount_t *r)
 static __always_inline __must_check
 bool refcount_sub_and_test(unsigned int i, refcount_t *r)
 {
-	return GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
+	bool ret = GEN_BINARY_SUFFIXED_RMWcc(LOCK_PREFIX "subl",
 					 REFCOUNT_CHECK_LT_ZERO,
 					 r->refs.counter, e, "er", i, "cx");
+
+	if (ret) {
+		smp_acquire__after_ctrl_dep();
+		return true;
+	}
+
+	return false;
 }
 
 static __always_inline __must_check bool refcount_dec_and_test(refcount_t *r)
 {
-	return GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl",
-					REFCOUNT_CHECK_LT_ZERO,
-					r->refs.counter, e, "cx");
+	bool ret = GEN_UNARY_SUFFIXED_RMWcc(LOCK_PREFIX "decl",
+					 REFCOUNT_CHECK_LT_ZERO,
+					 r->refs.counter, e, "cx");
+
+	if (ret) {
+		smp_acquire__after_ctrl_dep();
+		return true;
+	}
+
+	return false;
 }
 
 static __always_inline __must_check
diff --git a/lib/refcount.c b/lib/refcount.c
index ebcf8cd..6e904af 100644
--- a/lib/refcount.c
+++ b/lib/refcount.c
@@ -33,6 +33,9 @@
  * Note that the allocator is responsible for ordering things between free()
  * and alloc().
  *
+ * The decrements dec_and_test() and sub_and_test() also provide acquire
+ * ordering on success.
+ *
  */
 
 #include <linux/mutex.h>
@@ -164,8 +167,8 @@ EXPORT_SYMBOL(refcount_inc_checked);
  * at UINT_MAX.
  *
  * Provides release memory ordering, such that prior loads and stores are done
- * before, and provides a control dependency such that free() must come after.
- * See the comment on top.
+ * before, and provides an acquire ordering on success such that free()
+ * must come after.
  *
  * Use of this function is not recommended for the normal reference counting
  * use case in which references are taken and released one at a time.  In these
@@ -190,7 +193,12 @@ bool refcount_sub_and_test_checked(unsigned int i, refcount_t *r)
 
 	} while (!atomic_try_cmpxchg_release(&r->refs, &val, new));
 
-	return !new;
+	if (!new) {
+		smp_acquire__after_ctrl_dep();
+		return true;
+	}
+	return false;
+
 }
 EXPORT_SYMBOL(refcount_sub_and_test_checked);
 
@@ -202,8 +210,8 @@ EXPORT_SYMBOL(refcount_sub_and_test_checked);
  * decrement when saturated at UINT_MAX.
  *
  * Provides release memory ordering, such that prior loads and stores are done
- * before, and provides a control dependency such that free() must come after.
- * See the comment on top.
+ * before, and provides an acquire ordering on success such that free()
+ * must come after.
  *
  * Return: true if the resulting refcount is 0, false otherwise
  */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-01-30 14:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-28 12:09 [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants Elena Reshetova
2019-01-28 14:29 ` Andrea Parri
2019-01-29  9:51   ` Peter Zijlstra
2019-01-29 13:39     ` Reshetova, Elena
2019-01-29 14:00 ` Dmitry Vyukov
2019-01-29 17:37   ` Reshetova, Elena
2019-01-30  3:33     ` Andrea Parri
2019-01-30 10:19       ` Reshetova, Elena
2019-01-30 10:34         ` Dmitry Vyukov
2019-01-30 11:18 [PATCH v2] Adding smp_acquire__after_ctrl_dep barrier Elena Reshetova
2019-01-30 11:18 ` [PATCH] refcount_t: add ACQUIRE ordering on success for dec(sub)_and_test variants Elena Reshetova
2019-01-30 12:31   ` Andrea Parri
2019-01-30 14:21     ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).