linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
@ 2017-10-23 11:09 Elena Reshetova
  2017-10-23 13:12 ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Elena Reshetova @ 2017-10-23 11:09 UTC (permalink / raw)
  To: peterz
  Cc: linux-kernel, gregkh, keescook, tglx, mingo, ishkamiel, Elena Reshetova

Currently arch. independent implementation of refcount_t in
lib/refcount.c provides weak memory ordering guarantees
compare to its analog atomic_t implementations.
While it should not be a problem for most of the actual
cases of refcounters, it is more understandable for everyone
(and more error-prone for future users) to provide exactly
same memory ordering guarantees as atomics.

If speed is of a concern, then either more efficient arch.
dependent refcount_t implementation should be used or if there
are enough users in the future we might need to provide both
strict and relaxed refcount_t APIs.

Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Elena Reshetova <elena.reshetova@intel.com>
---
 lib/refcount.c | 71 +++++-----------------------------------------------------
 1 file changed, 6 insertions(+), 65 deletions(-)

diff --git a/lib/refcount.c b/lib/refcount.c
index 5d0582a..cc6946e 100644
--- a/lib/refcount.c
+++ b/lib/refcount.c
@@ -8,29 +8,7 @@
  * there. This avoids wrapping the counter and causing 'spurious'
  * use-after-free issues.
  *
- * Memory ordering rules are slightly relaxed wrt regular atomic_t functions
- * and provide only what is strictly required for refcounts.
- *
- * The increments are fully relaxed; these will not provide ordering. The
- * rationale is that whatever is used to obtain the object we're increasing the
- * reference count on will provide the ordering. For locked data structures,
- * its the lock acquire, for RCU/lockless data structures its the dependent
- * load.
- *
- * Do note that inc_not_zero() provides a control dependency which will order
- * future stores against the inc, this ensures we'll never modify the object
- * if we did not in fact acquire a reference.
- *
- * The decrements will provide release order, such that all the prior loads and
- * stores will be issued before, it also provides a control dependency, which
- * will order us against the subsequent free().
- *
- * The control dependency is against the load of the cmpxchg (ll/sc) that
- * succeeded. This means the stores aren't fully ordered, but this is fine
- * because the 1->0 transition indicates no concurrency.
- *
- * Note that the allocator is responsible for ordering things between free()
- * and alloc().
+ * Memory ordering rules are exactly the same as with regular atomic_t functions
  *
  */
 
@@ -46,10 +24,6 @@
  *
  * Will saturate at UINT_MAX and WARN.
  *
- * Provides no memory ordering, it is assumed the caller has guaranteed the
- * object memory to be stable (RCU, etc.). It does provide a control dependency
- * and thereby orders future stores. See the comment on top.
- *
  * Use of this function is not recommended for the normal reference counting
  * use case in which references are taken and released one at a time.  In these
  * cases, refcount_inc(), or one of its variants, should instead be used to
@@ -72,7 +46,7 @@ bool refcount_add_not_zero(unsigned int i, refcount_t *r)
 		if (new < val)
 			new = UINT_MAX;
 
-	} while (!atomic_try_cmpxchg_relaxed(&r->refs, &val, new));
+	} while (!atomic_try_cmpxchg(&r->refs, &val, new));
 
 	WARN_ONCE(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
 
@@ -87,10 +61,6 @@ EXPORT_SYMBOL(refcount_add_not_zero);
  *
  * Similar to atomic_add(), but will saturate at UINT_MAX and WARN.
  *
- * Provides no memory ordering, it is assumed the caller has guaranteed the
- * object memory to be stable (RCU, etc.). It does provide a control dependency
- * and thereby orders future stores. See the comment on top.
- *
  * Use of this function is not recommended for the normal reference counting
  * use case in which references are taken and released one at a time.  In these
  * cases, refcount_inc(), or one of its variants, should instead be used to
@@ -108,10 +78,6 @@ EXPORT_SYMBOL(refcount_add);
  *
  * Similar to atomic_inc_not_zero(), but will saturate at UINT_MAX and WARN.
  *
- * Provides no memory ordering, it is assumed the caller has guaranteed the
- * object memory to be stable (RCU, etc.). It does provide a control dependency
- * and thereby orders future stores. See the comment on top.
- *
  * Return: true if the increment was successful, false otherwise
  */
 bool refcount_inc_not_zero(refcount_t *r)
@@ -127,7 +93,7 @@ bool refcount_inc_not_zero(refcount_t *r)
 		if (unlikely(!new))
 			return true;
 
-	} while (!atomic_try_cmpxchg_relaxed(&r->refs, &val, new));
+	} while (!atomic_try_cmpxchg(&r->refs, &val, new));
 
 	WARN_ONCE(new == UINT_MAX, "refcount_t: saturated; leaking memory.\n");
 
@@ -141,9 +107,6 @@ EXPORT_SYMBOL(refcount_inc_not_zero);
  *
  * Similar to atomic_inc(), but will saturate at UINT_MAX and WARN.
  *
- * Provides no memory ordering, it is assumed the caller already has a
- * reference on the object.
- *
  * Will WARN if the refcount is 0, as this represents a possible use-after-free
  * condition.
  */
@@ -162,10 +125,6 @@ EXPORT_SYMBOL(refcount_inc);
  * ultimately leak on underflow and will fail to decrement when saturated
  * at UINT_MAX.
  *
- * Provides release memory ordering, such that prior loads and stores are done
- * before, and provides a control dependency such that free() must come after.
- * See the comment on top.
- *
  * Use of this function is not recommended for the normal reference counting
  * use case in which references are taken and released one at a time.  In these
  * cases, refcount_dec(), or one of its variants, should instead be used to
@@ -187,7 +146,7 @@ bool refcount_sub_and_test(unsigned int i, refcount_t *r)
 			return false;
 		}
 
-	} while (!atomic_try_cmpxchg_release(&r->refs, &val, new));
+	} while (!atomic_try_cmpxchg(&r->refs, &val, new));
 
 	return !new;
 }
@@ -200,10 +159,6 @@ EXPORT_SYMBOL(refcount_sub_and_test);
  * Similar to atomic_dec_and_test(), it will WARN on underflow and fail to
  * decrement when saturated at UINT_MAX.
  *
- * Provides release memory ordering, such that prior loads and stores are done
- * before, and provides a control dependency such that free() must come after.
- * See the comment on top.
- *
  * Return: true if the resulting refcount is 0, false otherwise
  */
 bool refcount_dec_and_test(refcount_t *r)
@@ -218,9 +173,6 @@ EXPORT_SYMBOL(refcount_dec_and_test);
  *
  * Similar to atomic_dec(), it will WARN on underflow and fail to decrement
  * when saturated at UINT_MAX.
- *
- * Provides release memory ordering, such that prior loads and stores are done
- * before.
  */
 void refcount_dec(refcount_t *r)
 {
@@ -236,9 +188,6 @@ EXPORT_SYMBOL(refcount_dec);
  * No atomic_t counterpart, it attempts a 1 -> 0 transition and returns the
  * success thereof.
  *
- * Like all decrement operations, it provides release memory order and provides
- * a control dependency.
- *
  * It can be used like a try-delete operator; this explicit case is provided
  * and not cmpxchg in generic, because that would allow implementing unsafe
  * operations.
@@ -249,7 +198,7 @@ bool refcount_dec_if_one(refcount_t *r)
 {
 	int val = 1;
 
-	return atomic_try_cmpxchg_release(&r->refs, &val, 0);
+	return atomic_try_cmpxchg(&r->refs, &val, 0);
 }
 EXPORT_SYMBOL(refcount_dec_if_one);
 
@@ -281,7 +230,7 @@ bool refcount_dec_not_one(refcount_t *r)
 			return true;
 		}
 
-	} while (!atomic_try_cmpxchg_release(&r->refs, &val, new));
+	} while (!atomic_try_cmpxchg(&r->refs, &val, new));
 
 	return true;
 }
@@ -296,10 +245,6 @@ EXPORT_SYMBOL(refcount_dec_not_one);
  * Similar to atomic_dec_and_mutex_lock(), it will WARN on underflow and fail
  * to decrement when saturated at UINT_MAX.
  *
- * Provides release memory ordering, such that prior loads and stores are done
- * before, and provides a control dependency such that free() must come after.
- * See the comment on top.
- *
  * Return: true and hold mutex if able to decrement refcount to 0, false
  *         otherwise
  */
@@ -327,10 +272,6 @@ EXPORT_SYMBOL(refcount_dec_and_mutex_lock);
  * Similar to atomic_dec_and_lock(), it will WARN on underflow and fail to
  * decrement when saturated at UINT_MAX.
  *
- * Provides release memory ordering, such that prior loads and stores are done
- * before, and provides a control dependency such that free() must come after.
- * See the comment on top.
- *
  * Return: true and hold spinlock if able to decrement refcount to 0, false
  *         otherwise
  */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-10-23 11:09 [PATCH] refcount: provide same memory ordering guarantees as in atomic_t Elena Reshetova
@ 2017-10-23 13:12 ` Peter Zijlstra
  2017-10-27  6:49   ` Reshetova, Elena
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2017-10-23 13:12 UTC (permalink / raw)
  To: Elena Reshetova; +Cc: linux-kernel, gregkh, keescook, tglx, mingo, ishkamiel

On Mon, Oct 23, 2017 at 02:09:44PM +0300, Elena Reshetova wrote:
> Currently arch. independent implementation of refcount_t in
> lib/refcount.c provides weak memory ordering guarantees
> compare to its analog atomic_t implementations.
> While it should not be a problem for most of the actual
> cases of refcounters, it is more understandable for everyone
> (and more error-prone for future users) to provide exactly
> same memory ordering guarantees as atomics.
> 
> If speed is of a concern, then either more efficient arch.
> dependent refcount_t implementation should be used or if there
> are enough users in the future we might need to provide both
> strict and relaxed refcount_t APIs.
> 
> Suggested-by: Kees Cook <keescook@chromium.org>

NAK

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-10-23 13:12 ` Peter Zijlstra
@ 2017-10-27  6:49   ` Reshetova, Elena
  2017-10-27 13:56     ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Reshetova, Elena @ 2017-10-27  6:49 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, gregkh, keescook, tglx, mingo, ishkamiel

> On Mon, Oct 23, 2017 at 02:09:44PM +0300, Elena Reshetova wrote:
> > Currently arch. independent implementation of refcount_t in
> > lib/refcount.c provides weak memory ordering guarantees
> > compare to its analog atomic_t implementations.
> > While it should not be a problem for most of the actual
> > cases of refcounters, it is more understandable for everyone
> > (and more error-prone for future users) to provide exactly
> > same memory ordering guarantees as atomics.
> >
> > If speed is of a concern, then either more efficient arch.
> > dependent refcount_t implementation should be used or if there
> > are enough users in the future we might need to provide both
> > strict and relaxed refcount_t APIs.
> >
> > Suggested-by: Kees Cook <keescook@chromium.org>
> 
> NAK

Could we possibly have a bit more elaborate discussion on this? 

Or alternatively, what then should be the correct way for a certain variable (that behaves like
a standard refcounter) to check if the relaxed memory ordering is ok?
This is what Thomas was asking me to do for core kernel conversions and this
is what I don't know how to do correctly. Also, I got exactly the same question
from xfs maintainer, so if we provide an API that we hope will be used correctly in
the future, we should have a way for people to verify it. 

Maybe it is just me, but I would think that having a way to verify that your code
is ok with this refcounter-specific relaxed memory ordering applies to any memory ordering
requirements, refcounters are just a subset with certain properties, so is then
the full answer is to figure out how to verify any memory ordering requirement 
of your code? 

We can also document this logic in more docs or even maybe try to create a better
documentation for current memory ordering bits since it is not the most easiest read
at the moment. Otherwise this might be just bad enough reason for many people to
avoid refcount_t type if it is just easier to tell "let's take just old atomic, we knew it 
somehow worked before"....

Best Regards,
Elena.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-10-27  6:49   ` Reshetova, Elena
@ 2017-10-27 13:56     ` Peter Zijlstra
  2017-11-02 11:04       ` Reshetova, Elena
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2017-10-27 13:56 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: linux-kernel, gregkh, keescook, tglx, mingo, ishkamiel, Will Deacon

On Fri, Oct 27, 2017 at 06:49:55AM +0000, Reshetova, Elena wrote:
> Could we possibly have a bit more elaborate discussion on this? 
> 
> Or alternatively, what then should be the correct way for a certain
> variable (that behaves like a standard refcounter) to check if the
> relaxed memory ordering is ok? 

The refcount_t includes sufficient ordering for normal reference
counting. It provides a happens-before relation wrt. dropping the
reference, such that all object accesses happen before we drop the
reference; because at that point the object can go away and any further
access would be UAF.

So if the thing being replaced is purely a reference count, all should
be well.

> This is what Thomas was asking me to do for core kernel conversions
> and this is what I don't know how to do correctly.

Thomas asked you to verify that nothing relies on the stronger ordering
provided by atomic_dec_and_test(). This means you have to audit the code
with an eye towards memory ordering.

Now the futex one is actually OK. Thomas just wants you to mention that
refcount_dec_and_test is weaker and explain that in this case that is
fine since pi_state::refcount is in fact a pure refcount and
put_pi_state() does not in fact rely on further ordering.

Just mentioning the above gives the maintainer:

  1) the information he needs to perform review, mentioning memory
  ordering changes means he needs to look at it.

  2) you identify the place where it changes (put_pi_state) which again
  aids him in review by limiting the amount of code he needs to double
  check.

  3) that you actually looked at the problem; giving more trust you
  actually know wth you're doing.

> Also, I got exactly the same question from xfs maintainer,
> so if we provide an API that we hope will be used correctly in the
> future, we should have a way for people to verify it. 

I actually replied to Dave (but have not checked if there's further
email on the thread). I object to the claim that the code is correct if
he cannot explain the memory ordering.

Aside from that; if/when he explain the ordering requirements of those
sites and those do indeed require more than refcount_dec_and_test()
provides we should add a comment exactly explaining the ordering along
with additional barriers, such as smp_mb__after_atomic().

> Maybe it is just me, but I would think that having a way to verify
> that your code is ok with this refcounter-specific relaxed memory
> ordering applies to any memory ordering requirements, refcounters are
> just a subset with certain properties, so is then the full answer is
> to figure out how to verify any memory ordering requirement of your
> code? 

Yes; and this can be quite tricky, but undocumented ordering
requirements are a bug that need to be fixed. Mostly the code is
'simple' and refcount really does the right and sufficient thing.

If the maintainer knows or suspects more ordering is required he can
help out; but for that to happen you have to, at the very least, do 1&2
above.

> We can also document this logic in more docs or even maybe try to
> create a better documentation for current memory ordering bits since
> it is not the most easiest read at the moment. Otherwise this might be
> just bad enough reason for many people to avoid refcount_t type if it
> is just easier to tell "let's take just old atomic, we knew it somehow
> worked before"....

That's just voodoo programming; and while that is rife I have absolutely
no sympathy for it. Memory ordering is hard, but that doesn't mean we
should just blindly sprinkle memory barriers around until it 'works'.

So while memory-barriers.txt is a longish document, it is readable with
a bit of time and effort. There are also tools being developed that can
help people validate their code.

There are enough people around that actually get this stuff (it really
is not _that_ hard, but it does require a functional brain) so if you
can describe the problem with sufficient detail we can help out getting
the ordering right (or state its broken :-).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-10-27 13:56     ` Peter Zijlstra
@ 2017-11-02 11:04       ` Reshetova, Elena
  2017-11-02 13:57         ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Reshetova, Elena @ 2017-11-02 11:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, gregkh, keescook, tglx, mingo, ishkamiel, Will Deacon

Sorry for delayed reply, but I was actually reading and trying to understand
all the involved notions, so it took a while...

> On Fri, Oct 27, 2017 at 06:49:55AM +0000, Reshetova, Elena wrote:
> > Could we possibly have a bit more elaborate discussion on this?
> >
> > Or alternatively, what then should be the correct way for a certain
> > variable (that behaves like a standard refcounter) to check if the
> > relaxed memory ordering is ok?
> 
> The refcount_t includes sufficient ordering for normal reference
> counting. It provides a happens-before relation wrt. dropping the
> reference, such that all object accesses happen before we drop the
> reference; because at that point the object can go away and any further
> access would be UAF.
> 
> So if the thing being replaced is purely a reference count, all should
> be well.

Yes, I understand this, but people are asking a totally different level of 
detail and inside analysis and given that there are changes, they have a
point. So, that's why I am trying to figure the "whole story" below.

> 
> > This is what Thomas was asking me to do for core kernel conversions
> > and this is what I don't know how to do correctly.
> 
> Thomas asked you to verify that nothing relies on the stronger ordering
> provided by atomic_dec_and_test(). This means you have to audit the code
> with an eye towards memory ordering.
> 
> Now the futex one is actually OK. Thomas just wants you to mention that
> refcount_dec_and_test is weaker and explain that in this case that is
> fine since pi_state::refcount is in fact a pure refcount and
> put_pi_state() does not in fact rely on further ordering.
> 
> Just mentioning the above gives the maintainer:
> 
>   1) the information he needs to perform review, mentioning memory
>   ordering changes means he needs to look at it.
> 
>   2) you identify the place where it changes (put_pi_state) which again
>   aids him in review by limiting the amount of code he needs to double
>   check.
> 
>   3) that you actually looked at the problem; giving more trust you
>   actually know wth you're doing.

Well refcount_dec_and_test() is not the only function that has different
memory ordering specifics. So, the full answer then for any arbitrary case
according to your points above would be smth like:

for each substituted function (atomic_* --> refcount_*) that actually 
 has changes in memory ordering *** perform the following:
  - mention the difference
  - mention the actual code place where the change would affect (
   various put and etc. functions)
  - potentially try to understand if it would make a difference for 
  this code (again here is where I am not sure how to do it properly)


*** the actual list of these functions to me looks like:
 1) atomic_inc_not_zero -> refcount_inc_not_zero. 
Change is from underlying atomic_cmpxchg() to atomic_try_cmpxchg_relaxed ()
First one implies SMP-conditional general memory barrier
(smp_mb()) on each side of the actual operation (at least according to documentation,
In reality it goes through so many changes, ifdefs and conditionals that one gets lost 
Easily in the process). 
Second one according to the comment implies no memory barriers whatsoever, BUT 
"provides a control dependency which will order future stores against the inc"
So, every store (not load) operation (I guess we are talking here only about store operations that
touch the same object, but I wonder how it is defined in terms of memory location? 
(overlapping?)  that comes inside "if refcount_inc_not_zero(){}" cause would only
be executed if functions returns true. 
So, practically what we might "loose" here is any updates on the object protected by this
refcounter done by another CPU. But for this purpose we expect the developer to take
some other lock/memory barrier into use, right? We only care of incrementing the
refcount atomically and make sure we don't do anything with object unless it is ready for
us to be used.  If yes, then  I guess it might be a big
change for the code that previously relied on atomic-provided smp_mb() barriers and now
instead needs to take an explicit locks/barriers by itself. 

2) atomic_** -> refcount_add_not_zero. Fortunately these are super rare and need to see
per each case dependent on actual atomic function substituted. 

3) atomic_add() --> refcount_add(). This should not make any change since both do not
provide memory ordering at all, but there is a comment in the refcount.c that says that
refcount_add " provide a control dependency and thereby orders future stores".
How is it done given that refcount_add is void returning function??
I am fully confused with this one. 

4) atomic_dec () --> refcount_dec (). This one we have discussed extensively
before. Again first one implies SMP-conditional general memory barrier (smp_mb()) on each
side of the actual operation. Second one only provides "release" ordering meaning that prior
both loads and stores must be completed before the operation. 
So, is it correct to express the difference in this way:

atomic_dec ():			refcount_dec ():

smp_mb();			smp_mb();
do_atomic_dec_operation;		do_atomic_dec_operation;
smp_mb();			/*note no any barrier here! */
                                                                                  / * we assume that we hold the lock
                                                                                    * or anything like this on object 
                                                                                    * anyway, otherwise code is broken
                                                                                    * to start with */ 
5) atomic_dec_and_test () --> refcount_dec_and_test (). Same as 4)
but in addition we have a control flow barrier.
So, is it correct to express the difference in this way:

atomic_dec_and_test ():	                            refcount_dec_and_test ():

smp_mb();                                                               smp_mb();
if (do_atomic_dec_and_test;) {                          if (do_atomic_dec_and_test;){
    /* 1 --> 0 transition happened */                        /* 1 --> 0 transition happened */
     smp_mb();                                                               /* in refcounter case we assume
     do_anything including free();                                  that we are the last user of object
                                                                                           so that no concurrent access can happen*/
                                                                                       /* control_flow_barrier here */ <------ which one btw????
                                                                                       /* only need to guarantee that we free after
                                                                                            reaching zero so we are good here */
}                                                                                }

6) atomic_sub_and_test () --> refcount_sub_and_test (). Identical to 5)
7) atomic_add_unless(&var, -1, 1) --> refcount_dec_not_one(). 
Should be identical to 4)


Lock functions such as refcount_dec_and_lock() & refcount_dec_and_mutex_lock()
Provide exactly the same guarantees as they atomic counterparts. 

Is the above a correct view on this? I think that instead of putting this whole info
in the individual commits, how about making some documentation subsection
(in memory-barriers or atomics or on its own, don't know what is better) with
examples like above? Then in each individual commit I can point to the respective line
in the doc and say that here is a particular memory guarantee change and in the case
of this particular code it would apply to your put_pi_state() or whatever else function.
Otherwise keeping this info only in commits do not benefit any future users of refcount. 

> 
> > Also, I got exactly the same question from xfs maintainer,
> > so if we provide an API that we hope will be used correctly in the
> > future, we should have a way for people to verify it.
> 
> I actually replied to Dave (but have not checked if there's further
> email on the thread). I object to the claim that the code is correct if
> he cannot explain the memory ordering.
> 
> Aside from that; if/when he explain the ordering requirements of those
> sites and those do indeed require more than refcount_dec_and_test()
> provides we should add a comment exactly explaining the ordering along
> with additional barriers, such as smp_mb__after_atomic().
> 
> > Maybe it is just me, but I would think that having a way to verify
> > that your code is ok with this refcounter-specific relaxed memory
> > ordering applies to any memory ordering requirements, refcounters are
> > just a subset with certain properties, so is then the full answer is
> > to figure out how to verify any memory ordering requirement of your
> > code?
> 
> Yes; and this can be quite tricky, but undocumented ordering
> requirements are a bug that need to be fixed. Mostly the code is
> 'simple' and refcount really does the right and sufficient thing.
> 
> If the maintainer knows or suspects more ordering is required he can
> help out; but for that to happen you have to, at the very least, do 1&2
> above.

As I said, I want to do more if we are going this way. That's why I am trying to
create some kind of type of easy to read doc that would explain people the
differences and won't necessary imply reading the whole memory-barrier.txt
doc (more on my reading of it below). 

> 
> > We can also document this logic in more docs or even maybe try to
> > create a better documentation for current memory ordering bits since
> > it is not the most easiest read at the moment. Otherwise this might be
> > just bad enough reason for many people to avoid refcount_t type if it
> > is just easier to tell "let's take just old atomic, we knew it somehow
> > worked before"....
> 
> That's just voodoo programming; and while that is rife I have absolutely
> no sympathy for it. Memory ordering is hard, but that doesn't mean we
> should just blindly sprinkle memory barriers around until it 'works'.
> 
> So while memory-barriers.txt is a longish document, it is readable with
> a bit of time and effort. There are also tools being developed that can
> help people validate their code.

Could you please point to the tools? Might help my understanding also. 

So, I actually went and tried to __slowly__ read the memory-barriers.txt
which indeed makes a lot of sense when you read it in full (not partially
like it did it before). But the issue with this doc is not so much the length
but not always consequent way of giving the information to unprepared reader. 
I have many examples that I have wondered for a while what is meant with 
certain term or notion somewhere at the beginning of the doc 
only to discover a fully specified answer somewhere after line 1800. 
I do have many suggestions that might help to improve
the readability of the document but it would take me at least a day to write 
them down (and it is going to be a long read for anyone), 
so before I go do it, I want to know if anyone is interested to hear :)
Anyway this would be a separate exercise to do if people want. 

Best Regards,
Elena.

> 
> There are enough people around that actually get this stuff (it really
> is not _that_ hard, but it does require a functional brain) so if you
> can describe the problem with sufficient detail we can help out getting
> the ordering right (or state its broken :-).

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-02 11:04       ` Reshetova, Elena
@ 2017-11-02 13:57         ` Peter Zijlstra
  2017-11-02 15:40           ` Alan Stern
                             ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Peter Zijlstra @ 2017-11-02 13:57 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: linux-kernel, gregkh, keescook, tglx, mingo, ishkamiel,
	Will Deacon, Paul McKenney, stern, parri.andrea, boqun.feng,
	dhowells, david

On Thu, Nov 02, 2017 at 11:04:53AM +0000, Reshetova, Elena wrote:

> Well refcount_dec_and_test() is not the only function that has different
> memory ordering specifics. So, the full answer then for any arbitrary case
> according to your points above would be smth like:
> 
> for each substituted function (atomic_* --> refcount_*) that actually 
>  has changes in memory ordering *** perform the following:
>   - mention the difference
>   - mention the actual code place where the change would affect (
>    various put and etc. functions)
>   - potentially try to understand if it would make a difference for 
>   this code (again here is where I am not sure how to do it properly)
> 
> 
> *** the actual list of these functions to me looks like:

>  1) atomic_inc_not_zero -> refcount_inc_not_zero.  Change is from
>  underlying atomic_cmpxchg() to atomic_try_cmpxchg_relaxed () First
>  one implies SMP-conditional general memory barrier (smp_mb()) on each
>  side of the actual operation (at least according to documentation, In
>  reality it goes through so many changes, ifdefs and conditionals that
>  one gets lost Easily in the process). 

That matches _inc(), which doesn't imply anything.

The reasoning being that you must already have obtained a pointer to the
object; and that will necessarily include enough ordering to observe the
object. Therefore increasing the refcount doesn't require further
constraints.

> Second one according to the comment implies no memory barriers
> whatsoever, BUT "provides a control dependency which will order future
> stores against the inc" So, every store (not load) operation (I guess
> we are talking here only about store operations that touch the same
> object, but I wonder how it is defined in terms of memory location? 

Memory location is irrelevant.

> (overlapping?)  that comes inside "if refcount_inc_not_zero(){}" cause
> would only be executed if functions returns true.

The point is that a CPU must NOT speculate on stores. So while it can
speculate a branch, any store inside the branch must not become visible
until it can commit to that branch.

The whole point being that possible modifications to the object to which
we've _conditionally_ acquired a reference, will only happen after we're
sure to indeed have acquired this reference.

Otherwise its similar to _inc().

> So, practically what we might "loose" here is any updates on the
> object protected by this refcounter done by another CPU. But for this
> purpose we expect the developer to take some other lock/memory barrier
> into use, right?

Correct, object modification had better be serialized. Refcounts cannot
(even with atomic_t) help with that.

> We only care of incrementing the refcount atomically and make sure we
> don't do anything with object unless it is ready for us to be used.

Just so..

> If yes, then  I guess it might be a big change for the code that
> previously relied on atomic-provided smp_mb() barriers and now instead
> needs to take an explicit locks/barriers by itself. 

Right, however such memory ordering should be explicitly documented.
Unknown and hidden memory ordering is a straight up bug, because
modifications to the code (be they introducing refcount_t or anything
else) can easily break things.

> 2) atomic_** -> refcount_add_not_zero. Fortunately these are super
> rare and need to see per each case dependent on actual atomic function
> substituted. 

See inc_not_zero.

> 3) atomic_add() --> refcount_add(). This should not make any change
> since both do not provide memory ordering at all, but there is a
> comment in the refcount.c that says that refcount_add " provide a
> control dependency and thereby orders future stores".  How is it done
> given that refcount_add is void returning function??  I am fully
> confused with this one. 

Weird, mostly comes from being implemented using add_not_zero I suppose.

> 4) atomic_dec () --> refcount_dec (). This one we have discussed
> extensively before. Again first one implies SMP-conditional general
> memory barrier (smp_mb()) on each side of the actual operation. 

No, atomic_dec() does not in fact imply anything..

> Second one only provides "release" ordering meaning that prior
> both loads and stores must be completed before the operation. 
> So, is it correct to express the difference in this way:
> 
> atomic_dec ():			refcount_dec ():
> 
> smp_mb();			smp_mb();
> do_atomic_dec_operation;		do_atomic_dec_operation;
> smp_mb();			/*note no any barrier here! */


No, on two points: atomic_dec() does not imply _anything_ and while
refcount_dec() does the release, that is distinctly different from
smp_mb() before.

  C C-peterz-release-vs-mb

  {
  }

  P0(int *a, int *b, int *c)
  {
	  WRITE_ONCE(*a, 1);
  //	smp_mb();
	  smp_store_release(b, 1);
	  WRITE_ONCE(*c, 1);
  }

  P1(int *a, int *b, int *c)
  {
	  r3 = READ_ONCE(*c);
	  smp_rmb();
	  r2 = smp_load_acquire(b);
	  r1 = READ_ONCE(*a);
  }

  exists (1:r1=0 /\ 1:r2=0 /\ 1:r3=1)

That happens without the smp_mb(), once you put that back its no longer
allowed.

The reason is that smp_store_release() does not constrain the store to
@c and that is allowed to happen before, whereas with smp_mb() that
store can not happen before the store to @a.

smp_store_release() only affects the prior load/stores not anything
later, whereas smp_mb() imposes full order.

> 5) atomic_dec_and_test () --> refcount_dec_and_test (). Same as 4)
> but in addition we have a control flow barrier.

No, this one was in fact a full barrier and is now a release.

> So, is it correct to express the difference in this way:
> 
> atomic_dec_and_test ():	                            refcount_dec_and_test ():
> 
> smp_mb();                                                               smp_mb();
> if (do_atomic_dec_and_test;) {                          if (do_atomic_dec_and_test;){
>     /* 1 --> 0 transition happened */                        /* 1 --> 0 transition happened */
>      smp_mb();                                                               /* in refcounter case we assume
>      do_anything including free();                                  that we are the last user of object
>                                                                                            so that no concurrent access can happen*/
>                                                                                        /* control_flow_barrier here */ <------ which one btw????
>                                                                                        /* only need to guarantee that we free after
>                                                                                             reaching zero so we are good here */
> }                                                                                }
  smp_mb();

or better:

  if ( ({ smp_mb(); ret = do_atomic_dec_and_test(); smp_mb(); ret }) )

The difference being that the smp_mb happens on both branches of the
condition (or in fact before the branch).

Same note as the above; the smp_mb() on the refcount_dec_and_test() side
is strictly stronger than the release.

> 6) atomic_sub_and_test () --> refcount_sub_and_test (). Identical to 5)

Yes, with the same note as for add*, these really should not be used.

> 7) atomic_add_unless(&var, -1, 1) --> refcount_dec_not_one(). 
> Should be identical to 4)

Correct; was fully ordered, is now a release.

> Lock functions such as refcount_dec_and_lock() &
> refcount_dec_and_mutex_lock() Provide exactly the same guarantees as
> they atomic counterparts. 

Nope. The atomic_dec_and_lock() provides smp_mb() while
refcount_dec_and_lock() merely orders all prior load/store's against all
later load/store's.

The difference is subtle and involves at least 3 CPUs. I can't seem to
write up anything simple, keeps turning into monsters :/ Will, Paul,
have you got anything simple around?

> Is the above a correct view on this? I think that instead of putting
> this whole info in the individual commits, how about making some
> documentation subsection (in memory-barriers or atomics or on its own,
> don't know what is better) with examples like above? 

> Then in each individual commit I can point to the respective line in
> the doc and say that here is a particular memory guarantee change and
> in the case of this particular code it would apply to your
> put_pi_state() or whatever else function.  Otherwise keeping this info
> only in commits do not benefit any future users of refcount. 

Sure you can write a refcount-vs-atomic.txt file to call out these
differences.

> > So while memory-barriers.txt is a longish document, it is readable with
> > a bit of time and effort. There are also tools being developed that can
> > help people validate their code.
> 
> Could you please point to the tools? Might help my understanding also. 

https://github.com/aparri/memory-model

> So, I actually went and tried to __slowly__ read the
> memory-barriers.txt which indeed makes a lot of sense when you read it
> in full (not partially like it did it before). But the issue with this
> doc is not so much the length but not always consequent way of giving
> the information to unprepared reader. 

Fair enough; that document has grown over the years as has our
understanding on the subject. I would imagine it is indeed somewhat
inconsistent.

Note that there's work done on better documents and updates to this one.
One document that might be good to read (I have not in fact had time to
read it myself yet :-():

  https://github.com/aparri/memory-model/blob/master/Documentation/explanation.txt

> I have many examples that I have wondered for a while what is meant with 
> certain term or notion somewhere at the beginning of the doc 
> only to discover a fully specified answer somewhere after line 1800. 

> I do have many suggestions that might help to improve the readability
> of the document but it would take me at least a day to write them down
> (and it is going to be a long read for anyone), so before I go do it,
> I want to know if anyone is interested to hear :) Anyway this would be
> a separate exercise to do if people want. 

Yes, in general feedback on that document is appreciated. Cc'ed a number
of more memory ordering people.

That said; we have to rely on maintainers to be aware of 'some' memory
ordering requirements, otherwise it will be exceedingly hard to find
them.

One tell-tale sign is if there are otherwise unmatched memory barriers
in the code. In that case they could end up being matched by atomic or
lock ops. And at that point we need to be careful and reverse engineer
what ordering is required.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-02 13:57         ` Peter Zijlstra
@ 2017-11-02 15:40           ` Alan Stern
  2017-11-02 16:02             ` Peter Zijlstra
  2017-11-03 11:55           ` Reshetova, Elena
  2017-11-13  9:09           ` Reshetova, Elena
  2 siblings, 1 reply; 32+ messages in thread
From: Alan Stern @ 2017-11-02 15:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Reshetova, Elena, linux-kernel, gregkh, keescook, tglx, mingo,
	ishkamiel, Will Deacon, Paul McKenney, parri.andrea, boqun.feng,
	dhowells, david

On Thu, 2 Nov 2017, Peter Zijlstra wrote:

> > Lock functions such as refcount_dec_and_lock() &
> > refcount_dec_and_mutex_lock() Provide exactly the same guarantees as
> > they atomic counterparts. 
> 
> Nope. The atomic_dec_and_lock() provides smp_mb() while
> refcount_dec_and_lock() merely orders all prior load/store's against all
> later load/store's.

In fact there is no guaranteed ordering when refcount_dec_and_lock()  
returns false; it provides ordering only if the return value is true.  
In which case it provides acquire ordering (thanks to the spin_lock),
and both release ordering and a control dependency (thanks to the
refcount_dec_and_test).

> The difference is subtle and involves at least 3 CPUs. I can't seem to
> write up anything simple, keeps turning into monsters :/ Will, Paul,
> have you got anything simple around?

The combination of acquire + release is not the same as smp_mb, because 
they allow things to pass by in one direction.  Example:

C C-refcount-vs-atomic-dec-and-lock

{
}

P0(int *x, int *y, refcount_t *r)
{
	refcount_set(r, 1);
	WRITE_ONCE(*x, 1);
	smp_wmb();
	WRITE_ONCE(*y, 1);
}

P1(int *x, int *y, refcount_t *r, spinlock_t *s)
{
	int rx, ry;
	bool r1;

	ry = READ_ONCE(*y);
	r1 = refcount_dec_and_lock(r, s);
	if (r1)
		rx = READ_ONCE(*x);
}

exists (1:ry=1 /\ 1:r1=1 /\ 1:rx=0)

This is allowed.  The idea is that the CPU can take:

	Read y
	Acquire
	Release
	Read x

and execute the first read after the Acquire and the second read before 
the Release:

	Acquire
	Read y
	Read x
	Release

and then the CPU can reorder the reads:

	Acquire
	Read x
	Read y
	Release

If the program had used atomic_dec_and_lock() instead, which provides a 
full smp_mb barrier, this outcome would not be possible.

Alan Stern

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-02 15:40           ` Alan Stern
@ 2017-11-02 16:02             ` Peter Zijlstra
  2017-11-02 16:45               ` Peter Zijlstra
  2017-11-02 17:08               ` Alan Stern
  0 siblings, 2 replies; 32+ messages in thread
From: Peter Zijlstra @ 2017-11-02 16:02 UTC (permalink / raw)
  To: Alan Stern
  Cc: Reshetova, Elena, linux-kernel, gregkh, keescook, tglx, mingo,
	ishkamiel, Will Deacon, Paul McKenney, parri.andrea, boqun.feng,
	dhowells, david

On Thu, Nov 02, 2017 at 11:40:35AM -0400, Alan Stern wrote:
> On Thu, 2 Nov 2017, Peter Zijlstra wrote:
> 
> > > Lock functions such as refcount_dec_and_lock() &
> > > refcount_dec_and_mutex_lock() Provide exactly the same guarantees as
> > > they atomic counterparts. 
> > 
> > Nope. The atomic_dec_and_lock() provides smp_mb() while
> > refcount_dec_and_lock() merely orders all prior load/store's against all
> > later load/store's.
> 
> In fact there is no guaranteed ordering when refcount_dec_and_lock()  
> returns false; 

It should provide a release:

 - if !=1, dec_not_one will provide release
 - if ==1, dec_not_one will no-op, but then we'll acquire the lock and
   dec_and_test will provide the release, even if the test fails and we
   unlock again it should still dec.

The one exception is when the counter is saturated, but in that case
we'll never free the object and the ordering is moot in any case.

> it provides ordering only if the return value is true.  
> In which case it provides acquire ordering (thanks to the spin_lock),
> and both release ordering and a control dependency (thanks to the
> refcount_dec_and_test).
> 
> > The difference is subtle and involves at least 3 CPUs. I can't seem to
> > write up anything simple, keeps turning into monsters :/ Will, Paul,
> > have you got anything simple around?
> 
> The combination of acquire + release is not the same as smp_mb, because 

acquire+release is nothing, its release+acquire that I meant which
should order things locally, but now that you've got me looking at it
again, we don't in fact do that.

So refcount_dec_and_lock() will provide a release, irrespective of the
return value (assuming we're not saturated). If it returns true, it also
does an acquire for the lock.

But combined they're acquire+release, which is unfortunate.. it means
the lock section and the refcount stuff overlaps, but I don't suppose
that's actually a problem. Need to consider more.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-02 16:02             ` Peter Zijlstra
@ 2017-11-02 16:45               ` Peter Zijlstra
  2017-11-02 17:08               ` Alan Stern
  1 sibling, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2017-11-02 16:45 UTC (permalink / raw)
  To: Alan Stern
  Cc: Reshetova, Elena, linux-kernel, gregkh, keescook, tglx, mingo,
	ishkamiel, Will Deacon, Paul McKenney, parri.andrea, boqun.feng,
	dhowells, david

On Thu, Nov 02, 2017 at 05:02:37PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 02, 2017 at 11:40:35AM -0400, Alan Stern wrote:
> > On Thu, 2 Nov 2017, Peter Zijlstra wrote:
> > 
> > > > Lock functions such as refcount_dec_and_lock() &
> > > > refcount_dec_and_mutex_lock() Provide exactly the same guarantees as
> > > > they atomic counterparts. 
> > > 
> > > Nope. The atomic_dec_and_lock() provides smp_mb() while
> > > refcount_dec_and_lock() merely orders all prior load/store's against all
> > > later load/store's.
> > 
> > In fact there is no guaranteed ordering when refcount_dec_and_lock()  
> > returns false; 
> 
> It should provide a release:
> 
>  - if !=1, dec_not_one will provide release
>  - if ==1, dec_not_one will no-op, but then we'll acquire the lock and
>    dec_and_test will provide the release, even if the test fails and we
>    unlock again it should still dec.
> 
> The one exception is when the counter is saturated, but in that case
> we'll never free the object and the ordering is moot in any case.
> 
> > it provides ordering only if the return value is true.  
> > In which case it provides acquire ordering (thanks to the spin_lock),
> > and both release ordering and a control dependency (thanks to the
> > refcount_dec_and_test).
> > 
> > > The difference is subtle and involves at least 3 CPUs. I can't seem to
> > > write up anything simple, keeps turning into monsters :/ Will, Paul,
> > > have you got anything simple around?
> > 
> > The combination of acquire + release is not the same as smp_mb, because 
> 
> acquire+release is nothing, its release+acquire that I meant which
> should order things locally, but now that you've got me looking at it
> again, we don't in fact do that.
> 
> So refcount_dec_and_lock() will provide a release, irrespective of the
> return value (assuming we're not saturated). If it returns true, it also
> does an acquire for the lock.
> 
> But combined they're acquire+release, which is unfortunate.. it means
> the lock section and the refcount stuff overlaps, but I don't suppose
> that's actually a problem. Need to consider more.

Right, so in that case we have refcount==0 and are guaranteed no
concurrency. So its fine.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-02 16:02             ` Peter Zijlstra
  2017-11-02 16:45               ` Peter Zijlstra
@ 2017-11-02 17:08               ` Alan Stern
  2017-11-02 17:16                 ` Will Deacon
  2017-11-02 17:45                 ` Andrea Parri
  1 sibling, 2 replies; 32+ messages in thread
From: Alan Stern @ 2017-11-02 17:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Reshetova, Elena, linux-kernel, gregkh, keescook, tglx, mingo,
	ishkamiel, Will Deacon, Paul McKenney, parri.andrea, boqun.feng,
	dhowells, david

On Thu, 2 Nov 2017, Peter Zijlstra wrote:

> On Thu, Nov 02, 2017 at 11:40:35AM -0400, Alan Stern wrote:
> > On Thu, 2 Nov 2017, Peter Zijlstra wrote:
> > 
> > > > Lock functions such as refcount_dec_and_lock() &
> > > > refcount_dec_and_mutex_lock() Provide exactly the same guarantees as
> > > > they atomic counterparts. 
> > > 
> > > Nope. The atomic_dec_and_lock() provides smp_mb() while
> > > refcount_dec_and_lock() merely orders all prior load/store's against all
> > > later load/store's.
> > 
> > In fact there is no guaranteed ordering when refcount_dec_and_lock()  
> > returns false; 
> 
> It should provide a release:
> 
>  - if !=1, dec_not_one will provide release
>  - if ==1, dec_not_one will no-op, but then we'll acquire the lock and
>    dec_and_test will provide the release, even if the test fails and we
>    unlock again it should still dec.
> 
> The one exception is when the counter is saturated, but in that case
> we'll never free the object and the ordering is moot in any case.

Also if the counter is 0, but that will never happen if the 
refcounting is correct.

> > it provides ordering only if the return value is true.  
> > In which case it provides acquire ordering (thanks to the spin_lock),
> > and both release ordering and a control dependency (thanks to the
> > refcount_dec_and_test).
> > 
> > > The difference is subtle and involves at least 3 CPUs. I can't seem to
> > > write up anything simple, keeps turning into monsters :/ Will, Paul,
> > > have you got anything simple around?
> > 
> > The combination of acquire + release is not the same as smp_mb, because 
> 
> acquire+release is nothing, its release+acquire that I meant which
> should order things locally, but now that you've got me looking at it
> again, we don't in fact do that.
> 
> So refcount_dec_and_lock() will provide a release, irrespective of the
> return value (assuming we're not saturated). If it returns true, it also
> does an acquire for the lock.
> 
> But combined they're acquire+release, which is unfortunate.. it means
> the lock section and the refcount stuff overlaps, but I don't suppose
> that's actually a problem. Need to consider more.

Right.  To address your point: release + acquire isn't the same as a
full barrier either.  The SB pattern illustrates the difference:

	P0		P1
	Write x=1	Write y=1
	Release a	smp_mb
	Acquire b	Read x=0
	Read y=0

This would not be allowed if the release + acquire sequence was 
replaced by smp_mb.  But as it stands, this is allowed because nothing 
prevents the CPU from interchanging the order of the release and the 
acquire -- and then you're back to the acquire + release case.

However, there is one circumstance where this interchange isn't 
allowed: when the release and acquire access the same memory 
location.  Thus:

	P0(int *x, int *y, int *a)
	{
		int r0;

		WRITE_ONCE(*x, 1);
		smp_store_release(a, 1);
		smp_load_acquire(a);
		r0 = READ_ONCE(*y);
	}

	P1(int *x, int *y)
	{
		int r1;

		WRITE_ONCE(*y, 1);
		smp_mb();
		r1 = READ_ONCE(*x);
	}

	exists (0:r0=0 /\ 1:r1=0)

This is forbidden.  It would remain forbidden even if the smp_mb in P1 
were replaced by a similar release/acquire pair for the same memory 
location.

To see the difference between smp_mb and release/acquire requires three 
threads:

	P0		P1		P2
	Write x=1	Read y=1	Read z=1
	Release a	data dep.	smp_rmb
	Acquire a	Write z=1	Read x=0
	Write y=1

The Linux Kernel Memory Model allows this execution, although as far as 
I know, no existing hardware will do it.  But with smp_mb in P0, the 
execution would be forbidden.

None of this should be a problem for refcount_dec_and_lock, assuming it 
is used purely for reference counting.

Alan Stern

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-02 17:08               ` Alan Stern
@ 2017-11-02 17:16                 ` Will Deacon
  2017-11-02 17:26                   ` Peter Zijlstra
  2017-11-02 20:21                   ` Alan Stern
  2017-11-02 17:45                 ` Andrea Parri
  1 sibling, 2 replies; 32+ messages in thread
From: Will Deacon @ 2017-11-02 17:16 UTC (permalink / raw)
  To: Alan Stern
  Cc: Peter Zijlstra, Reshetova, Elena, linux-kernel, gregkh, keescook,
	tglx, mingo, ishkamiel, Paul McKenney, parri.andrea, boqun.feng,
	dhowells, david

On Thu, Nov 02, 2017 at 01:08:52PM -0400, Alan Stern wrote:
> On Thu, 2 Nov 2017, Peter Zijlstra wrote:
> 
> > On Thu, Nov 02, 2017 at 11:40:35AM -0400, Alan Stern wrote:
> > > On Thu, 2 Nov 2017, Peter Zijlstra wrote:
> > > 
> > > > > Lock functions such as refcount_dec_and_lock() &
> > > > > refcount_dec_and_mutex_lock() Provide exactly the same guarantees as
> > > > > they atomic counterparts. 
> > > > 
> > > > Nope. The atomic_dec_and_lock() provides smp_mb() while
> > > > refcount_dec_and_lock() merely orders all prior load/store's against all
> > > > later load/store's.
> > > 
> > > In fact there is no guaranteed ordering when refcount_dec_and_lock()  
> > > returns false; 
> > 
> > It should provide a release:
> > 
> >  - if !=1, dec_not_one will provide release
> >  - if ==1, dec_not_one will no-op, but then we'll acquire the lock and
> >    dec_and_test will provide the release, even if the test fails and we
> >    unlock again it should still dec.
> > 
> > The one exception is when the counter is saturated, but in that case
> > we'll never free the object and the ordering is moot in any case.
> 
> Also if the counter is 0, but that will never happen if the 
> refcounting is correct.
> 
> > > it provides ordering only if the return value is true.  
> > > In which case it provides acquire ordering (thanks to the spin_lock),
> > > and both release ordering and a control dependency (thanks to the
> > > refcount_dec_and_test).
> > > 
> > > > The difference is subtle and involves at least 3 CPUs. I can't seem to
> > > > write up anything simple, keeps turning into monsters :/ Will, Paul,
> > > > have you got anything simple around?
> > > 
> > > The combination of acquire + release is not the same as smp_mb, because 
> > 
> > acquire+release is nothing, its release+acquire that I meant which
> > should order things locally, but now that you've got me looking at it
> > again, we don't in fact do that.
> > 
> > So refcount_dec_and_lock() will provide a release, irrespective of the
> > return value (assuming we're not saturated). If it returns true, it also
> > does an acquire for the lock.
> > 
> > But combined they're acquire+release, which is unfortunate.. it means
> > the lock section and the refcount stuff overlaps, but I don't suppose
> > that's actually a problem. Need to consider more.
> 
> Right.  To address your point: release + acquire isn't the same as a
> full barrier either.  The SB pattern illustrates the difference:
> 
> 	P0		P1
> 	Write x=1	Write y=1
> 	Release a	smp_mb
> 	Acquire b	Read x=0
> 	Read y=0
> 
> This would not be allowed if the release + acquire sequence was 
> replaced by smp_mb.  But as it stands, this is allowed because nothing 
> prevents the CPU from interchanging the order of the release and the 
> acquire -- and then you're back to the acquire + release case.
> 
> However, there is one circumstance where this interchange isn't 
> allowed: when the release and acquire access the same memory 
> location.  Thus:
> 
> 	P0(int *x, int *y, int *a)
> 	{
> 		int r0;
> 
> 		WRITE_ONCE(*x, 1);
> 		smp_store_release(a, 1);
> 		smp_load_acquire(a);
> 		r0 = READ_ONCE(*y);
> 	}
> 
> 	P1(int *x, int *y)
> 	{
> 		int r1;
> 
> 		WRITE_ONCE(*y, 1);
> 		smp_mb();
> 		r1 = READ_ONCE(*x);
> 	}
> 
> 	exists (0:r0=0 /\ 1:r1=0)
> 
> This is forbidden.  It would remain forbidden even if the smp_mb in P1 
> were replaced by a similar release/acquire pair for the same memory 
> location.

Isn't this allowed on x86 mapping smp_mb() to mfence, store-release to plain
store and load-acquire to plain load? All we're saying is that you can forward
from a release to an acquire, which is fine for RCpc semantics.

e.g.

X86 SB+mfence+po-rfi-po
"MFencedWR Fre PodWW Rfi PodRR Fre"
Generator=diyone7 (version 7.46+3)
Prefetch=0:x=F,0:y=T,1:y=F,1:x=T
Com=Fr Fr
Orig=MFencedWR Fre PodWW Rfi PodRR Fre
{
}
 P0          | P1          ;
 MOV [x],$1  | MOV [y],$1  ;
 MFENCE      | MOV [z],$1  ;
 MOV EAX,[y] | MOV EAX,[z] ;
             | MOV EBX,[x] ;
exists
(0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0)

which herd says is allowed:

Test SB+mfence+po-rfi-po Allowed
States 4
0:EAX=0; 1:EAX=1; 1:EBX=0;
0:EAX=0; 1:EAX=1; 1:EBX=1;
0:EAX=1; 1:EAX=1; 1:EBX=0;
0:EAX=1; 1:EAX=1; 1:EBX=1;
Ok
Witnesses
Positive: 1 Negative: 3
Condition exists (0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0)
Observation SB+mfence+po-rfi-po Sometimes 1 3
Time SB+mfence+po-rfi-po 0.00
Hash=0f983e2d7579e5c04c332f9ac620c31f

and I can reproduce using litmus to actually run it on my x86 box:

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Results for SB+mfence+po-rfi-po.litmus %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
X86 SB+mfence+po-rfi-po
"MFencedWR Fre PodWW Rfi PodRR Fre"

{}

 P0          | P1          ;
 MOV [x],$1  | MOV [y],$1  ;
 MFENCE      | MOV [z],$1  ;
 MOV EAX,[y] | MOV EAX,[z] ;
             | MOV EBX,[x] ;

exists (0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0)
Generated assembler
#START _litmus_P1
	movl $1,(%r8,%rcx)
	movl $1,(%r9,%rcx)
	movl (%r9,%rcx),%eax
	movl (%rdi,%rcx),%edx
#START _litmus_P0
	movl $1,(%rdx,%rcx)
	mfence
	movl (%rdi,%rcx),%eax

Test SB+mfence+po-rfi-po Allowed
Histogram (4 states)
8     *>0:EAX=0; 1:EAX=1; 1:EBX=0;
1999851:>0:EAX=1; 1:EAX=1; 1:EBX=0;
1999549:>0:EAX=0; 1:EAX=1; 1:EBX=1;
592   :>0:EAX=1; 1:EAX=1; 1:EBX=1;
Ok

Witnesses
Positive: 8, Negative: 3999992
Condition exists (0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0) is validated
Hash=0f983e2d7579e5c04c332f9ac620c31f
Generator=diyone7 (version 7.46+3)
Com=Fr Fr
Orig=MFencedWR Fre PodWW Rfi PodRR Fre
Observation SB+mfence+po-rfi-po Sometimes 8 3999992
Time SB+mfence+po-rfi-po 0.17

Will

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-02 17:16                 ` Will Deacon
@ 2017-11-02 17:26                   ` Peter Zijlstra
  2017-11-02 20:21                   ` Alan Stern
  1 sibling, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2017-11-02 17:26 UTC (permalink / raw)
  To: Will Deacon
  Cc: Alan Stern, Reshetova, Elena, linux-kernel, gregkh, keescook,
	tglx, mingo, ishkamiel, Paul McKenney, parri.andrea, boqun.feng,
	dhowells, david

On Thu, Nov 02, 2017 at 05:16:44PM +0000, Will Deacon wrote:
> On Thu, Nov 02, 2017 at 01:08:52PM -0400, Alan Stern wrote:

> > Right.  To address your point: release + acquire isn't the same as a
> > full barrier either.  The SB pattern illustrates the difference:
> > 
> > 	P0		P1
> > 	Write x=1	Write y=1
> > 	Release a	smp_mb
> > 	Acquire b	Read x=0
> > 	Read y=0
> > 
> > This would not be allowed if the release + acquire sequence was 
> > replaced by smp_mb.  But as it stands, this is allowed because nothing 
> > prevents the CPU from interchanging the order of the release and the 
> > acquire -- and then you're back to the acquire + release case.
> > 
> > However, there is one circumstance where this interchange isn't 
> > allowed: when the release and acquire access the same memory 
> > location.  Thus:
> > 
> > 	P0(int *x, int *y, int *a)
> > 	{
> > 		int r0;
> > 
> > 		WRITE_ONCE(*x, 1);
> > 		smp_store_release(a, 1);
> > 		smp_load_acquire(a);
> > 		r0 = READ_ONCE(*y);
> > 	}
> > 
> > 	P1(int *x, int *y)
> > 	{
> > 		int r1;
> > 
> > 		WRITE_ONCE(*y, 1);
> > 		smp_mb();
> > 		r1 = READ_ONCE(*x);
> > 	}
> > 
> > 	exists (0:r0=0 /\ 1:r1=0)
> > 
> > This is forbidden.  It would remain forbidden even if the smp_mb in P1 
> > were replaced by a similar release/acquire pair for the same memory 
> > location.
> 
> Isn't this allowed on x86 mapping smp_mb() to mfence, store-release to plain
> store and load-acquire to plain load? All we're saying is that you can forward
> from a release to an acquire, which is fine for RCpc semantics.

Yeah, as it happens I talked to Will about that exact case while writing
that email :-), this is why he has that thing handy.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-02 17:08               ` Alan Stern
  2017-11-02 17:16                 ` Will Deacon
@ 2017-11-02 17:45                 ` Andrea Parri
  2017-11-02 20:28                   ` Alan Stern
  1 sibling, 1 reply; 32+ messages in thread
From: Andrea Parri @ 2017-11-02 17:45 UTC (permalink / raw)
  To: Alan Stern
  Cc: Peter Zijlstra, Reshetova, Elena, linux-kernel, gregkh, keescook,
	tglx, mingo, ishkamiel, Will Deacon, Paul McKenney, boqun.feng,
	dhowells, david

On Thu, Nov 02, 2017 at 01:08:52PM -0400, Alan Stern wrote:
> On Thu, 2 Nov 2017, Peter Zijlstra wrote:
> 
> > On Thu, Nov 02, 2017 at 11:40:35AM -0400, Alan Stern wrote:
> > > On Thu, 2 Nov 2017, Peter Zijlstra wrote:
> > > 
> > > > > Lock functions such as refcount_dec_and_lock() &
> > > > > refcount_dec_and_mutex_lock() Provide exactly the same guarantees as
> > > > > they atomic counterparts. 
> > > > 
> > > > Nope. The atomic_dec_and_lock() provides smp_mb() while
> > > > refcount_dec_and_lock() merely orders all prior load/store's against all
> > > > later load/store's.
> > > 
> > > In fact there is no guaranteed ordering when refcount_dec_and_lock()  
> > > returns false; 
> > 
> > It should provide a release:
> > 
> >  - if !=1, dec_not_one will provide release
> >  - if ==1, dec_not_one will no-op, but then we'll acquire the lock and
> >    dec_and_test will provide the release, even if the test fails and we
> >    unlock again it should still dec.
> > 
> > The one exception is when the counter is saturated, but in that case
> > we'll never free the object and the ordering is moot in any case.
> 
> Also if the counter is 0, but that will never happen if the 
> refcounting is correct.
> 
> > > it provides ordering only if the return value is true.  
> > > In which case it provides acquire ordering (thanks to the spin_lock),
> > > and both release ordering and a control dependency (thanks to the
> > > refcount_dec_and_test).
> > > 
> > > > The difference is subtle and involves at least 3 CPUs. I can't seem to
> > > > write up anything simple, keeps turning into monsters :/ Will, Paul,
> > > > have you got anything simple around?
> > > 
> > > The combination of acquire + release is not the same as smp_mb, because 
> > 
> > acquire+release is nothing, its release+acquire that I meant which
> > should order things locally, but now that you've got me looking at it
> > again, we don't in fact do that.
> > 
> > So refcount_dec_and_lock() will provide a release, irrespective of the
> > return value (assuming we're not saturated). If it returns true, it also
> > does an acquire for the lock.
> > 
> > But combined they're acquire+release, which is unfortunate.. it means
> > the lock section and the refcount stuff overlaps, but I don't suppose
> > that's actually a problem. Need to consider more.
> 
> Right.  To address your point: release + acquire isn't the same as a
> full barrier either.  The SB pattern illustrates the difference:
> 
> 	P0		P1
> 	Write x=1	Write y=1
> 	Release a	smp_mb
> 	Acquire b	Read x=0
> 	Read y=0
> 
> This would not be allowed if the release + acquire sequence was 
> replaced by smp_mb.  But as it stands, this is allowed because nothing 
> prevents the CPU from interchanging the order of the release and the 
> acquire -- and then you're back to the acquire + release case.
> 
> However, there is one circumstance where this interchange isn't 
> allowed: when the release and acquire access the same memory 
> location.  Thus:
> 
> 	P0(int *x, int *y, int *a)
> 	{
> 		int r0;
> 
> 		WRITE_ONCE(*x, 1);
> 		smp_store_release(a, 1);
> 		smp_load_acquire(a);
> 		r0 = READ_ONCE(*y);
> 	}
> 
> 	P1(int *x, int *y)
> 	{
> 		int r1;
> 
> 		WRITE_ONCE(*y, 1);
> 		smp_mb();
> 		r1 = READ_ONCE(*x);
> 	}
> 
> 	exists (0:r0=0 /\ 1:r1=0)
> 
> This is forbidden.  It would remain forbidden even if the smp_mb in P1 
> were replaced by a similar release/acquire pair for the same memory 
> location.

Hopefully, the LKMM does not agree with this assessment... ;-)


> 
> To see the difference between smp_mb and release/acquire requires three 
> threads:
> 
> 	P0		P1		P2
> 	Write x=1	Read y=1	Read z=1
> 	Release a	data dep.	smp_rmb
> 	Acquire a	Write z=1	Read x=0
> 	Write y=1
> 
> The Linux Kernel Memory Model allows this execution, although as far as 
> I know, no existing hardware will do it.  But with smp_mb in P0, the 
> execution would be forbidden.

Here's a two-threads example showing that "(w)mb is _not_ rfi-rel-acq":

C rfi-rel-acq-is-not-mb

{}

P0(int *x, int *y, int *a)
{
	WRITE_ONCE(*x, 1);
	smp_store_release(a, 1);
	r1 = smp_load_acquire(a);
	WRITE_ONCE(*y, 1);
}

P1(int *x, int *y)
{
	int r0;
	int r1;

	r0 = READ_ONCE(*y);
	smp_rmb();
	r1 = READ_ONCE(*x);
}

exists (1:r0=1 /\ 1:r1=0)

  Andrea


> 
> None of this should be a problem for refcount_dec_and_lock, assuming it 
> is used purely for reference counting.
> 
> Alan Stern
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-02 17:16                 ` Will Deacon
  2017-11-02 17:26                   ` Peter Zijlstra
@ 2017-11-02 20:21                   ` Alan Stern
  2017-11-15 18:05                     ` Will Deacon
  1 sibling, 1 reply; 32+ messages in thread
From: Alan Stern @ 2017-11-02 20:21 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Reshetova, Elena, linux-kernel, gregkh, keescook,
	tglx, mingo, ishkamiel, Paul McKenney, parri.andrea, boqun.feng,
	dhowells, david

On Thu, 2 Nov 2017, Will Deacon wrote:

> > Right.  To address your point: release + acquire isn't the same as a
> > full barrier either.  The SB pattern illustrates the difference:
> > 
> > 	P0		P1
> > 	Write x=1	Write y=1
> > 	Release a	smp_mb
> > 	Acquire b	Read x=0
> > 	Read y=0
> > 
> > This would not be allowed if the release + acquire sequence was 
> > replaced by smp_mb.  But as it stands, this is allowed because nothing 
> > prevents the CPU from interchanging the order of the release and the 
> > acquire -- and then you're back to the acquire + release case.
> > 
> > However, there is one circumstance where this interchange isn't 
> > allowed: when the release and acquire access the same memory 
> > location.  Thus:
> > 
> > 	P0(int *x, int *y, int *a)
> > 	{
> > 		int r0;
> > 
> > 		WRITE_ONCE(*x, 1);
> > 		smp_store_release(a, 1);
> > 		smp_load_acquire(a);
> > 		r0 = READ_ONCE(*y);
> > 	}
> > 
> > 	P1(int *x, int *y)
> > 	{
> > 		int r1;
> > 
> > 		WRITE_ONCE(*y, 1);
> > 		smp_mb();
> > 		r1 = READ_ONCE(*x);
> > 	}
> > 
> > 	exists (0:r0=0 /\ 1:r1=0)
> > 
> > This is forbidden.  It would remain forbidden even if the smp_mb in P1 
> > were replaced by a similar release/acquire pair for the same memory 
> > location.

I have to apologize; this was totally wrong.  This test is not
forbidden under the LKMM, and it certainly isn't forbidden if the
smp_mb is replaced by a release/acquire pair.

I was trying to think of something completely different.  If you have a
release/acquire to the same address, it creates a happens-before
ordering:

	Access x
	Release a
	Acquire a
	Access y

Here is the access to x happens-before the access to y.  This is true
even on x86, even in the presence of forwarding -- the CPU still has to
execute the instructions in order.  But if the release and acquire are
to different addresses:

	Access x
	Release a
	Acquire b
	Access y

then there is no happens-before ordering for x and y -- the CPU can
execute the last two instructions before the first two.  x86 and
PowerPC won't do this, but I believe ARMv8 can.  (Please correct me if
it can't.)

But happens-before is much weaker than a strong fence.  So in short, 
release + acquire, even to the same address, is no replacement for 
smp_mb().

> Isn't this allowed on x86 mapping smp_mb() to mfence, store-release to plain
> store and load-acquire to plain load? All we're saying is that you can forward
> from a release to an acquire, which is fine for RCpc semantics.
> 
> e.g.
> 
> X86 SB+mfence+po-rfi-po
> "MFencedWR Fre PodWW Rfi PodRR Fre"
> Generator=diyone7 (version 7.46+3)
> Prefetch=0:x=F,0:y=T,1:y=F,1:x=T
> Com=Fr Fr
> Orig=MFencedWR Fre PodWW Rfi PodRR Fre
> {
> }
>  P0          | P1          ;
>  MOV [x],$1  | MOV [y],$1  ;
>  MFENCE      | MOV [z],$1  ;
>  MOV EAX,[y] | MOV EAX,[z] ;
>              | MOV EBX,[x] ;
> exists
> (0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0)
> 
> which herd says is allowed:
> 
> Test SB+mfence+po-rfi-po Allowed
> States 4
> 0:EAX=0; 1:EAX=1; 1:EBX=0;
> 0:EAX=0; 1:EAX=1; 1:EBX=1;
> 0:EAX=1; 1:EAX=1; 1:EBX=0;
> 0:EAX=1; 1:EAX=1; 1:EBX=1;
> Ok
> Witnesses
> Positive: 1 Negative: 3
> Condition exists (0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0)
> Observation SB+mfence+po-rfi-po Sometimes 1 3
> Time SB+mfence+po-rfi-po 0.00
> Hash=0f983e2d7579e5c04c332f9ac620c31f
> 
> and I can reproduce using litmus to actually run it on my x86 box:
> 
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> % Results for SB+mfence+po-rfi-po.litmus %
> %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
> X86 SB+mfence+po-rfi-po
> "MFencedWR Fre PodWW Rfi PodRR Fre"
> 
> {}
> 
>  P0          | P1          ;
>  MOV [x],$1  | MOV [y],$1  ;
>  MFENCE      | MOV [z],$1  ;
>  MOV EAX,[y] | MOV EAX,[z] ;
>              | MOV EBX,[x] ;
> 
> exists (0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0)
> Generated assembler
> #START _litmus_P1
> 	movl $1,(%r8,%rcx)
> 	movl $1,(%r9,%rcx)
> 	movl (%r9,%rcx),%eax
> 	movl (%rdi,%rcx),%edx
> #START _litmus_P0
> 	movl $1,(%rdx,%rcx)
> 	mfence
> 	movl (%rdi,%rcx),%eax
> 
> Test SB+mfence+po-rfi-po Allowed
> Histogram (4 states)
> 8     *>0:EAX=0; 1:EAX=1; 1:EBX=0;
> 1999851:>0:EAX=1; 1:EAX=1; 1:EBX=0;
> 1999549:>0:EAX=0; 1:EAX=1; 1:EBX=1;
> 592   :>0:EAX=1; 1:EAX=1; 1:EBX=1;
> Ok
> 
> Witnesses
> Positive: 8, Negative: 3999992
> Condition exists (0:EAX=0 /\ 1:EAX=1 /\ 1:EBX=0) is validated
> Hash=0f983e2d7579e5c04c332f9ac620c31f
> Generator=diyone7 (version 7.46+3)
> Com=Fr Fr
> Orig=MFencedWR Fre PodWW Rfi PodRR Fre
> Observation SB+mfence+po-rfi-po Sometimes 8 3999992
> Time SB+mfence+po-rfi-po 0.17

Yes, you are quite correct.  Thanks for pointing out my mistake.

Alan Stern

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-02 17:45                 ` Andrea Parri
@ 2017-11-02 20:28                   ` Alan Stern
  0 siblings, 0 replies; 32+ messages in thread
From: Alan Stern @ 2017-11-02 20:28 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Peter Zijlstra, Reshetova, Elena, linux-kernel, gregkh, keescook,
	tglx, mingo, ishkamiel, Will Deacon, Paul McKenney, boqun.feng,
	dhowells, david

On Thu, 2 Nov 2017, Andrea Parri wrote:

> > This is forbidden.  It would remain forbidden even if the smp_mb in P1 
> > were replaced by a similar release/acquire pair for the same memory 
> > location.
> 
> Hopefully, the LKMM does not agree with this assessment... ;-)

No, it doesn't.

> Here's a two-threads example showing that "(w)mb is _not_ rfi-rel-acq":
> 
> C rfi-rel-acq-is-not-mb
> 
> {}
> 
> P0(int *x, int *y, int *a)
> {
> 	WRITE_ONCE(*x, 1);
> 	smp_store_release(a, 1);
> 	r1 = smp_load_acquire(a);
> 	WRITE_ONCE(*y, 1);
> }
> 
> P1(int *x, int *y)
> {
> 	int r0;
> 	int r1;
> 
> 	r0 = READ_ONCE(*y);
> 	smp_rmb();
> 	r1 = READ_ONCE(*x);
> }
> 
> exists (1:r0=1 /\ 1:r1=0)

Right.  There is a happens-before edge between the two WRITE_ONCE calls
in P0 but no cumul-fence edge, and therefore the test is allowed.  
Here is an example where a happens-before edge suffices to provide
ordering:

P0(int *x, int *y)
{
	WRITE_ONCE(*x, 1);
	smp_wmb();
	WRITE_ONCE(*y, 1);
}

P1(int *x, int *y, int *a)
{
	int rx, ry, ra;

	ry = READ_ONCE(*y);
	smp_store_release(a, 1);
	ra = smp_load_acquire(a);
	rx = READ_ONCE(*x);
}

exists (1:rx=0 /\ 1:ry=1)

This test is forbidden, but it would be allowed if the release and 
acquire accessed different locations.

Alan Stern

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-02 13:57         ` Peter Zijlstra
  2017-11-02 15:40           ` Alan Stern
@ 2017-11-03 11:55           ` Reshetova, Elena
  2017-11-13  9:09           ` Reshetova, Elena
  2 siblings, 0 replies; 32+ messages in thread
From: Reshetova, Elena @ 2017-11-03 11:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, gregkh, keescook, tglx, mingo, ishkamiel,
	Will Deacon, Paul McKenney, stern, parri.andrea, boqun.feng,
	dhowells, david


> On Thu, Nov 02, 2017 at 11:04:53AM +0000, Reshetova, Elena wrote:
> 
> > Well refcount_dec_and_test() is not the only function that has different
> > memory ordering specifics. So, the full answer then for any arbitrary case
> > according to your points above would be smth like:
> >
> > for each substituted function (atomic_* --> refcount_*) that actually
> >  has changes in memory ordering *** perform the following:
> >   - mention the difference
> >   - mention the actual code place where the change would affect (
> >    various put and etc. functions)
> >   - potentially try to understand if it would make a difference for
> >   this code (again here is where I am not sure how to do it properly)
> >
> >
> > *** the actual list of these functions to me looks like:
> 
> >  1) atomic_inc_not_zero -> refcount_inc_not_zero.  Change is from
> >  underlying atomic_cmpxchg() to atomic_try_cmpxchg_relaxed () First
> >  one implies SMP-conditional general memory barrier (smp_mb()) on each
> >  side of the actual operation (at least according to documentation, In
> >  reality it goes through so many changes, ifdefs and conditionals that
> >  one gets lost Easily in the process).
> 
> That matches _inc(), which doesn't imply anything.

So you are saying that atomic_inc_not_zero has the same guarantees (meaning
no guarantees) as atomic_inc?  If yes, then I am now confused here because
atomic_inc_not_zero based on atomic_cmpxchg() which according to this
https://elixir.free-electrons.com/linux/latest/source/Documentation/memory-barriers.txt#L2527
does imply the smp_mb()....

> 
> The reasoning being that you must already have obtained a pointer to the
> object; and that will necessarily include enough ordering to observe the
> object. Therefore increasing the refcount doesn't require further
> constraints.
> 
> > Second one according to the comment implies no memory barriers
> > whatsoever, BUT "provides a control dependency which will order future
> > stores against the inc" So, every store (not load) operation (I guess
> > we are talking here only about store operations that touch the same
> > object, but I wonder how it is defined in terms of memory location?
> 
> Memory location is irrelevant.

I was just trying to understand to what "stores" does control dependency
barrier applies here? You mean it applies to absolutely all stores on all 
objects? I guess we are talking about the same object here, just trying to 
understand how object is defined in terms of memory location. 
> 
> > (overlapping?)  that comes inside "if refcount_inc_not_zero(){}" cause
> > would only be executed if functions returns true.
> 
> The point is that a CPU must NOT speculate on stores. So while it can
> speculate a branch, any store inside the branch must not become visible
> until it can commit to that branch.
> 
> The whole point being that possible modifications to the object to which
> we've _conditionally_ acquired a reference, will only happen after we're
> sure to indeed have acquired this reference.
> 
> Otherwise its similar to _inc().

Yes, now I understand this part. 

> 
> > So, practically what we might "loose" here is any updates on the
> > object protected by this refcounter done by another CPU. But for this
> > purpose we expect the developer to take some other lock/memory barrier
> > into use, right?
> 
> Correct, object modification had better be serialized. Refcounts cannot
> (even with atomic_t) help with that.
> 
> > We only care of incrementing the refcount atomically and make sure we
> > don't do anything with object unless it is ready for us to be used.
> 
> Just so..
> 
> > If yes, then  I guess it might be a big change for the code that
> > previously relied on atomic-provided smp_mb() barriers and now instead
> > needs to take an explicit locks/barriers by itself.
> 
> Right, however such memory ordering should be explicitly documented.
> Unknown and hidden memory ordering is a straight up bug, because
> modifications to the code (be they introducing refcount_t or anything
> else) can easily break things.

Yes, this is what has been mentioned before many times, but again reality might
be different, so better be prepared here also. 
> 
> > 2) atomic_** -> refcount_add_not_zero. Fortunately these are super
> > rare and need to see per each case dependent on actual atomic function
> > substituted.
> 
> See inc_not_zero.
> 
> > 3) atomic_add() --> refcount_add(). This should not make any change
> > since both do not provide memory ordering at all, but there is a
> > comment in the refcount.c that says that refcount_add " provide a
> > control dependency and thereby orders future stores".  How is it done
> > given that refcount_add is void returning function??  I am fully
> > confused with this one.
> 
> Weird, mostly comes from being implemented using add_not_zero I suppose.

Yes, underlying is add_not_zero, but is it still correct to talk about any control 
dependencies here? How would it possibly look in the code? What is the surrounding
if statement? 

> 
> > 4) atomic_dec () --> refcount_dec (). This one we have discussed
> > extensively before. Again first one implies SMP-conditional general
> > memory barrier (smp_mb()) on each side of the actual operation.
> 
> No, atomic_dec() does not in fact imply anything..
> 
> > Second one only provides "release" ordering meaning that prior
> > both loads and stores must be completed before the operation.
> > So, is it correct to express the difference in this way:
> >
> > atomic_dec ():			refcount_dec ():
> >
> > smp_mb();			smp_mb();
> > do_atomic_dec_operation;		do_atomic_dec_operation;
> > smp_mb();			/*note no any barrier here! */
> 
> 
> No, on two points: atomic_dec() does not imply _anything_ and while
> refcount_dec() does the release, that is distinctly different from
> smp_mb() before.

Oh, yes, sorry this got confused. So, here we actually have the opposite situation:
refcount variant kind of provides a bit more order here than atomic variant.  
> 
>   C C-peterz-release-vs-mb
> 
>   {
>   }
> 
>   P0(int *a, int *b, int *c)
>   {
> 	  WRITE_ONCE(*a, 1);
>   //	smp_mb();
> 	  smp_store_release(b, 1);
> 	  WRITE_ONCE(*c, 1);
>   }
> 
>   P1(int *a, int *b, int *c)
>   {
> 	  r3 = READ_ONCE(*c);
> 	  smp_rmb();
> 	  r2 = smp_load_acquire(b);
> 	  r1 = READ_ONCE(*a);
>   }
> 
>   exists (1:r1=0 /\ 1:r2=0 /\ 1:r3=1)
> 
> That happens without the smp_mb(), once you put that back its no longer
> allowed.
> 
> The reason is that smp_store_release() does not constrain the store to
> @c and that is allowed to happen before, whereas with smp_mb() that
> store can not happen before the store to @a.
> 
> smp_store_release() only affects the prior load/stores not anything
> later, whereas smp_mb() imposes full order.

This is a good example, thank you, helps understanding greatly. 

> 
> > 5) atomic_dec_and_test () --> refcount_dec_and_test (). Same as 4)
> > but in addition we have a control flow barrier.
> 
> No, this one was in fact a full barrier and is now a release.
> 
> > So, is it correct to express the difference in this way:
> >
> > atomic_dec_and_test ():	                            refcount_dec_and_test ():
> >
> > smp_mb();                                                               smp_mb();
> > if (do_atomic_dec_and_test;) {                          if (do_atomic_dec_and_test;){
> >     /* 1 --> 0 transition happened */                        /* 1 --> 0 transition happened */
> >      smp_mb();                                                               /* in refcounter case we assume
> >      do_anything including free();                                  that we are the last user of
> object
> >                                                                                            so that no concurrent access can
> happen*/
> >                                                                                        /* control_flow_barrier here */ <--
> ---- which one btw????
> >                                                                                        /* only need to guarantee that we
> free after
> >                                                                                             reaching zero so we are good
> here */
> > }                                                                                }
>   smp_mb();
> 
> or better:
> 
>   if ( ({ smp_mb(); ret = do_atomic_dec_and_test(); smp_mb(); ret }) )
> 
> The difference being that the smp_mb happens on both branches of the
> condition (or in fact before the branch).
> 
> Same note as the above; the smp_mb() on the refcount_dec_and_test() side
> is strictly stronger than the release.

What is the correct way to show the control flow barriers in the example? 
The memory-barriers.txt only uses READ/WRITE_ONCE() notation. 

> 
> > 6) atomic_sub_and_test () --> refcount_sub_and_test (). Identical to 5)
> 
> Yes, with the same note as for add*, these really should not be used.
> 
> > 7) atomic_add_unless(&var, -1, 1) --> refcount_dec_not_one().
> > Should be identical to 4)
> 
> Correct; was fully ordered, is now a release.
> 
> > Lock functions such as refcount_dec_and_lock() &
> > refcount_dec_and_mutex_lock() Provide exactly the same guarantees as
> > they atomic counterparts.
> 
> Nope. The atomic_dec_and_lock() provides smp_mb() while
> refcount_dec_and_lock() merely orders all prior load/store's against all
> later load/store's.

Yes, I confused the whole thing again. Yes, atomic_dec_and_lock()
is based on atomic_add_unless() which impies smp_mb(). 


> 
> The difference is subtle and involves at least 3 CPUs. I can't seem to
> write up anything simple, keeps turning into monsters :/ Will, Paul,
> have you got anything simple around?
> 
> > Is the above a correct view on this? I think that instead of putting
> > this whole info in the individual commits, how about making some
> > documentation subsection (in memory-barriers or atomics or on its own,
> > don't know what is better) with examples like above?
> 
> > Then in each individual commit I can point to the respective line in
> > the doc and say that here is a particular memory guarantee change and
> > in the case of this particular code it would apply to your
> > put_pi_state() or whatever else function.  Otherwise keeping this info
> > only in commits do not benefit any future users of refcount.
> 
> Sure you can write a refcount-vs-atomic.txt file to call out these
> differences.

Ok, I will try collecting the relevant examples from this thread as it goes 
and will try to send some first version on Monday. 

> 
> > > So while memory-barriers.txt is a longish document, it is readable with
> > > a bit of time and effort. There are also tools being developed that can
> > > help people validate their code.
> >
> > Could you please point to the tools? Might help my understanding also.
> 
> https://github.com/aparri/memory-model

Thank you, will try to check!
> 
> > So, I actually went and tried to __slowly__ read the
> > memory-barriers.txt which indeed makes a lot of sense when you read it
> > in full (not partially like it did it before). But the issue with this
> > doc is not so much the length but not always consequent way of giving
> > the information to unprepared reader.
> 
> Fair enough; that document has grown over the years as has our
> understanding on the subject. I would imagine it is indeed somewhat
> inconsistent.
> 
> Note that there's work done on better documents and updates to this one.
> One document that might be good to read (I have not in fact had time to
> read it myself yet :-():
> 
>   https://github.com/aparri/memory-
> model/blob/master/Documentation/explanation.txt

Ok, let me try this one. Anyhow it would take a while for all this info to 
settle properly in my head, there are a lot of details to keep in mind all
the time....

> 
> > I have many examples that I have wondered for a while what is meant with
> > certain term or notion somewhere at the beginning of the doc
> > only to discover a fully specified answer somewhere after line 1800.
> 
> > I do have many suggestions that might help to improve the readability
> > of the document but it would take me at least a day to write them down
> > (and it is going to be a long read for anyone), so before I go do it,
> > I want to know if anyone is interested to hear :) Anyway this would be
> > a separate exercise to do if people want.
> 
> Yes, in general feedback on that document is appreciated. Cc'ed a number
> of more memory ordering people.
> 
> That said; we have to rely on maintainers to be aware of 'some' memory
> ordering requirements, otherwise it will be exceedingly hard to find
> them.

You mean hard to find requirements or maintainers that keep all these details
in their heads? :) 
 
> 
> One tell-tale sign is if there are otherwise unmatched memory barriers
> in the code. In that case they could end up being matched by atomic or
> lock ops. And at that point we need to be careful and reverse engineer
> what ordering is required.

So, you mean that if I see somewhere that there is let's say smp_rmb(), 
but then can't find any matching smp_wmb() or smp_mb() then I should check if it has 
been done "underneath" using one of atomic functions. I think this is 
a good tip for checking actually, I will start looking into these.
Might be good tip to put into doc also as suggestion for people to double 
check their code when/if they convert.  


Best Regards,
Elena.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-02 13:57         ` Peter Zijlstra
  2017-11-02 15:40           ` Alan Stern
  2017-11-03 11:55           ` Reshetova, Elena
@ 2017-11-13  9:09           ` Reshetova, Elena
  2017-11-13 13:19             ` Paul E. McKenney
  2017-11-16 13:44             ` Michal Hocko
  2 siblings, 2 replies; 32+ messages in thread
From: Reshetova, Elena @ 2017-11-13  9:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, gregkh, keescook, tglx, mingo, ishkamiel,
	Will Deacon, Paul McKenney, stern, parri.andrea, boqun.feng,
	dhowells, david

<snip>

> Note that there's work done on better documents and updates to this one.
> One document that might be good to read (I have not in fact had time to
> read it myself yet :-():
> 
>   https://github.com/aparri/memory-
> model/blob/master/Documentation/explanation.txt
> 

I have just finished reading over this and must say that this is excellent. 
If I would have started reading on this topic from this doc and then move
to other in-tree docs, including memory-barriers.txt, I would
have had much less issues/questions and it would be much less of a bumpy
read. 

Is there any plan to include it into official kernel doc tree? I really think it
would be very helpful for others also even basically to explain the notations, properties 
and language people talk about these issues and give examples. 

I will try to improve a bit the new doc I have previously sent a patch for in the 
spirit of this reading. 


Best Regards,
Elena.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-13  9:09           ` Reshetova, Elena
@ 2017-11-13 13:19             ` Paul E. McKenney
  2017-11-13 16:01               ` Reshetova, Elena
  2017-11-16 13:44             ` Michal Hocko
  1 sibling, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2017-11-13 13:19 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Peter Zijlstra, linux-kernel, gregkh, keescook, tglx, mingo,
	ishkamiel, Will Deacon, stern, parri.andrea, boqun.feng,
	dhowells, david

On Mon, Nov 13, 2017 at 09:09:57AM +0000, Reshetova, Elena wrote:
> <snip>
> 
> > Note that there's work done on better documents and updates to this one.
> > One document that might be good to read (I have not in fact had time to
> > read it myself yet :-():
> > 
> >   https://github.com/aparri/memory-
> > model/blob/master/Documentation/explanation.txt
> 
> I have just finished reading over this and must say that this is excellent. 
> If I would have started reading on this topic from this doc and then move
> to other in-tree docs, including memory-barriers.txt, I would
> have had much less issues/questions and it would be much less of a bumpy
> read. 

Glad you like it!  May we have your Acked-by?

> Is there any plan to include it into official kernel doc tree? I really think it
> would be very helpful for others also even basically to explain the notations, properties 
> and language people talk about these issues and give examples. 

Yes, we do plan to submit it for inclusion.

							Thanx, Paul

> I will try to improve a bit the new doc I have previously sent a patch for in the 
> spirit of this reading. 
> 
> 
> Best Regards,
> Elena.
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-13 13:19             ` Paul E. McKenney
@ 2017-11-13 16:01               ` Reshetova, Elena
  2017-11-13 16:26                 ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Reshetova, Elena @ 2017-11-13 16:01 UTC (permalink / raw)
  To: paulmck
  Cc: Peter Zijlstra, linux-kernel, gregkh, keescook, tglx, mingo,
	ishkamiel, Will Deacon, stern, parri.andrea, boqun.feng,
	dhowells, david


> On Mon, Nov 13, 2017 at 09:09:57AM +0000, Reshetova, Elena wrote:
> > <snip>
> >
> > > Note that there's work done on better documents and updates to this one.
> > > One document that might be good to read (I have not in fact had time to
> > > read it myself yet :-():
> > >
> > >   https://github.com/aparri/memory-
> > > model/blob/master/Documentation/explanation.txt
> >
> > I have just finished reading over this and must say that this is excellent.
> > If I would have started reading on this topic from this doc and then move
> > to other in-tree docs, including memory-barriers.txt, I would
> > have had much less issues/questions and it would be much less of a bumpy
> > read.
> 
> Glad you like it!  May we have your Acked-by?

I think my Acked-by has little value in this case since I am really a beginner in it
myself, but I would strongly suggest to Peter and others to consider inclusion 
of this doc into the tree since I do see a value in it. Again, I am not really an
important person here :)

> 
> > Is there any plan to include it into official kernel doc tree? I really think it
> > would be very helpful for others also even basically to explain the notations,
> properties
> > and language people talk about these issues and give examples.
> 
> Yes, we do plan to submit it for inclusion.

Great, I think it would help people!

Best Regards,
Elena.

> 
> 
> 		Thanx, Paul
> 
> > I will try to improve a bit the new doc I have previously sent a patch for in the
> > spirit of this reading.
> >
> >
> > Best Regards,
> > Elena.
> >

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-13 16:01               ` Reshetova, Elena
@ 2017-11-13 16:26                 ` Paul E. McKenney
  2017-11-14 11:23                   ` Reshetova, Elena
  0 siblings, 1 reply; 32+ messages in thread
From: Paul E. McKenney @ 2017-11-13 16:26 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Peter Zijlstra, linux-kernel, gregkh, keescook, tglx, mingo,
	ishkamiel, Will Deacon, stern, parri.andrea, boqun.feng,
	dhowells, david

On Mon, Nov 13, 2017 at 04:01:11PM +0000, Reshetova, Elena wrote:
> 
> > On Mon, Nov 13, 2017 at 09:09:57AM +0000, Reshetova, Elena wrote:
> > > <snip>
> > >
> > > > Note that there's work done on better documents and updates to this one.
> > > > One document that might be good to read (I have not in fact had time to
> > > > read it myself yet :-():
> > > >
> > > >   https://github.com/aparri/memory-
> > > > model/blob/master/Documentation/explanation.txt
> > >
> > > I have just finished reading over this and must say that this is excellent.
> > > If I would have started reading on this topic from this doc and then move
> > > to other in-tree docs, including memory-barriers.txt, I would
> > > have had much less issues/questions and it would be much less of a bumpy
> > > read.
> > 
> > Glad you like it!  May we have your Acked-by?
> 
> I think my Acked-by has little value in this case since I am really a beginner in it
> myself, but I would strongly suggest to Peter and others to consider inclusion 
> of this doc into the tree since I do see a value in it. Again, I am not really an
> important person here :)

I respectfully disagree.

The fact that this file was helpful to a self-described beginner
such as yourself is -way- more important than it being helpful to us
battlescarred veterans.

Besides, Peter already agreed, perhaps in a moment of weakness, to
allow his name to be added to this patch.

In any case, it is your choice, but I personally would value your
Acked-by.

> > > Is there any plan to include it into official kernel doc tree? I really think it
> > > would be very helpful for others also even basically to explain the notations,
> > properties
> > > and language people talk about these issues and give examples.
> > 
> > Yes, we do plan to submit it for inclusion.
> 
> Great, I think it would help people!

Here is hoping!  ;-)

							Thanx, Paul

> Best Regards,
> Elena.
> 
> > 
> > 
> > 		Thanx, Paul
> > 
> > > I will try to improve a bit the new doc I have previously sent a patch for in the
> > > spirit of this reading.
> > >
> > >
> > > Best Regards,
> > > Elena.
> > >
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* RE: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-13 16:26                 ` Paul E. McKenney
@ 2017-11-14 11:23                   ` Reshetova, Elena
  2017-11-14 17:24                     ` Paul E. McKenney
  0 siblings, 1 reply; 32+ messages in thread
From: Reshetova, Elena @ 2017-11-14 11:23 UTC (permalink / raw)
  To: paulmck
  Cc: Peter Zijlstra, linux-kernel, gregkh, keescook, tglx, mingo,
	ishkamiel, Will Deacon, stern, parri.andrea, boqun.feng,
	dhowells, david

> On Mon, Nov 13, 2017 at 04:01:11PM +0000, Reshetova, Elena wrote:
> >
> > > On Mon, Nov 13, 2017 at 09:09:57AM +0000, Reshetova, Elena wrote:
> > > > <snip>
> > > >
> > > > > Note that there's work done on better documents and updates to this one.
> > > > > One document that might be good to read (I have not in fact had time to
> > > > > read it myself yet :-():
> > > > >
> > > > >   https://github.com/aparri/memory-
> > > > > model/blob/master/Documentation/explanation.txt
> > > >
> > > > I have just finished reading over this and must say that this is excellent.
> > > > If I would have started reading on this topic from this doc and then move
> > > > to other in-tree docs, including memory-barriers.txt, I would
> > > > have had much less issues/questions and it would be much less of a bumpy
> > > > read.
> > >
> > > Glad you like it!  May we have your Acked-by?
> >
> > I think my Acked-by has little value in this case since I am really a beginner in it
> > myself, but I would strongly suggest to Peter and others to consider inclusion
> > of this doc into the tree since I do see a value in it. Again, I am not really an
> > important person here :)
> 
> I respectfully disagree.
> 
> The fact that this file was helpful to a self-described beginner
> such as yourself is -way- more important than it being helpful to us
> battlescarred veterans.
> 
> Besides, Peter already agreed, perhaps in a moment of weakness, to
> allow his name to be added to this patch.
> 
> In any case, it is your choice, but I personally would value your
> Acked-by.

Sure, please feel free to add it if it is of any help/value!

Best Regards,
Elena.

> 
> > > > Is there any plan to include it into official kernel doc tree? I really think it
> > > > would be very helpful for others also even basically to explain the notations,
> > > properties
> > > > and language people talk about these issues and give examples.
> > >
> > > Yes, we do plan to submit it for inclusion.
> >
> > Great, I think it would help people!
> 
> Here is hoping!  ;-)
> 
> 
> 		Thanx, Paul
> 
> > Best Regards,
> > Elena.
> >
> > >
> > >
> > > 		Thanx, Paul
> > >
> > > > I will try to improve a bit the new doc I have previously sent a patch for in the
> > > > spirit of this reading.
> > > >
> > > >
> > > > Best Regards,
> > > > Elena.
> > > >
> >

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-14 11:23                   ` Reshetova, Elena
@ 2017-11-14 17:24                     ` Paul E. McKenney
  0 siblings, 0 replies; 32+ messages in thread
From: Paul E. McKenney @ 2017-11-14 17:24 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Peter Zijlstra, linux-kernel, gregkh, keescook, tglx, mingo,
	ishkamiel, Will Deacon, stern, parri.andrea, boqun.feng,
	dhowells, david

On Tue, Nov 14, 2017 at 11:23:53AM +0000, Reshetova, Elena wrote:
> > On Mon, Nov 13, 2017 at 04:01:11PM +0000, Reshetova, Elena wrote:
> > >
> > > > On Mon, Nov 13, 2017 at 09:09:57AM +0000, Reshetova, Elena wrote:
> > > > > <snip>
> > > > >
> > > > > > Note that there's work done on better documents and updates to this one.
> > > > > > One document that might be good to read (I have not in fact had time to
> > > > > > read it myself yet :-():
> > > > > >
> > > > > >   https://github.com/aparri/memory-
> > > > > > model/blob/master/Documentation/explanation.txt
> > > > >
> > > > > I have just finished reading over this and must say that this is excellent.
> > > > > If I would have started reading on this topic from this doc and then move
> > > > > to other in-tree docs, including memory-barriers.txt, I would
> > > > > have had much less issues/questions and it would be much less of a bumpy
> > > > > read.
> > > >
> > > > Glad you like it!  May we have your Acked-by?
> > >
> > > I think my Acked-by has little value in this case since I am really a beginner in it
> > > myself, but I would strongly suggest to Peter and others to consider inclusion
> > > of this doc into the tree since I do see a value in it. Again, I am not really an
> > > important person here :)
> > 
> > I respectfully disagree.
> > 
> > The fact that this file was helpful to a self-described beginner
> > such as yourself is -way- more important than it being helpful to us
> > battlescarred veterans.
> > 
> > Besides, Peter already agreed, perhaps in a moment of weakness, to
> > allow his name to be added to this patch.
> > 
> > In any case, it is your choice, but I personally would value your
> > Acked-by.
> 
> Sure, please feel free to add it if it is of any help/value!

Very good, I have added it, thank you!

							Thanx, Paul

> Best Regards,
> Elena.
> 
> > 
> > > > > Is there any plan to include it into official kernel doc tree? I really think it
> > > > > would be very helpful for others also even basically to explain the notations,
> > > > properties
> > > > > and language people talk about these issues and give examples.
> > > >
> > > > Yes, we do plan to submit it for inclusion.
> > >
> > > Great, I think it would help people!
> > 
> > Here is hoping!  ;-)
> > 
> > 
> > 		Thanx, Paul
> > 
> > > Best Regards,
> > > Elena.
> > >
> > > >
> > > >
> > > > 		Thanx, Paul
> > > >
> > > > > I will try to improve a bit the new doc I have previously sent a patch for in the
> > > > > spirit of this reading.
> > > > >
> > > > >
> > > > > Best Regards,
> > > > > Elena.
> > > > >
> > >
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-02 20:21                   ` Alan Stern
@ 2017-11-15 18:05                     ` Will Deacon
  2017-11-15 19:15                       ` Alan Stern
  0 siblings, 1 reply; 32+ messages in thread
From: Will Deacon @ 2017-11-15 18:05 UTC (permalink / raw)
  To: Alan Stern
  Cc: Peter Zijlstra, Reshetova, Elena, linux-kernel, gregkh, keescook,
	tglx, mingo, ishkamiel, Paul McKenney, parri.andrea, boqun.feng,
	dhowells, david

On Thu, Nov 02, 2017 at 04:21:56PM -0400, Alan Stern wrote:
> I was trying to think of something completely different.  If you have a
> release/acquire to the same address, it creates a happens-before
> ordering:
> 
> 	Access x
> 	Release a
> 	Acquire a
> 	Access y
> 
> Here is the access to x happens-before the access to y.  This is true
> even on x86, even in the presence of forwarding -- the CPU still has to
> execute the instructions in order.  But if the release and acquire are
> to different addresses:
> 
> 	Access x
> 	Release a
> 	Acquire b
> 	Access y
> 
> then there is no happens-before ordering for x and y -- the CPU can
> execute the last two instructions before the first two.  x86 and
> PowerPC won't do this, but I believe ARMv8 can.  (Please correct me if
> it can't.)

Release/Acquire are RCsc on ARMv8, so they are ordered irrespective of
address.

Will

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-15 18:05                     ` Will Deacon
@ 2017-11-15 19:15                       ` Alan Stern
  2017-11-15 20:03                         ` Peter Zijlstra
  0 siblings, 1 reply; 32+ messages in thread
From: Alan Stern @ 2017-11-15 19:15 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Reshetova, Elena, linux-kernel, gregkh, keescook,
	tglx, mingo, ishkamiel, Paul McKenney, parri.andrea, boqun.feng,
	dhowells, david

On Wed, 15 Nov 2017, Will Deacon wrote:

> On Thu, Nov 02, 2017 at 04:21:56PM -0400, Alan Stern wrote:
> > I was trying to think of something completely different.  If you have a
> > release/acquire to the same address, it creates a happens-before
> > ordering:
> > 
> > 	Access x
> > 	Release a
> > 	Acquire a
> > 	Access y
> > 
> > Here is the access to x happens-before the access to y.  This is true
> > even on x86, even in the presence of forwarding -- the CPU still has to
> > execute the instructions in order.  But if the release and acquire are
> > to different addresses:
> > 
> > 	Access x
> > 	Release a
> > 	Acquire b
> > 	Access y
> > 
> > then there is no happens-before ordering for x and y -- the CPU can
> > execute the last two instructions before the first two.  x86 and
> > PowerPC won't do this, but I believe ARMv8 can.  (Please correct me if
> > it can't.)
> 
> Release/Acquire are RCsc on ARMv8, so they are ordered irrespective of
> address.

Ah, okay, thanks.

In any case, we have considered removing this ordering constraint
(store-release followed by load-acquire for the same location) from the
Linux-kernel memory model.  I'm not aware of any code in the kernel
that depends on it.  Do any of you happen to know of any examples?

Alan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-15 19:15                       ` Alan Stern
@ 2017-11-15 20:03                         ` Peter Zijlstra
  2017-11-15 20:22                           ` Alan Stern
  2017-11-15 21:01                           ` Andrea Parri
  0 siblings, 2 replies; 32+ messages in thread
From: Peter Zijlstra @ 2017-11-15 20:03 UTC (permalink / raw)
  To: Alan Stern
  Cc: Will Deacon, Reshetova, Elena, linux-kernel, gregkh, keescook,
	tglx, mingo, ishkamiel, Paul McKenney, parri.andrea, boqun.feng,
	dhowells, david

On Wed, Nov 15, 2017 at 02:15:19PM -0500, Alan Stern wrote:
> On Wed, 15 Nov 2017, Will Deacon wrote:
> 
> > On Thu, Nov 02, 2017 at 04:21:56PM -0400, Alan Stern wrote:
> > > I was trying to think of something completely different.  If you have a
> > > release/acquire to the same address, it creates a happens-before
> > > ordering:
> > > 
> > > 	Access x
> > > 	Release a
> > > 	Acquire a
> > > 	Access y
> > > 
> > > Here is the access to x happens-before the access to y.  This is true
> > > even on x86, even in the presence of forwarding -- the CPU still has to
> > > execute the instructions in order.  But if the release and acquire are
> > > to different addresses:
> > > 
> > > 	Access x
> > > 	Release a
> > > 	Acquire b
> > > 	Access y
> > > 
> > > then there is no happens-before ordering for x and y -- the CPU can
> > > execute the last two instructions before the first two.  x86 and
> > > PowerPC won't do this, but I believe ARMv8 can.  (Please correct me if
> > > it can't.)
> > 
> > Release/Acquire are RCsc on ARMv8, so they are ordered irrespective of
> > address.
> 
> Ah, okay, thanks.
> 
> In any case, we have considered removing this ordering constraint
> (store-release followed by load-acquire for the same location) from the
> Linux-kernel memory model.

Why? Its a perfectly sensible construct.

> I'm not aware of any code in the kernel that depends on it.  Do any of
> you happen to know of any examples?

All locks? Something like:

	spin_lock(&x)
	/* foo */
	spin_unlock(&x)
	spin_lock(&x)
	/* bar */
	spin_unlock(&x);

Has a fairly high foo happens-before bar expectation level.

And in specific things like:

  135e8c9250dd5
  ecf7d01c229d1

which use the release of rq->lock paired with the next acquire of the
same rq->lock to match with an smp_rmb().

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-15 20:03                         ` Peter Zijlstra
@ 2017-11-15 20:22                           ` Alan Stern
  2017-11-16  8:46                             ` Peter Zijlstra
  2017-11-15 21:01                           ` Andrea Parri
  1 sibling, 1 reply; 32+ messages in thread
From: Alan Stern @ 2017-11-15 20:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Will Deacon, Reshetova, Elena, linux-kernel, gregkh, keescook,
	tglx, mingo, ishkamiel, Paul McKenney, parri.andrea, boqun.feng,
	dhowells, david

On Wed, 15 Nov 2017, Peter Zijlstra wrote:

> On Wed, Nov 15, 2017 at 02:15:19PM -0500, Alan Stern wrote:
> > On Wed, 15 Nov 2017, Will Deacon wrote:
> > 
> > > On Thu, Nov 02, 2017 at 04:21:56PM -0400, Alan Stern wrote:
> > > > I was trying to think of something completely different.  If you have a
> > > > release/acquire to the same address, it creates a happens-before
> > > > ordering:
> > > > 
> > > > 	Access x
> > > > 	Release a
> > > > 	Acquire a
> > > > 	Access y
> > > > 
> > > > Here is the access to x happens-before the access to y.  This is true
> > > > even on x86, even in the presence of forwarding -- the CPU still has to
> > > > execute the instructions in order.  But if the release and acquire are
> > > > to different addresses:
> > > > 
> > > > 	Access x
> > > > 	Release a
> > > > 	Acquire b
> > > > 	Access y
> > > > 
> > > > then there is no happens-before ordering for x and y -- the CPU can
> > > > execute the last two instructions before the first two.  x86 and
> > > > PowerPC won't do this, but I believe ARMv8 can.  (Please correct me if
> > > > it can't.)
> > > 
> > > Release/Acquire are RCsc on ARMv8, so they are ordered irrespective of
> > > address.
> > 
> > Ah, okay, thanks.
> > 
> > In any case, we have considered removing this ordering constraint
> > (store-release followed by load-acquire for the same location) from the
> > Linux-kernel memory model.
> 
> Why? Its a perfectly sensible construct.
> 
> > I'm not aware of any code in the kernel that depends on it.  Do any of
> > you happen to know of any examples?
> 
> All locks? Something like:
> 
> 	spin_lock(&x)
> 	/* foo */
> 	spin_unlock(&x)
> 	spin_lock(&x)
> 	/* bar */
> 	spin_unlock(&x);
> 
> Has a fairly high foo happens-before bar expectation level.
> 
> And in specific things like:
> 
>   135e8c9250dd5
>   ecf7d01c229d1
> 
> which use the release of rq->lock paired with the next acquire of the
> same rq->lock to match with an smp_rmb().

You know, sometimes I feel like I'm losing my mind.

Yes, of course -- this was in fact the original reason for adding that
constraint to the memory model in the first place!  An unlock-to-lock
link between two CPUs would naturally create an ordering relation, and
we wanted the same to be true when everything occurred on a single CPU.

I'll shut up now...

Alan

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-15 20:03                         ` Peter Zijlstra
  2017-11-15 20:22                           ` Alan Stern
@ 2017-11-15 21:01                           ` Andrea Parri
  2017-11-16  8:58                             ` Peter Zijlstra
  1 sibling, 1 reply; 32+ messages in thread
From: Andrea Parri @ 2017-11-15 21:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alan Stern, Will Deacon, Reshetova, Elena, linux-kernel, gregkh,
	keescook, tglx, mingo, ishkamiel, Paul McKenney, boqun.feng,
	dhowells, david

On Wed, Nov 15, 2017 at 09:03:07PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 15, 2017 at 02:15:19PM -0500, Alan Stern wrote:
> > On Wed, 15 Nov 2017, Will Deacon wrote:
> > 
> > > On Thu, Nov 02, 2017 at 04:21:56PM -0400, Alan Stern wrote:
> > > > I was trying to think of something completely different.  If you have a
> > > > release/acquire to the same address, it creates a happens-before
> > > > ordering:
> > > > 
> > > > 	Access x
> > > > 	Release a
> > > > 	Acquire a
> > > > 	Access y
> > > > 
> > > > Here is the access to x happens-before the access to y.  This is true
> > > > even on x86, even in the presence of forwarding -- the CPU still has to
> > > > execute the instructions in order.  But if the release and acquire are
> > > > to different addresses:
> > > > 
> > > > 	Access x
> > > > 	Release a
> > > > 	Acquire b
> > > > 	Access y
> > > > 
> > > > then there is no happens-before ordering for x and y -- the CPU can
> > > > execute the last two instructions before the first two.  x86 and
> > > > PowerPC won't do this, but I believe ARMv8 can.  (Please correct me if
> > > > it can't.)
> > > 
> > > Release/Acquire are RCsc on ARMv8, so they are ordered irrespective of
> > > address.
> > 
> > Ah, okay, thanks.
> > 
> > In any case, we have considered removing this ordering constraint
> > (store-release followed by load-acquire for the same location) from the
> > Linux-kernel memory model.
> 
> Why? Its a perfectly sensible construct.
> 
> > I'm not aware of any code in the kernel that depends on it.  Do any of
> > you happen to know of any examples?
> 
> All locks? Something like:
> 
> 	spin_lock(&x)
> 	/* foo */
> 	spin_unlock(&x)
> 	spin_lock(&x)
> 	/* bar */
> 	spin_unlock(&x);
> 
> Has a fairly high foo happens-before bar expectation level.
> 
> And in specific things like:
> 
>   135e8c9250dd5
>   ecf7d01c229d1
> 
> which use the release of rq->lock paired with the next acquire of the
> same rq->lock to match with an smp_rmb().

Those cycles are currently forbidden by LKMM _when_ you consider the
smp_mb__after_spinlock() from schedule().  See rfi-rel-acq-is-not-mb
from my previous email and Alan's remarks about cumul-fence.

  Andrea

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-15 20:22                           ` Alan Stern
@ 2017-11-16  8:46                             ` Peter Zijlstra
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Zijlstra @ 2017-11-16  8:46 UTC (permalink / raw)
  To: Alan Stern
  Cc: Will Deacon, Reshetova, Elena, linux-kernel, gregkh, keescook,
	tglx, mingo, ishkamiel, Paul McKenney, parri.andrea, boqun.feng,
	dhowells, david

On Wed, Nov 15, 2017 at 03:22:33PM -0500, Alan Stern wrote:
> You know, sometimes I feel like I'm losing my mind.

Hehe, I'm very familiar with that feeling ;-)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-15 21:01                           ` Andrea Parri
@ 2017-11-16  8:58                             ` Peter Zijlstra
  2017-11-16 10:00                               ` Andrea Parri
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Zijlstra @ 2017-11-16  8:58 UTC (permalink / raw)
  To: Andrea Parri
  Cc: Alan Stern, Will Deacon, Reshetova, Elena, linux-kernel, gregkh,
	keescook, tglx, mingo, ishkamiel, Paul McKenney, boqun.feng,
	dhowells, david

On Wed, Nov 15, 2017 at 10:01:11PM +0100, Andrea Parri wrote:

> > And in specific things like:
> > 
> >   135e8c9250dd5
> >   ecf7d01c229d1
> > 
> > which use the release of rq->lock paired with the next acquire of the
> > same rq->lock to match with an smp_rmb().
> 
> Those cycles are currently forbidden by LKMM _when_ you consider the
> smp_mb__after_spinlock() from schedule().  See rfi-rel-acq-is-not-mb
> from my previous email and Alan's remarks about cumul-fence.

I'm not sure I get your point; and you all seem to forget I do not in
fact speak the ordering lingo. So I have no idea what
rfi-blah-blah or cumul-fence mean.

I know rel-acq isn't smp_mb() and I don't think any of the above patches
need it to be. They just need it do be a local ordering, no?

Even without smp_mb__after_spinlock() we get that:

	spin_lock(&x)
	x = 1
	spin_unlock(&x)
	spin_lock(&x)
	y = 1
	spin_unlock(&x)

guarantees that x happens-before y, right?

And that should be sufficient to then order something else against, like
for example:

	r2 = y
	smp_rmb()
	r1 = x

no?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-16  8:58                             ` Peter Zijlstra
@ 2017-11-16 10:00                               ` Andrea Parri
  0 siblings, 0 replies; 32+ messages in thread
From: Andrea Parri @ 2017-11-16 10:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alan Stern, Will Deacon, Reshetova, Elena, linux-kernel, gregkh,
	keescook, tglx, mingo, ishkamiel, Paul McKenney, boqun.feng,
	dhowells, david

On Thu, Nov 16, 2017 at 09:58:04AM +0100, Peter Zijlstra wrote:
> On Wed, Nov 15, 2017 at 10:01:11PM +0100, Andrea Parri wrote:
> 
> > > And in specific things like:
> > > 
> > >   135e8c9250dd5
> > >   ecf7d01c229d1
> > > 
> > > which use the release of rq->lock paired with the next acquire of the
> > > same rq->lock to match with an smp_rmb().
> > 
> > Those cycles are currently forbidden by LKMM _when_ you consider the
> > smp_mb__after_spinlock() from schedule().  See rfi-rel-acq-is-not-mb
> > from my previous email and Alan's remarks about cumul-fence.
> 
> I'm not sure I get your point; and you all seem to forget I do not in
> fact speak the ordering lingo. So I have no idea what
> rfi-blah-blah or cumul-fence mean.

I expand on my comment. Consider the following test:

C T1

{}

P0(int *x, int *y, spinlock_t *s)
{
	spin_lock(s);
        WRITE_ONCE(*x, 1);
	spin_unlock(s);
	spin_lock(s);
        WRITE_ONCE(*y, 1);
	spin_unlock(s);
}

P1(int *x, int *y)
{
        int r0;
        int r1;

        r0 = READ_ONCE(*y);
        smp_rmb();
        r1 = READ_ONCE(*x);
}

exists (1:r0=1 /\ 1:r1=0)

According to LKMM, the store to x happens before the store to y but there
is no guarantee that the former store propagate (to P1) before the latter
(which is what we need to forbid that state).  As a result, that state in
the "exists" clause is _allowed_ by LKMM.

The LKMM encodes happens-before (or execution) ordering with a relation
named "hb", while it encodes "propagation ordering" with "cumul-fence".

  Andrea


> 
> I know rel-acq isn't smp_mb() and I don't think any of the above patches
> need it to be. They just need it do be a local ordering, no?
> 
> Even without smp_mb__after_spinlock() we get that:
> 
> 	spin_lock(&x)
> 	x = 1
> 	spin_unlock(&x)
> 	spin_lock(&x)
> 	y = 1
> 	spin_unlock(&x)
> 
> guarantees that x happens-before y, right?
> 
> And that should be sufficient to then order something else against, like
> for example:
> 
> 	r2 = y
> 	smp_rmb()
> 	r1 = x
> 
> no?
> 
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-13  9:09           ` Reshetova, Elena
  2017-11-13 13:19             ` Paul E. McKenney
@ 2017-11-16 13:44             ` Michal Hocko
  2017-11-16 15:29               ` Paul E. McKenney
  1 sibling, 1 reply; 32+ messages in thread
From: Michal Hocko @ 2017-11-16 13:44 UTC (permalink / raw)
  To: Reshetova, Elena
  Cc: Peter Zijlstra, linux-kernel, gregkh, keescook, tglx, mingo,
	ishkamiel, Will Deacon, Paul McKenney, stern, parri.andrea,
	boqun.feng, dhowells, david

On Mon 13-11-17 09:09:57, Reshetova, Elena wrote:
> <snip>
> 
> > Note that there's work done on better documents and updates to this one.
> > One document that might be good to read (I have not in fact had time to
> > read it myself yet :-():
> > 
> >   https://github.com/aparri/memory-
> > model/blob/master/Documentation/explanation.txt
> > 
> 
> I have just finished reading over this and must say that this is excellent. 

I fully second this. The main problem with
Documentation/memory-barriers.txt is that it jumps from "easy to follow"
to "blow your head" parts is just too quick. On the other hand the above
explanation builds the picture from basics and piles up new layers on
top of previous. So I found it much more easier to grasp. I cannot
really speak for the correctness in all aspects but it certainly makes a
lot of sense to me. If you are really interested then feel free to add.
Acked-by: Michal Hocko <mhocko@suse.com>
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH] refcount: provide same memory ordering guarantees as in atomic_t
  2017-11-16 13:44             ` Michal Hocko
@ 2017-11-16 15:29               ` Paul E. McKenney
  0 siblings, 0 replies; 32+ messages in thread
From: Paul E. McKenney @ 2017-11-16 15:29 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Reshetova, Elena, Peter Zijlstra, linux-kernel, gregkh, keescook,
	tglx, mingo, ishkamiel, Will Deacon, stern, parri.andrea,
	boqun.feng, dhowells, david

On Thu, Nov 16, 2017 at 02:44:30PM +0100, Michal Hocko wrote:
> On Mon 13-11-17 09:09:57, Reshetova, Elena wrote:
> > <snip>
> > 
> > > Note that there's work done on better documents and updates to this one.
> > > One document that might be good to read (I have not in fact had time to
> > > read it myself yet :-():
> > > 
> > >   https://github.com/aparri/memory-
> > > model/blob/master/Documentation/explanation.txt
> > > 
> > 
> > I have just finished reading over this and must say that this is excellent. 
> 
> I fully second this. The main problem with
> Documentation/memory-barriers.txt is that it jumps from "easy to follow"
> to "blow your head" parts is just too quick. On the other hand the above
> explanation builds the picture from basics and piles up new layers on
> top of previous. So I found it much more easier to grasp. I cannot
> really speak for the correctness in all aspects but it certainly makes a
> lot of sense to me. If you are really interested then feel free to add.
> Acked-by: Michal Hocko <mhocko@suse.com>

Thank you, props to Alan for the writing, and I have applied your ack!

							Thanx, Paul

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2017-11-16 15:29 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-23 11:09 [PATCH] refcount: provide same memory ordering guarantees as in atomic_t Elena Reshetova
2017-10-23 13:12 ` Peter Zijlstra
2017-10-27  6:49   ` Reshetova, Elena
2017-10-27 13:56     ` Peter Zijlstra
2017-11-02 11:04       ` Reshetova, Elena
2017-11-02 13:57         ` Peter Zijlstra
2017-11-02 15:40           ` Alan Stern
2017-11-02 16:02             ` Peter Zijlstra
2017-11-02 16:45               ` Peter Zijlstra
2017-11-02 17:08               ` Alan Stern
2017-11-02 17:16                 ` Will Deacon
2017-11-02 17:26                   ` Peter Zijlstra
2017-11-02 20:21                   ` Alan Stern
2017-11-15 18:05                     ` Will Deacon
2017-11-15 19:15                       ` Alan Stern
2017-11-15 20:03                         ` Peter Zijlstra
2017-11-15 20:22                           ` Alan Stern
2017-11-16  8:46                             ` Peter Zijlstra
2017-11-15 21:01                           ` Andrea Parri
2017-11-16  8:58                             ` Peter Zijlstra
2017-11-16 10:00                               ` Andrea Parri
2017-11-02 17:45                 ` Andrea Parri
2017-11-02 20:28                   ` Alan Stern
2017-11-03 11:55           ` Reshetova, Elena
2017-11-13  9:09           ` Reshetova, Elena
2017-11-13 13:19             ` Paul E. McKenney
2017-11-13 16:01               ` Reshetova, Elena
2017-11-13 16:26                 ` Paul E. McKenney
2017-11-14 11:23                   ` Reshetova, Elena
2017-11-14 17:24                     ` Paul E. McKenney
2017-11-16 13:44             ` Michal Hocko
2017-11-16 15:29               ` Paul E. McKenney

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).