* wake_q memory ordering
@ 2019-10-10 10:41 Manfred Spraul
2019-10-10 11:42 ` Peter Zijlstra
0 siblings, 1 reply; 7+ messages in thread
From: Manfred Spraul @ 2019-10-10 10:41 UTC (permalink / raw)
To: Waiman Long, Davidlohr Bueso, Linux Kernel Mailing List, Peter Zijlstra
Cc: 1vier1, Paul E. McKenney
Hi,
Waiman Long noticed that the memory barriers in sem_lock() are not
really documented, and while adding documentation, I ended up with one
case where I'm not certain about the wake_q code:
Questions:
- Does smp_mb__before_atomic() + a (failed) cmpxchg_relaxed provide an
ordering guarantee?
- Is it ok that wake_up_q just writes wake_q->next, shouldn't
smp_store_acquire() be used? I.e.: guarantee that wake_up_process()
happens after cmpxchg_relaxed(), assuming that a failed cmpxchg_relaxed
provides any ordering.
Example:
- CPU2 never touches lock a. It is just an unrelated wake_q user that also
wants to wake up task 1234.
- I've noticed already that smp_store_acquire() doesn't exist.
So smp_store_mb() is required. But from semantical point of view, we
would
need an ACQUIRE: the wake_up_process() must happen after cmpxchg().
- May wake_up_q() rely on the spinlocks/memory barriers in try_to_wake_up,
or should the function be safe by itself?
CPU1: /current=1234, inside do_semtimedop()/
g_wakee = current;
current->state = TASK_INTERRUPTIBLE;
spin_unlock(a);
CPU2: / arbitrary kernel thread that uses wake_q /
wake_q_add(&unrelated_q, 1234);
wake_up_q(&unrelated_q);
<...ongoing>
CPU3: / do_semtimedop() + wake_up_sem_queue_prepare() /
spin_lock(a);
wake_q_add(,g_wakee);
< within wake_q_add() >:
smp_mb__before_atomic();
if (unlikely(cmpxchg_relaxed(&node->next,
NULL, WAKE_Q_TAIL)))
return false; /* -> this happens */
CPU2:
<within wake_up_q>
1234->wake_q.next = NULL; <<<<<<<<< Ok? Is
store_acquire() missing? >>>>>>>>>>>>
wake_up_process(1234);
< within wake_up_process/try_to_wake_up():
raw_spin_lock_irqsave()
smp_mb__after_spinlock()
if(1234->state = TASK_RUNNING) return;
>
rewritten:
start condition: A = 1; B = 0;
CPU1:
B = 1;
RELEASE, unlock LockX;
CPU2:
lock LockX, ACQUIRE
if (LOAD A == 1) return; /* using cmp_xchg_relaxed */
CPU2:
A = 0;
ACQUIRE, lock LockY
smp_mb__after_spinlock();
READ B
Question: is A = 1, B = 0 possible?
--
Manfred
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: wake_q memory ordering
2019-10-10 10:41 wake_q memory ordering Manfred Spraul
@ 2019-10-10 11:42 ` Peter Zijlstra
2019-10-10 12:13 ` Manfred Spraul
0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2019-10-10 11:42 UTC (permalink / raw)
To: Manfred Spraul
Cc: Waiman Long, Davidlohr Bueso, Linux Kernel Mailing List, 1vier1,
Paul E. McKenney
On Thu, Oct 10, 2019 at 12:41:11PM +0200, Manfred Spraul wrote:
> Hi,
>
> Waiman Long noticed that the memory barriers in sem_lock() are not really
> documented, and while adding documentation, I ended up with one case where
> I'm not certain about the wake_q code:
>
> Questions:
> - Does smp_mb__before_atomic() + a (failed) cmpxchg_relaxed provide an
> ordering guarantee?
Yep. Either the atomic instruction implies ordering (eg. x86 LOCK
prefix) or it doesn't (most RISC LL/SC), if it does,
smp_mb__{before,after}_atomic() are a NO-OP and the ordering is
unconditinoal, if it does not, then smp_mb__{before,after}_atomic() are
unconditional barriers.
IOW, the only way to get a cmpxchg without barriers on failure, is with
LL/SC, and in that case smp_mb__{before,after}_atomic() are
unconditional.
For instance, the way ARM64 does cmpxchg() is:
cmpxchg(p, o, n)
do {
v = LL(p);
if (v != o)
return v;
} while (!SC_RELEASE(p, n))
smp_mb();
return v;
And you'll note how on success the store-release constraints all prior
memory operations, and the smp_mb() constraints all later memops. But on
failure there's not a barrier to be found.
> - Is it ok that wake_up_q just writes wake_q->next, shouldn't
> smp_store_acquire() be used? I.e.: guarantee that wake_up_process()
> happens after cmpxchg_relaxed(), assuming that a failed cmpxchg_relaxed
> provides any ordering.
There is no such thing as store_acquire, it is either load_acquire or
store_release. But just like how we can write load-aquire like
load+smp_mb(), so too I suppose we could write store-acquire like
store+smp_mb(), and that is exactly what is there (through the implied
barrier of wake_up_process()).
(arguably it should've been WRITE_ONCE() I suppose)
>
> Example:
> - CPU2 never touches lock a. It is just an unrelated wake_q user that also
> wants to wake up task 1234.
> - I've noticed already that smp_store_acquire() doesn't exist.
> So smp_store_mb() is required. But from semantical point of view, we would
> need an ACQUIRE: the wake_up_process() must happen after cmpxchg().
> - May wake_up_q() rely on the spinlocks/memory barriers in try_to_wake_up,
> or should the function be safe by itself?
>
> CPU1: /current=1234, inside do_semtimedop()/
> g_wakee = current;
> current->state = TASK_INTERRUPTIBLE;
> spin_unlock(a);
>
> CPU2: / arbitrary kernel thread that uses wake_q /
> wake_q_add(&unrelated_q, 1234);
> wake_up_q(&unrelated_q);
> <...ongoing>
>
> CPU3: / do_semtimedop() + wake_up_sem_queue_prepare() /
> spin_lock(a);
> wake_q_add(,g_wakee);
> < within wake_q_add() >:
> smp_mb__before_atomic();
> if (unlikely(cmpxchg_relaxed(&node->next, NULL,
> WAKE_Q_TAIL)))
> return false; /* -> this happens */
>
> CPU2:
> <within wake_up_q>
> 1234->wake_q.next = NULL; <<<<<<<<< Ok? Is store_acquire()
> missing? >>>>>>>>>>>>
/* smp_mb(); implied by the following wake_up_process() */
> wake_up_process(1234);
> < within wake_up_process/try_to_wake_up():
> raw_spin_lock_irqsave()
> smp_mb__after_spinlock()
> if(1234->state = TASK_RUNNING) return;
> >
>
>
> rewritten:
>
> start condition: A = 1; B = 0;
>
> CPU1:
> B = 1;
> RELEASE, unlock LockX;
>
> CPU2:
> lock LockX, ACQUIRE
> if (LOAD A == 1) return; /* using cmp_xchg_relaxed */
>
> CPU2:
> A = 0;
> ACQUIRE, lock LockY
> smp_mb__after_spinlock();
> READ B
>
> Question: is A = 1, B = 0 possible?
Your example is incomplete (there is no A=1 assignment for example), but
I'm thinking I can guess where that should go given the earlier text.
I don't think this is broken.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: wake_q memory ordering
2019-10-10 11:42 ` Peter Zijlstra
@ 2019-10-10 12:13 ` Manfred Spraul
2019-10-10 12:32 ` Peter Zijlstra
0 siblings, 1 reply; 7+ messages in thread
From: Manfred Spraul @ 2019-10-10 12:13 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Waiman Long, Davidlohr Bueso, Linux Kernel Mailing List, 1vier1,
Paul E. McKenney
Hi Peter,
On 10/10/19 1:42 PM, Peter Zijlstra wrote:
> On Thu, Oct 10, 2019 at 12:41:11PM +0200, Manfred Spraul wrote:
>> Hi,
>>
>> Waiman Long noticed that the memory barriers in sem_lock() are not really
>> documented, and while adding documentation, I ended up with one case where
>> I'm not certain about the wake_q code:
>>
>> Questions:
>> - Does smp_mb__before_atomic() + a (failed) cmpxchg_relaxed provide an
>> ordering guarantee?
> Yep. Either the atomic instruction implies ordering (eg. x86 LOCK
> prefix) or it doesn't (most RISC LL/SC), if it does,
> smp_mb__{before,after}_atomic() are a NO-OP and the ordering is
> unconditinoal, if it does not, then smp_mb__{before,after}_atomic() are
> unconditional barriers.
And _relaxed() differs from "normal" cmpxchg only for LL/SC
architectures, correct?
Therefore smp_mb__{before,after}_atomic() may be combined with
cmpxchg_relaxed, to form a full memory barrier, on all archs.
[...]
>> - Is it ok that wake_up_q just writes wake_q->next, shouldn't
>> smp_store_acquire() be used? I.e.: guarantee that wake_up_process()
>> happens after cmpxchg_relaxed(), assuming that a failed cmpxchg_relaxed
>> provides any ordering.
> There is no such thing as store_acquire, it is either load_acquire or
> store_release. But just like how we can write load-aquire like
> load+smp_mb(), so too I suppose we could write store-acquire like
> store+smp_mb(), and that is exactly what is there (through the implied
> barrier of wake_up_process()).
Thanks for confirming my assumption:
The code is correct, due to the implied barrier inside wake_up_process().
[...]
>> rewritten:
>>
>> start condition: A = 1; B = 0;
>>
>> CPU1:
>> B = 1;
>> RELEASE, unlock LockX;
>>
>> CPU2:
>> lock LockX, ACQUIRE
>> if (LOAD A == 1) return; /* using cmp_xchg_relaxed */
>>
>> CPU2:
>> A = 0;
>> ACQUIRE, lock LockY
>> smp_mb__after_spinlock();
>> READ B
>>
>> Question: is A = 1, B = 0 possible?
> Your example is incomplete (there is no A=1 assignment for example), but
> I'm thinking I can guess where that should go given the earlier text.
A=1 is listed as start condition. Way before, someone did wake_q_add().
> I don't think this is broken.
Thanks.
--
Manfred
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: wake_q memory ordering
2019-10-10 12:13 ` Manfred Spraul
@ 2019-10-10 12:32 ` Peter Zijlstra
2019-10-10 19:25 ` Davidlohr Bueso
0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2019-10-10 12:32 UTC (permalink / raw)
To: Manfred Spraul
Cc: Waiman Long, Davidlohr Bueso, Linux Kernel Mailing List, 1vier1,
Paul E. McKenney
On Thu, Oct 10, 2019 at 02:13:47PM +0200, Manfred Spraul wrote:
> Hi Peter,
>
> On 10/10/19 1:42 PM, Peter Zijlstra wrote:
> > On Thu, Oct 10, 2019 at 12:41:11PM +0200, Manfred Spraul wrote:
> > > Hi,
> > >
> > > Waiman Long noticed that the memory barriers in sem_lock() are not really
> > > documented, and while adding documentation, I ended up with one case where
> > > I'm not certain about the wake_q code:
> > >
> > > Questions:
> > > - Does smp_mb__before_atomic() + a (failed) cmpxchg_relaxed provide an
> > > ordering guarantee?
> > Yep. Either the atomic instruction implies ordering (eg. x86 LOCK
> > prefix) or it doesn't (most RISC LL/SC), if it does,
> > smp_mb__{before,after}_atomic() are a NO-OP and the ordering is
> > unconditinoal, if it does not, then smp_mb__{before,after}_atomic() are
> > unconditional barriers.
>
> And _relaxed() differs from "normal" cmpxchg only for LL/SC architectures,
> correct?
Indeed.
> Therefore smp_mb__{before,after}_atomic() may be combined with
> cmpxchg_relaxed, to form a full memory barrier, on all archs.
Just so.
> > > - Is it ok that wake_up_q just writes wake_q->next, shouldn't
> > > smp_store_acquire() be used? I.e.: guarantee that wake_up_process()
> > > happens after cmpxchg_relaxed(), assuming that a failed cmpxchg_relaxed
> > > provides any ordering.
> > There is no such thing as store_acquire, it is either load_acquire or
> > store_release. But just like how we can write load-aquire like
> > load+smp_mb(), so too I suppose we could write store-acquire like
> > store+smp_mb(), and that is exactly what is there (through the implied
> > barrier of wake_up_process()).
>
> Thanks for confirming my assumption:
> The code is correct, due to the implied barrier inside wake_up_process().
It has a comment there, trying to state this.
task->wake_q.next = NULL;
/*
* wake_up_process() executes a full barrier, which pairs with
* the queueing in wake_q_add() so as not to miss wakeups.
*/
wake_up_process(task);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: wake_q memory ordering
2019-10-10 12:32 ` Peter Zijlstra
@ 2019-10-10 19:25 ` Davidlohr Bueso
2019-10-11 8:57 ` Manfred Spraul
0 siblings, 1 reply; 7+ messages in thread
From: Davidlohr Bueso @ 2019-10-10 19:25 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Manfred Spraul, Waiman Long, Linux Kernel Mailing List, 1vier1,
Paul E. McKenney
On Thu, 10 Oct 2019, Peter Zijlstra wrote:
>On Thu, Oct 10, 2019 at 02:13:47PM +0200, Manfred Spraul wrote:
>> Hi Peter,
>>
>> On 10/10/19 1:42 PM, Peter Zijlstra wrote:
>> > On Thu, Oct 10, 2019 at 12:41:11PM +0200, Manfred Spraul wrote:
>> > > Hi,
>> > >
>> > > Waiman Long noticed that the memory barriers in sem_lock() are not really
>> > > documented, and while adding documentation, I ended up with one case where
>> > > I'm not certain about the wake_q code:
>> > >
>> > > Questions:
>> > > - Does smp_mb__before_atomic() + a (failed) cmpxchg_relaxed provide an
>> > > ordering guarantee?
>> > Yep. Either the atomic instruction implies ordering (eg. x86 LOCK
>> > prefix) or it doesn't (most RISC LL/SC), if it does,
>> > smp_mb__{before,after}_atomic() are a NO-OP and the ordering is
>> > unconditinoal, if it does not, then smp_mb__{before,after}_atomic() are
>> > unconditional barriers.
>>
>> And _relaxed() differs from "normal" cmpxchg only for LL/SC architectures,
>> correct?
>
>Indeed.
>
>> Therefore smp_mb__{before,after}_atomic() may be combined with
>> cmpxchg_relaxed, to form a full memory barrier, on all archs.
>
>Just so.
We might want something like this?
----8<---------------------------------------------------------
From: Davidlohr Bueso <dave@stgolabs.net>
Subject: [PATCH] Documentation/memory-barriers.txt: Mention smp_mb__{before,after}_atomic() and CAS
Explicitly mention possible usages to guarantee serialization even upon
failed cmpxchg (or similar) calls along with smp_mb__{before,after}_atomic().
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
Documentation/memory-barriers.txt | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 1adbb8a371c7..5d2873d4b442 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1890,6 +1890,18 @@ There are some more advanced barrier functions:
This makes sure that the death mark on the object is perceived to be set
*before* the reference counter is decremented.
+ Similarly, these barriers can be used to guarantee serialization for atomic
+ RMW calls on architectures which may not imply memory barriers upon failure.
+
+ obj->next = NULL;
+ smp_mb__before_atomic()
+ if (cmpxchg(&obj->ptr, NULL, val))
+ return;
+
+ This makes sure that the store to the next pointer always has smp_store_mb()
+ semantics. As such, smp_mb__{before,after}_atomic() calls allow optimizing
+ the barrier usage by finer grained serialization.
+
See Documentation/atomic_{t,bitops}.txt for more information.
--
2.16.4
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: wake_q memory ordering
2019-10-10 19:25 ` Davidlohr Bueso
@ 2019-10-11 8:57 ` Manfred Spraul
2019-10-11 15:46 ` Davidlohr Bueso
0 siblings, 1 reply; 7+ messages in thread
From: Manfred Spraul @ 2019-10-11 8:57 UTC (permalink / raw)
To: Davidlohr Bueso, Peter Zijlstra
Cc: Waiman Long, Linux Kernel Mailing List, 1vier1, Paul E. McKenney
[-- Attachment #1: Type: text/plain, Size: 2366 bytes --]
Hi Davidlohr,
On 10/10/19 9:25 PM, Davidlohr Bueso wrote:
> On Thu, 10 Oct 2019, Peter Zijlstra wrote:
>
>> On Thu, Oct 10, 2019 at 02:13:47PM +0200, Manfred Spraul wrote:
>>
>>> Therefore smp_mb__{before,after}_atomic() may be combined with
>>> cmpxchg_relaxed, to form a full memory barrier, on all archs.
>>
>> Just so.
>
> We might want something like this?
>
> ----8<---------------------------------------------------------
>
> From: Davidlohr Bueso <dave@stgolabs.net>
> Subject: [PATCH] Documentation/memory-barriers.txt: Mention
> smp_mb__{before,after}_atomic() and CAS
>
> Explicitly mention possible usages to guarantee serialization even upon
> failed cmpxchg (or similar) calls along with
> smp_mb__{before,after}_atomic().
>
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> ---
> Documentation/memory-barriers.txt | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/Documentation/memory-barriers.txt
> b/Documentation/memory-barriers.txt
> index 1adbb8a371c7..5d2873d4b442 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1890,6 +1890,18 @@ There are some more advanced barrier functions:
> This makes sure that the death mark on the object is perceived to
> be set
> *before* the reference counter is decremented.
>
> + Similarly, these barriers can be used to guarantee serialization
> for atomic
> + RMW calls on architectures which may not imply memory barriers
> upon failure.
> +
> + obj->next = NULL;
> + smp_mb__before_atomic()
> + if (cmpxchg(&obj->ptr, NULL, val))
> + return;
> +
> + This makes sure that the store to the next pointer always has
> smp_store_mb()
> + semantics. As such, smp_mb__{before,after}_atomic() calls allow
> optimizing
> + the barrier usage by finer grained serialization.
> +
> See Documentation/atomic_{t,bitops}.txt for more information.
>
>
I don't know. The new documentation would not have answered my question
(is it ok to combine smp_mb__before_atomic() with atomic_relaxed()?).
And it copies content already present in atomic_t.txt.
Thus: I would prefer if the first sentence of the paragraph is replaced:
The list of operations should end with "...", and it should match what
is in atomic_t.txt
Ok?
--
Manfred
[-- Attachment #2: 0004-Documentation-memory-barriers.txt-Clarify-cmpxchg.patch --]
[-- Type: text/x-patch, Size: 2495 bytes --]
From 8df60211228042672ba0cd89c3566c5145e8b203 Mon Sep 17 00:00:00 2001
From: Manfred Spraul <manfred@colorfullife.com>
Date: Fri, 11 Oct 2019 10:33:26 +0200
Subject: [PATCH 4/4] Documentation/memory-barriers.txt: Clarify cmpxchg()
The documentation in memory-barriers.txt claims that
smp_mb__{before,after}_atomic() are for atomic ops that do not return a
value.
This is misleading and doesn't match the example in atomic_t.txt,
and e.g. smp_mb__before_atomic() may and is used together with
cmpxchg_relaxed() in the wake_q code.
The purpose of e.g. smp_mb__before_atomic() is to "upgrade" a following
RMW atomic operation to a full memory barrier.
The return code of the atomic operation has no impact, so all of the
following examples are valid:
1)
smp_mb__before_atomic();
atomic_add();
2)
smp_mb__before_atomic();
atomic_xchg_relaxed();
3)
smp_mb__before_atomic();
atomic_fetch_add_relaxed();
Invalid would be:
smp_mb__before_atomic();
atomic_set();
Signed-off-by: Manfred Spraul <manfred@colorfullife.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Peter Zijlstra <peterz@infradead.org>
---
Documentation/memory-barriers.txt | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 1adbb8a371c7..52076b057400 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1873,12 +1873,13 @@ There are some more advanced barrier functions:
(*) smp_mb__before_atomic();
(*) smp_mb__after_atomic();
- These are for use with atomic (such as add, subtract, increment and
- decrement) functions that don't return a value, especially when used for
- reference counting. These functions do not imply memory barriers.
+ These are for use with atomic RMW functions (such as add, subtract,
+ increment, decrement, failed conditional operations, ...) that do
+ not imply memory barriers, but where the code needs a memory barrier,
+ for example when used for reference counting.
- These are also used for atomic bitop functions that do not return a
- value (such as set_bit and clear_bit).
+ These are also used for atomic RMW bitop functions that do imply a full
+ memory barrier (such as set_bit and clear_bit).
As an example, consider a piece of code that marks an object as being dead
and then decrements the object's reference count:
--
2.21.0
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: wake_q memory ordering
2019-10-11 8:57 ` Manfred Spraul
@ 2019-10-11 15:46 ` Davidlohr Bueso
0 siblings, 0 replies; 7+ messages in thread
From: Davidlohr Bueso @ 2019-10-11 15:46 UTC (permalink / raw)
To: Manfred Spraul
Cc: Peter Zijlstra, Waiman Long, Linux Kernel Mailing List, 1vier1,
Paul E. McKenney
On Fri, 11 Oct 2019, Manfred Spraul wrote:
>I don't know. The new documentation would not have answered my
>question (is it ok to combine smp_mb__before_atomic() with
>atomic_relaxed()?). And it copies content already present in
>atomic_t.txt.
Well, the _relaxed (and release/acquire) extentions refer to a
_successful_ operation (LL/SC), and whether it has barriers on
each of the sides before and after. I thought you were mainly
worried about the failed CAS scenario, not the relaxed itself.
I don't know how this copies content from atomic_t.txt, at no
point does it talk about failed CAS.
>
>Thus: I would prefer if the first sentence of the paragraph is
>replaced: The list of operations should end with "...", and it should
>match what is in atomic_t.txt
I'll see about combining some of your changes in patch 5/5 of
your new series, but have to say I still prefer my documentation
change.
Thanks,
Davidlohr
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-10-11 15:47 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-10 10:41 wake_q memory ordering Manfred Spraul
2019-10-10 11:42 ` Peter Zijlstra
2019-10-10 12:13 ` Manfred Spraul
2019-10-10 12:32 ` Peter Zijlstra
2019-10-10 19:25 ` Davidlohr Bueso
2019-10-11 8:57 ` Manfred Spraul
2019-10-11 15:46 ` Davidlohr Bueso
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).