* wake_up_process implied memory barrier clarification @ 2015-08-27 12:27 Michal Hocko 2015-08-27 12:43 ` Peter Zijlstra 0 siblings, 1 reply; 20+ messages in thread From: Michal Hocko @ 2015-08-27 12:27 UTC (permalink / raw) To: LKML; +Cc: Oleg Nesterov, Peter Zijlstra, David Howells, Linus Torvalds Hi, I have just stumbled over the comment above wake_up_process which claims: " * It may be assumed that this function implies a write memory barrier before * changing the task state if and only if any tasks are woken up. " but try_to_wake_up does smp_mb__before_spinlock and did smp_wmb since 04e2f1741d235 unconditionally. The comment was added when the smp_wmb was in place already so I am wondering whether the comment is wrong/misleading. Could somebody clarify please? -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: wake_up_process implied memory barrier clarification 2015-08-27 12:27 wake_up_process implied memory barrier clarification Michal Hocko @ 2015-08-27 12:43 ` Peter Zijlstra 2015-08-27 13:14 ` Michal Hocko 0 siblings, 1 reply; 20+ messages in thread From: Peter Zijlstra @ 2015-08-27 12:43 UTC (permalink / raw) To: Michal Hocko; +Cc: LKML, Oleg Nesterov, David Howells, Linus Torvalds On Thu, Aug 27, 2015 at 02:27:27PM +0200, Michal Hocko wrote: > Hi, > I have just stumbled over the comment above wake_up_process which > claims: > " > * It may be assumed that this function implies a write memory barrier before > * changing the task state if and only if any tasks are woken up. > " > > but try_to_wake_up does smp_mb__before_spinlock and did smp_wmb > since 04e2f1741d235 unconditionally. The comment was added when the > smp_wmb was in place already so I am wondering whether the comment is > wrong/misleading. > > Could somebody clarify please? Its true for wake_up(), since that bails early if the waitqueue list is empty. I suspect there was no exception made for wake_up_process() to simplify the rules. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: wake_up_process implied memory barrier clarification 2015-08-27 12:43 ` Peter Zijlstra @ 2015-08-27 13:14 ` Michal Hocko 2015-08-27 18:26 ` Oleg Nesterov 0 siblings, 1 reply; 20+ messages in thread From: Michal Hocko @ 2015-08-27 13:14 UTC (permalink / raw) To: Peter Zijlstra; +Cc: LKML, Oleg Nesterov, David Howells, Linus Torvalds On Thu 27-08-15 14:43:34, Peter Zijlstra wrote: > On Thu, Aug 27, 2015 at 02:27:27PM +0200, Michal Hocko wrote: > > Hi, > > I have just stumbled over the comment above wake_up_process which > > claims: > > " > > * It may be assumed that this function implies a write memory barrier before > > * changing the task state if and only if any tasks are woken up. > > " > > > > but try_to_wake_up does smp_mb__before_spinlock and did smp_wmb > > since 04e2f1741d235 unconditionally. The comment was added when the > > smp_wmb was in place already so I am wondering whether the comment is > > wrong/misleading. > > > > Could somebody clarify please? > > Its true for wake_up(), since that bails early if the waitqueue list is > empty. > > I suspect there was no exception made for wake_up_process() to simplify > the rules. Thanks for the confirmation. Shouldn't we rather change the documentation because this is clearly misleading and confusing. --- >From b70d9a384cfd018e686c0aca06e830f564a34dd9 Mon Sep 17 00:00:00 2001 From: Michal Hocko <mhocko@suse.com> Date: Thu, 27 Aug 2015 15:10:55 +0200 Subject: [PATCH] sched: Clarigy wake_up_process memory barrier semantic wake_up_process unlike other wake up primitives based on __wake_up implies the write memory barrier unconditionally because it relies on try_to_wake_up directly. Clarify this in the function comment and memory-barriers.txt because the current doc is quite misleading. Signed-off-by: Michal Hocko <mhocko@suse.com> --- Documentation/memory-barriers.txt | 3 +++ kernel/sched/core.c | 3 +-- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index 13feb697271f..c4f180caf0ff 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -2031,6 +2031,9 @@ something up. The barrier occurs before the task state is cleared, and so sits <general barrier> STORE current->state LOAD event_indicated +Please note that wake_up_process is an exception here because it implies +the write memory barrier unconditionally. + To repeat, this write memory barrier is present if and only if something is actually awakened. To see this, consider the following sequence of events, where X and Y are both initially zero: diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 78b4bad10081..39583b76ad2c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1967,8 +1967,7 @@ static void try_to_wake_up_local(struct task_struct *p) * * Return: 1 if the process was woken up, 0 if it was already running. * - * It may be assumed that this function implies a write memory barrier before - * changing the task state if and only if any tasks are woken up. + * It may be assumed that this function implies a write memory barrier. */ int wake_up_process(struct task_struct *p) { -- 2.5.0 -- Michal Hocko SUSE Labs ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: wake_up_process implied memory barrier clarification 2015-08-27 13:14 ` Michal Hocko @ 2015-08-27 18:26 ` Oleg Nesterov 2015-08-28 14:51 ` Michal Hocko 0 siblings, 1 reply; 20+ messages in thread From: Oleg Nesterov @ 2015-08-27 18:26 UTC (permalink / raw) To: Michal Hocko; +Cc: Peter Zijlstra, LKML, David Howells, Linus Torvalds On 08/27, Michal Hocko wrote: > > --- a/Documentation/memory-barriers.txt > +++ b/Documentation/memory-barriers.txt > @@ -2031,6 +2031,9 @@ something up. The barrier occurs before the task state is cleared, and so sits > <general barrier> STORE current->state > LOAD event_indicated > > +Please note that wake_up_process is an exception here because it implies > +the write memory barrier unconditionally. > + I simply can't understand (can't even parse) this part of memory-barriers.txt. > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -1967,8 +1967,7 @@ static void try_to_wake_up_local(struct task_struct *p) > * > * Return: 1 if the process was woken up, 0 if it was already running. > * > - * It may be assumed that this function implies a write memory barrier before > - * changing the task state if and only if any tasks are woken up. > + * It may be assumed that this function implies a write memory barrier. > */ I won't argue, technically this is correct of course. And I agree that the old comment is misleading. But the new comment looks as if it is fine to avoid wmb() if you do wake_up_process(). Say, void w(void) { A = 1; wake_up_process(something_unrelated); // we know that it implies wmb(). B = 1; } void r(void) { int a, b; b = B; rmb(); a = A; BUG_ON(b && !a); } Perhaps this part of the comment should be simply removed, the unconditional wmb() in ttwu() is just implementation detail. And note that initially the documented behaviour of smp_mb__before_spinlock() was only the STORE - LOAD serialization. Then people noticed that it actually does wmb() and started to rely on this fact. To me, this comment should just explain that this function implies a barrier but only in a sense that you do not need another one after CONDITION = T and before wake_up_process(). Oleg. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: wake_up_process implied memory barrier clarification 2015-08-27 18:26 ` Oleg Nesterov @ 2015-08-28 14:51 ` Michal Hocko 2015-08-28 16:06 ` Oleg Nesterov 0 siblings, 1 reply; 20+ messages in thread From: Michal Hocko @ 2015-08-28 14:51 UTC (permalink / raw) To: Oleg Nesterov; +Cc: Peter Zijlstra, LKML, David Howells, Linus Torvalds On Thu 27-08-15 20:26:54, Oleg Nesterov wrote: > On 08/27, Michal Hocko wrote: > > > > --- a/Documentation/memory-barriers.txt > > +++ b/Documentation/memory-barriers.txt > > @@ -2031,6 +2031,9 @@ something up. The barrier occurs before the task state is cleared, and so sits > > <general barrier> STORE current->state > > LOAD event_indicated > > > > +Please note that wake_up_process is an exception here because it implies > > +the write memory barrier unconditionally. > > + > > I simply can't understand (can't even parse) this part of memory-barriers.txt. Do you mean the added text or the example above it? > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -1967,8 +1967,7 @@ static void try_to_wake_up_local(struct task_struct *p) > > * > > * Return: 1 if the process was woken up, 0 if it was already running. > > * > > - * It may be assumed that this function implies a write memory barrier before > > - * changing the task state if and only if any tasks are woken up. > > + * It may be assumed that this function implies a write memory barrier. > > */ > > I won't argue, technically this is correct of course. And I agree that > the old comment is misleading. Well the reason I've noticed is the following race in the scsi code CPU0 CPU1 scsi_error_handler scsi_host_dev_release kthread_stop() while (!kthread_should_stop()) { set_bit(KTHREAD_SHOULD_STOP) wake_up_process() wait_for_completion() set_current_state(TASK_INTERRUPTIBLE) schedule() [...] } I have read the comment for wake_up_process and was wondering that moving set_current_state before kthread_should_stop wouldn't be enough because the the task at CPU0 might be TASK_RUNNIG and so wake_up_process wouldn't wake up it and the missing write barrier could lead to a missed KTHREAD_SHOULD_STOP. A look into ttwu made my worry void. > But the new comment looks as if it is fine to avoid wmb() if you do > wake_up_process(). Say, > > void w(void) > { > A = 1; > wake_up_process(something_unrelated); > // we know that it implies wmb(). > B = 1; > } > > void r(void) > { > int a, b; > > b = B; > rmb(); > a = A; > > BUG_ON(b && !a); > } > > Perhaps this part of the comment should be simply removed, the unconditional > wmb() in ttwu() is just implementation detail. And note that initially the > documented behaviour of smp_mb__before_spinlock() was only the STORE - LOAD > serialization. Then people noticed that it actually does wmb() and started > to rely on this fact. > > To me, this comment should just explain that this function implies a barrier > but only in a sense that you do not need another one after CONDITION = T and > before wake_up_process(). I have no objection against more precise wording here but what we have is just misleading. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: wake_up_process implied memory barrier clarification 2015-08-28 14:51 ` Michal Hocko @ 2015-08-28 16:06 ` Oleg Nesterov 2015-08-29 9:25 ` Boqun Feng 0 siblings, 1 reply; 20+ messages in thread From: Oleg Nesterov @ 2015-08-28 16:06 UTC (permalink / raw) To: Michal Hocko; +Cc: Peter Zijlstra, LKML, David Howells, Linus Torvalds On 08/28, Michal Hocko wrote: > > On Thu 27-08-15 20:26:54, Oleg Nesterov wrote: > > On 08/27, Michal Hocko wrote: > > > > > > --- a/Documentation/memory-barriers.txt > > > +++ b/Documentation/memory-barriers.txt > > > @@ -2031,6 +2031,9 @@ something up. The barrier occurs before the task state is cleared, and so sits > > > <general barrier> STORE current->state > > > LOAD event_indicated > > > > > > +Please note that wake_up_process is an exception here because it implies > > > +the write memory barrier unconditionally. > > > + > > > > I simply can't understand (can't even parse) this part of memory-barriers.txt. > > Do you mean the added text or the example above it? Both ;) but note that "load from X might see 0" is true of course, and in this sense wake_up_process() is not exception: X = 1; wmb(); // unless I am totally confused this just adds more confusion Y = 1; wake_up_process(TASK); vs TASK doing for (;;) { set_current_state(...); if (Y) break; schedule(); } BUG_ON(X == 0) is not correct, it can actually can hit the BUG_ON() above. However, if wake_up_process() actually wakes a sleeping TASK up, then it should also see X = 1. Even without wmb(), even if we do Y = 1; X = 1; wake_up_process(TASK); > > > --- a/kernel/sched/core.c > > > +++ b/kernel/sched/core.c > > > @@ -1967,8 +1967,7 @@ static void try_to_wake_up_local(struct task_struct *p) > > > * > > > * Return: 1 if the process was woken up, 0 if it was already running. > > > * > > > - * It may be assumed that this function implies a write memory barrier before > > > - * changing the task state if and only if any tasks are woken up. > > > + * It may be assumed that this function implies a write memory barrier. > > > */ > > > > I won't argue, technically this is correct of course. And I agree that > > the old comment is misleading. > > Well the reason I've noticed is the following race in the scsi code > CPU0 CPU1 > scsi_error_handler scsi_host_dev_release > kthread_stop() > while (!kthread_should_stop()) { > set_bit(KTHREAD_SHOULD_STOP) > wake_up_process() > wait_for_completion() > > set_current_state(TASK_INTERRUPTIBLE) > schedule() Heh. This looks like a common mistake. See fecdf8be2d91e04b0a9a4f79ff06499 ;) But I believe this is another thing. > I have read the comment for wake_up_process and was wondering that > moving set_current_state before kthread_should_stop wouldn't be enough > because the the task at CPU0 might be TASK_RUNNIG and so wake_up_process > wouldn't wake up it and the missing write barrier could lead to a missed > KTHREAD_SHOULD_STOP. And that is why try_to_wake_up()->smp_mb__before_spinlock() needs to serialize STORE(CONDITION) and the subsequent LOAD(p->state). The fact that it actually does wmb() is just implementation detail, that is what I tried to say. > > To me, this comment should just explain that this function implies a barrier > > but only in a sense that you do not need another one after CONDITION = T and > > before wake_up_process(). > > I have no objection against more precise wording here but what we have is just > misleading. Yes, yes, I agree. Just I do not know what exactly it should document. Oleg. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: wake_up_process implied memory barrier clarification 2015-08-28 16:06 ` Oleg Nesterov @ 2015-08-29 9:25 ` Boqun Feng 2015-08-29 14:27 ` Oleg Nesterov 0 siblings, 1 reply; 20+ messages in thread From: Boqun Feng @ 2015-08-29 9:25 UTC (permalink / raw) To: Oleg Nesterov Cc: Michal Hocko, Peter Zijlstra, LKML, David Howells, Linus Torvalds, Paul E. McKenney, Jonathan Corbet Hi Oleg On Fri, Aug 28, 2015 at 06:06:37PM +0200, Oleg Nesterov wrote: > On 08/28, Michal Hocko wrote: > > > > On Thu 27-08-15 20:26:54, Oleg Nesterov wrote: > > > On 08/27, Michal Hocko wrote: > > > > > > > > --- a/Documentation/memory-barriers.txt > > > > +++ b/Documentation/memory-barriers.txt > > > > @@ -2031,6 +2031,9 @@ something up. The barrier occurs before the task state is cleared, and so sits > > > > <general barrier> STORE current->state > > > > LOAD event_indicated > > > > > > > > +Please note that wake_up_process is an exception here because it implies > > > > +the write memory barrier unconditionally. > > > > + > > > > > > I simply can't understand (can't even parse) this part of memory-barriers.txt. > > > > Do you mean the added text or the example above it? > > Both ;) > > but note that "load from X might see 0" is true of course, and in this By this, I think you actually means the example below the added text, i.e. the example for "to repeat..", right? I think that's not a good example, and actually that example explains that the barriers in -wait_event()- rather than in -wake_up()- are conditional. I wrote a patch to make things more clear, hope that helps. (Add Paul and Jonathan to CCed) Regards, Boqun > sense wake_up_process() is not exception: > > X = 1; > wmb(); // unless I am totally confused this just adds more confusion > Y = 1; > wake_up_process(TASK); > > vs TASK doing > > for (;;) { > set_current_state(...); > if (Y) > break; > schedule(); > } > > BUG_ON(X == 0) > > is not correct, it can actually can hit the BUG_ON() above. However, if > wake_up_process() actually wakes a sleeping TASK up, then it should also > see X = 1. Even without wmb(), even if we do > > Y = 1; > X = 1; > wake_up_process(TASK); > ----------->8 Subject: Documentation: call out conditional barriers of wait_*() and wake_up*() The memory barriers in some sleep and wakeup functions are conditional, there are several situations that there is no barriers: 1. If wait_event() and co. actually don't prepare to sleep, there may be no barrier in them. 2. No matter whether a sleep occurs or not, there may be no barrier between a successful wait-condition checking(the result of which is true) in wait_event() and the following instructions. 3. If wake_up() and co. actually wake up no one, there may be no write barrier in them. However, the current version of memory-barriers.txt doesn't call these out. Further more, the example which wanted to explain that write barriers in wake_up*() are conditional fails its job. To see that, consider a similar example: CPU 1 CPU 2 =============================== =============================== X = 1; smp_mb(); Y = 1; wait_event(wq, Y == 1); load from Y sees 1, no memory barrier smp_wmb(); wake_up(); load from X might see 0 CPU 2's load from X might still be 0 even if we add a write barrier explicitly, which means that even if wake_up() guarantees a write barrier, the original example still doesn't have the desired order guarantee. In fact, the original example can explains the conditionality of barriers in wait_*() rather than wake_up*(). This patch makes things clear, by calling out that the barriers in wait_*() and wake_up*() are conditional and giving exact examples to explain that. Signed-off-by: Boqun Feng <boqun.feng@gmail.com> --- Documentation/memory-barriers.txt | 81 +++++++++++++++++++++++++++++++-------- 1 file changed, 65 insertions(+), 16 deletions(-) diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt index eafa6a5..df69841 100644 --- a/Documentation/memory-barriers.txt +++ b/Documentation/memory-barriers.txt @@ -1975,7 +1975,8 @@ set_current_state() may be wrapped by: which therefore also imply a general memory barrier after setting the state. The whole sequence above is available in various canned forms, all of which -interpolate the memory barrier in the right place: +imply a general barrier if and only if a sleep is at least about to happen, +i.e. prepare_to_wait*() is called. wait_event(); wait_event_interruptible(); @@ -1986,6 +1987,9 @@ interpolate the memory barrier in the right place: wait_on_bit(); wait_on_bit_lock(); +Further more, no barrier is guaranteed after the successful wait condition +checkings, whose results are true, in wait_*() and before the instructions +following wait_*(). Secondly, code that performs a wake up normally follows something like this: @@ -2009,21 +2013,6 @@ between the STORE to indicate the event and the STORE to set TASK_RUNNING: <general barrier> STORE current->state LOAD event_indicated -To repeat, this write memory barrier is present if and only if something -is actually awakened. To see this, consider the following sequence of -events, where X and Y are both initially zero: - - CPU 1 CPU 2 - =============================== =============================== - X = 1; STORE event_indicated - smp_mb(); wake_up(); - Y = 1; wait_event(wq, Y == 1); - wake_up(); load from Y sees 1, no memory barrier - load from X might see 0 - -In contrast, if a wakeup does occur, CPU 2's load from X would be guaranteed -to see 1. - The available waker functions include: complete(); @@ -2042,6 +2031,66 @@ The available waker functions include: wake_up_poll(); wake_up_process(); +To repeat, barriers in wait_event(), wake_up() and co. are conditional, meaning +they are present if and only if a sleep or a wakeup actually occurs. To see +this, consider the following three examples. + +The first example is for the conditional barriers in wait_*(), say X and Y +are both initially zero: + + CPU 1 CPU 2 + =============================== =============================== + X = 1; + smp_mb(); + wait_event(wq, Y == 1); Y = 1; + load from Y sees 1 wake_up(); + <no memory barrier> + load from X might see 0 + +And even if a sleep and a wakeup really occurs, there might be no barrier +between the last load from Y(which the makes wait condition checking succeed) +and load from X, so that load from X might still see 0. The second example shows +that, the code running on CPU 1 & 2 is the same with the first example, however +the sequence of events are different, say X, Y and Z are all initially zero, +and another task may be waiting in wait_event(wq, Z == 1) on CPU 4: + + CPU 1 CPU 2 CPU 3 + =============================== =============================== =============================== + wait_event(wq, Y == 1); Z = 1; + load from Y sees 0 + prepare_to_wait_event(); + <general barrier> + schedule(); + <general barrier> + wake_up(); + prepare_to_wait_event(); + <general barrier> + X = 1; + smp_mb(); + load from Y sees 1 Y = 1; + <no memory barrier> wake_up(); + load from X might see 0 + +So that, to guarantee that CPU 1's load from X is 1 in the two examples above, +a read barrier must be added after wait_event() in the code running on CPU 1. + +The third example is for the conditional write barriers in wake_up*(), say +X, Y and Z are all initially zero: + + CPU 1 CPU 2 + =============================== =============================== + X = 1; + wait_event(wq, Y == 1); Y = 1; + load from Y sees 1 wake_up(); + <no memory barrier> + Z = 1; + load from Z sees 1 + smp_rmb(); + load from X might see 0 + +In contrast, if a wakeup does occur, CPU 1's load from X would be guaranteed +to see 1. To guarantee that CPU 1's load from X is 1, a write barrier must be +added between store to X and store to Z in the code running on CPU 2. [!] Note that the memory barriers implied by the sleeper and the waker do _not_ order multiple stores before the wake-up with respect to loads of those stored ^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: wake_up_process implied memory barrier clarification 2015-08-29 9:25 ` Boqun Feng @ 2015-08-29 14:27 ` Oleg Nesterov 2015-08-31 0:37 ` Boqun Feng 0 siblings, 1 reply; 20+ messages in thread From: Oleg Nesterov @ 2015-08-29 14:27 UTC (permalink / raw) To: Boqun Feng Cc: Michal Hocko, Peter Zijlstra, LKML, David Howells, Linus Torvalds, Paul E. McKenney, Jonathan Corbet Hello Boqun, On 08/29, Boqun Feng wrote: > Hi Oleg > > On Fri, Aug 28, 2015 at 06:06:37PM +0200, Oleg Nesterov wrote: > > On 08/28, Michal Hocko wrote: > > > > > > On Thu 27-08-15 20:26:54, Oleg Nesterov wrote: > > > > On 08/27, Michal Hocko wrote: > > > > > > > > > > --- a/Documentation/memory-barriers.txt > > > > > +++ b/Documentation/memory-barriers.txt > > > > > @@ -2031,6 +2031,9 @@ something up. The barrier occurs before the task state is cleared, and so sits > > > > > <general barrier> STORE current->state > > > > > LOAD event_indicated > > > > > > > > > > +Please note that wake_up_process is an exception here because it implies > > > > > +the write memory barrier unconditionally. > > > > > + > > > > > > > > I simply can't understand (can't even parse) this part of memory-barriers.txt. > > > > > > Do you mean the added text or the example above it? > > > > Both ;) > > > > but note that "load from X might see 0" is true of course, and in this > > By this, I think you actually means the example below the added text, > i.e. the example for "to repeat..", right? And above. Even The barrier occurs before the task state is cleared is not actually right. This is misleading. What is really important is that we have a barrier before we _read_ the task state. And again, again, the fact that we actually have the write barrier is just the implementation detail. > Subject: Documentation: call out conditional barriers of wait_*() and wake_up*() > > The memory barriers in some sleep and wakeup functions are conditional, > there are several situations that there is no barriers: > > 1. If wait_event() and co. actually don't prepare to sleep, there > may be no barrier in them. And thus (imo) we should not even try to document this. I mean, a user should not make any assumptions about the barriers inside wait_event(). > 2. No matter whether a sleep occurs or not, there may be no barrier > between a successful wait-condition checking(the result of which > is true) in wait_event() and the following instructions. Yes, this is true in any case. Not sure this deserves addtionional documentation, but see below. > 3. If wake_up() and co. actually wake up no one, there may be no > write barrier in them. See above. > --- a/Documentation/memory-barriers.txt > +++ b/Documentation/memory-barriers.txt > @@ -1975,7 +1975,8 @@ set_current_state() may be wrapped by: > > which therefore also imply a general memory barrier after setting the state. > The whole sequence above is available in various canned forms, all of which > -interpolate the memory barrier in the right place: > +imply a general barrier if and only if a sleep is at least about to happen, > +i.e. prepare_to_wait*() is called. > > wait_event(); > wait_event_interruptible(); > @@ -1986,6 +1987,9 @@ interpolate the memory barrier in the right place: > wait_on_bit(); > wait_on_bit_lock(); > > +Further more, no barrier is guaranteed after the successful wait condition > +checkings, whose results are true, in wait_*() and before the instructions > +following wait_*(). Yes, perhaps this makes sense, but (to me) only because the explanation above looks a bit confusing to me. I simply can't understand why memory-barriers.txt mentions that barrier implied by set_current_state(). You need to know this if you want to understand how wait_event() works. You do not need to know about this barrier if you want to use wait_event(). If nothing else, because you can never rely on this barrier. Anf your examples below shows this. > +To repeat, barriers in wait_event(), wake_up() and co. are conditional, meaning > +they are present if and only if a sleep or a wakeup actually occurs. To see > +this, consider the following three examples. > + > +The first example is for the conditional barriers in wait_*(), say X and Y > +are both initially zero: > + > + CPU 1 CPU 2 > + =============================== =============================== > + X = 1; > + smp_mb(); > + wait_event(wq, Y == 1); Y = 1; > + load from Y sees 1 wake_up(); > + <no memory barrier> > + load from X might see 0 > + > +And even if a sleep and a wakeup really occurs, there might be no barrier > +between the last load from Y(which the makes wait condition checking succeed) > +and load from X, so that load from X might still see 0. ... > +So that, to guarantee that CPU 1's load from X is 1 in the two examples above, > +a read barrier must be added after wait_event() in the code running on CPU 1. Yes. So why should we try to document that barrier in wait_event() ? > +The third example is for the conditional write barriers in wake_up*(), say > +X, Y and Z are all initially zero: > + > + CPU 1 CPU 2 > + =============================== =============================== > + X = 1; > + wait_event(wq, Y == 1); Y = 1; > + load from Y sees 1 wake_up(); > + <no memory barrier> > + Z = 1; > + load from Z sees 1 > + smp_rmb(); > + load from X might see 0 > + > +In contrast, if a wakeup does occur, CPU 1's load from X would be guaranteed > +to see 1. To guarantee that CPU 1's load from X is 1, a write barrier must be > +added between store to X and store to Z in the code running on CPU 2. The same. Why should we mention that barrier in wake_up() if we can't rely on it and CPU 2 needs a write barrier anyway? Oleg. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: wake_up_process implied memory barrier clarification 2015-08-29 14:27 ` Oleg Nesterov @ 2015-08-31 0:37 ` Boqun Feng 2015-08-31 18:33 ` Oleg Nesterov 0 siblings, 1 reply; 20+ messages in thread From: Boqun Feng @ 2015-08-31 0:37 UTC (permalink / raw) To: Oleg Nesterov Cc: Michal Hocko, Peter Zijlstra, LKML, David Howells, Linus Torvalds, Paul E. McKenney, Jonathan Corbet [-- Attachment #1: Type: text/plain, Size: 3384 bytes --] Hi Oleg, On Sat, Aug 29, 2015 at 04:27:07PM +0200, Oleg Nesterov wrote: > Hello Boqun, > ... > > By this, I think you actually means the example below the added text, > > i.e. the example for "to repeat..", right? > > And above. Even > > The barrier occurs before the task state is cleared > > is not actually right. This is misleading. What is really important is that > we have a barrier before we _read_ the task state. And again, again, the > fact that we actually have the write barrier is just the implementation > detail. > Yes, agree with you. However, I didn't realize this problem before, I will see what I can do ;-) > > Subject: Documentation: call out conditional barriers of wait_*() and wake_up*() > > > > The memory barriers in some sleep and wakeup functions are conditional, > > there are several situations that there is no barriers: > > > > 1. If wait_event() and co. actually don't prepare to sleep, there > > may be no barrier in them. > > And thus (imo) we should not even try to document this. I mean, a user > should not make any assumptions about the barriers inside wait_event(). > Yep. > > 2. No matter whether a sleep occurs or not, there may be no barrier > > between a successful wait-condition checking(the result of which > > is true) in wait_event() and the following instructions. > > Yes, this is true in any case. Not sure this deserves addtionional > documentation, but see below. > > > 3. If wake_up() and co. actually wake up no one, there may be no > > write barrier in them. > > See above. > > > --- a/Documentation/memory-barriers.txt > > +++ b/Documentation/memory-barriers.txt > > @@ -1975,7 +1975,8 @@ set_current_state() may be wrapped by: > > > > which therefore also imply a general memory barrier after setting the state. > > The whole sequence above is available in various canned forms, all of which > > -interpolate the memory barrier in the right place: > > +imply a general barrier if and only if a sleep is at least about to happen, > > +i.e. prepare_to_wait*() is called. > > > > wait_event(); > > wait_event_interruptible(); > > @@ -1986,6 +1987,9 @@ interpolate the memory barrier in the right place: > > wait_on_bit(); > > wait_on_bit_lock(); > > > > +Further more, no barrier is guaranteed after the successful wait condition > > +checkings, whose results are true, in wait_*() and before the instructions > > +following wait_*(). > > Yes, perhaps this makes sense, but (to me) only because the explanation above > looks a bit confusing to me. I simply can't understand why memory-barriers.txt > mentions that barrier implied by set_current_state(). You need to know this > if you want to understand how wait_event() works. You do not need to know > about this barrier if you want to use wait_event(). If nothing else, because > you can never rely on this barrier. Anf your examples below shows this. > Fair enough, I went too far. How about just a single paragraph saying that: The wake_up(), wait_event() and their friends have proper barriers in them, but these implicity barriers are only for the correctness for sleep and wakeup. So don't rely on these barriers for things that are neither wait-conditons nor task states. Is that OK to you? Thank you for your comments ;-) Regards, Boqun [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: wake_up_process implied memory barrier clarification 2015-08-31 0:37 ` Boqun Feng @ 2015-08-31 18:33 ` Oleg Nesterov 2015-08-31 20:37 ` Paul E. McKenney 0 siblings, 1 reply; 20+ messages in thread From: Oleg Nesterov @ 2015-08-31 18:33 UTC (permalink / raw) To: Boqun Feng Cc: Michal Hocko, Peter Zijlstra, LKML, David Howells, Linus Torvalds, Paul E. McKenney, Jonathan Corbet On 08/31, Boqun Feng wrote: > > Fair enough, I went too far. How about just a single paragraph saying > that: > > The wake_up(), wait_event() and their friends have proper barriers in > them, but these implicity barriers are only for the correctness for > sleep and wakeup. So don't rely on these barriers for things that are > neither wait-conditons nor task states. > > Is that OK to you? Ask Paul ;) but personally I agree. To me, the only thing a user should know about wake_up/try_to_wake_up and barriers is that you do not need another barrier between setting condition and waking up. Oleg. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: wake_up_process implied memory barrier clarification 2015-08-31 18:33 ` Oleg Nesterov @ 2015-08-31 20:37 ` Paul E. McKenney 2015-09-01 3:40 ` Boqun Feng 2015-09-01 9:41 ` Oleg Nesterov 0 siblings, 2 replies; 20+ messages in thread From: Paul E. McKenney @ 2015-08-31 20:37 UTC (permalink / raw) To: Oleg Nesterov Cc: Boqun Feng, Michal Hocko, Peter Zijlstra, LKML, David Howells, Linus Torvalds, Jonathan Corbet On Mon, Aug 31, 2015 at 08:33:35PM +0200, Oleg Nesterov wrote: > On 08/31, Boqun Feng wrote: > > > > Fair enough, I went too far. How about just a single paragraph saying > > that: > > > > The wake_up(), wait_event() and their friends have proper barriers in > > them, but these implicity barriers are only for the correctness for > > sleep and wakeup. So don't rely on these barriers for things that are > > neither wait-conditons nor task states. > > > > Is that OK to you? > > Ask Paul ;) but personally I agree. > > To me, the only thing a user should know about wake_up/try_to_wake_up > and barriers is that you do not need another barrier between setting > condition and waking up. Sounds like an excellent idea in general. But could you please show me a short code snippet illustrating where you don't need the additional barrier, even if the fastpaths are taken so that there is no sleep and no wakeup? Thanx, Paul ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: wake_up_process implied memory barrier clarification 2015-08-31 20:37 ` Paul E. McKenney @ 2015-09-01 3:40 ` Boqun Feng 2015-09-01 4:03 ` Paul E. McKenney 2015-09-01 9:59 ` Oleg Nesterov 2015-09-01 9:41 ` Oleg Nesterov 1 sibling, 2 replies; 20+ messages in thread From: Boqun Feng @ 2015-09-01 3:40 UTC (permalink / raw) To: Paul E. McKenney Cc: Oleg Nesterov, Michal Hocko, Peter Zijlstra, LKML, David Howells, Linus Torvalds, Jonathan Corbet [-- Attachment #1: Type: text/plain, Size: 1810 bytes --] Hi Paul, On Mon, Aug 31, 2015 at 01:37:39PM -0700, Paul E. McKenney wrote: > On Mon, Aug 31, 2015 at 08:33:35PM +0200, Oleg Nesterov wrote: > > On 08/31, Boqun Feng wrote: > > > > > > Fair enough, I went too far. How about just a single paragraph saying > > > that: > > > > > > The wake_up(), wait_event() and their friends have proper barriers in > > > them, but these implicity barriers are only for the correctness for > > > sleep and wakeup. So don't rely on these barriers for things that are > > > neither wait-conditons nor task states. > > > > > > Is that OK to you? > > > > Ask Paul ;) but personally I agree. > > > > To me, the only thing a user should know about wake_up/try_to_wake_up > > and barriers is that you do not need another barrier between setting > > condition and waking up. > > Sounds like an excellent idea in general. But could you please show me > a short code snippet illustrating where you don't need the additional > barrier, even if the fastpaths are taken so that there is no sleep and > no wakeup? > If there is no sleep and no wakeup, it means only CONDITION changed. Either CONDITION is a single variable or it should maintains internal ordering guarantee itself. And there is no need for barriers, because there is only one shared resource we are talking about, right? But I'm still a little confused at Oleg's words: "What is really important is that we have a barrier before we _read_ the task state." I read is as "What is really important is that we have a barrier before we _read_ the task state and _after_ we write the CONDITION", if I don't misunderstand Oleg, this means a STORE-barrier-LOAD sequence, which IIUC can't pair with anything. So, there might be some tricky barrier usage here? Regards, Boqun [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: wake_up_process implied memory barrier clarification 2015-09-01 3:40 ` Boqun Feng @ 2015-09-01 4:03 ` Paul E. McKenney 2015-09-01 9:59 ` Oleg Nesterov 1 sibling, 0 replies; 20+ messages in thread From: Paul E. McKenney @ 2015-09-01 4:03 UTC (permalink / raw) To: Boqun Feng Cc: Oleg Nesterov, Michal Hocko, Peter Zijlstra, LKML, David Howells, Linus Torvalds, Jonathan Corbet On Tue, Sep 01, 2015 at 11:40:14AM +0800, Boqun Feng wrote: > Hi Paul, > > On Mon, Aug 31, 2015 at 01:37:39PM -0700, Paul E. McKenney wrote: > > On Mon, Aug 31, 2015 at 08:33:35PM +0200, Oleg Nesterov wrote: > > > On 08/31, Boqun Feng wrote: > > > > > > > > Fair enough, I went too far. How about just a single paragraph saying > > > > that: > > > > > > > > The wake_up(), wait_event() and their friends have proper barriers in > > > > them, but these implicity barriers are only for the correctness for > > > > sleep and wakeup. So don't rely on these barriers for things that are > > > > neither wait-conditons nor task states. > > > > > > > > Is that OK to you? > > > > > > Ask Paul ;) but personally I agree. > > > > > > To me, the only thing a user should know about wake_up/try_to_wake_up > > > and barriers is that you do not need another barrier between setting > > > condition and waking up. > > > > Sounds like an excellent idea in general. But could you please show me > > a short code snippet illustrating where you don't need the additional > > barrier, even if the fastpaths are taken so that there is no sleep and > > no wakeup? > > If there is no sleep and no wakeup, it means only CONDITION changed. > Either CONDITION is a single variable or it should maintains internal > ordering guarantee itself. And there is no need for barriers, because > there is only one shared resource we are talking about, right? I could imagine all sorts of combinations, which is why I would like to see a code snippet showing exactly what Oleg is talking about. ;-) Thanx, Paul > But I'm still a little confused at Oleg's words: > > "What is really important is that we have a barrier before we _read_ the > task state." > > I read is as "What is really important is that we have a barrier before > we _read_ the task state and _after_ we write the CONDITION", if I don't > misunderstand Oleg, this means a STORE-barrier-LOAD sequence, which IIUC > can't pair with anything. > > So, there might be some tricky barrier usage here? > > Regards, > Boqun ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: wake_up_process implied memory barrier clarification 2015-09-01 3:40 ` Boqun Feng 2015-09-01 4:03 ` Paul E. McKenney @ 2015-09-01 9:59 ` Oleg Nesterov 2015-09-01 14:50 ` Boqun Feng 1 sibling, 1 reply; 20+ messages in thread From: Oleg Nesterov @ 2015-09-01 9:59 UTC (permalink / raw) To: Boqun Feng Cc: Paul E. McKenney, Michal Hocko, Peter Zijlstra, LKML, David Howells, Linus Torvalds, Jonathan Corbet On 09/01, Boqun Feng wrote: > > But I'm still a little confused at Oleg's words: > > "What is really important is that we have a barrier before we _read_ the > task state." > > I read is as "What is really important is that we have a barrier before > we _read_ the task state and _after_ we write the CONDITION", if I don't > misunderstand Oleg, this means a STORE-barrier-LOAD sequence, Yes, exactly. Let's look at this trivial code again, CONDITION = 1; wake_up_process(); note that try_to_wake_up() does if (!(p->state & state)) goto out; If this LOAD could be reordered with STORE(CONDITION) above we can obviously race with set_current_state(...); if (!CONDITION) schedule(); See the comment at the start of try_to_wake_up(). And again, again, please note that initially the only documented behaviour of smp_mb__before_spinlock() was the STORE - LOAD serialization. This is what try_to_wake_up() needs, it doesn't actually need the write barrier after STORE(CONDITION). And just in case, wake_up() differs in a sense that it doesn't even need that STORE-LOAD barrier in try_to_wake_up(), we can rely on wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal" wait_event()-like code. > which IIUC > can't pair with anything. It pairs with the barrier implied by set_current_state(). Oleg. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: wake_up_process implied memory barrier clarification 2015-09-01 9:59 ` Oleg Nesterov @ 2015-09-01 14:50 ` Boqun Feng 2015-09-01 16:39 ` Oleg Nesterov 0 siblings, 1 reply; 20+ messages in thread From: Boqun Feng @ 2015-09-01 14:50 UTC (permalink / raw) To: Oleg Nesterov Cc: Paul E. McKenney, Michal Hocko, Peter Zijlstra, LKML, David Howells, Linus Torvalds, Jonathan Corbet [-- Attachment #1: Type: text/plain, Size: 4321 bytes --] On Tue, Sep 01, 2015 at 11:59:23AM +0200, Oleg Nesterov wrote: > On 09/01, Boqun Feng wrote: > > > > But I'm still a little confused at Oleg's words: > > > > "What is really important is that we have a barrier before we _read_ the > > task state." > > > > I read is as "What is really important is that we have a barrier before > > we _read_ the task state and _after_ we write the CONDITION", if I don't > > misunderstand Oleg, this means a STORE-barrier-LOAD sequence, > > Yes, exactly. > > Let's look at this trivial code again, > > CONDITION = 1; > wake_up_process(); > > note that try_to_wake_up() does > > if (!(p->state & state)) > goto out; > > If this LOAD could be reordered with STORE(CONDITION) above we can obviously > race with > > set_current_state(...); > if (!CONDITION) > schedule(); > > See the comment at the start of try_to_wake_up(). And again, again, please Thank you for your detailed explanation! I read the comment, but just couldn't understand how the pairing happens, I think I have a misunderstanding of pairing, please see below. > note that initially the only documented behaviour of smp_mb__before_spinlock() > was the STORE - LOAD serialization. This is what try_to_wake_up() needs, it > doesn't actually need the write barrier after STORE(CONDITION). > > And just in case, wake_up() differs in a sense that it doesn't even need > that STORE-LOAD barrier in try_to_wake_up(), we can rely on > wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal" > wait_event()-like code. > > > which IIUC > > can't pair with anything. > > It pairs with the barrier implied by set_current_state(). I think maybe I have a misunderstanding of barrier pairing. I used to think that a barrier pairing can only happen: 1. between a "MEM-barrier-STORE" and "LOAD-barrier-MEM", when the LOAD reads the value which the STORE writes where MEM is a memory operation(STORE or LOAD). However, wait and wakeup don't fit in the #1 barrier pairing, so I guess I was wrong, there is a kind of barrier pairings which also happen: 2. between a "MEM-barrier-LOAD" and "STORE-barrier-MEM", when the LOAD _fails_ to read the value which the STORE writes, #1 pairing is easy to understand and memory-barriers.txt already has some examples. I admit that I haven't seen any usage of #2 pairing other than wait_event() and wake_up(). Of course, this may be because I'm not an expert of parallel programming and don't know too much.. Have to ask Paul, when we are talking about barrier pairing, we are actually talking about the combination of #1 and #2, right? And Oleg, the barrier pairing we are talking here is the kind of #2, right? Add two examples, in case that I fail to make clear the difference of two barrier pairings. One example of #1 pairing is the following sequence of events: Initially X = 0, Y = 0 CPU 1 CPU 2 =========================== ================================ WRITE_ONCE(Y, 1); smp_mb(); WRITE_ONCE(X, 1); r1 = READ_ONCE(X); // r1 == 1 smp_mb(); r2 = READ_ONCE(Y); ---------------------------------------------------------------- { assert(!(r1 == 1 && r2 == 0)); // means if r1 == 1 then r2 == 1} If CPU 2 reads the value of X written by CPU 1, r2 is guaranteed to be 1, which means assert(!(r1 == 1 && r2 == 0)) afterwards wouldn't be triggered in any case. One example of #2 pairing is the following sequence of events: Initially X = 0, Y = 0 CPU 1 CPU 2 =========================== ================================ WRITE_ONCE(Y, 1); smp_mb(); r1 = READ_ONCE(X); // r1 == 0 WRITE_ONCE(X, 1); smp_mb(); r2 = READ_ONCE(Y); ---------------------------------------------------------------- { assert(!(r1 == 0 && r2 == 0)); // means if r1 == 0 then r2 == 1} If CPU 1 _fails_ to read the value of X written by CPU 1, r2 is guaranteed to 1, which means assert(!(r1 == 0 && r2 == 0)) afterwards wouldn't be triggered in any case. And this is actually the case of wake_up/wait, assuming that prepare_to_wait() is called on CPU 1 and wake_up() is called on CPU 2, X is the condition and Y is the task state, and replace smp_mb() with really necessary barriers, right? Regards, Boqun [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: wake_up_process implied memory barrier clarification 2015-09-01 14:50 ` Boqun Feng @ 2015-09-01 16:39 ` Oleg Nesterov 2015-09-02 1:10 ` Boqun Feng 0 siblings, 1 reply; 20+ messages in thread From: Oleg Nesterov @ 2015-09-01 16:39 UTC (permalink / raw) To: Boqun Feng Cc: Paul E. McKenney, Michal Hocko, Peter Zijlstra, LKML, David Howells, Linus Torvalds, Jonathan Corbet On 09/01, Boqun Feng wrote: > > On Tue, Sep 01, 2015 at 11:59:23AM +0200, Oleg Nesterov wrote: > > > > And just in case, wake_up() differs in a sense that it doesn't even need > > that STORE-LOAD barrier in try_to_wake_up(), we can rely on > > wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal" > > wait_event()-like code. Looks like, you have missed this part of my previous email. See below. > I think maybe I have a misunderstanding of barrier pairing. Or me. I can only say how it is supposed to work. > think that a barrier pairing can only happen: Well, no. See for example https://lkml.org/lkml/2014/7/16/310 Or, say, the comment in completion_done(). And please do not assume I can answer authoritatively the questions in this area. Fortunately we have paulmck/peterz in CC, they can correct me. Plus I didn't sleep today, not sure I can understand you correctly and/or answer your question ;) > One example of #2 pairing is the following sequence of events: > > Initially X = 0, Y = 0 > > CPU 1 CPU 2 > =========================== ================================ > WRITE_ONCE(Y, 1); > smp_mb(); > r1 = READ_ONCE(X); // r1 == 0 > WRITE_ONCE(X, 1); > smp_mb(); > r2 = READ_ONCE(Y); > > ---------------------------------------------------------------- > { assert(!(r1 == 0 && r2 == 0)); // means if r1 == 0 then r2 == 1} > > If CPU 1 _fails_ to read the value of X written by CPU 1, r2 is > guaranteed to 1, which means assert(!(r1 == 0 && r2 == 0)) afterwards > wouldn't be triggered in any case. > > And this is actually the case of wake_up/wait, assuming that > prepare_to_wait() is called on CPU 1 and wake_up() is called on CPU 2, See above, wake_up/wait_event are fine in any case because they use the same lock. But as for try_to_wake_up() you are right, we rely on STORE-MB-LOAD. Now let me quote another previous email, So in fact try_to_wake_up() needs mb() before it does if (!(p->state & state)) goto out; But mb() is heavy, we can use wmb() instead, but only in this particular case: before spin_lock(), which guarantees that LOAD(p->state) can't leak out of the critical section. And since spin_lock() itself is the STORE, this guarantees that STORE(CONDITION) can't leak _into_ the critical section and thus it can't be reordered with LOAD(p->state). And I think that smp_mb__before_spinlock() + spin_lock() should pair correctly with mb(). If not - we should redefine it. > X is the condition and Y is the task state, Yes, > and replace smp_mb() with really necessary barriers, right? Sorry, can't understand this part... Oleg. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: wake_up_process implied memory barrier clarification 2015-09-01 16:39 ` Oleg Nesterov @ 2015-09-02 1:10 ` Boqun Feng 2015-09-07 17:06 ` Oleg Nesterov 0 siblings, 1 reply; 20+ messages in thread From: Boqun Feng @ 2015-09-02 1:10 UTC (permalink / raw) To: Oleg Nesterov Cc: Paul E. McKenney, Michal Hocko, Peter Zijlstra, LKML, David Howells, Linus Torvalds, Jonathan Corbet [-- Attachment #1: Type: text/plain, Size: 4034 bytes --] On Tue, Sep 01, 2015 at 06:39:23PM +0200, Oleg Nesterov wrote: > On 09/01, Boqun Feng wrote: > > > > On Tue, Sep 01, 2015 at 11:59:23AM +0200, Oleg Nesterov wrote: > > > > > > And just in case, wake_up() differs in a sense that it doesn't even need > > > that STORE-LOAD barrier in try_to_wake_up(), we can rely on > > > wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal" > > > wait_event()-like code. > > Looks like, you have missed this part of my previous email. See below. I guess I need to think through this, though I haven't found any problem in wake_up() if we remove the STORE-LOAD barrier in try_to_wake_up(). And I know that in wake_up(), try_to_wake_up() will be called with holding wait_queue_head_t->lock, however, only part of wait_event() holds the same lock, I can't figure out why the barrier is not needed because of the lock.. I will read the code carefully to see whether I miss something. > > > I think maybe I have a misunderstanding of barrier pairing. > > Or me. I can only say how it is supposed to work. > > > think that a barrier pairing can only happen: > > Well, no. See for example https://lkml.org/lkml/2014/7/16/310 This helps a lot, so does the LWN article it references: https://lwn.net/Articles/573436/ The barrier pairing here is scenario 10 in that article. I'm sure that is my misunderstanding of barrier pairing now, thank you! > Or, say, the comment in completion_done(). > > And please do not assume I can answer authoritatively the questions > in this area. Fortunately we have paulmck/peterz in CC, they can > correct me. > Your explanation and references help a lot, now I understand how the barrier works in try_to_wake_up() ;-) > Plus I didn't sleep today, not sure I can understand you correctly > and/or answer your question ;) > > > One example of #2 pairing is the following sequence of events: > > > > Initially X = 0, Y = 0 > > > > CPU 1 CPU 2 > > =========================== ================================ > > WRITE_ONCE(Y, 1); > > smp_mb(); > > r1 = READ_ONCE(X); // r1 == 0 > > WRITE_ONCE(X, 1); > > smp_mb(); > > r2 = READ_ONCE(Y); > > > > ---------------------------------------------------------------- > > { assert(!(r1 == 0 && r2 == 0)); // means if r1 == 0 then r2 == 1} > > > > If CPU 1 _fails_ to read the value of X written by CPU 1, r2 is > > guaranteed to 1, which means assert(!(r1 == 0 && r2 == 0)) afterwards > > wouldn't be triggered in any case. > > > > And this is actually the case of wake_up/wait, assuming that > > prepare_to_wait() is called on CPU 1 and wake_up() is called on CPU 2, > > See above, wake_up/wait_event are fine in any case because they use > the same lock. > I think I wanted to say wake_up_proccess() here, which calls try_to_wake_up() directly, sorry about that mistake.. > But as for try_to_wake_up() you are right, we rely on STORE-MB-LOAD. > Now let me quote another previous email, > > So in fact try_to_wake_up() needs mb() before it does > > if (!(p->state & state)) > goto out; > > But mb() is heavy, we can use wmb() instead, but only in this particular > case: before spin_lock(), which guarantees that LOAD(p->state) can't leak > out of the critical section. And since spin_lock() itself is the STORE, > this guarantees that STORE(CONDITION) can't leak _into_ the critical section > and thus it can't be reordered with LOAD(p->state). > This also helps a lot, thank you ;-) > And I think that smp_mb__before_spinlock() + spin_lock() should pair > correctly with mb(). If not - we should redefine it. > > > X is the condition and Y is the task state, > > Yes, > > > and replace smp_mb() with really necessary barriers, right? > > Sorry, can't understand this part... > I just want to be accurate by saying that, because the barrier in try_to_wake_up() is not a smp_mb(), is a smp_mb__before_spinlock() + spin_lock(). Regards, Boqun [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: wake_up_process implied memory barrier clarification 2015-09-02 1:10 ` Boqun Feng @ 2015-09-07 17:06 ` Oleg Nesterov 2015-09-08 0:22 ` Boqun Feng 0 siblings, 1 reply; 20+ messages in thread From: Oleg Nesterov @ 2015-09-07 17:06 UTC (permalink / raw) To: Boqun Feng Cc: Paul E. McKenney, Michal Hocko, Peter Zijlstra, LKML, David Howells, Linus Torvalds, Jonathan Corbet Sorry for delay, On 09/02, Boqun Feng wrote: > > On Tue, Sep 01, 2015 at 06:39:23PM +0200, Oleg Nesterov wrote: > > On 09/01, Boqun Feng wrote: > > > > > > On Tue, Sep 01, 2015 at 11:59:23AM +0200, Oleg Nesterov wrote: > > > > > > > > And just in case, wake_up() differs in a sense that it doesn't even need > > > > that STORE-LOAD barrier in try_to_wake_up(), we can rely on > > > > wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal" > > > > wait_event()-like code. > > > > Looks like, you have missed this part of my previous email. See below. > > I guess I need to think through this, though I haven't found any problem > in wake_up() if we remove the STORE-LOAD barrier in try_to_wake_up(). > And I know that in wake_up(), try_to_wake_up() will be called with > holding wait_queue_head_t->lock, however, only part of wait_event() > holds the same lock, I can't figure out why the barrier is not needed > because of the lock.. This is very simple. __wait_event() does for (;;) { prepare_to_wait_event(WQ, ...); // takes WQ->lock if (CONDITION) break; schedule(); } and we have CONDITION = 1; wake_up(WQ); // takes WQ->lock on another side. Suppose that __wait_event() wins and takes WQ->lock first. It can block then. In this case wake_up() must see the result of set_current_state() and list_add() when it takes the same lock, otherwise spin_lock() would be simply buggy. So it will wake the waiter up. At the same time, if __wait_event() takes this lock after wake_up(), it can not miss CONDITION = 1. It must see it after it takes the lock, and of course after it drops the lock too. So I am not sure I understand your concerns in this case... Oleg. ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: wake_up_process implied memory barrier clarification 2015-09-07 17:06 ` Oleg Nesterov @ 2015-09-08 0:22 ` Boqun Feng 0 siblings, 0 replies; 20+ messages in thread From: Boqun Feng @ 2015-09-08 0:22 UTC (permalink / raw) To: Oleg Nesterov Cc: Paul E. McKenney, Michal Hocko, Peter Zijlstra, LKML, David Howells, Linus Torvalds, Jonathan Corbet [-- Attachment #1: Type: text/plain, Size: 2128 bytes --] Hi Oleg, On Mon, Sep 07, 2015 at 07:06:52PM +0200, Oleg Nesterov wrote: > Sorry for delay, > > On 09/02, Boqun Feng wrote: > > > > On Tue, Sep 01, 2015 at 06:39:23PM +0200, Oleg Nesterov wrote: > > > On 09/01, Boqun Feng wrote: > > > > > > > > On Tue, Sep 01, 2015 at 11:59:23AM +0200, Oleg Nesterov wrote: > > > > > > > > > > And just in case, wake_up() differs in a sense that it doesn't even need > > > > > that STORE-LOAD barrier in try_to_wake_up(), we can rely on > > > > > wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal" > > > > > wait_event()-like code. > > > > > > Looks like, you have missed this part of my previous email. See below. > > > > I guess I need to think through this, though I haven't found any problem > > in wake_up() if we remove the STORE-LOAD barrier in try_to_wake_up(). > > And I know that in wake_up(), try_to_wake_up() will be called with > > holding wait_queue_head_t->lock, however, only part of wait_event() > > holds the same lock, I can't figure out why the barrier is not needed > > because of the lock.. > > This is very simple. __wait_event() does > > for (;;) { > prepare_to_wait_event(WQ, ...); // takes WQ->lock > > if (CONDITION) > break; > > schedule(); > } > > and we have > > CONDITION = 1; > wake_up(WQ); // takes WQ->lock > > on another side. > > Suppose that __wait_event() wins and takes WQ->lock first. It can block > then. In this case wake_up() must see the result of set_current_state() > and list_add() when it takes the same lock, otherwise spin_lock() would > be simply buggy. So it will wake the waiter up. > > At the same time, if __wait_event() takes this lock after wake_up(), it > can not miss CONDITION = 1. It must see it after it takes the lock, and > of course after it drops the lock too. > Yes, you're right! I wasn't aware that in prepare_to_wait_event(), set_current_state() is called with the WQ->lock. > So I am not sure I understand your concerns in this case... > It's my mistake. Thank you for your explanation ;-) Regards, Boqun [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: wake_up_process implied memory barrier clarification 2015-08-31 20:37 ` Paul E. McKenney 2015-09-01 3:40 ` Boqun Feng @ 2015-09-01 9:41 ` Oleg Nesterov 1 sibling, 0 replies; 20+ messages in thread From: Oleg Nesterov @ 2015-09-01 9:41 UTC (permalink / raw) To: Paul E. McKenney Cc: Boqun Feng, Michal Hocko, Peter Zijlstra, LKML, David Howells, Linus Torvalds, Jonathan Corbet On 08/31, Paul E. McKenney wrote: > > On Mon, Aug 31, 2015 at 08:33:35PM +0200, Oleg Nesterov wrote: > > On 08/31, Boqun Feng wrote: > > > > > > Fair enough, I went too far. How about just a single paragraph saying > > > that: > > > > > > The wake_up(), wait_event() and their friends have proper barriers in > > > them, but these implicity barriers are only for the correctness for > > > sleep and wakeup. So don't rely on these barriers for things that are > > > neither wait-conditons nor task states. > > > > > > Is that OK to you? > > > > Ask Paul ;) but personally I agree. > > > > To me, the only thing a user should know about wake_up/try_to_wake_up > > and barriers is that you do not need another barrier between setting > > condition and waking up. > > Sounds like an excellent idea in general. But could you please show me > a short code snippet illustrating where you don't need the additional > barrier, even if the fastpaths are taken so that there is no sleep and > no wakeup? I guess I wasn't clear... All I tried to say is that CONDITION = 1; wake_up_process(); does not need any _additional_ barrier in between. I mentioned this because afaics people are often unsure if this is true or not, and to some degree this question initiated this discussion. Oleg. ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2015-09-08 0:23 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-08-27 12:27 wake_up_process implied memory barrier clarification Michal Hocko 2015-08-27 12:43 ` Peter Zijlstra 2015-08-27 13:14 ` Michal Hocko 2015-08-27 18:26 ` Oleg Nesterov 2015-08-28 14:51 ` Michal Hocko 2015-08-28 16:06 ` Oleg Nesterov 2015-08-29 9:25 ` Boqun Feng 2015-08-29 14:27 ` Oleg Nesterov 2015-08-31 0:37 ` Boqun Feng 2015-08-31 18:33 ` Oleg Nesterov 2015-08-31 20:37 ` Paul E. McKenney 2015-09-01 3:40 ` Boqun Feng 2015-09-01 4:03 ` Paul E. McKenney 2015-09-01 9:59 ` Oleg Nesterov 2015-09-01 14:50 ` Boqun Feng 2015-09-01 16:39 ` Oleg Nesterov 2015-09-02 1:10 ` Boqun Feng 2015-09-07 17:06 ` Oleg Nesterov 2015-09-08 0:22 ` Boqun Feng 2015-09-01 9:41 ` Oleg Nesterov
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).