linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* wake_up_process implied memory barrier clarification
@ 2015-08-27 12:27 Michal Hocko
  2015-08-27 12:43 ` Peter Zijlstra
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2015-08-27 12:27 UTC (permalink / raw)
  To: LKML; +Cc: Oleg Nesterov, Peter Zijlstra, David Howells, Linus Torvalds

Hi,
I have just stumbled over the comment above wake_up_process which
claims:
"
 * It may be assumed that this function implies a write memory barrier before
 * changing the task state if and only if any tasks are woken up.
"

but try_to_wake_up does smp_mb__before_spinlock and did smp_wmb
since 04e2f1741d235 unconditionally. The comment was added when the
smp_wmb was in place already so I am wondering whether the comment is
wrong/misleading.

Could somebody clarify please?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: wake_up_process implied memory barrier clarification
  2015-08-27 12:27 wake_up_process implied memory barrier clarification Michal Hocko
@ 2015-08-27 12:43 ` Peter Zijlstra
  2015-08-27 13:14   ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Peter Zijlstra @ 2015-08-27 12:43 UTC (permalink / raw)
  To: Michal Hocko; +Cc: LKML, Oleg Nesterov, David Howells, Linus Torvalds

On Thu, Aug 27, 2015 at 02:27:27PM +0200, Michal Hocko wrote:
> Hi,
> I have just stumbled over the comment above wake_up_process which
> claims:
> "
>  * It may be assumed that this function implies a write memory barrier before
>  * changing the task state if and only if any tasks are woken up.
> "
> 
> but try_to_wake_up does smp_mb__before_spinlock and did smp_wmb
> since 04e2f1741d235 unconditionally. The comment was added when the
> smp_wmb was in place already so I am wondering whether the comment is
> wrong/misleading.
> 
> Could somebody clarify please?

Its true for wake_up(), since that bails early if the waitqueue list is
empty.

I suspect there was no exception made for wake_up_process() to simplify
the rules.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: wake_up_process implied memory barrier clarification
  2015-08-27 12:43 ` Peter Zijlstra
@ 2015-08-27 13:14   ` Michal Hocko
  2015-08-27 18:26     ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2015-08-27 13:14 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Oleg Nesterov, David Howells, Linus Torvalds

On Thu 27-08-15 14:43:34, Peter Zijlstra wrote:
> On Thu, Aug 27, 2015 at 02:27:27PM +0200, Michal Hocko wrote:
> > Hi,
> > I have just stumbled over the comment above wake_up_process which
> > claims:
> > "
> >  * It may be assumed that this function implies a write memory barrier before
> >  * changing the task state if and only if any tasks are woken up.
> > "
> > 
> > but try_to_wake_up does smp_mb__before_spinlock and did smp_wmb
> > since 04e2f1741d235 unconditionally. The comment was added when the
> > smp_wmb was in place already so I am wondering whether the comment is
> > wrong/misleading.
> > 
> > Could somebody clarify please?
> 
> Its true for wake_up(), since that bails early if the waitqueue list is
> empty.
> 
> I suspect there was no exception made for wake_up_process() to simplify
> the rules.

Thanks for the confirmation. Shouldn't we rather change the
documentation because this is clearly misleading and confusing.
--- 
>From b70d9a384cfd018e686c0aca06e830f564a34dd9 Mon Sep 17 00:00:00 2001
From: Michal Hocko <mhocko@suse.com>
Date: Thu, 27 Aug 2015 15:10:55 +0200
Subject: [PATCH] sched: Clarigy wake_up_process memory barrier semantic

wake_up_process unlike other wake up primitives based on __wake_up
implies the write memory barrier unconditionally because it relies
on try_to_wake_up directly.

Clarify this in the function comment and memory-barriers.txt because the
current doc is quite misleading.

Signed-off-by: Michal Hocko <mhocko@suse.com>
---
 Documentation/memory-barriers.txt | 3 +++
 kernel/sched/core.c               | 3 +--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index 13feb697271f..c4f180caf0ff 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -2031,6 +2031,9 @@ something up.  The barrier occurs before the task state is cleared, and so sits
 	    <general barrier>		  STORE current->state
 	LOAD event_indicated
 
+Please note that wake_up_process is an exception here because it implies
+the write memory barrier unconditionally.
+
 To repeat, this write memory barrier is present if and only if something
 is actually awakened.  To see this, consider the following sequence of
 events, where X and Y are both initially zero:
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 78b4bad10081..39583b76ad2c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1967,8 +1967,7 @@ static void try_to_wake_up_local(struct task_struct *p)
  *
  * Return: 1 if the process was woken up, 0 if it was already running.
  *
- * It may be assumed that this function implies a write memory barrier before
- * changing the task state if and only if any tasks are woken up.
+ * It may be assumed that this function implies a write memory barrier.
  */
 int wake_up_process(struct task_struct *p)
 {
-- 
2.5.0


-- 
Michal Hocko
SUSE Labs

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: wake_up_process implied memory barrier clarification
  2015-08-27 13:14   ` Michal Hocko
@ 2015-08-27 18:26     ` Oleg Nesterov
  2015-08-28 14:51       ` Michal Hocko
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2015-08-27 18:26 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Peter Zijlstra, LKML, David Howells, Linus Torvalds

On 08/27, Michal Hocko wrote:
>
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -2031,6 +2031,9 @@ something up.  The barrier occurs before the task state is cleared, and so sits
>  	    <general barrier>		  STORE current->state
>  	LOAD event_indicated
>
> +Please note that wake_up_process is an exception here because it implies
> +the write memory barrier unconditionally.
> +

I simply can't understand (can't even parse) this part of memory-barriers.txt.

> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1967,8 +1967,7 @@ static void try_to_wake_up_local(struct task_struct *p)
>   *
>   * Return: 1 if the process was woken up, 0 if it was already running.
>   *
> - * It may be assumed that this function implies a write memory barrier before
> - * changing the task state if and only if any tasks are woken up.
> + * It may be assumed that this function implies a write memory barrier.
>   */

I won't argue, technically this is correct of course. And I agree that
the old comment is misleading.



But the new comment looks as if it is fine to avoid wmb() if you do
wake_up_process(). Say,

	void w(void)
	{
		A = 1;
		wake_up_process(something_unrelated);
		// we know that it implies wmb().
		B = 1;
	}

	void r(void)
	{
		int a, b;

		b = B;
		rmb();
		a = A;

		BUG_ON(b && !a);
	}

Perhaps this part of the comment should be simply removed, the unconditional
wmb() in ttwu() is just implementation detail. And note that initially the
documented behaviour of smp_mb__before_spinlock() was only the STORE - LOAD
serialization. Then people noticed that it actually does wmb() and started
to rely on this fact.

To me, this comment should just explain that this function implies a barrier
but only in a sense that you do not need another one after CONDITION = T and
before wake_up_process().

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: wake_up_process implied memory barrier clarification
  2015-08-27 18:26     ` Oleg Nesterov
@ 2015-08-28 14:51       ` Michal Hocko
  2015-08-28 16:06         ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: Michal Hocko @ 2015-08-28 14:51 UTC (permalink / raw)
  To: Oleg Nesterov; +Cc: Peter Zijlstra, LKML, David Howells, Linus Torvalds

On Thu 27-08-15 20:26:54, Oleg Nesterov wrote:
> On 08/27, Michal Hocko wrote:
> >
> > --- a/Documentation/memory-barriers.txt
> > +++ b/Documentation/memory-barriers.txt
> > @@ -2031,6 +2031,9 @@ something up.  The barrier occurs before the task state is cleared, and so sits
> >  	    <general barrier>		  STORE current->state
> >  	LOAD event_indicated
> >
> > +Please note that wake_up_process is an exception here because it implies
> > +the write memory barrier unconditionally.
> > +
> 
> I simply can't understand (can't even parse) this part of memory-barriers.txt.

Do you mean the added text or the example above it?

> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1967,8 +1967,7 @@ static void try_to_wake_up_local(struct task_struct *p)
> >   *
> >   * Return: 1 if the process was woken up, 0 if it was already running.
> >   *
> > - * It may be assumed that this function implies a write memory barrier before
> > - * changing the task state if and only if any tasks are woken up.
> > + * It may be assumed that this function implies a write memory barrier.
> >   */
> 
> I won't argue, technically this is correct of course. And I agree that
> the old comment is misleading.

Well the reason I've noticed is the following race in the scsi code
    CPU0                                        CPU1
scsi_error_handler                      scsi_host_dev_release
                                          kthread_stop()
  while (!kthread_should_stop()) {
                                            set_bit(KTHREAD_SHOULD_STOP)
                                            wake_up_process()
                                            wait_for_completion()

    set_current_state(TASK_INTERRUPTIBLE)
    schedule()
    [...]
  }

I have read the comment for wake_up_process and was wondering that
moving set_current_state before kthread_should_stop wouldn't be enough
because the the task at CPU0 might be TASK_RUNNIG and so wake_up_process
wouldn't wake up it and the missing write barrier could lead to a missed
KTHREAD_SHOULD_STOP. A look into ttwu made my worry void.

> But the new comment looks as if it is fine to avoid wmb() if you do
> wake_up_process(). Say,
> 
> 	void w(void)
> 	{
> 		A = 1;
> 		wake_up_process(something_unrelated);
> 		// we know that it implies wmb().
> 		B = 1;
> 	}
> 
> 	void r(void)
> 	{
> 		int a, b;
> 
> 		b = B;
> 		rmb();
> 		a = A;
> 
> 		BUG_ON(b && !a);
> 	}
> 
> Perhaps this part of the comment should be simply removed, the unconditional
> wmb() in ttwu() is just implementation detail. And note that initially the
> documented behaviour of smp_mb__before_spinlock() was only the STORE - LOAD
> serialization. Then people noticed that it actually does wmb() and started
> to rely on this fact.
> 
> To me, this comment should just explain that this function implies a barrier
> but only in a sense that you do not need another one after CONDITION = T and
> before wake_up_process().

I have no objection against more precise wording here but what we have is just
misleading.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: wake_up_process implied memory barrier clarification
  2015-08-28 14:51       ` Michal Hocko
@ 2015-08-28 16:06         ` Oleg Nesterov
  2015-08-29  9:25           ` Boqun Feng
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2015-08-28 16:06 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Peter Zijlstra, LKML, David Howells, Linus Torvalds

On 08/28, Michal Hocko wrote:
>
> On Thu 27-08-15 20:26:54, Oleg Nesterov wrote:
> > On 08/27, Michal Hocko wrote:
> > >
> > > --- a/Documentation/memory-barriers.txt
> > > +++ b/Documentation/memory-barriers.txt
> > > @@ -2031,6 +2031,9 @@ something up.  The barrier occurs before the task state is cleared, and so sits
> > >  	    <general barrier>		  STORE current->state
> > >  	LOAD event_indicated
> > >
> > > +Please note that wake_up_process is an exception here because it implies
> > > +the write memory barrier unconditionally.
> > > +
> >
> > I simply can't understand (can't even parse) this part of memory-barriers.txt.
>
> Do you mean the added text or the example above it?

Both ;)

but note that "load from X might see 0" is true of course, and in this
sense wake_up_process() is not exception:

	X = 1;
	wmb();	// unless I am totally confused this just adds more confusion
	Y = 1;
	wake_up_process(TASK);

vs TASK doing

	for (;;) {
		set_current_state(...);
		if (Y)
			break;
		schedule();
	}

	BUG_ON(X == 0)

is not correct, it can actually can hit the BUG_ON() above. However, if
wake_up_process() actually wakes a sleeping TASK up, then it should also
see X = 1. Even without wmb(), even if we do

	Y = 1;
	X = 1;
	wake_up_process(TASK);

> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -1967,8 +1967,7 @@ static void try_to_wake_up_local(struct task_struct *p)
> > >   *
> > >   * Return: 1 if the process was woken up, 0 if it was already running.
> > >   *
> > > - * It may be assumed that this function implies a write memory barrier before
> > > - * changing the task state if and only if any tasks are woken up.
> > > + * It may be assumed that this function implies a write memory barrier.
> > >   */
> >
> > I won't argue, technically this is correct of course. And I agree that
> > the old comment is misleading.
>
> Well the reason I've noticed is the following race in the scsi code
>     CPU0                                        CPU1
> scsi_error_handler                      scsi_host_dev_release
>                                           kthread_stop()
>   while (!kthread_should_stop()) {
>                                             set_bit(KTHREAD_SHOULD_STOP)
>                                             wake_up_process()
>                                             wait_for_completion()
>
>     set_current_state(TASK_INTERRUPTIBLE)
>     schedule()

Heh. This looks like a common mistake. See fecdf8be2d91e04b0a9a4f79ff06499 ;)

But I believe this is another thing.

> I have read the comment for wake_up_process and was wondering that
> moving set_current_state before kthread_should_stop wouldn't be enough
> because the the task at CPU0 might be TASK_RUNNIG and so wake_up_process
> wouldn't wake up it and the missing write barrier could lead to a missed
> KTHREAD_SHOULD_STOP.

And that is why try_to_wake_up()->smp_mb__before_spinlock() needs to
serialize STORE(CONDITION) and the subsequent LOAD(p->state). The fact
that it actually does wmb() is just implementation detail, that is what
I tried to say.

> > To me, this comment should just explain that this function implies a barrier
> > but only in a sense that you do not need another one after CONDITION = T and
> > before wake_up_process().
>
> I have no objection against more precise wording here but what we have is just
> misleading.

Yes, yes, I agree. Just I do not know what exactly it should document.

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: wake_up_process implied memory barrier clarification
  2015-08-28 16:06         ` Oleg Nesterov
@ 2015-08-29  9:25           ` Boqun Feng
  2015-08-29 14:27             ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: Boqun Feng @ 2015-08-29  9:25 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Michal Hocko, Peter Zijlstra, LKML, David Howells,
	Linus Torvalds, Paul E. McKenney, Jonathan Corbet

Hi Oleg

On Fri, Aug 28, 2015 at 06:06:37PM +0200, Oleg Nesterov wrote:
> On 08/28, Michal Hocko wrote:
> >
> > On Thu 27-08-15 20:26:54, Oleg Nesterov wrote:
> > > On 08/27, Michal Hocko wrote:
> > > >
> > > > --- a/Documentation/memory-barriers.txt
> > > > +++ b/Documentation/memory-barriers.txt
> > > > @@ -2031,6 +2031,9 @@ something up.  The barrier occurs before the task state is cleared, and so sits
> > > >  	    <general barrier>		  STORE current->state
> > > >  	LOAD event_indicated
> > > >
> > > > +Please note that wake_up_process is an exception here because it implies
> > > > +the write memory barrier unconditionally.
> > > > +
> > >
> > > I simply can't understand (can't even parse) this part of memory-barriers.txt.
> >
> > Do you mean the added text or the example above it?
> 
> Both ;)
> 
> but note that "load from X might see 0" is true of course, and in this

By this, I think you actually means the example below the added text,
i.e. the example for "to repeat..", right?

I think that's not a good example, and actually that example explains
that the barriers in -wait_event()- rather than in -wake_up()- are
conditional.

I wrote a patch to make things more clear, hope that helps.

(Add Paul and Jonathan to CCed)

Regards,
Boqun

> sense wake_up_process() is not exception:
> 
> 	X = 1;
> 	wmb();	// unless I am totally confused this just adds more confusion
> 	Y = 1;
> 	wake_up_process(TASK);
> 
> vs TASK doing
> 
> 	for (;;) {
> 		set_current_state(...);
> 		if (Y)
> 			break;
> 		schedule();
> 	}
> 
> 	BUG_ON(X == 0)
> 
> is not correct, it can actually can hit the BUG_ON() above. However, if
> wake_up_process() actually wakes a sleeping TASK up, then it should also
> see X = 1. Even without wmb(), even if we do
> 
> 	Y = 1;
> 	X = 1;
> 	wake_up_process(TASK);
> 

----------->8
Subject: Documentation: call out conditional barriers of wait_*() and wake_up*()

The memory barriers in some sleep and wakeup functions are conditional,
there are several situations that there is no barriers:

1.	If wait_event() and co. actually don't prepare to sleep, there
	may be no barrier in them.

2.	No matter whether a sleep occurs or not, there may be no barrier
	between a successful wait-condition checking(the result of which
	is true) in wait_event() and the following instructions.

3.	If wake_up() and co. actually wake up no one, there may be no
	write barrier in them.

However, the current version of memory-barriers.txt doesn't call these
out. Further more, the example which wanted to explain that write
barriers in wake_up*() are conditional fails its job. To see that,
consider a similar example:

	CPU 1				CPU 2
	===============================	===============================
	X = 1;
	smp_mb();
	Y = 1;				wait_event(wq, Y == 1);
					  load from Y sees 1, no memory barrier
	smp_wmb();
	wake_up();
					load from X might see 0

CPU 2's load from X might still be 0 even if we add a write barrier
explicitly, which means that even if wake_up() guarantees a write
barrier, the original example still doesn't have the desired order
guarantee. In fact, the original example can explains the conditionality
of barriers in wait_*() rather than wake_up*().

This patch makes things clear, by calling out that the barriers in
wait_*() and wake_up*() are conditional and giving exact examples to
explain that.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---

 Documentation/memory-barriers.txt | 81 +++++++++++++++++++++++++++++++--------
 1 file changed, 65 insertions(+), 16 deletions(-)

diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt
index eafa6a5..df69841 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -1975,7 +1975,8 @@ set_current_state() may be wrapped by:
 
 which therefore also imply a general memory barrier after setting the state.
 The whole sequence above is available in various canned forms, all of which
-interpolate the memory barrier in the right place:
+imply a general barrier if and only if a sleep is at least about to happen,
+i.e. prepare_to_wait*() is called.
 
 	wait_event();
 	wait_event_interruptible();
@@ -1986,6 +1987,9 @@ interpolate the memory barrier in the right place:
 	wait_on_bit();
 	wait_on_bit_lock();
 
+Further more, no barrier is guaranteed after the successful wait condition
+checkings, whose results are true, in wait_*() and before the instructions
+following wait_*().
 
 Secondly, code that performs a wake up normally follows something like this:
 
@@ -2009,21 +2013,6 @@ between the STORE to indicate the event and the STORE to set TASK_RUNNING:
 	    <general barrier>		  STORE current->state
 	LOAD event_indicated
 
-To repeat, this write memory barrier is present if and only if something
-is actually awakened.  To see this, consider the following sequence of
-events, where X and Y are both initially zero:
-
-	CPU 1				CPU 2
-	===============================	===============================
-	X = 1;				STORE event_indicated
-	smp_mb();			wake_up();
-	Y = 1;				wait_event(wq, Y == 1);
-	wake_up();			  load from Y sees 1, no memory barrier
-					load from X might see 0
-
-In contrast, if a wakeup does occur, CPU 2's load from X would be guaranteed
-to see 1.
-
 The available waker functions include:
 
 	complete();
@@ -2042,6 +2031,66 @@ The available waker functions include:
 	wake_up_poll();
 	wake_up_process();
 
+To repeat, barriers in wait_event(), wake_up() and co. are conditional, meaning
+they are present if and only if a sleep or a wakeup actually occurs. To see
+this, consider the following three examples.
+
+The first example is for the conditional barriers in wait_*(), say X and Y
+are both initially zero:
+
+	CPU 1				CPU 2
+	===============================	===============================
+					X = 1;
+					smp_mb();
+	wait_event(wq, Y == 1);		Y = 1;
+	  load from Y sees 1		wake_up();
+	  <no memory barrier>
+	load from X might see 0
+
+And even if a sleep and a wakeup really occurs, there might be no barrier
+between the last load from Y(which the makes wait condition checking succeed)
+and load from X, so that load from X might still see 0. The second example shows
+that, the code running on CPU 1 & 2 is the same with the first example, however
+the sequence of events are different, say X, Y and Z are all initially zero,
+and another task may be waiting in wait_event(wq, Z == 1) on CPU 4:
+
+	CPU 1				CPU 2				CPU 3
+	===============================	=============================== ===============================
+	wait_event(wq, Y == 1);						Z = 1;
+	  load from Y sees 0
+	  prepare_to_wait_event();
+	    <general barrier>
+	  schedule();
+	    <general barrier>
+									wake_up();
+	  prepare_to_wait_event();
+	    <general barrier>
+					X = 1;
+					smp_mb();
+	  load from Y sees 1		Y = 1;
+	  <no memory barrier>		wake_up();
+	load from X might see 0
+
+So that, to guarantee that CPU 1's load from X is 1 in the two examples above,
+a read barrier must be added after wait_event() in the code running on CPU 1.
+
+The third example is for the conditional write barriers in wake_up*(), say
+X, Y and Z are all initially zero:
+
+	CPU 1				CPU 2
+	===============================	===============================
+					X = 1;
+	wait_event(wq, Y == 1);		Y = 1;
+	  load from Y sees 1		wake_up();
+					  <no memory barrier>
+					Z = 1;
+	load from Z sees 1
+	smp_rmb();
+	load from X might see 0
+
+In contrast, if a wakeup does occur, CPU 1's load from X would be guaranteed
+to see 1. To guarantee that CPU 1's load from X is 1, a write barrier must be
+added between store to X and store to Z in the code running on CPU 2.
 
 [!] Note that the memory barriers implied by the sleeper and the waker do _not_
 order multiple stores before the wake-up with respect to loads of those stored


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: wake_up_process implied memory barrier clarification
  2015-08-29  9:25           ` Boqun Feng
@ 2015-08-29 14:27             ` Oleg Nesterov
  2015-08-31  0:37               ` Boqun Feng
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2015-08-29 14:27 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Michal Hocko, Peter Zijlstra, LKML, David Howells,
	Linus Torvalds, Paul E. McKenney, Jonathan Corbet

Hello Boqun,

On 08/29, Boqun Feng wrote:
> Hi Oleg
>
> On Fri, Aug 28, 2015 at 06:06:37PM +0200, Oleg Nesterov wrote:
> > On 08/28, Michal Hocko wrote:
> > >
> > > On Thu 27-08-15 20:26:54, Oleg Nesterov wrote:
> > > > On 08/27, Michal Hocko wrote:
> > > > >
> > > > > --- a/Documentation/memory-barriers.txt
> > > > > +++ b/Documentation/memory-barriers.txt
> > > > > @@ -2031,6 +2031,9 @@ something up.  The barrier occurs before the task state is cleared, and so sits
> > > > >  	    <general barrier>		  STORE current->state
> > > > >  	LOAD event_indicated
> > > > >
> > > > > +Please note that wake_up_process is an exception here because it implies
> > > > > +the write memory barrier unconditionally.
> > > > > +
> > > >
> > > > I simply can't understand (can't even parse) this part of memory-barriers.txt.
> > >
> > > Do you mean the added text or the example above it?
> >
> > Both ;)
> >
> > but note that "load from X might see 0" is true of course, and in this
>
> By this, I think you actually means the example below the added text,
> i.e. the example for "to repeat..", right?

And above. Even

	The barrier occurs before the task state is cleared

is not actually right. This is misleading. What is really important is that
we have a barrier before we _read_ the task state. And again, again, the
fact that we actually have the write barrier is just the implementation
detail.

> Subject: Documentation: call out conditional barriers of wait_*() and wake_up*()
>
> The memory barriers in some sleep and wakeup functions are conditional,
> there are several situations that there is no barriers:
>
> 1.	If wait_event() and co. actually don't prepare to sleep, there
> 	may be no barrier in them.

And thus (imo) we should not even try to document this. I mean, a user
should not make any assumptions about the barriers inside wait_event().

> 2.	No matter whether a sleep occurs or not, there may be no barrier
> 	between a successful wait-condition checking(the result of which
> 	is true) in wait_event() and the following instructions.

Yes, this is true in any case. Not sure this deserves addtionional
documentation, but see below.

> 3.	If wake_up() and co. actually wake up no one, there may be no
> 	write barrier in them.

See above.

> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -1975,7 +1975,8 @@ set_current_state() may be wrapped by:
>
>  which therefore also imply a general memory barrier after setting the state.
>  The whole sequence above is available in various canned forms, all of which
> -interpolate the memory barrier in the right place:
> +imply a general barrier if and only if a sleep is at least about to happen,
> +i.e. prepare_to_wait*() is called.
>
>  	wait_event();
>  	wait_event_interruptible();
> @@ -1986,6 +1987,9 @@ interpolate the memory barrier in the right place:
>  	wait_on_bit();
>  	wait_on_bit_lock();
>
> +Further more, no barrier is guaranteed after the successful wait condition
> +checkings, whose results are true, in wait_*() and before the instructions
> +following wait_*().

Yes, perhaps this makes sense, but (to me) only because the explanation above
looks a bit confusing to me. I simply can't understand why memory-barriers.txt
mentions that barrier implied by set_current_state(). You need to know this
if you want to understand how wait_event() works. You do not need to know
about this barrier if you want to use wait_event(). If nothing else, because
you can never rely on this barrier. Anf your examples below shows this.

> +To repeat, barriers in wait_event(), wake_up() and co. are conditional, meaning
> +they are present if and only if a sleep or a wakeup actually occurs. To see
> +this, consider the following three examples.
> +
> +The first example is for the conditional barriers in wait_*(), say X and Y
> +are both initially zero:
> +
> +	CPU 1				CPU 2
> +	===============================	===============================
> +					X = 1;
> +					smp_mb();
> +	wait_event(wq, Y == 1);		Y = 1;
> +	  load from Y sees 1		wake_up();
> +	  <no memory barrier>
> +	load from X might see 0
> +
> +And even if a sleep and a wakeup really occurs, there might be no barrier
> +between the last load from Y(which the makes wait condition checking succeed)
> +and load from X, so that load from X might still see 0.
...

> +So that, to guarantee that CPU 1's load from X is 1 in the two examples above,
> +a read barrier must be added after wait_event() in the code running on CPU 1.

Yes. So why should we try to document that barrier in wait_event() ?

> +The third example is for the conditional write barriers in wake_up*(), say
> +X, Y and Z are all initially zero:
> +
> +	CPU 1				CPU 2
> +	===============================	===============================
> +					X = 1;
> +	wait_event(wq, Y == 1);		Y = 1;
> +	  load from Y sees 1		wake_up();
> +					  <no memory barrier>
> +					Z = 1;
> +	load from Z sees 1
> +	smp_rmb();
> +	load from X might see 0
> +
> +In contrast, if a wakeup does occur, CPU 1's load from X would be guaranteed
> +to see 1. To guarantee that CPU 1's load from X is 1, a write barrier must be
> +added between store to X and store to Z in the code running on CPU 2.

The same. Why should we mention that barrier in wake_up() if we can't
rely on it and CPU 2 needs a write barrier anyway?

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: wake_up_process implied memory barrier clarification
  2015-08-29 14:27             ` Oleg Nesterov
@ 2015-08-31  0:37               ` Boqun Feng
  2015-08-31 18:33                 ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: Boqun Feng @ 2015-08-31  0:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Michal Hocko, Peter Zijlstra, LKML, David Howells,
	Linus Torvalds, Paul E. McKenney, Jonathan Corbet

[-- Attachment #1: Type: text/plain, Size: 3384 bytes --]

Hi Oleg,

On Sat, Aug 29, 2015 at 04:27:07PM +0200, Oleg Nesterov wrote:
> Hello Boqun,
> 
...
> > By this, I think you actually means the example below the added text,
> > i.e. the example for "to repeat..", right?
> 
> And above. Even
> 
> 	The barrier occurs before the task state is cleared
> 
> is not actually right. This is misleading. What is really important is that
> we have a barrier before we _read_ the task state. And again, again, the
> fact that we actually have the write barrier is just the implementation
> detail.
> 

Yes, agree with you. However, I didn't realize this problem before, I
will see what I can do ;-)

> > Subject: Documentation: call out conditional barriers of wait_*() and wake_up*()
> >
> > The memory barriers in some sleep and wakeup functions are conditional,
> > there are several situations that there is no barriers:
> >
> > 1.	If wait_event() and co. actually don't prepare to sleep, there
> > 	may be no barrier in them.
> 
> And thus (imo) we should not even try to document this. I mean, a user
> should not make any assumptions about the barriers inside wait_event().
> 

Yep.

> > 2.	No matter whether a sleep occurs or not, there may be no barrier
> > 	between a successful wait-condition checking(the result of which
> > 	is true) in wait_event() and the following instructions.
> 
> Yes, this is true in any case. Not sure this deserves addtionional
> documentation, but see below.
> 
> > 3.	If wake_up() and co. actually wake up no one, there may be no
> > 	write barrier in them.
> 
> See above.
> 
> > --- a/Documentation/memory-barriers.txt
> > +++ b/Documentation/memory-barriers.txt
> > @@ -1975,7 +1975,8 @@ set_current_state() may be wrapped by:
> >
> >  which therefore also imply a general memory barrier after setting the state.
> >  The whole sequence above is available in various canned forms, all of which
> > -interpolate the memory barrier in the right place:
> > +imply a general barrier if and only if a sleep is at least about to happen,
> > +i.e. prepare_to_wait*() is called.
> >
> >  	wait_event();
> >  	wait_event_interruptible();
> > @@ -1986,6 +1987,9 @@ interpolate the memory barrier in the right place:
> >  	wait_on_bit();
> >  	wait_on_bit_lock();
> >
> > +Further more, no barrier is guaranteed after the successful wait condition
> > +checkings, whose results are true, in wait_*() and before the instructions
> > +following wait_*().
> 
> Yes, perhaps this makes sense, but (to me) only because the explanation above
> looks a bit confusing to me. I simply can't understand why memory-barriers.txt
> mentions that barrier implied by set_current_state(). You need to know this
> if you want to understand how wait_event() works. You do not need to know
> about this barrier if you want to use wait_event(). If nothing else, because
> you can never rely on this barrier. Anf your examples below shows this.
> 

Fair enough, I went too far. How about just a single paragraph saying
that:

The wake_up(), wait_event() and their friends have proper barriers in
them, but these implicity barriers are only for the correctness for
sleep and wakeup. So don't rely on these barriers for things that are
neither wait-conditons nor task states.

Is that OK to you?

Thank you for your comments ;-)

Regards,
Boqun


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: wake_up_process implied memory barrier clarification
  2015-08-31  0:37               ` Boqun Feng
@ 2015-08-31 18:33                 ` Oleg Nesterov
  2015-08-31 20:37                   ` Paul E. McKenney
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2015-08-31 18:33 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Michal Hocko, Peter Zijlstra, LKML, David Howells,
	Linus Torvalds, Paul E. McKenney, Jonathan Corbet

On 08/31, Boqun Feng wrote:
>
> Fair enough, I went too far. How about just a single paragraph saying
> that:
>
> The wake_up(), wait_event() and their friends have proper barriers in
> them, but these implicity barriers are only for the correctness for
> sleep and wakeup. So don't rely on these barriers for things that are
> neither wait-conditons nor task states.
>
> Is that OK to you?

Ask Paul ;) but personally I agree.

To me, the only thing a user should know about wake_up/try_to_wake_up
and barriers is that you do not need another barrier between setting
condition and waking up.

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: wake_up_process implied memory barrier clarification
  2015-08-31 18:33                 ` Oleg Nesterov
@ 2015-08-31 20:37                   ` Paul E. McKenney
  2015-09-01  3:40                     ` Boqun Feng
  2015-09-01  9:41                     ` Oleg Nesterov
  0 siblings, 2 replies; 20+ messages in thread
From: Paul E. McKenney @ 2015-08-31 20:37 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Boqun Feng, Michal Hocko, Peter Zijlstra, LKML, David Howells,
	Linus Torvalds, Jonathan Corbet

On Mon, Aug 31, 2015 at 08:33:35PM +0200, Oleg Nesterov wrote:
> On 08/31, Boqun Feng wrote:
> >
> > Fair enough, I went too far. How about just a single paragraph saying
> > that:
> >
> > The wake_up(), wait_event() and their friends have proper barriers in
> > them, but these implicity barriers are only for the correctness for
> > sleep and wakeup. So don't rely on these barriers for things that are
> > neither wait-conditons nor task states.
> >
> > Is that OK to you?
> 
> Ask Paul ;) but personally I agree.
> 
> To me, the only thing a user should know about wake_up/try_to_wake_up
> and barriers is that you do not need another barrier between setting
> condition and waking up.

Sounds like an excellent idea in general.  But could you please show me
a short code snippet illustrating where you don't need the additional
barrier, even if the fastpaths are taken so that there is no sleep and
no wakeup?

							Thanx, Paul


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: wake_up_process implied memory barrier clarification
  2015-08-31 20:37                   ` Paul E. McKenney
@ 2015-09-01  3:40                     ` Boqun Feng
  2015-09-01  4:03                       ` Paul E. McKenney
  2015-09-01  9:59                       ` Oleg Nesterov
  2015-09-01  9:41                     ` Oleg Nesterov
  1 sibling, 2 replies; 20+ messages in thread
From: Boqun Feng @ 2015-09-01  3:40 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Oleg Nesterov, Michal Hocko, Peter Zijlstra, LKML, David Howells,
	Linus Torvalds, Jonathan Corbet

[-- Attachment #1: Type: text/plain, Size: 1810 bytes --]

Hi Paul,

On Mon, Aug 31, 2015 at 01:37:39PM -0700, Paul E. McKenney wrote:
> On Mon, Aug 31, 2015 at 08:33:35PM +0200, Oleg Nesterov wrote:
> > On 08/31, Boqun Feng wrote:
> > >
> > > Fair enough, I went too far. How about just a single paragraph saying
> > > that:
> > >
> > > The wake_up(), wait_event() and their friends have proper barriers in
> > > them, but these implicity barriers are only for the correctness for
> > > sleep and wakeup. So don't rely on these barriers for things that are
> > > neither wait-conditons nor task states.
> > >
> > > Is that OK to you?
> > 
> > Ask Paul ;) but personally I agree.
> > 
> > To me, the only thing a user should know about wake_up/try_to_wake_up
> > and barriers is that you do not need another barrier between setting
> > condition and waking up.
> 
> Sounds like an excellent idea in general.  But could you please show me
> a short code snippet illustrating where you don't need the additional
> barrier, even if the fastpaths are taken so that there is no sleep and
> no wakeup?
> 

If there is no sleep and no wakeup, it means only CONDITION changed.
Either CONDITION is a single variable or it should maintains internal
ordering guarantee itself. And there is no need for barriers, because
there is only one shared resource we are talking about, right?

But I'm still a little confused at Oleg's words:

"What is really important is that we have a barrier before we _read_ the
task state."

I read is as "What is really important is that we have a barrier before
we _read_ the task state and _after_ we write the CONDITION", if I don't
misunderstand Oleg, this means a STORE-barrier-LOAD sequence, which IIUC
can't pair with anything.

So, there might be some tricky barrier usage here?

Regards,
Boqun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: wake_up_process implied memory barrier clarification
  2015-09-01  3:40                     ` Boqun Feng
@ 2015-09-01  4:03                       ` Paul E. McKenney
  2015-09-01  9:59                       ` Oleg Nesterov
  1 sibling, 0 replies; 20+ messages in thread
From: Paul E. McKenney @ 2015-09-01  4:03 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Oleg Nesterov, Michal Hocko, Peter Zijlstra, LKML, David Howells,
	Linus Torvalds, Jonathan Corbet

On Tue, Sep 01, 2015 at 11:40:14AM +0800, Boqun Feng wrote:
> Hi Paul,
> 
> On Mon, Aug 31, 2015 at 01:37:39PM -0700, Paul E. McKenney wrote:
> > On Mon, Aug 31, 2015 at 08:33:35PM +0200, Oleg Nesterov wrote:
> > > On 08/31, Boqun Feng wrote:
> > > >
> > > > Fair enough, I went too far. How about just a single paragraph saying
> > > > that:
> > > >
> > > > The wake_up(), wait_event() and their friends have proper barriers in
> > > > them, but these implicity barriers are only for the correctness for
> > > > sleep and wakeup. So don't rely on these barriers for things that are
> > > > neither wait-conditons nor task states.
> > > >
> > > > Is that OK to you?
> > > 
> > > Ask Paul ;) but personally I agree.
> > > 
> > > To me, the only thing a user should know about wake_up/try_to_wake_up
> > > and barriers is that you do not need another barrier between setting
> > > condition and waking up.
> > 
> > Sounds like an excellent idea in general.  But could you please show me
> > a short code snippet illustrating where you don't need the additional
> > barrier, even if the fastpaths are taken so that there is no sleep and
> > no wakeup?
> 
> If there is no sleep and no wakeup, it means only CONDITION changed.
> Either CONDITION is a single variable or it should maintains internal
> ordering guarantee itself. And there is no need for barriers, because
> there is only one shared resource we are talking about, right?

I could imagine all sorts of combinations, which is why I would like
to see a code snippet showing exactly what Oleg is talking about.  ;-)

							Thanx, Paul

> But I'm still a little confused at Oleg's words:
> 
> "What is really important is that we have a barrier before we _read_ the
> task state."
> 
> I read is as "What is really important is that we have a barrier before
> we _read_ the task state and _after_ we write the CONDITION", if I don't
> misunderstand Oleg, this means a STORE-barrier-LOAD sequence, which IIUC
> can't pair with anything.
> 
> So, there might be some tricky barrier usage here?
> 
> Regards,
> Boqun



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: wake_up_process implied memory barrier clarification
  2015-08-31 20:37                   ` Paul E. McKenney
  2015-09-01  3:40                     ` Boqun Feng
@ 2015-09-01  9:41                     ` Oleg Nesterov
  1 sibling, 0 replies; 20+ messages in thread
From: Oleg Nesterov @ 2015-09-01  9:41 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Boqun Feng, Michal Hocko, Peter Zijlstra, LKML, David Howells,
	Linus Torvalds, Jonathan Corbet

On 08/31, Paul E. McKenney wrote:
>
> On Mon, Aug 31, 2015 at 08:33:35PM +0200, Oleg Nesterov wrote:
> > On 08/31, Boqun Feng wrote:
> > >
> > > Fair enough, I went too far. How about just a single paragraph saying
> > > that:
> > >
> > > The wake_up(), wait_event() and their friends have proper barriers in
> > > them, but these implicity barriers are only for the correctness for
> > > sleep and wakeup. So don't rely on these barriers for things that are
> > > neither wait-conditons nor task states.
> > >
> > > Is that OK to you?
> >
> > Ask Paul ;) but personally I agree.
> >
> > To me, the only thing a user should know about wake_up/try_to_wake_up
> > and barriers is that you do not need another barrier between setting
> > condition and waking up.
>
> Sounds like an excellent idea in general.  But could you please show me
> a short code snippet illustrating where you don't need the additional
> barrier, even if the fastpaths are taken so that there is no sleep and
> no wakeup?

I guess I wasn't clear... All I tried to say is that

	CONDITION = 1;
	wake_up_process();

does not need any _additional_ barrier in between.

I mentioned this because afaics people are often unsure if this is true
or not, and to some degree this question initiated this discussion.

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: wake_up_process implied memory barrier clarification
  2015-09-01  3:40                     ` Boqun Feng
  2015-09-01  4:03                       ` Paul E. McKenney
@ 2015-09-01  9:59                       ` Oleg Nesterov
  2015-09-01 14:50                         ` Boqun Feng
  1 sibling, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2015-09-01  9:59 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Paul E. McKenney, Michal Hocko, Peter Zijlstra, LKML,
	David Howells, Linus Torvalds, Jonathan Corbet

On 09/01, Boqun Feng wrote:
>
> But I'm still a little confused at Oleg's words:
>
> "What is really important is that we have a barrier before we _read_ the
> task state."
>
> I read is as "What is really important is that we have a barrier before
> we _read_ the task state and _after_ we write the CONDITION", if I don't
> misunderstand Oleg, this means a STORE-barrier-LOAD sequence,

Yes, exactly.

Let's look at this trivial code again,

	CONDITION = 1;
	wake_up_process();

note that try_to_wake_up() does

	if (!(p->state & state))
		goto out;

If this LOAD could be reordered with STORE(CONDITION) above we can obviously
race with

	set_current_state(...);
	if (!CONDITION)
		schedule();

See the comment at the start of try_to_wake_up(). And again, again, please
note that initially the only documented behaviour of smp_mb__before_spinlock()
was the STORE - LOAD serialization. This is what try_to_wake_up() needs, it
doesn't actually need the write barrier after STORE(CONDITION).

And just in case, wake_up() differs in a sense that it doesn't even need
that STORE-LOAD barrier in try_to_wake_up(), we can rely on
wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal"
wait_event()-like code.

> which IIUC
> can't pair with anything.

It pairs with the barrier implied by set_current_state().

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: wake_up_process implied memory barrier clarification
  2015-09-01  9:59                       ` Oleg Nesterov
@ 2015-09-01 14:50                         ` Boqun Feng
  2015-09-01 16:39                           ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: Boqun Feng @ 2015-09-01 14:50 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Paul E. McKenney, Michal Hocko, Peter Zijlstra, LKML,
	David Howells, Linus Torvalds, Jonathan Corbet

[-- Attachment #1: Type: text/plain, Size: 4321 bytes --]

On Tue, Sep 01, 2015 at 11:59:23AM +0200, Oleg Nesterov wrote:
> On 09/01, Boqun Feng wrote:
> >
> > But I'm still a little confused at Oleg's words:
> >
> > "What is really important is that we have a barrier before we _read_ the
> > task state."
> >
> > I read is as "What is really important is that we have a barrier before
> > we _read_ the task state and _after_ we write the CONDITION", if I don't
> > misunderstand Oleg, this means a STORE-barrier-LOAD sequence,
> 
> Yes, exactly.
> 
> Let's look at this trivial code again,
> 
> 	CONDITION = 1;
> 	wake_up_process();
> 
> note that try_to_wake_up() does
> 
> 	if (!(p->state & state))
> 		goto out;
> 
> If this LOAD could be reordered with STORE(CONDITION) above we can obviously
> race with
> 
> 	set_current_state(...);
> 	if (!CONDITION)
> 		schedule();
> 
> See the comment at the start of try_to_wake_up(). And again, again, please

Thank you for your detailed explanation!

I read the comment, but just couldn't understand how the pairing
happens, I think I have a misunderstanding of pairing, please see below.

> note that initially the only documented behaviour of smp_mb__before_spinlock()
> was the STORE - LOAD serialization. This is what try_to_wake_up() needs, it
> doesn't actually need the write barrier after STORE(CONDITION).
> 
> And just in case, wake_up() differs in a sense that it doesn't even need
> that STORE-LOAD barrier in try_to_wake_up(), we can rely on
> wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal"
> wait_event()-like code.
> 
> > which IIUC
> > can't pair with anything.
> 
> It pairs with the barrier implied by set_current_state().

I think maybe I have a misunderstanding of barrier pairing. I used to
think that a barrier pairing can only happen:

1.	between a "MEM-barrier-STORE" and "LOAD-barrier-MEM", when the
	LOAD reads the value which the STORE writes

where MEM is a memory operation(STORE or LOAD).

However, wait and wakeup don't fit in the #1 barrier pairing, so I guess
I was wrong, there is a kind of barrier pairings which also happen:

2.	between a "MEM-barrier-LOAD" and "STORE-barrier-MEM", when the
	LOAD _fails_ to read the value which the STORE writes,

#1 pairing is easy to understand and memory-barriers.txt already has
some examples. I admit that I haven't seen any usage of #2 pairing
other than wait_event() and wake_up(). Of course, this may be because
I'm not an expert of parallel programming and don't know too much..

Have to ask Paul, when we are talking about barrier pairing, we are
actually talking about the combination of #1 and #2, right?

And Oleg, the barrier pairing we are talking here is the kind of #2,
right?



Add two examples, in case that I fail to make clear the difference of
two barrier pairings.

One example of #1 pairing is the following sequence of events:

Initially X = 0, Y = 0

CPU 1				CPU 2
===========================	================================
WRITE_ONCE(Y, 1);
smp_mb();
WRITE_ONCE(X, 1);
				r1 = READ_ONCE(X); // r1 == 1
				smp_mb();
				r2 = READ_ONCE(Y);
----------------------------------------------------------------
{ assert(!(r1 == 1 && r2 == 0)); // means if r1 == 1 then r2 == 1}

If CPU 2 reads the value of X written by CPU 1, r2 is guaranteed to be
1, which means assert(!(r1 == 1 && r2 == 0)) afterwards wouldn't be
triggered in any case.


One example of #2 pairing is the following sequence of events:

Initially X = 0, Y = 0

CPU 1				CPU 2
===========================	================================
WRITE_ONCE(Y, 1);
smp_mb();
r1 = READ_ONCE(X); // r1 == 0
				WRITE_ONCE(X, 1);
				smp_mb();
				r2 = READ_ONCE(Y);

----------------------------------------------------------------
{ assert(!(r1 == 0 && r2 == 0)); // means if r1 == 0 then r2 == 1}

If CPU 1 _fails_ to read the value of X written by CPU 1, r2 is
guaranteed to 1, which means assert(!(r1 == 0 && r2 == 0)) afterwards
wouldn't be triggered in any case.

And this is actually the case of wake_up/wait, assuming that
prepare_to_wait() is called on CPU 1 and wake_up() is called on CPU 2, X
is the condition and Y is the task state, and replace smp_mb() with
really necessary barriers, right?

Regards,
Boqun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: wake_up_process implied memory barrier clarification
  2015-09-01 14:50                         ` Boqun Feng
@ 2015-09-01 16:39                           ` Oleg Nesterov
  2015-09-02  1:10                             ` Boqun Feng
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2015-09-01 16:39 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Paul E. McKenney, Michal Hocko, Peter Zijlstra, LKML,
	David Howells, Linus Torvalds, Jonathan Corbet

On 09/01, Boqun Feng wrote:
>
> On Tue, Sep 01, 2015 at 11:59:23AM +0200, Oleg Nesterov wrote:
> >
> > And just in case, wake_up() differs in a sense that it doesn't even need
> > that STORE-LOAD barrier in try_to_wake_up(), we can rely on
> > wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal"
> > wait_event()-like code.

Looks like, you have missed this part of my previous email. See below.

> I think maybe I have a misunderstanding of barrier pairing.

Or me. I can only say how it is supposed to work.

> think that a barrier pairing can only happen:

Well, no. See for example https://lkml.org/lkml/2014/7/16/310
Or, say, the comment in completion_done().

And please do not assume I can answer authoritatively the questions
in this area. Fortunately we have paulmck/peterz in CC, they can
correct me.

Plus I didn't sleep today, not sure I can understand you correctly
and/or answer your question ;)

> One example of #2 pairing is the following sequence of events:
>
> Initially X = 0, Y = 0
>
> CPU 1				CPU 2
> ===========================	================================
> WRITE_ONCE(Y, 1);
> smp_mb();
> r1 = READ_ONCE(X); // r1 == 0
> 				WRITE_ONCE(X, 1);
> 				smp_mb();
> 				r2 = READ_ONCE(Y);
>
> ----------------------------------------------------------------
> { assert(!(r1 == 0 && r2 == 0)); // means if r1 == 0 then r2 == 1}
>
> If CPU 1 _fails_ to read the value of X written by CPU 1, r2 is
> guaranteed to 1, which means assert(!(r1 == 0 && r2 == 0)) afterwards
> wouldn't be triggered in any case.
>
> And this is actually the case of wake_up/wait, assuming that
> prepare_to_wait() is called on CPU 1 and wake_up() is called on CPU 2,

See above, wake_up/wait_event are fine in any case because they use
the same lock.

But as for try_to_wake_up() you are right, we rely on STORE-MB-LOAD.
Now let me quote another previous email,

	So in fact try_to_wake_up() needs mb() before it does

		if (!(p->state & state))
			goto out;

	But mb() is heavy, we can use wmb() instead, but only in this particular
	case: before spin_lock(), which guarantees that LOAD(p->state) can't leak
	out of the critical section. And since spin_lock() itself is the STORE,
	this guarantees that STORE(CONDITION) can't leak _into_ the critical section
	and thus it can't be reordered with LOAD(p->state).

And I think that smp_mb__before_spinlock() + spin_lock() should pair
correctly with mb(). If not - we should redefine it.

> X is the condition and Y is the task state,

Yes,

> and replace smp_mb() with really necessary barriers, right?

Sorry, can't understand this part...

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: wake_up_process implied memory barrier clarification
  2015-09-01 16:39                           ` Oleg Nesterov
@ 2015-09-02  1:10                             ` Boqun Feng
  2015-09-07 17:06                               ` Oleg Nesterov
  0 siblings, 1 reply; 20+ messages in thread
From: Boqun Feng @ 2015-09-02  1:10 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Paul E. McKenney, Michal Hocko, Peter Zijlstra, LKML,
	David Howells, Linus Torvalds, Jonathan Corbet

[-- Attachment #1: Type: text/plain, Size: 4034 bytes --]

On Tue, Sep 01, 2015 at 06:39:23PM +0200, Oleg Nesterov wrote:
> On 09/01, Boqun Feng wrote:
> >
> > On Tue, Sep 01, 2015 at 11:59:23AM +0200, Oleg Nesterov wrote:
> > >
> > > And just in case, wake_up() differs in a sense that it doesn't even need
> > > that STORE-LOAD barrier in try_to_wake_up(), we can rely on
> > > wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal"
> > > wait_event()-like code.
> 
> Looks like, you have missed this part of my previous email. See below.

I guess I need to think through this, though I haven't found any problem
in wake_up() if we remove the STORE-LOAD barrier in try_to_wake_up().
And I know that in wake_up(), try_to_wake_up() will be called with
holding wait_queue_head_t->lock, however, only part of wait_event()
holds the same lock, I can't figure out why the barrier is not needed
because of the lock.. I will read the code carefully to see whether I
miss something.

> 
> > I think maybe I have a misunderstanding of barrier pairing.
> 
> Or me. I can only say how it is supposed to work.
> 
> > think that a barrier pairing can only happen:
> 
> Well, no. See for example https://lkml.org/lkml/2014/7/16/310

This helps a lot, so does the LWN article it references:

https://lwn.net/Articles/573436/

The barrier pairing here is scenario 10 in that article. I'm sure that
is my misunderstanding of barrier pairing now, thank you!

> Or, say, the comment in completion_done().
> 
> And please do not assume I can answer authoritatively the questions
> in this area. Fortunately we have paulmck/peterz in CC, they can
> correct me.
> 

Your explanation and references help a lot, now I understand how the
barrier works in try_to_wake_up() ;-)

> Plus I didn't sleep today, not sure I can understand you correctly
> and/or answer your question ;)
> 
> > One example of #2 pairing is the following sequence of events:
> >
> > Initially X = 0, Y = 0
> >
> > CPU 1				CPU 2
> > ===========================	================================
> > WRITE_ONCE(Y, 1);
> > smp_mb();
> > r1 = READ_ONCE(X); // r1 == 0
> > 				WRITE_ONCE(X, 1);
> > 				smp_mb();
> > 				r2 = READ_ONCE(Y);
> >
> > ----------------------------------------------------------------
> > { assert(!(r1 == 0 && r2 == 0)); // means if r1 == 0 then r2 == 1}
> >
> > If CPU 1 _fails_ to read the value of X written by CPU 1, r2 is
> > guaranteed to 1, which means assert(!(r1 == 0 && r2 == 0)) afterwards
> > wouldn't be triggered in any case.
> >
> > And this is actually the case of wake_up/wait, assuming that
> > prepare_to_wait() is called on CPU 1 and wake_up() is called on CPU 2,
> 
> See above, wake_up/wait_event are fine in any case because they use
> the same lock.
> 

I think I wanted to say wake_up_proccess() here, which calls
try_to_wake_up() directly, sorry about that mistake..

> But as for try_to_wake_up() you are right, we rely on STORE-MB-LOAD.
> Now let me quote another previous email,
> 
> 	So in fact try_to_wake_up() needs mb() before it does
> 
> 		if (!(p->state & state))
> 			goto out;
> 
> 	But mb() is heavy, we can use wmb() instead, but only in this particular
> 	case: before spin_lock(), which guarantees that LOAD(p->state) can't leak
> 	out of the critical section. And since spin_lock() itself is the STORE,
> 	this guarantees that STORE(CONDITION) can't leak _into_ the critical section
> 	and thus it can't be reordered with LOAD(p->state).
> 

This also helps a lot, thank you ;-)

> And I think that smp_mb__before_spinlock() + spin_lock() should pair
> correctly with mb(). If not - we should redefine it.
> 
> > X is the condition and Y is the task state,
> 
> Yes,
> 
> > and replace smp_mb() with really necessary barriers, right?
> 
> Sorry, can't understand this part...
> 

I just want to be accurate by saying that, because the barrier in
try_to_wake_up() is not a smp_mb(), is a smp_mb__before_spinlock() +
spin_lock().

Regards,
Boqun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: wake_up_process implied memory barrier clarification
  2015-09-02  1:10                             ` Boqun Feng
@ 2015-09-07 17:06                               ` Oleg Nesterov
  2015-09-08  0:22                                 ` Boqun Feng
  0 siblings, 1 reply; 20+ messages in thread
From: Oleg Nesterov @ 2015-09-07 17:06 UTC (permalink / raw)
  To: Boqun Feng
  Cc: Paul E. McKenney, Michal Hocko, Peter Zijlstra, LKML,
	David Howells, Linus Torvalds, Jonathan Corbet

Sorry for delay,

On 09/02, Boqun Feng wrote:
>
> On Tue, Sep 01, 2015 at 06:39:23PM +0200, Oleg Nesterov wrote:
> > On 09/01, Boqun Feng wrote:
> > >
> > > On Tue, Sep 01, 2015 at 11:59:23AM +0200, Oleg Nesterov wrote:
> > > >
> > > > And just in case, wake_up() differs in a sense that it doesn't even need
> > > > that STORE-LOAD barrier in try_to_wake_up(), we can rely on
> > > > wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal"
> > > > wait_event()-like code.
> >
> > Looks like, you have missed this part of my previous email. See below.
>
> I guess I need to think through this, though I haven't found any problem
> in wake_up() if we remove the STORE-LOAD barrier in try_to_wake_up().
> And I know that in wake_up(), try_to_wake_up() will be called with
> holding wait_queue_head_t->lock, however, only part of wait_event()
> holds the same lock, I can't figure out why the barrier is not needed
> because of the lock..

This is very simple. __wait_event() does

	for (;;) {
		prepare_to_wait_event(WQ, ...);	// takes WQ->lock

		if (CONDITION)
			break;

		schedule();
	}

and we have

	CONDITION = 1;
	wake_up(WQ);				// takes WQ->lock

on another side.

Suppose that __wait_event() wins and takes WQ->lock first. It can block
then. In this case wake_up() must see the result of set_current_state()
and list_add() when it takes the same lock, otherwise spin_lock() would
be simply buggy. So it will wake the waiter up.

At the same time, if __wait_event() takes this lock after wake_up(), it
can not miss CONDITION = 1. It must see it after it takes the lock, and
of course after it drops the lock too.

So I am not sure I understand your concerns in this case...

Oleg.


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: wake_up_process implied memory barrier clarification
  2015-09-07 17:06                               ` Oleg Nesterov
@ 2015-09-08  0:22                                 ` Boqun Feng
  0 siblings, 0 replies; 20+ messages in thread
From: Boqun Feng @ 2015-09-08  0:22 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Paul E. McKenney, Michal Hocko, Peter Zijlstra, LKML,
	David Howells, Linus Torvalds, Jonathan Corbet

[-- Attachment #1: Type: text/plain, Size: 2128 bytes --]

Hi Oleg,

On Mon, Sep 07, 2015 at 07:06:52PM +0200, Oleg Nesterov wrote:
> Sorry for delay,
> 
> On 09/02, Boqun Feng wrote:
> >
> > On Tue, Sep 01, 2015 at 06:39:23PM +0200, Oleg Nesterov wrote:
> > > On 09/01, Boqun Feng wrote:
> > > >
> > > > On Tue, Sep 01, 2015 at 11:59:23AM +0200, Oleg Nesterov wrote:
> > > > >
> > > > > And just in case, wake_up() differs in a sense that it doesn't even need
> > > > > that STORE-LOAD barrier in try_to_wake_up(), we can rely on
> > > > > wait_queue_head_t->lock. Assuming that wake_up() pairs with the "normal"
> > > > > wait_event()-like code.
> > >
> > > Looks like, you have missed this part of my previous email. See below.
> >
> > I guess I need to think through this, though I haven't found any problem
> > in wake_up() if we remove the STORE-LOAD barrier in try_to_wake_up().
> > And I know that in wake_up(), try_to_wake_up() will be called with
> > holding wait_queue_head_t->lock, however, only part of wait_event()
> > holds the same lock, I can't figure out why the barrier is not needed
> > because of the lock..
> 
> This is very simple. __wait_event() does
> 
> 	for (;;) {
> 		prepare_to_wait_event(WQ, ...);	// takes WQ->lock
> 
> 		if (CONDITION)
> 			break;
> 
> 		schedule();
> 	}
> 
> and we have
> 
> 	CONDITION = 1;
> 	wake_up(WQ);				// takes WQ->lock
> 
> on another side.
> 
> Suppose that __wait_event() wins and takes WQ->lock first. It can block
> then. In this case wake_up() must see the result of set_current_state()
> and list_add() when it takes the same lock, otherwise spin_lock() would
> be simply buggy. So it will wake the waiter up.
> 
> At the same time, if __wait_event() takes this lock after wake_up(), it
> can not miss CONDITION = 1. It must see it after it takes the lock, and
> of course after it drops the lock too.
> 

Yes, you're right! I wasn't aware that in prepare_to_wait_event(),
set_current_state() is called with the WQ->lock.

> So I am not sure I understand your concerns in this case...
> 

It's my mistake. Thank you for your explanation ;-)

Regards,
Boqun

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-09-08  0:23 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-27 12:27 wake_up_process implied memory barrier clarification Michal Hocko
2015-08-27 12:43 ` Peter Zijlstra
2015-08-27 13:14   ` Michal Hocko
2015-08-27 18:26     ` Oleg Nesterov
2015-08-28 14:51       ` Michal Hocko
2015-08-28 16:06         ` Oleg Nesterov
2015-08-29  9:25           ` Boqun Feng
2015-08-29 14:27             ` Oleg Nesterov
2015-08-31  0:37               ` Boqun Feng
2015-08-31 18:33                 ` Oleg Nesterov
2015-08-31 20:37                   ` Paul E. McKenney
2015-09-01  3:40                     ` Boqun Feng
2015-09-01  4:03                       ` Paul E. McKenney
2015-09-01  9:59                       ` Oleg Nesterov
2015-09-01 14:50                         ` Boqun Feng
2015-09-01 16:39                           ` Oleg Nesterov
2015-09-02  1:10                             ` Boqun Feng
2015-09-07 17:06                               ` Oleg Nesterov
2015-09-08  0:22                                 ` Boqun Feng
2015-09-01  9:41                     ` Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).