linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [v2][PATCH] Fix a race between try_to_wake_up() and a woken up task
@ 2016-08-31 10:08 Balbir Singh
  2016-09-05  3:16 ` [RESEND][v2][PATCH] " Balbir Singh
  0 siblings, 1 reply; 8+ messages in thread
From: Balbir Singh @ 2016-08-31 10:08 UTC (permalink / raw)
  To: Peter Zijlstra, LKML, Oleg Nesterov
  Cc: Benjamin Herrenschmidt, Nicholas Piggin, Alexey Kardashevskiy



The origin of the issue I've seen is related to
a missing memory barrier between check for task->state and
the check for task->on_rq.

The task being woken up is already awake from a schedule()
and is doing the following

do {
	schedule();
	set_current_state(TASK_(UN)INTERRUPTIBLE);
} while (!cond);

The waker, actually gets stuck doing the following in
try_to_wake_up

while (p->on_cpu)
	cpu_relax();

Analysis

The instance I've seen involves the following race

CPU1					CPU2

while () {
  if (cond)
    break;
  do {
    schedule();
    set_current_state(TASK_UN..)
  } while (!cond);
					wakeup_routine()
					  spin_lock_irqsave(wait_lock)
  raw_spin_lock_irqsave(wait_lock)	  wake_up_process()
}					  try_to_wake_up()
set_current_state(TASK_RUNNING);	  ..
list_del(&waiter.list);

CPU2 wakes up CPU1, but before it can get the wait_lock and set
current state to TASK_RUNNING the following occurs

CPU3
wakeup_routine()
raw_spin_lock_irqsave(wait_lock)
if (!list_empty)
  wake_up_process()
  try_to_wake_up()
  raw_spin_lock_irqsave(p->pi_lock)
  ..
  if (p->on_rq && ttwu_wakeup())
  ..
  while (p->on_cpu)
    cpu_relax()
  ..

CPU3 tries to wake up the task on CPU1 again since it finds
it on the wait_queue, CPU1 is spinning on wait_lock, but immediately
after CPU2, CPU3 got it.

CPU3 checks the state of p on CPU1, it is TASK_UNINTERRUPTIBLE and
the task is spinning on the wait_lock. Interestingly since p->on_rq
is checked under pi_lock, I've noticed that try_to_wake_up() finds
p->on_rq to be 0. This was the most confusing bit of the analysis,
but p->on_rq is changed under runqueue lock, rq_lock, the p->on_rq
check is not reliable without this fix IMHO. The race is visible
(based on the analysis) only when ttwu_queue() does a remote wakeup
via ttwu_queue_remote. In which case the p->on_rq change is not
done uder the pi_lock.

The result is that after a while the entire system locks up on
the raw_spin_irqlock_save(wait_lock) and the holder spins infintely

Reproduction of the issue

The issue can be reproduced after a long run on my system with 80
threads and having to tweak available memory to very low and running
memory stress-ng mmapfork test. It usually takes a long time to
reproduce. I am trying to work on a test case that can reproduce
the issue faster, but thats work in progress. I am still testing the
changes on my still in a loop and the tests seem OK thus far.

Big thanks to Benjamin and Nick for helping debug this as well.
Ben helped catch the missing barrier, Nick caught every missing
bit in my theory

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
---
 -v2: Make the description more generic

 kernel/sched/core.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2a906f2..582c684 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2016,6 +2016,17 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	success = 1; /* we're going to change ->state */
 	cpu = task_cpu(p);
 
+	/*
+	 * Ensure we see on_rq and p_state consistently
+	 *
+	 *    [S] ->on_rq = 1				[L] ->state
+	 *    MB					 RMB
+	 *    [S] ->state = TASK_UNINTERRUPTIBLE	[L] ->on_rq
+	 * In the absence of the RMB p->on_rq can be observed to be 0
+	 * and we end up spinning indefinitely in while (p->on_cpu)
+	 */
+	smp_rmb();
 	if (p->on_rq && ttwu_remote(p, wake_flags))
 		goto stat;
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [RESEND][v2][PATCH] Fix a race between try_to_wake_up() and a woken up task
  2016-08-31 10:08 [v2][PATCH] Fix a race between try_to_wake_up() and a woken up task Balbir Singh
@ 2016-09-05  3:16 ` Balbir Singh
  2016-09-05  7:14   ` Benjamin Herrenschmidt
  2016-09-05 11:14   ` [tip:sched/urgent] sched/core: " tip-bot for Balbir Singh
  0 siblings, 2 replies; 8+ messages in thread
From: Balbir Singh @ 2016-09-05  3:16 UTC (permalink / raw)
  To: Peter Zijlstra, LKML, Oleg Nesterov
  Cc: Benjamin Herrenschmidt, Nicholas Piggin, Alexey Kardashevskiy



The origin of the issue I've seen is related to
a missing memory barrier between check for task->state and
the check for task->on_rq.

The task being woken up is already awake from a schedule()
and is doing the following

do {
	schedule()
	set_current_state(TASK_(UN)INTERRUPTIBLE);
} while (!cond);

The waker, actually gets stuck doing the following in
try_to_wake_up

while (p->on_cpu)
	cpu_relax();

Analysis

The instance I've seen involves the following race

CPU1					CPU2

while () {
  if (cond)
    break;
  do {
    schedule();
    set_current_state(TASK_UN..)
  } while (!cond);
					wakeup_routine()
					  spin_lock_irqsave(wait_lock)
  raw_spin_lock_irqsave(wait_lock)	  wake_up_process()
}					  try_to_wake_up()
set_current_state(TASK_RUNNING);	  ..
list_del(&waiter.list);

CPU2 wakes up CPU1, but before it can get the wait_lock and set
current state to TASK_RUNNING the following occurs

CPU3
wakeup_routine()
raw_spin_lock_irqsave(wait_lock)
if (!list_empty)
  wake_up_process()
  try_to_wake_up()
  raw_spin_lock_irqsave(p->pi_lock)
  ..
  if (p->on_rq && ttwu_wakeup())
  ..
  while (p->on_cpu)
    cpu_relax()
  ..

CPU3 tries to wake up the task on CPU1 again since it finds
it on the wait_queue, CPU1 is spinning on wait_lock, but immediately
after CPU2, CPU3 got it.

CPU3 checks the state of p on CPU1, it is TASK_UNINTERRUPTIBLE and
the task is spinning on the wait_lock. Interestingly since p->on_rq
is checked under pi_lock, I've noticed that try_to_wake_up() finds
p->on_rq to be 0. This was the most confusing bit of the analysis,
but p->on_rq is changed under runqueue lock, rq_lock, the p->on_rq
check is not reliable without this fix IMHO. The race is visible
(based on the analysis) only when ttwu_queue() does a remote wakeup
via ttwu_queue_remote. In which case the p->on_rq change is not
done uder the pi_lock.

The result is that after a while the entire system locks up on
the raw_spin_irqlock_save(wait_lock) and the holder spins infintely

Reproduction of the issue

The issue can be reproduced after a long run on my system with 80
threads and having to tweak available memory to very low and running
memory stress-ng mmapfork test. It usually takes a long time to
reproduce. I am trying to work on a test case that can reproduce
the issue faster, but thats work in progress. I am still testing the
changes on my still in a loop and the tests seem OK thus far.

Big thanks to Benjamin and Nick for helping debug this as well.
Ben helped catch the missing barrier, Nick caught every missing
bit in my theory

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
---
 kernel/sched/core.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2a906f2..582c684 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2016,6 +2016,17 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	success = 1; /* we're going to change ->state */
 	cpu = task_cpu(p);
 
+	/*
+	 * Ensure we see on_rq and p_state consistently
+	 *
+	 * For example in __rwsem_down_write_failed(), we have
+	 *    [S] ->on_rq = 1				[L] ->state
+	 *    MB					 RMB
+	 *    [S] ->state = TASK_UNINTERRUPTIBLE	[L] ->on_rq
+	 * In the absence of the RMB p->on_rq can be observed to be 0
+	 * and we end up spinning indefinitely in while (p->on_cpu)
+	 */
+	smp_rmb();
 	if (p->on_rq && ttwu_remote(p, wake_flags))
 		goto stat;
 
-- 
2.5.5

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [RESEND][v2][PATCH] Fix a race between try_to_wake_up() and a woken up task
  2016-09-05  3:16 ` [RESEND][v2][PATCH] " Balbir Singh
@ 2016-09-05  7:14   ` Benjamin Herrenschmidt
  2016-09-05  7:48     ` Peter Zijlstra
  2016-09-05 11:14   ` [tip:sched/urgent] sched/core: " tip-bot for Balbir Singh
  1 sibling, 1 reply; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2016-09-05  7:14 UTC (permalink / raw)
  To: Balbir Singh, Peter Zijlstra, LKML, Oleg Nesterov
  Cc: Nicholas Piggin, Alexey Kardashevskiy

On Mon, 2016-09-05 at 13:16 +1000, Balbir Singh wrote:

 .../...
> 
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: Nicholas Piggin <npiggin@gmail.com>

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

> Signed-off-by: Balbir Singh <bsingharora@gmail.com>
> ---
>  kernel/sched/core.c | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2a906f2..582c684 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2016,6 +2016,17 @@ try_to_wake_up(struct task_struct *p, unsigned
> int state, int wake_flags)
>  	success = 1; /* we're going to change ->state */
>  	cpu = task_cpu(p);
>  
> +	/*
> +	 * Ensure we see on_rq and p_state consistently
> +	 *
> +	 * For example in __rwsem_down_write_failed(), we have
> +	 *    [S] ->on_rq = 1				[L]
> ->state
> +	 *    MB					 RMB
> +	 *    [S] ->state = TASK_UNINTERRUPTIBLE	[L] ->on_rq
> +	 * In the absence of the RMB p->on_rq can be observed to be
> 0
> +	 * and we end up spinning indefinitely in while (p->on_cpu)
> +	 */
> +	smp_rmb();
>  	if (p->on_rq && ttwu_remote(p, wake_flags))
>  		goto stat;
>  

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RESEND][v2][PATCH] Fix a race between try_to_wake_up() and a woken up task
  2016-09-05  7:14   ` Benjamin Herrenschmidt
@ 2016-09-05  7:48     ` Peter Zijlstra
  2016-09-05  8:24       ` Mike Galbraith
  2016-09-05  8:37       ` Balbir Singh
  0 siblings, 2 replies; 8+ messages in thread
From: Peter Zijlstra @ 2016-09-05  7:48 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Balbir Singh, LKML, Oleg Nesterov, Nicholas Piggin, Alexey Kardashevskiy

On Mon, Sep 05, 2016 at 05:14:19PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2016-09-05 at 13:16 +1000, Balbir Singh wrote:
> 
>  .../...
> > 
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Nicholas Piggin <npiggin@gmail.com>
> 
> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> 
> > Signed-off-by: Balbir Singh <bsingharora@gmail.com>
> > ---
> >  kernel/sched/core.c | 11 +++++++++++
> >  1 file changed, 11 insertions(+)
> > 
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 2a906f2..582c684 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -2016,6 +2016,17 @@ try_to_wake_up(struct task_struct *p, unsigned
> > int state, int wake_flags)
> >  	success = 1; /* we're going to change ->state */
> >  	cpu = task_cpu(p);
> >  
> > +	/*
> > +	 * Ensure we see on_rq and p_state consistently
> > +	 *
> > +	 * For example in __rwsem_down_write_failed(), we have
> > +	 *    [S] ->on_rq = 1				[L] ->state
> > +	 *    MB					 RMB
> > +	 *    [S] ->state = TASK_UNINTERRUPTIBLE	[L] ->on_rq
> > +	 * In the absence of the RMB p->on_rq can be observed to be 0
> > +	 * and we end up spinning indefinitely in while (p->on_cpu)
> > +	 */

So I did replace that comment with the one I proposed earlier. I checked
a fair number of architectures and many did not have an obvious barrier
in switch_to(). So that is not something we can rely on, nor do we need
to I think.

> > +	smp_rmb();
> >  	if (p->on_rq && ttwu_remote(p, wake_flags))
> >  		goto stat;
> >  

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RESEND][v2][PATCH] Fix a race between try_to_wake_up() and a woken up task
  2016-09-05  7:48     ` Peter Zijlstra
@ 2016-09-05  8:24       ` Mike Galbraith
  2016-09-05  8:31         ` Benjamin Herrenschmidt
  2016-09-05  8:37       ` Balbir Singh
  1 sibling, 1 reply; 8+ messages in thread
From: Mike Galbraith @ 2016-09-05  8:24 UTC (permalink / raw)
  To: Peter Zijlstra, Benjamin Herrenschmidt
  Cc: Balbir Singh, LKML, Oleg Nesterov, Nicholas Piggin, Alexey Kardashevskiy

On Mon, 2016-09-05 at 09:48 +0200, Peter Zijlstra wrote:
> On Mon, Sep 05, 2016 at 05:14:19PM +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2016-09-05 at 13:16 +1000, Balbir Singh wrote:
> > 
> >  .../...
> > > 
> > > Cc: Peter Zijlstra <peterz@infradead.org>
> > > Cc: Nicholas Piggin <npiggin@gmail.com>
> > 
> > Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > 
> > > Signed-off-by: Balbir Singh <bsingharora@gmail.com>
> > > ---

... no Cc: @stable?  Sounds plenty nasty enough to be worthy.

	-Mike

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RESEND][v2][PATCH] Fix a race between try_to_wake_up() and a woken up task
  2016-09-05  8:24       ` Mike Galbraith
@ 2016-09-05  8:31         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 8+ messages in thread
From: Benjamin Herrenschmidt @ 2016-09-05  8:31 UTC (permalink / raw)
  To: Mike Galbraith, Peter Zijlstra
  Cc: Balbir Singh, LKML, Oleg Nesterov, Nicholas Piggin, Alexey Kardashevskiy

On Mon, 2016-09-05 at 10:24 +0200, Mike Galbraith wrote:
> > > Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > > 
> > > > Signed-off-by: Balbir Singh <bsingharora@gmail.com>
> > > > ---
> 
> ... no Cc: @stable?  Sounds plenty nasty enough to be worthy.

Right, we should have it in stable too.

Peter, can you still stick that in ? Otherwise we'll have to fwd it to
stable manually.

Cheers
Ben.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [RESEND][v2][PATCH] Fix a race between try_to_wake_up() and a woken up task
  2016-09-05  7:48     ` Peter Zijlstra
  2016-09-05  8:24       ` Mike Galbraith
@ 2016-09-05  8:37       ` Balbir Singh
  1 sibling, 0 replies; 8+ messages in thread
From: Balbir Singh @ 2016-09-05  8:37 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Benjamin Herrenschmidt, LKML, Oleg Nesterov, Nicholas Piggin,
	Alexey Kardashevskiy

On Mon, Sep 5, 2016 at 5:48 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Mon, Sep 05, 2016 at 05:14:19PM +1000, Benjamin Herrenschmidt wrote:
>> On Mon, 2016-09-05 at 13:16 +1000, Balbir Singh wrote:
>>
>>  .../...
>> >
>> > Cc: Peter Zijlstra <peterz@infradead.org>
>> > Cc: Nicholas Piggin <npiggin@gmail.com>
>>
>> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>
>> > Signed-off-by: Balbir Singh <bsingharora@gmail.com>
>> > ---
>> >  kernel/sched/core.c | 11 +++++++++++
>> >  1 file changed, 11 insertions(+)
>> >
>> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
>> > index 2a906f2..582c684 100644
>> > --- a/kernel/sched/core.c
>> > +++ b/kernel/sched/core.c
>> > @@ -2016,6 +2016,17 @@ try_to_wake_up(struct task_struct *p, unsigned
>> > int state, int wake_flags)
>> >     success = 1; /* we're going to change ->state */
>> >     cpu = task_cpu(p);
>> >
>> > +   /*
>> > +    * Ensure we see on_rq and p_state consistently
>> > +    *
>> > +    * For example in __rwsem_down_write_failed(), we have
>> > +    *    [S] ->on_rq = 1                           [L] ->state
>> > +    *    MB                                         RMB
>> > +    *    [S] ->state = TASK_UNINTERRUPTIBLE        [L] ->on_rq
>> > +    * In the absence of the RMB p->on_rq can be observed to be 0
>> > +    * and we end up spinning indefinitely in while (p->on_cpu)
>> > +    */
>
> So I did replace that comment with the one I proposed earlier. I checked
> a fair number of architectures and many did not have an obvious barrier
> in switch_to(). So that is not something we can rely on, nor do we need
> to I think.
>

Thanks for the comment edit and thanks for letting us know.

Balbir Singh

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [tip:sched/urgent] sched/core: Fix a race between try_to_wake_up() and a woken up task
  2016-09-05  3:16 ` [RESEND][v2][PATCH] " Balbir Singh
  2016-09-05  7:14   ` Benjamin Herrenschmidt
@ 2016-09-05 11:14   ` tip-bot for Balbir Singh
  1 sibling, 0 replies; 8+ messages in thread
From: tip-bot for Balbir Singh @ 2016-09-05 11:14 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: oleg, linux-kernel, hpa, tglx, npiggin, bsingharora, mingo,
	stable, peterz, aik, nicholas.piggin, torvalds, benh

Commit-ID:  135e8c9250dd5c8c9aae5984fde6f230d0cbfeaf
Gitweb:     http://git.kernel.org/tip/135e8c9250dd5c8c9aae5984fde6f230d0cbfeaf
Author:     Balbir Singh <bsingharora@gmail.com>
AuthorDate: Mon, 5 Sep 2016 13:16:40 +1000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Mon, 5 Sep 2016 11:57:53 +0200

sched/core: Fix a race between try_to_wake_up() and a woken up task

The origin of the issue I've seen is related to
a missing memory barrier between check for task->state and
the check for task->on_rq.

The task being woken up is already awake from a schedule()
and is doing the following:

	do {
		schedule()
		set_current_state(TASK_(UN)INTERRUPTIBLE);
	} while (!cond);

The waker, actually gets stuck doing the following in
try_to_wake_up():

	while (p->on_cpu)
		cpu_relax();

Analysis:

The instance I've seen involves the following race:

 CPU1					CPU2

 while () {
   if (cond)
     break;
   do {
     schedule();
     set_current_state(TASK_UN..)
   } while (!cond);
					wakeup_routine()
					  spin_lock_irqsave(wait_lock)
   raw_spin_lock_irqsave(wait_lock)	  wake_up_process()
 }					  try_to_wake_up()
 set_current_state(TASK_RUNNING);	  ..
 list_del(&waiter.list);

CPU2 wakes up CPU1, but before it can get the wait_lock and set
current state to TASK_RUNNING the following occurs:

 CPU3
 wakeup_routine()
 raw_spin_lock_irqsave(wait_lock)
 if (!list_empty)
   wake_up_process()
   try_to_wake_up()
   raw_spin_lock_irqsave(p->pi_lock)
   ..
   if (p->on_rq && ttwu_wakeup())
   ..
   while (p->on_cpu)
     cpu_relax()
   ..

CPU3 tries to wake up the task on CPU1 again since it finds
it on the wait_queue, CPU1 is spinning on wait_lock, but immediately
after CPU2, CPU3 got it.

CPU3 checks the state of p on CPU1, it is TASK_UNINTERRUPTIBLE and
the task is spinning on the wait_lock. Interestingly since p->on_rq
is checked under pi_lock, I've noticed that try_to_wake_up() finds
p->on_rq to be 0. This was the most confusing bit of the analysis,
but p->on_rq is changed under runqueue lock, rq_lock, the p->on_rq
check is not reliable without this fix IMHO. The race is visible
(based on the analysis) only when ttwu_queue() does a remote wakeup
via ttwu_queue_remote. In which case the p->on_rq change is not
done uder the pi_lock.

The result is that after a while the entire system locks up on
the raw_spin_irqlock_save(wait_lock) and the holder spins infintely

Reproduction of the issue:

The issue can be reproduced after a long run on my system with 80
threads and having to tweak available memory to very low and running
memory stress-ng mmapfork test. It usually takes a long time to
reproduce. I am trying to work on a test case that can reproduce
the issue faster, but thats work in progress. I am still testing the
changes on my still in a loop and the tests seem OK thus far.

Big thanks to Benjamin and Nick for helping debug this as well.
Ben helped catch the missing barrier, Nick caught every missing
bit in my theory.

Signed-off-by: Balbir Singh <bsingharora@gmail.com>
[ Updated comment to clarify matching barriers. Many
  architectures do not have a full barrier in switch_to()
  so that cannot be relied upon. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nicholas Piggin <nicholas.piggin@gmail.com>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/e02cce7b-d9ca-1ad0-7a61-ea97c7582b37@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2a906f2..44817c6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2016,6 +2016,28 @@ try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
 	success = 1; /* we're going to change ->state */
 	cpu = task_cpu(p);
 
+	/*
+	 * Ensure we load p->on_rq _after_ p->state, otherwise it would
+	 * be possible to, falsely, observe p->on_rq == 0 and get stuck
+	 * in smp_cond_load_acquire() below.
+	 *
+	 * sched_ttwu_pending()                 try_to_wake_up()
+	 *   [S] p->on_rq = 1;                  [L] P->state
+	 *       UNLOCK rq->lock  -----.
+	 *                              \
+	 *				 +---   RMB
+	 * schedule()                   /
+	 *       LOCK rq->lock    -----'
+	 *       UNLOCK rq->lock
+	 *
+	 * [task p]
+	 *   [S] p->state = UNINTERRUPTIBLE     [L] p->on_rq
+	 *
+	 * Pairs with the UNLOCK+LOCK on rq->lock from the
+	 * last wakeup of our task and the schedule that got our task
+	 * current.
+	 */
+	smp_rmb();
 	if (p->on_rq && ttwu_remote(p, wake_flags))
 		goto stat;
 

^ permalink raw reply related	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-09-05 11:15 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-31 10:08 [v2][PATCH] Fix a race between try_to_wake_up() and a woken up task Balbir Singh
2016-09-05  3:16 ` [RESEND][v2][PATCH] " Balbir Singh
2016-09-05  7:14   ` Benjamin Herrenschmidt
2016-09-05  7:48     ` Peter Zijlstra
2016-09-05  8:24       ` Mike Galbraith
2016-09-05  8:31         ` Benjamin Herrenschmidt
2016-09-05  8:37       ` Balbir Singh
2016-09-05 11:14   ` [tip:sched/urgent] sched/core: " tip-bot for Balbir Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).