[PATCH 0/2] sched: Fix a race between CPU onlining & user task scheduling

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] sched: Fix a race between CPU onlining & user task scheduling
@ 2018-05-26 15:46 Paul Burton
  2018-05-26 15:46 ` [PATCH 1/2] sched: Make select_task_rq() require cpu_active() for user tasks Paul Burton
  2018-05-26 15:46 ` [PATCH 2/2] sched: Warn if we fail to migrate a task Paul Burton
  0 siblings, 2 replies; 11+ messages in thread
From: Paul Burton @ 2018-05-26 15:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Paul Burton

This short series fixes a race condition I've observed in which user
tasks can be scheduled on CPUs that are still in the process of being
brought online (or rather being made active) by hotplug.

I've created a small test case for the observed problem here:

  https://gist.github.com/paulburton/25187c4f537263a6be9c8aac67bd33bf

The issue was first observed on a MIPS system running a test program
that was sensitive to CPU affinity, but has been reproduced on x86_64
using the above program too. On my laptop with an Intel i7-5600U CPU the
test tends to fail within 10 minutes, but with this series applied runs
cleanly overnight.

Thanks,
    Paul

Paul Burton (2):
  sched: Make select_task_rq() require cpu_active() for user tasks
  sched: Warn if we fail to migrate a task

 kernel/sched/core.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

-- 
2.17.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/2] sched: Make select_task_rq() require cpu_active() for user tasks
  2018-05-26 15:46 [PATCH 0/2] sched: Fix a race between CPU onlining & user task scheduling Paul Burton
@ 2018-05-26 15:46 ` Paul Burton
  2018-05-28 14:49   ` Peter Zijlstra
  2018-05-31 12:28   ` [tip:sched/urgent] sched/core: Require cpu_active() in select_task_rq(), " tip-bot for Paul Burton
  2018-05-26 15:46 ` [PATCH 2/2] sched: Warn if we fail to migrate a task Paul Burton
  1 sibling, 2 replies; 11+ messages in thread
From: Paul Burton @ 2018-05-26 15:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Paul Burton

select_task_rq() is used in a few paths to select the CPU upon which a
thread should be run - for example it is used by try_to_wake_up() & by
fork or exec balancing. As-is it allows use of any online CPU that is
present in the task's cpus_allowed mask.

This presents a problem because there is a period whilst CPUs are
brought online where a CPU is marked online, but is not yet fully
initialized - ie. the period where CPUHP_AP_ONLINE_IDLE <= state <
CPUHP_ONLINE. Usually we don't run any user tasks during this window,
but there are corner cases where this can happen. An example observed
is:

  - Some user task A, running on CPU X, forks to create task B.

  - sched_fork() calls __set_task_cpu() with cpu=X, setting task B's
    struct task_struct cpu field to X.

  - CPU X is offlined.

  - Task A, currently somewhere between the __set_task_cpu() in
    copy_process() and the call to wake_up_new_task(), is migrated to
    CPU Y by migrate_tasks() when CPU X is offlined.

  - CPU X is onlined, but still in the CPUHP_AP_ONLINE_IDLE state. The
    scheduler is now active on CPU X, but there are no user tasks on
    the runqueue.

  - Task A runs on CPU Y & reaches wake_up_new_task(). This calls
    select_task_rq() with cpu=X, taken from task B's struct task_struct,
    and select_task_rq() allows cpu X to be returned.

  - Task A enqueues task B on CPU X's runqueue, via activate_task() &
    enqueue_task().

  - CPU X now has a user task on its runqueue before it has reached the
    CPUHP_ONLINE state.

In most cases, the user tasks that schedule on the newly onlined CPU
have no idea that anything went wrong, but one case observed to be
problematic is if the task goes on to invoke the sched_setaffinity
syscall. The newly onlined CPU reaches the CPUHP_AP_ONLINE_IDLE state
before the CPU that brought it online calls stop_machine_unpark(). This
means that for a portion of the window of time between
CPUHP_AP_ONLINE_IDLE & CPUHP_ONLINE the newly onlined CPU's struct
cpu_stopper has its enabled field set to false. If a user thread is
executed on the CPU during this window and it invokes sched_setaffinity
with a CPU mask that does not include the CPU it's running on, then when
__set_cpus_allowed_ptr() calls stop_one_cpu() intending to invoke
migration_cpu_stop() and perform the actual migration away from the CPU
it will simply return -ENOENT rather than calling migration_cpu_stop().
We then return from the sched_setaffinity syscall back to the user task
that is now running on a CPU which it just asked not to run on, and
which is not present in its cpus_allowed mask.

This patch resolves the problem by having select_task_rq() enforce that
user tasks run on CPUs that are active - the same requirement that
select_fallback_rq() already enforces. This should ensure that newly
onlined CPUs reach the CPUHP_AP_ACTIVE state before being able to
schedule user tasks, and also implies that bringup_wait_for_ap() will
have called stop_machine_unpark() which resolves the sched_setaffinity
issue above.

I haven't yet investigated them, but it may be of interest to review
whether any of the actions performed by hotplug states between
CPUHP_AP_ONLINE_IDLE & CPUHP_AP_ACTIVE could have similar unintended
effects on user tasks that might schedule before they are reached, which
might widen the scope of the problem from just affecting the behaviour
of sched_setaffinity.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org

---
As described in the series' cover letter I've created a small test case
for the observed problem here:

  https://gist.github.com/paulburton/25187c4f537263a6be9c8aac67bd33bf

The issue was first observed on a MIPS system running a test program
that was sensitive to CPU affinity, but has been reproduced on x86_64
using the above program too. On my laptop with an Intel i7-5600U CPU the
test tends to fail within 10 minutes, but with this series applied runs
cleanly overnight.

 kernel/sched/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a7bf32aabfda..2380bc228dd0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1565,7 +1565,8 @@ int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags)
 	 *   not worry about this generic constraint ]
 	 */
 	if (unlikely(!cpumask_test_cpu(cpu, &p->cpus_allowed) ||
-		     !cpu_online(cpu)))
+		     !cpu_online(cpu) ||
+		     (!cpu_active(cpu) && !(p->flags & PF_KTHREAD))))
 		cpu = select_fallback_rq(task_cpu(p), p);

 	return cpu;
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/2] sched: Warn if we fail to migrate a task
  2018-05-26 15:46 [PATCH 0/2] sched: Fix a race between CPU onlining & user task scheduling Paul Burton
  2018-05-26 15:46 ` [PATCH 1/2] sched: Make select_task_rq() require cpu_active() for user tasks Paul Burton
@ 2018-05-26 15:46 ` Paul Burton
  2018-05-28 15:06   ` Peter Zijlstra
  1 sibling, 1 reply; 11+ messages in thread
From: Paul Burton @ 2018-05-26 15:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Peter Zijlstra, Thomas Gleixner, Ingo Molnar, Paul Burton

__set_cpus_allowed_ptr() makes use of stop_one_cpu() to call
migration_cpu_stop() in order to perform migration of a task away from
the CPU it's currently running on. If all is as expected then this
shouldn't fail, but as the preceding patch shows it's possible for this
assumption to be broken fairly subtly.

Add a warning to ensure that if stop_one_cpu() returns an error
(-ENOENT is the only one it can currently return) then we warn about it
in the kernel log, since this isn't expected to happen.

I considered propogating the error upwards, but this would require a
change to the return values allowed from the sched_setaffinity() syscall
and would require that user programs handle errors other than those
caused by their own bad input to the syscall. So for now this patch
simply warns, which is an improvement over the silent error & incorrect
scheduling we had before.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org

---

 kernel/sched/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2380bc228dd0..cda3affd45b7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1127,7 +1127,8 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
 		struct migration_arg arg = { p, dest_cpu };
 		/* Need help from migration thread: drop lock and wait. */
 		task_rq_unlock(rq, p, &rf);
-		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
+		ret = stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
+		WARN_ON(ret);
 		tlb_migrate_finish(p->mm);
 		return 0;
 	} else if (task_on_rq_queued(p)) {
-- 
2.17.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] sched: Make select_task_rq() require cpu_active() for user tasks
  2018-05-26 15:46 ` [PATCH 1/2] sched: Make select_task_rq() require cpu_active() for user tasks Paul Burton
@ 2018-05-28 14:49   ` Peter Zijlstra
  2018-05-28 15:45     ` Paul Burton
  2018-05-31 12:28   ` [tip:sched/urgent] sched/core: Require cpu_active() in select_task_rq(), " tip-bot for Paul Burton
  1 sibling, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2018-05-28 14:49 UTC (permalink / raw)
  To: Paul Burton
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Paul McKenney, Tejun Heo

On Sat, May 26, 2018 at 08:46:47AM -0700, Paul Burton wrote:

> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1565,7 +1565,8 @@ int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags)
>  	 *   not worry about this generic constraint ]
>  	 */
>  	if (unlikely(!cpumask_test_cpu(cpu, &p->cpus_allowed) ||
> -		     !cpu_online(cpu)))
> +		     !cpu_online(cpu) ||
> +		     (!cpu_active(cpu) && !(p->flags & PF_KTHREAD))))
>  		cpu = select_fallback_rq(task_cpu(p), p);

That is not quite right.. and I find that the wrong patch:

  955dbdf4ce87 ("sched: Allow migrating kthreads into online but inactive CPUs")

got merged over my suggested alternative :-(

  http://lkml.kernel.org/r/20170725165821.cejhb7v2s3kecems@hirez.programming.kicks-ass.net

So, lets first fix that, and then your patch becomes something like the
below I think.


--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1562,7 +1562,7 @@ int select_task_rq(struct task_struct *p
 	 *   not worry about this generic constraint ]
 	 */
 	if (unlikely(!cpumask_test_cpu(cpu, &p->cpus_allowed) ||
-		     !cpu_online(cpu)))
+		     (is_per_cpu_kthread(p) ? !cpu_online(cpu) : !cpu_active(cpu)))
 		cpu = select_fallback_rq(task_cpu(p), p);
 
 	return cpu;

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] sched: Warn if we fail to migrate a task
  2018-05-26 15:46 ` [PATCH 2/2] sched: Warn if we fail to migrate a task Paul Burton
@ 2018-05-28 15:06   ` Peter Zijlstra
  2018-05-28 15:59     ` Paul Burton
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2018-05-28 15:06 UTC (permalink / raw)
  To: Paul Burton; +Cc: linux-kernel, Thomas Gleixner, Ingo Molnar

On Sat, May 26, 2018 at 08:46:48AM -0700, Paul Burton wrote:
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 2380bc228dd0..cda3affd45b7 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1127,7 +1127,8 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
>  		struct migration_arg arg = { p, dest_cpu };
>  		/* Need help from migration thread: drop lock and wait. */
>  		task_rq_unlock(rq, p, &rf);
> -		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
> +		ret = stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
> +		WARN_ON(ret);
>  		tlb_migrate_finish(p->mm);
>  		return 0;
>  	} else if (task_on_rq_queued(p)) {

I think we can trigger this at will.. Set affinity to the CPU you're
going to take offline and offline concurrently.

It is possible for the offline to happen between task_rq_unlock() and
stop_one_cpu(), at which point the WARM will then trigger.

The point is; and maybe this should be a comment somewhere; that if this
fails, there is nothing we can do about it, and it should be fixed up by
migrate_tasks()/select_task_rq().

There is no point in propagating the error to userspace, since if we'd
have slightly different timing and completed the stop_one_cpu() before
the hot-un-plug, migrate_tasks()/select_task_rq() would've had to fix up
anyway.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] sched: Make select_task_rq() require cpu_active() for user tasks
  2018-05-28 14:49   ` Peter Zijlstra
@ 2018-05-28 15:45     ` Paul Burton
  2018-05-28 15:53       ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Burton @ 2018-05-28 15:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Paul McKenney, Tejun Heo

Hi Peter,

On Mon, May 28, 2018 at 04:49:24PM +0200, Peter Zijlstra wrote:
> On Sat, May 26, 2018 at 08:46:47AM -0700, Paul Burton wrote:
> 
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1565,7 +1565,8 @@ int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags)
> >  	 *   not worry about this generic constraint ]
> >  	 */
> >  	if (unlikely(!cpumask_test_cpu(cpu, &p->cpus_allowed) ||
> > -		     !cpu_online(cpu)))
> > +		     !cpu_online(cpu) ||
> > +		     (!cpu_active(cpu) && !(p->flags & PF_KTHREAD))))
> >  		cpu = select_fallback_rq(task_cpu(p), p);
> 
> That is not quite right.. and I find that the wrong patch:
> 
>   955dbdf4ce87 ("sched: Allow migrating kthreads into online but inactive CPUs")
> 
> got merged over my suggested alternative :-(
> 
>   http://lkml.kernel.org/r/20170725165821.cejhb7v2s3kecems@hirez.programming.kicks-ass.net
> 
> So, lets first fix that

Thanks for the link - just knowing that the intention is that only
per-CPU kthreads are allowed on !active CPUs is useful.

> , and then your patch becomes something like the below I think.
> 
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1562,7 +1562,7 @@ int select_task_rq(struct task_struct *p
>  	 *   not worry about this generic constraint ]
>  	 */
>  	if (unlikely(!cpumask_test_cpu(cpu, &p->cpus_allowed) ||
> -		     !cpu_online(cpu)))
> +		     (is_per_cpu_kthread(p) ? !cpu_online(cpu) : !cpu_active(cpu)))
>  		cpu = select_fallback_rq(task_cpu(p), p);
>  
>  	return cpu;

Yes this looks good to me.

Are you planning to submit your change to introduce
is_per_cpu_kthread(), or shall I?

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] sched: Make select_task_rq() require cpu_active() for user tasks
  2018-05-28 15:45     ` Paul Burton
@ 2018-05-28 15:53       ` Peter Zijlstra
  2018-05-28 16:10         ` Paul Burton
  0 siblings, 1 reply; 11+ messages in thread
From: Peter Zijlstra @ 2018-05-28 15:53 UTC (permalink / raw)
  To: Paul Burton
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Paul McKenney, Tejun Heo

On Mon, May 28, 2018 at 08:45:16AM -0700, Paul Burton wrote:
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1562,7 +1562,7 @@ int select_task_rq(struct task_struct *p
> >  	 *   not worry about this generic constraint ]
> >  	 */
> >  	if (unlikely(!cpumask_test_cpu(cpu, &p->cpus_allowed) ||
> > -		     !cpu_online(cpu)))
> > +		     (is_per_cpu_kthread(p) ? !cpu_online(cpu) : !cpu_active(cpu)))
> >  		cpu = select_fallback_rq(task_cpu(p), p);
> >  
> >  	return cpu;
> 
> Yes this looks good to me.
> 
> Are you planning to submit your change to introduce
> is_per_cpu_kthread(), or shall I?

I've got the lot; I just need to write a better changelog for my old
(now rebased) patch and make a small note in your patch, but will feed
them to Ingo when done.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] sched: Warn if we fail to migrate a task
  2018-05-28 15:06   ` Peter Zijlstra
@ 2018-05-28 15:59     ` Paul Burton
  2018-05-29 10:40       ` Peter Zijlstra
  0 siblings, 1 reply; 11+ messages in thread
From: Paul Burton @ 2018-05-28 15:59 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Thomas Gleixner, Ingo Molnar

Hi Peter,

On Mon, May 28, 2018 at 05:06:56PM +0200, Peter Zijlstra wrote:
> On Sat, May 26, 2018 at 08:46:48AM -0700, Paul Burton wrote:
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index 2380bc228dd0..cda3affd45b7 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -1127,7 +1127,8 @@ static int __set_cpus_allowed_ptr(struct task_struct *p,
> >  		struct migration_arg arg = { p, dest_cpu };
> >  		/* Need help from migration thread: drop lock and wait. */
> >  		task_rq_unlock(rq, p, &rf);
> > -		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
> > +		ret = stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
> > +		WARN_ON(ret);
> >  		tlb_migrate_finish(p->mm);
> >  		return 0;
> >  	} else if (task_on_rq_queued(p)) {
> 
> I think we can trigger this at will.. Set affinity to the CPU you're
> going to take offline and offline concurrently.
> 
> It is possible for the offline to happen between task_rq_unlock() and
> stop_one_cpu(), at which point the WARM will then trigger.

Right, good point. The problem is only if stop_one_cpu() returns an
error whilst cpu_of(rq) is being brought online, not whilst it's being
offlined.

> The point is; and maybe this should be a comment somewhere; that if this
> fails, there is nothing we can do about it, and it should be fixed up by
> migrate_tasks()/select_task_rq().
> 
> There is no point in propagating the error to userspace, since if we'd
> have slightly different timing and completed the stop_one_cpu() before
> the hot-un-plug, migrate_tasks()/select_task_rq() would've had to fix up
> anyway.

I agree userspace shouldn't need to care about this but in my case
(using the test program I linked from the previous patch) this triggers
whilst the CPU is being brought online, not taken offline. That means
migrate_tasks() is not involved, and we actually just return from here
back out from a sched_setaffinity syscall & continue running the user
task on a CPU that is no longer present in the task's cpus_allowed.

I can't think of a good qualifier to limit the warning to only trigger
in that scenario though, so in reality perhaps we're best to just trust
that with patch 1 applied the problem will go away.

Thanks,
    Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] sched: Make select_task_rq() require cpu_active() for user tasks
  2018-05-28 15:53       ` Peter Zijlstra
@ 2018-05-28 16:10         ` Paul Burton
  0 siblings, 0 replies; 11+ messages in thread
From: Paul Burton @ 2018-05-28 16:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Paul McKenney, Tejun Heo

Hi Peter,

On Mon, May 28, 2018 at 05:53:06PM +0200, Peter Zijlstra wrote:
> On Mon, May 28, 2018 at 08:45:16AM -0700, Paul Burton wrote:
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -1562,7 +1562,7 @@ int select_task_rq(struct task_struct *p
> > >  	 *   not worry about this generic constraint ]
> > >  	 */
> > >  	if (unlikely(!cpumask_test_cpu(cpu, &p->cpus_allowed) ||
> > > -		     !cpu_online(cpu)))
> > > +		     (is_per_cpu_kthread(p) ? !cpu_online(cpu) : !cpu_active(cpu)))
> > >  		cpu = select_fallback_rq(task_cpu(p), p);
> > >  
> > >  	return cpu;
> > 
> > Yes this looks good to me.
> > 
> > Are you planning to submit your change to introduce
> > is_per_cpu_kthread(), or shall I?
> 
> I've got the lot; I just need to write a better changelog for my old
> (now rebased) patch and make a small note in your patch, but will feed
> them to Ingo when done.

Thanks a bunch!

Paul

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] sched: Warn if we fail to migrate a task
  2018-05-28 15:59     ` Paul Burton
@ 2018-05-29 10:40       ` Peter Zijlstra
  0 siblings, 0 replies; 11+ messages in thread
From: Peter Zijlstra @ 2018-05-29 10:40 UTC (permalink / raw)
  To: Paul Burton; +Cc: linux-kernel, Thomas Gleixner, Ingo Molnar

On Mon, May 28, 2018 at 08:59:32AM -0700, Paul Burton wrote:

> I agree userspace shouldn't need to care about this but in my case
> (using the test program I linked from the previous patch) this triggers
> whilst the CPU is being brought online, not taken offline. That means
> migrate_tasks() is not involved, and we actually just return from here
> back out from a sched_setaffinity syscall & continue running the user
> task on a CPU that is no longer present in the task's cpus_allowed.
> 
> I can't think of a good qualifier to limit the warning to only trigger
> in that scenario though, so in reality perhaps we're best to just trust
> that with patch 1 applied the problem will go away.

Yeah, I'm struggling too.. re-taking task_rq_lock and testing if
task_cpu(p) is inside it's own cpus_allowed (which at that time might be
different from new_mask) might be the best we can do, but it is fairly
expensive for a sanity check.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tip:sched/urgent] sched/core: Require cpu_active() in select_task_rq(), for user tasks
  2018-05-26 15:46 ` [PATCH 1/2] sched: Make select_task_rq() require cpu_active() for user tasks Paul Burton
  2018-05-28 14:49   ` Peter Zijlstra
@ 2018-05-31 12:28   ` tip-bot for Paul Burton
  1 sibling, 0 replies; 11+ messages in thread
From: tip-bot for Paul Burton @ 2018-05-31 12:28 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: peterz, hpa, torvalds, paul.burton, tglx, linux-kernel, mingo

Commit-ID:  7af443ee1697607541c6346c87385adab2214743
Gitweb:     https://git.kernel.org/tip/7af443ee1697607541c6346c87385adab2214743
Author:     Paul Burton <paul.burton@mips.com>
AuthorDate: Sat, 26 May 2018 08:46:47 -0700
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 31 May 2018 12:24:25 +0200

sched/core: Require cpu_active() in select_task_rq(), for user tasks

select_task_rq() is used in a few paths to select the CPU upon which a
thread should be run - for example it is used by try_to_wake_up() & by
fork or exec balancing. As-is it allows use of any online CPU that is
present in the task's cpus_allowed mask.

This presents a problem because there is a period whilst CPUs are
brought online where a CPU is marked online, but is not yet fully
initialized - ie. the period where CPUHP_AP_ONLINE_IDLE <= state <
CPUHP_ONLINE. Usually we don't run any user tasks during this window,
but there are corner cases where this can happen. An example observed
is:

  - Some user task A, running on CPU X, forks to create task B.

  - sched_fork() calls __set_task_cpu() with cpu=X, setting task B's
    task_struct::cpu field to X.

  - CPU X is offlined.

  - Task A, currently somewhere between the __set_task_cpu() in
    copy_process() and the call to wake_up_new_task(), is migrated to
    CPU Y by migrate_tasks() when CPU X is offlined.

  - CPU X is onlined, but still in the CPUHP_AP_ONLINE_IDLE state. The
    scheduler is now active on CPU X, but there are no user tasks on
    the runqueue.

  - Task A runs on CPU Y & reaches wake_up_new_task(). This calls
    select_task_rq() with cpu=X, taken from task B's task_struct,
    and select_task_rq() allows CPU X to be returned.

  - Task A enqueues task B on CPU X's runqueue, via activate_task() &
    enqueue_task().

  - CPU X now has a user task on its runqueue before it has reached the
    CPUHP_ONLINE state.

In most cases, the user tasks that schedule on the newly onlined CPU
have no idea that anything went wrong, but one case observed to be
problematic is if the task goes on to invoke the sched_setaffinity
syscall. The newly onlined CPU reaches the CPUHP_AP_ONLINE_IDLE state
before the CPU that brought it online calls stop_machine_unpark(). This
means that for a portion of the window of time between
CPUHP_AP_ONLINE_IDLE & CPUHP_ONLINE the newly onlined CPU's struct
cpu_stopper has its enabled field set to false. If a user thread is
executed on the CPU during this window and it invokes sched_setaffinity
with a CPU mask that does not include the CPU it's running on, then when
__set_cpus_allowed_ptr() calls stop_one_cpu() intending to invoke
migration_cpu_stop() and perform the actual migration away from the CPU
it will simply return -ENOENT rather than calling migration_cpu_stop().
We then return from the sched_setaffinity syscall back to the user task
that is now running on a CPU which it just asked not to run on, and
which is not present in its cpus_allowed mask.

This patch resolves the problem by having select_task_rq() enforce that
user tasks run on CPUs that are active - the same requirement that
select_fallback_rq() already enforces. This should ensure that newly
onlined CPUs reach the CPUHP_AP_ACTIVE state before being able to
schedule user tasks, and also implies that bringup_wait_for_ap() will
have called stop_machine_unpark() which resolves the sched_setaffinity
issue above.

I haven't yet investigated them, but it may be of interest to review
whether any of the actions performed by hotplug states between
CPUHP_AP_ONLINE_IDLE & CPUHP_AP_ACTIVE could have similar unintended
effects on user tasks that might schedule before they are reached, which
might widen the scope of the problem from just affecting the behaviour
of sched_setaffinity.

Signed-off-by: Paul Burton <paul.burton@mips.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180526154648.11635-2-paul.burton@mips.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1c58f54b9114..211890edf37e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1560,8 +1560,7 @@ int select_task_rq(struct task_struct *p, int cpu, int sd_flags, int wake_flags)
 	 * [ this allows ->select_task() to simply return task_cpu(p) and
 	 *   not worry about this generic constraint ]
 	 */
-	if (unlikely(!cpumask_test_cpu(cpu, &p->cpus_allowed) ||
-		     !cpu_online(cpu)))
+	if (unlikely(!is_cpu_allowed(p, cpu)))
 		cpu = select_fallback_rq(task_cpu(p), p);

 	return cpu;

^ permalink raw reply related	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-05-31 12:29 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-05-26 15:46 [PATCH 0/2] sched: Fix a race between CPU onlining & user task scheduling Paul Burton
2018-05-26 15:46 ` [PATCH 1/2] sched: Make select_task_rq() require cpu_active() for user tasks Paul Burton
2018-05-28 14:49   ` Peter Zijlstra
2018-05-28 15:45     ` Paul Burton
2018-05-28 15:53       ` Peter Zijlstra
2018-05-28 16:10         ` Paul Burton
2018-05-31 12:28   ` [tip:sched/urgent] sched/core: Require cpu_active() in select_task_rq(), " tip-bot for Paul Burton
2018-05-26 15:46 ` [PATCH 2/2] sched: Warn if we fail to migrate a task Paul Burton
2018-05-28 15:06   ` Peter Zijlstra
2018-05-28 15:59     ` Paul Burton
2018-05-29 10:40       ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.