All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] sched: Fix affine_move_task() wreckage
@ 2021-02-24 12:24 Peter Zijlstra
  2021-02-24 12:24 ` [PATCH 1/6] sched: Fix migration_cpu_stop() requeueing Peter Zijlstra
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Peter Zijlstra @ 2021-02-24 12:24 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner
  Cc: Valentin Schneider, Vincent Guittot, Mel Gorman,
	Dietmar Eggemann, linux-kernel, peterz, Andi Kleen

Hi!

The long and short of it is that commit 6d337eab041d ("sched: Fix
migrate_disable() vs set_cpus_allowed_ptr()") is utterly wrecked and it is a
miracle it doesn't insta explode for anybody (else).

The longer story is that after some initial confusion and tracing I found the
first problem and send (patch #1):

  https://lkml.kernel.org/r/YCfLHxpL+L0BYEyG@hirez.programming.kicks-ass.net

and was hoping that was the end of it (ha!). Obviously the one machine that did
manage to trigger this instantly found the next problem, now addressed in patch
#5.

The even longer story is that Monday last I sat down with a large piece of
(virtual) paper, basically threw the entire affine_move_task() /
migration_cpu_stop() logic out and while doodling re-implemented it all.

The difficult machine was happy on the second try after that.

Ofcourse, at that point I had a single huge rewrite of commit 6d337eab041d, and
I pondered sending it like that. However I figured that for review and
posterity it might be easier/better to do smaller steps. So today I reverse
engineerd a possible logical path between the two states.

I'm hoping nothing got wrecked while doing the cleanups :-)

Patches also in:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/urgent



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/6] sched: Fix migration_cpu_stop() requeueing
  2021-02-24 12:24 [PATCH 0/6] sched: Fix affine_move_task() wreckage Peter Zijlstra
@ 2021-02-24 12:24 ` Peter Zijlstra
  2021-03-01 10:16   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
  2021-03-06 11:42   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2021-02-24 12:24 ` [PATCH 2/6] sched: Simplify migration_cpu_stop() Peter Zijlstra
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2021-02-24 12:24 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner
  Cc: Valentin Schneider, Vincent Guittot, Mel Gorman,
	Dietmar Eggemann, linux-kernel, peterz, Andi Kleen

When affine_move_task(p) is called on a running task @p, which is not
otherwise already changing affinity, we'll first set
p->migration_pending and then do:

	 stop_one_cpu(cpu_of_rq(rq), migration_cpu_stop, &arg);

This then gets us to migration_cpu_stop() running on the CPU that was
previously running our victim task @p.

If we find that our task is no longer on that runqueue (this can
happen because of a concurrent migration due to load-balance etc.),
then we'll end up at the:

	} else if (dest_cpu < 1 || pending) {

branch. Which we'll take because we set pending earlier. Here we first
check if the task @p has already satisfied the affinity constraints,
if so we bail early [A]. Otherwise we'll reissue migration_cpu_stop()
onto the CPU that is now hosting our task @p:

	stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
			    &pending->arg, &pending->stop_work);

Except, we've never initialized pending->arg, which will be all 0s.

This then results in running migration_cpu_stop() on the next CPU with
arg->p == NULL, which gives the by now obvious result of fireworks.

The cure is to change affine_move_task() to always use pending->arg,
furthermore we can use the exact same pattern as the
SCA_MIGRATE_ENABLE case, since we'll block on the pending->done
completion anyway, no point in adding yet another completion in
stop_one_cpu().

This then gives a clear distinction between the two
migration_cpu_stop() use cases:

  - sched_exec() / migrate_task_to() : arg->pending == NULL
  - affine_move_task() : arg->pending != NULL;

And we can have it ignore p->migration_pending when !arg->pending. Any
stop work from sched_exec() / migrate_task_to() is in addition to stop
works from affine_move_task(), which will be sufficient to issue the
completion.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c |   39 ++++++++++++++++++++++++++++-----------
 1 file changed, 28 insertions(+), 11 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1922,6 +1922,24 @@ static int migration_cpu_stop(void *data
 	rq_lock(rq, &rf);
 
 	pending = p->migration_pending;
+	if (pending && !arg->pending) {
+		/*
+		 * This happens from sched_exec() and migrate_task_to(),
+		 * neither of them care about pending and just want a task to
+		 * maybe move about.
+		 *
+		 * Even if there is a pending, we can ignore it, since
+		 * affine_move_task() will have it's own stop_work's in flight
+		 * which will manage the completion.
+		 *
+		 * Notably, pending doesn't need to match arg->pending. This can
+		 * happen when tripple concurrent affine_move_task() first sets
+		 * pending, then clears pending and eventually sets another
+		 * pending.
+		 */
+		pending = NULL;
+	}
+
 	/*
 	 * If task_rq(p) != rq, it cannot be migrated here, because we're
 	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
@@ -2194,10 +2212,6 @@ static int affine_move_task(struct rq *r
 			    int dest_cpu, unsigned int flags)
 {
 	struct set_affinity_pending my_pending = { }, *pending = NULL;
-	struct migration_arg arg = {
-		.task = p,
-		.dest_cpu = dest_cpu,
-	};
 	bool complete = false;
 
 	/* Can the task run on the task's current CPU? If so, we're done */
@@ -2235,6 +2249,12 @@ static int affine_move_task(struct rq *r
 			/* Install the request */
 			refcount_set(&my_pending.refs, 1);
 			init_completion(&my_pending.done);
+			my_pending.arg = (struct migration_arg) {
+				.task = p,
+				.dest_cpu = -1,		/* any */
+				.pending = &my_pending,
+			};
+
 			p->migration_pending = &my_pending;
 		} else {
 			pending = p->migration_pending;
@@ -2265,12 +2285,6 @@ static int affine_move_task(struct rq *r
 		p->migration_flags &= ~MDF_PUSH;
 		task_rq_unlock(rq, p, rf);
 
-		pending->arg = (struct migration_arg) {
-			.task = p,
-			.dest_cpu = -1,
-			.pending = pending,
-		};
-
 		stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
 				    &pending->arg, &pending->stop_work);
 
@@ -2283,8 +2297,11 @@ static int affine_move_task(struct rq *r
 		 * is_migration_disabled(p) checks to the stopper, which will
 		 * run on the same CPU as said p.
 		 */
+		refcount_inc(&pending->refs); /* pending->{arg,stop_work} */
 		task_rq_unlock(rq, p, rf);
-		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
+
+		stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
+				    &pending->arg, &pending->stop_work);
 
 	} else {
 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 2/6] sched: Simplify migration_cpu_stop()
  2021-02-24 12:24 [PATCH 0/6] sched: Fix affine_move_task() wreckage Peter Zijlstra
  2021-02-24 12:24 ` [PATCH 1/6] sched: Fix migration_cpu_stop() requeueing Peter Zijlstra
@ 2021-02-24 12:24 ` Peter Zijlstra
  2021-02-24 15:34   ` Valentin Schneider
                     ` (2 more replies)
  2021-02-24 12:24 ` [PATCH 3/6] sched: Collate affine_move_task() stoppers Peter Zijlstra
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 27+ messages in thread
From: Peter Zijlstra @ 2021-02-24 12:24 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner
  Cc: Valentin Schneider, Vincent Guittot, Mel Gorman,
	Dietmar Eggemann, linux-kernel, peterz, Andi Kleen

When affine_move_task() issues a migration_cpu_stop(), the purpose of
that function is to complete that @pending, not any random other
p->migration_pending that might have gotten installed since.

This realization much simplifies migration_cpu_stop() and allows
further necessary steps to fix all this as it provides the guarantee
that @pending's stopper will complete @pending (and not some random
other @pending).

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c |   56 +++++++---------------------------------------------
 1 file changed, 8 insertions(+), 48 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1898,8 +1898,8 @@ static struct rq *__migrate_task(struct
  */
 static int migration_cpu_stop(void *data)
 {
-	struct set_affinity_pending *pending;
 	struct migration_arg *arg = data;
+	struct set_affinity_pending *pending = arg->pending;
 	struct task_struct *p = arg->task;
 	int dest_cpu = arg->dest_cpu;
 	struct rq *rq = this_rq();
@@ -1921,25 +1921,6 @@ static int migration_cpu_stop(void *data
 	raw_spin_lock(&p->pi_lock);
 	rq_lock(rq, &rf);
 
-	pending = p->migration_pending;
-	if (pending && !arg->pending) {
-		/*
-		 * This happens from sched_exec() and migrate_task_to(),
-		 * neither of them care about pending and just want a task to
-		 * maybe move about.
-		 *
-		 * Even if there is a pending, we can ignore it, since
-		 * affine_move_task() will have it's own stop_work's in flight
-		 * which will manage the completion.
-		 *
-		 * Notably, pending doesn't need to match arg->pending. This can
-		 * happen when tripple concurrent affine_move_task() first sets
-		 * pending, then clears pending and eventually sets another
-		 * pending.
-		 */
-		pending = NULL;
-	}
-
 	/*
 	 * If task_rq(p) != rq, it cannot be migrated here, because we're
 	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
@@ -1950,31 +1931,20 @@ static int migration_cpu_stop(void *data
 			goto out;
 
 		if (pending) {
-			p->migration_pending = NULL;
+			if (p->migration_pending == pending)
+				p->migration_pending = NULL;
 			complete = true;
 		}
 
-		/* migrate_enable() --  we must not race against SCA */
-		if (dest_cpu < 0) {
-			/*
-			 * When this was migrate_enable() but we no longer
-			 * have a @pending, a concurrent SCA 'fixed' things
-			 * and we should be valid again. Nothing to do.
-			 */
-			if (!pending) {
-				WARN_ON_ONCE(!cpumask_test_cpu(task_cpu(p), &p->cpus_mask));
-				goto out;
-			}
-
+		if (dest_cpu < 0)
 			dest_cpu = cpumask_any_distribute(&p->cpus_mask);
-		}
 
 		if (task_on_rq_queued(p))
 			rq = __migrate_task(rq, &rf, p, dest_cpu);
 		else
 			p->wake_cpu = dest_cpu;
 
-	} else if (dest_cpu < 0 || pending) {
+	} else if (pending) {
 		/*
 		 * This happens when we get migrated between migrate_enable()'s
 		 * preempt_enable() and scheduling the stopper task. At that
@@ -1989,23 +1959,14 @@ static int migration_cpu_stop(void *data
 		 * ->pi_lock, so the allowed mask is stable - if it got
 		 * somewhere allowed, we're done.
 		 */
-		if (pending && cpumask_test_cpu(task_cpu(p), p->cpus_ptr)) {
-			p->migration_pending = NULL;
+		if (cpumask_test_cpu(task_cpu(p), p->cpus_ptr)) {
+			if (p->migration_pending == pending)
+				p->migration_pending = NULL;
 			complete = true;
 			goto out;
 		}
 
 		/*
-		 * When this was migrate_enable() but we no longer have an
-		 * @pending, a concurrent SCA 'fixed' things and we should be
-		 * valid again. Nothing to do.
-		 */
-		if (!pending) {
-			WARN_ON_ONCE(!cpumask_test_cpu(task_cpu(p), &p->cpus_mask));
-			goto out;
-		}
-
-		/*
 		 * When migrate_enable() hits a rq mis-match we can't reliably
 		 * determine is_migration_disabled() and so have to chase after
 		 * it.
@@ -2022,7 +1983,6 @@ static int migration_cpu_stop(void *data
 		complete_all(&pending->done);
 
 	/* For pending->{arg,stop_work} */
-	pending = arg->pending;
 	if (pending && refcount_dec_and_test(&pending->refs))
 		wake_up_var(&pending->refs);
 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 3/6] sched: Collate affine_move_task() stoppers
  2021-02-24 12:24 [PATCH 0/6] sched: Fix affine_move_task() wreckage Peter Zijlstra
  2021-02-24 12:24 ` [PATCH 1/6] sched: Fix migration_cpu_stop() requeueing Peter Zijlstra
  2021-02-24 12:24 ` [PATCH 2/6] sched: Simplify migration_cpu_stop() Peter Zijlstra
@ 2021-02-24 12:24 ` Peter Zijlstra
  2021-03-01 10:16   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
  2021-03-06 11:42   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2021-02-24 12:24 ` [PATCH 4/6] sched: Optimize migration_cpu_stop() Peter Zijlstra
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2021-02-24 12:24 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner
  Cc: Valentin Schneider, Vincent Guittot, Mel Gorman,
	Dietmar Eggemann, linux-kernel, peterz, Andi Kleen

The SCA_MIGRATE_ENABLE and task_running() cases are almost identical,
collapse them to avoid further duplication.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c |   23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2239,30 +2239,23 @@ static int affine_move_task(struct rq *r
 		return -EINVAL;
 	}
 
-	if (flags & SCA_MIGRATE_ENABLE) {
-
-		refcount_inc(&pending->refs); /* pending->{arg,stop_work} */
-		p->migration_flags &= ~MDF_PUSH;
-		task_rq_unlock(rq, p, rf);
-
-		stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
-				    &pending->arg, &pending->stop_work);
-
-		return 0;
-	}
-
 	if (task_running(rq, p) || p->state == TASK_WAKING) {
 		/*
-		 * Lessen races (and headaches) by delegating
-		 * is_migration_disabled(p) checks to the stopper, which will
-		 * run on the same CPU as said p.
+		 * MIGRATE_ENABLE gets here because 'p == current', but for
+		 * anything else we cannot do is_migration_disabled(), punt
+		 * and have the stopper function handle it all race-free.
 		 */
+
 		refcount_inc(&pending->refs); /* pending->{arg,stop_work} */
+		if (flags & SCA_MIGRATE_ENABLE)
+			p->migration_flags &= ~MDF_PUSH;
 		task_rq_unlock(rq, p, rf);
 
 		stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
 				    &pending->arg, &pending->stop_work);
 
+		if (flags & SCA_MIGRATE_ENABLE)
+			return 0;
 	} else {
 
 		if (!is_migration_disabled(p)) {



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 4/6] sched: Optimize migration_cpu_stop()
  2021-02-24 12:24 [PATCH 0/6] sched: Fix affine_move_task() wreckage Peter Zijlstra
                   ` (2 preceding siblings ...)
  2021-02-24 12:24 ` [PATCH 3/6] sched: Collate affine_move_task() stoppers Peter Zijlstra
@ 2021-02-24 12:24 ` Peter Zijlstra
  2021-03-01 10:16   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
  2021-03-06 11:42   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2021-02-24 12:24 ` [PATCH 5/6] sched: Fix affine_move_task() self-concurrency Peter Zijlstra
  2021-02-24 12:24 ` [PATCH 6/6] sched: Simplify set_affinity_pending refcounts Peter Zijlstra
  5 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2021-02-24 12:24 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner
  Cc: Valentin Schneider, Vincent Guittot, Mel Gorman,
	Dietmar Eggemann, linux-kernel, peterz, Andi Kleen

When the purpose of migration_cpu_stop() is to migrate the task to
'any' valid CPU, don't migrate the task when it's already running on a
valid CPU.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1936,14 +1936,25 @@ static int migration_cpu_stop(void *data
 			complete = true;
 		}
 
-		if (dest_cpu < 0)
+		if (dest_cpu < 0) {
+			if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask))
+				goto out;
+
 			dest_cpu = cpumask_any_distribute(&p->cpus_mask);
+		}
 
 		if (task_on_rq_queued(p))
 			rq = __migrate_task(rq, &rf, p, dest_cpu);
 		else
 			p->wake_cpu = dest_cpu;
 
+		/*
+		 * XXX __migrate_task() can fail, at which point we might end
+		 * up running on a dodgy CPU, AFAICT this can only happen
+		 * during CPU hotplug, at which point we'll get pushed out
+		 * anyway, so it's probably not a big deal.
+		 */
+
 	} else if (pending) {
 		/*
 		 * This happens when we get migrated between migrate_enable()'s



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 5/6] sched: Fix affine_move_task() self-concurrency
  2021-02-24 12:24 [PATCH 0/6] sched: Fix affine_move_task() wreckage Peter Zijlstra
                   ` (3 preceding siblings ...)
  2021-02-24 12:24 ` [PATCH 4/6] sched: Optimize migration_cpu_stop() Peter Zijlstra
@ 2021-02-24 12:24 ` Peter Zijlstra
  2021-03-01 10:16   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
  2021-03-06 11:42   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2021-02-24 12:24 ` [PATCH 6/6] sched: Simplify set_affinity_pending refcounts Peter Zijlstra
  5 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2021-02-24 12:24 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner
  Cc: Valentin Schneider, Vincent Guittot, Mel Gorman,
	Dietmar Eggemann, linux-kernel, peterz, Andi Kleen

Consider:

   sched_setaffinity(p, X);		sched_setaffinity(p, Y);

Then the first will install p->migration_pending = &my_pending; and
issue stop_one_cpu_nowait(pending); and the second one will read
p->migration_pending and _also_ issue: stop_one_cpu_nowait(pending),
the _SAME_ @pending.

This causes stopper list corruption.

Add set_affinity_pending::stop_pending, to indicate if a stopper is in
progress.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1864,6 +1864,7 @@ struct migration_arg {
 
 struct set_affinity_pending {
 	refcount_t		refs;
+	unsigned int		stop_pending;
 	struct completion	done;
 	struct cpu_stop_work	stop_work;
 	struct migration_arg	arg;
@@ -1982,12 +1983,15 @@ static int migration_cpu_stop(void *data
 		 * determine is_migration_disabled() and so have to chase after
 		 * it.
 		 */
+		WARN_ON_ONCE(!pending->stop_pending);
 		task_rq_unlock(rq, p, &rf);
 		stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
 				    &pending->arg, &pending->stop_work);
 		return 0;
 	}
 out:
+	if (pending)
+		pending->stop_pending = false;
 	task_rq_unlock(rq, p, &rf);
 
 	if (complete)
@@ -2183,7 +2187,7 @@ static int affine_move_task(struct rq *r
 			    int dest_cpu, unsigned int flags)
 {
 	struct set_affinity_pending my_pending = { }, *pending = NULL;
-	bool complete = false;
+	bool stop_pending, complete = false;
 
 	/* Can the task run on the task's current CPU? If so, we're done */
 	if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask)) {
@@ -2256,14 +2260,19 @@ static int affine_move_task(struct rq *r
 		 * anything else we cannot do is_migration_disabled(), punt
 		 * and have the stopper function handle it all race-free.
 		 */
+		stop_pending = pending->stop_pending;
+		if (!stop_pending)
+			pending->stop_pending = true;
 
 		refcount_inc(&pending->refs); /* pending->{arg,stop_work} */
 		if (flags & SCA_MIGRATE_ENABLE)
 			p->migration_flags &= ~MDF_PUSH;
 		task_rq_unlock(rq, p, rf);
 
-		stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
-				    &pending->arg, &pending->stop_work);
+		if (!stop_pending) {
+			stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
+					    &pending->arg, &pending->stop_work);
+		}
 
 		if (flags & SCA_MIGRATE_ENABLE)
 			return 0;



^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 6/6] sched: Simplify set_affinity_pending refcounts
  2021-02-24 12:24 [PATCH 0/6] sched: Fix affine_move_task() wreckage Peter Zijlstra
                   ` (4 preceding siblings ...)
  2021-02-24 12:24 ` [PATCH 5/6] sched: Fix affine_move_task() self-concurrency Peter Zijlstra
@ 2021-02-24 12:24 ` Peter Zijlstra
  2021-02-24 15:34   ` Valentin Schneider
                     ` (3 more replies)
  5 siblings, 4 replies; 27+ messages in thread
From: Peter Zijlstra @ 2021-02-24 12:24 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner
  Cc: Valentin Schneider, Vincent Guittot, Mel Gorman,
	Dietmar Eggemann, linux-kernel, peterz, Andi Kleen

Now that we have set_affinity_pending::stop_pending to indicate if a
stopper is in progress, and we have the guarantee that if that stopper
exists, it will (eventually) complete our @pending we can simplify the
refcount scheme by no longer counting the stopper thread.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c |   32 ++++++++++++++++++++------------
 1 file changed, 20 insertions(+), 12 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1862,6 +1862,10 @@ struct migration_arg {
 	struct set_affinity_pending	*pending;
 };
 
+/*
+ * @refs: number of wait_for_completion()
+ * @stop_pending: is @stop_work in use
+ */
 struct set_affinity_pending {
 	refcount_t		refs;
 	unsigned int		stop_pending;
@@ -1997,10 +2001,6 @@ static int migration_cpu_stop(void *data
 	if (complete)
 		complete_all(&pending->done);
 
-	/* For pending->{arg,stop_work} */
-	if (pending && refcount_dec_and_test(&pending->refs))
-		wake_up_var(&pending->refs);
-
 	return 0;
 }
 
@@ -2199,12 +2199,16 @@ static int affine_move_task(struct rq *r
 			push_task = get_task_struct(p);
 		}
 
+		/*
+		 * If there are pending waiters, but no pending stop_work,
+		 * then complete now.
+		 */
 		pending = p->migration_pending;
-		if (pending) {
-			refcount_inc(&pending->refs);
+		if (pending && !pending->stop_pending) {
 			p->migration_pending = NULL;
 			complete = true;
 		}
+
 		task_rq_unlock(rq, p, rf);
 
 		if (push_task) {
@@ -2213,7 +2217,7 @@ static int affine_move_task(struct rq *r
 		}
 
 		if (complete)
-			goto do_complete;
+			complete_all(&pending->done);
 
 		return 0;
 	}
@@ -2264,9 +2268,9 @@ static int affine_move_task(struct rq *r
 		if (!stop_pending)
 			pending->stop_pending = true;
 
-		refcount_inc(&pending->refs); /* pending->{arg,stop_work} */
 		if (flags & SCA_MIGRATE_ENABLE)
 			p->migration_flags &= ~MDF_PUSH;
+
 		task_rq_unlock(rq, p, rf);
 
 		if (!stop_pending) {
@@ -2282,12 +2286,13 @@ static int affine_move_task(struct rq *r
 			if (task_on_rq_queued(p))
 				rq = move_queued_task(rq, rf, p, dest_cpu);
 
-			p->migration_pending = NULL;
-			complete = true;
+			if (!pending->stop_pending) {
+				p->migration_pending = NULL;
+				complete = true;
+			}
 		}
 		task_rq_unlock(rq, p, rf);
 
-do_complete:
 		if (complete)
 			complete_all(&pending->done);
 	}
@@ -2295,7 +2300,7 @@ static int affine_move_task(struct rq *r
 	wait_for_completion(&pending->done);
 
 	if (refcount_dec_and_test(&pending->refs))
-		wake_up_var(&pending->refs);
+		wake_up_var(&pending->refs); /* No UaF, just an address */
 
 	/*
 	 * Block the original owner of &pending until all subsequent callers
@@ -2303,6 +2308,9 @@ static int affine_move_task(struct rq *r
 	 */
 	wait_var_event(&my_pending.refs, !refcount_read(&my_pending.refs));
 
+	/* ARGH */
+	WARN_ON_ONCE(my_pending.stop_pending);
+
 	return 0;
 }
 



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] sched: Simplify migration_cpu_stop()
  2021-02-24 12:24 ` [PATCH 2/6] sched: Simplify migration_cpu_stop() Peter Zijlstra
@ 2021-02-24 15:34   ` Valentin Schneider
  2021-02-25  8:45     ` Peter Zijlstra
  2021-03-01 10:16   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
  2021-03-06 11:42   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2 siblings, 1 reply; 27+ messages in thread
From: Valentin Schneider @ 2021-02-24 15:34 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner
  Cc: Vincent Guittot, Mel Gorman, Dietmar Eggemann, linux-kernel,
	peterz, Andi Kleen

On 24/02/21 13:24, Peter Zijlstra wrote:
> @@ -1950,31 +1931,20 @@ static int migration_cpu_stop(void *data
>                       goto out;
>
>               if (pending) {
> -			p->migration_pending = NULL;
> +			if (p->migration_pending == pending)
> +				p->migration_pending = NULL;
>                       complete = true;
>               }
>
> -		/* migrate_enable() --  we must not race against SCA */
> -		if (dest_cpu < 0) {
> -			/*
> -			 * When this was migrate_enable() but we no longer
> -			 * have a @pending, a concurrent SCA 'fixed' things
> -			 * and we should be valid again. Nothing to do.
> -			 */
> -			if (!pending) {
> -				WARN_ON_ONCE(!cpumask_test_cpu(task_cpu(p), &p->cpus_mask));
> -				goto out;
> -			}
> -

This is fixed by 5+6, but at this patch I think you can have double
completions - I thought this was an issue, but briefly looking at
completion stuff it might not. In any case, consider:

  task_cpu(p) == Y

  SCA(p, X);
                 SCA(p, Y);


SCA(p, Y) will uninstall SCA(p, X)'s pending and complete.

migration/Y kicked by SCA(p, X) will grab arg->pending, which is still
SCA(p, X)'s pending and also complete.

> +		if (dest_cpu < 0)
>                       dest_cpu = cpumask_any_distribute(&p->cpus_mask);
> -		}
>
>               if (task_on_rq_queued(p))
>                       rq = __migrate_task(rq, &rf, p, dest_cpu);
>               else
>                       p->wake_cpu = dest_cpu;
>
> -	} else if (dest_cpu < 0 || pending) {
> +	} else if (pending) {
>               /*
>                * This happens when we get migrated between migrate_enable()'s
>                * preempt_enable() and scheduling the stopper task. At that

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6] sched: Simplify set_affinity_pending refcounts
  2021-02-24 12:24 ` [PATCH 6/6] sched: Simplify set_affinity_pending refcounts Peter Zijlstra
@ 2021-02-24 15:34   ` Valentin Schneider
  2021-02-24 15:34   ` Peter Zijlstra
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 27+ messages in thread
From: Valentin Schneider @ 2021-02-24 15:34 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner
  Cc: Vincent Guittot, Mel Gorman, Dietmar Eggemann, linux-kernel,
	peterz, Andi Kleen

On 24/02/21 13:24, Peter Zijlstra wrote:
> Now that we have set_affinity_pending::stop_pending to indicate if a
> stopper is in progress, and we have the guarantee that if that stopper
> exists, it will (eventually) complete our @pending we can simplify the
> refcount scheme by no longer counting the stopper thread.
>
> Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> @@ -2199,12 +2199,16 @@ static int affine_move_task(struct rq *r
>                       push_task = get_task_struct(p);
>               }
>
> +		/*
> +		 * If there are pending waiters, but no pending stop_work,
> +		 * then complete now.
> +		 */
>               pending = p->migration_pending;
> -		if (pending) {
> -			refcount_inc(&pending->refs);
> +		if (pending && !pending->stop_pending) {
>                       p->migration_pending = NULL;
>                       complete = true;
>               }
> +
>               task_rq_unlock(rq, p, rf);
>
>               if (push_task) {
> @@ -2213,7 +2217,7 @@ static int affine_move_task(struct rq *r
>               }
>
>               if (complete)
> -			goto do_complete;
> +			complete_all(&pending->done);

We could've done this in the first place, right? I don't think this path
actually needed to deal with the refcounts (at least not since we started
counting the stoppers).

Musings aside, I believe the above means, for migration_cpu_stop():

  (pending != NULL) => (pending == p->migration_pending)

Since, when ->stop_pending, only the stopper can uninstall
p->migration_pending. This could simplify a few if's.

Also, the fatty comment above affine_move_task() probably needs a bit of
gardening:

---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9492f8eb242a..6f649aa2795c 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2165,16 +2165,21 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
  *
  * (1) In the cases covered above. There is one more where the completion is
  * signaled within affine_move_task() itself: when a subsequent affinity request
- * cancels the need for an active migration. Consider:
+ * occurs after the stopper bailed out due to the targeted task still being
+ * Migrate-Disable. Consider:
  *
  *     Initial conditions: P0->cpus_mask = [0, 1]
  *
- *     P0@CPU0            P1                             P2
- *
- *     migrate_disable();
- *     <preempted>
+ *     CPU0               P1                             P2
+ *     <P0>
+ *       migrate_disable();
+ *       <preempted>
  *                        set_cpus_allowed_ptr(P0, [1]);
  *                          <blocks>
+ *     <migration/0>
+ *       migration_cpu_stop()
+ *         is_migration_disabled()
+ *           <bails>
  *                                                       set_cpus_allowed_ptr(P0, [0, 1]);
  *                                                         <signal completion>
  *                          <awakes>

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6] sched: Simplify set_affinity_pending refcounts
  2021-02-24 12:24 ` [PATCH 6/6] sched: Simplify set_affinity_pending refcounts Peter Zijlstra
  2021-02-24 15:34   ` Valentin Schneider
@ 2021-02-24 15:34   ` Peter Zijlstra
  2021-02-24 17:59     ` Valentin Schneider
  2021-03-01 10:16   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
  2021-03-06 11:42   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  3 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2021-02-24 15:34 UTC (permalink / raw)
  To: Ingo Molnar, Thomas Gleixner
  Cc: Valentin Schneider, Vincent Guittot, Mel Gorman,
	Dietmar Eggemann, linux-kernel, Andi Kleen

On Wed, Feb 24, 2021 at 01:24:45PM +0100, Peter Zijlstra wrote:
> @@ -2199,12 +2199,16 @@ static int affine_move_task(struct rq *r
>  			push_task = get_task_struct(p);
>  		}
>  
> +		/*
> +		 * If there are pending waiters, but no pending stop_work,
> +		 * then complete now.
> +		 */
>  		pending = p->migration_pending;
> +		if (pending && !pending->stop_pending) {
>  			p->migration_pending = NULL;
>  			complete = true;
>  		}

> @@ -2282,12 +2286,13 @@ static int affine_move_task(struct rq *r
>  			if (task_on_rq_queued(p))
>  				rq = move_queued_task(rq, rf, p, dest_cpu);
>  
> +			if (!pending->stop_pending) {
> +				p->migration_pending = NULL;
> +				complete = true;
> +			}
>  		}
>  		task_rq_unlock(rq, p, rf);

Elsewhere Valentin argued something like the below ought to be possible.
I've not drawn diagrams yet, but if I understood his argument right it
should be possible.

---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 1c56ac4df2c9..3ffbd1b76f3e 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2204,9 +2204,10 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 		 * then complete now.
 		 */
 		pending = p->migration_pending;
-		if (pending && !pending->stop_pending) {
+		if (pending) {
 			p->migration_pending = NULL;
-			complete = true;
+			if (!pending->stop_pending)
+				complete = true;
 		}
 
 		task_rq_unlock(rq, p, rf);
@@ -2286,10 +2287,9 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 			if (task_on_rq_queued(p))
 				rq = move_queued_task(rq, rf, p, dest_cpu);
 
-			if (!pending->stop_pending) {
-				p->migration_pending = NULL;
+			p->migration_pending = NULL;
+			if (!pending->stop_pending)
 				complete = true;
-			}
 		}
 		task_rq_unlock(rq, p, rf);
 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6] sched: Simplify set_affinity_pending refcounts
  2021-02-24 15:34   ` Peter Zijlstra
@ 2021-02-24 17:59     ` Valentin Schneider
  2021-02-25  9:27       ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Valentin Schneider @ 2021-02-24 17:59 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Thomas Gleixner
  Cc: Vincent Guittot, Mel Gorman, Dietmar Eggemann, linux-kernel, Andi Kleen

On 24/02/21 16:34, Peter Zijlstra wrote:
> Elsewhere Valentin argued something like the below ought to be possible.
> I've not drawn diagrams yet, but if I understood his argument right it
> should be possible.
>
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 1c56ac4df2c9..3ffbd1b76f3e 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -2204,9 +2204,10 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
>  		 * then complete now.
>  		 */
>  		pending = p->migration_pending;
> -		if (pending && !pending->stop_pending) {
> +		if (pending) {
>  			p->migration_pending = NULL;
> -			complete = true;
> +			if (!pending->stop_pending)
> +				complete = true;
>  		}
>  
>  		task_rq_unlock(rq, p, rf);
> @@ -2286,10 +2287,9 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
>  			if (task_on_rq_queued(p))
>  				rq = move_queued_task(rq, rf, p, dest_cpu);
>  
> -			if (!pending->stop_pending) {
> -				p->migration_pending = NULL;
> +			p->migration_pending = NULL;
> +			if (!pending->stop_pending)
>  				complete = true;
> -			}
>  		}
>  		task_rq_unlock(rq, p, rf);
>  

I was thinking of the "other way around"; i.e. modify migration_cpu_stop()
into:

---
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9492f8eb242a..9546f0263970 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1926,6 +1926,11 @@ static int migration_cpu_stop(void *data)
 	raw_spin_lock(&p->pi_lock);
 	rq_lock(rq, &rf);
 
+	/*
+	 * If we were passed a pending, then ->stop_pending was set, thus
+	 * p->migration_pending must have remained stable.
+	 */
+	WARN_ON_ONCE(pending && pending != p->migration_pending);
 	/*
 	 * If task_rq(p) != rq, it cannot be migrated here, because we're
 	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
@@ -1936,8 +1941,7 @@ static int migration_cpu_stop(void *data)
 			goto out;
 
 		if (pending) {
-			if (p->migration_pending == pending)
-				p->migration_pending = NULL;
+			p->migration_pending = NULL;
 			complete = true;
 		}
 
@@ -1976,8 +1980,7 @@ static int migration_cpu_stop(void *data)
 		 * somewhere allowed, we're done.
 		 */
 		if (cpumask_test_cpu(task_cpu(p), p->cpus_ptr)) {
-			if (p->migration_pending == pending)
-				p->migration_pending = NULL;
+			p->migration_pending = NULL;
 			complete = true;
 			goto out;
 		}
---

Your change reinstores the "triple SCA" pattern, where a stopper can run
with arg->pending && arg->pending != p->migration_pending, which I was
kinda happy to see go away...

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] sched: Simplify migration_cpu_stop()
  2021-02-24 15:34   ` Valentin Schneider
@ 2021-02-25  8:45     ` Peter Zijlstra
  2021-02-25 11:10       ` Valentin Schneider
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2021-02-25  8:45 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Ingo Molnar, Thomas Gleixner, Vincent Guittot, Mel Gorman,
	Dietmar Eggemann, linux-kernel, Andi Kleen

On Wed, Feb 24, 2021 at 03:34:36PM +0000, Valentin Schneider wrote:
> On 24/02/21 13:24, Peter Zijlstra wrote:
> > @@ -1950,31 +1931,20 @@ static int migration_cpu_stop(void *data
> >                       goto out;
> >
> >               if (pending) {
> > -			p->migration_pending = NULL;
> > +			if (p->migration_pending == pending)
> > +				p->migration_pending = NULL;
> >                       complete = true;
> >               }
> >
> > -		/* migrate_enable() --  we must not race against SCA */
> > -		if (dest_cpu < 0) {
> > -			/*
> > -			 * When this was migrate_enable() but we no longer
> > -			 * have a @pending, a concurrent SCA 'fixed' things
> > -			 * and we should be valid again. Nothing to do.
> > -			 */
> > -			if (!pending) {
> > -				WARN_ON_ONCE(!cpumask_test_cpu(task_cpu(p), &p->cpus_mask));
> > -				goto out;
> > -			}
> > -
> 
> This is fixed by 5+6, but at this patch I think you can have double
> completions - I thought this was an issue, but briefly looking at
> completion stuff it might not. In any case, consider:
> 
>   task_cpu(p) == Y
> 
>   SCA(p, X);
>                  SCA(p, Y);
> 
> 
> SCA(p, Y) will uninstall SCA(p, X)'s pending and complete.
> 
> migration/Y kicked by SCA(p, X) will grab arg->pending, which is still
> SCA(p, X)'s pending and also complete.

Right, so I didn't really think too hard about the intermediate states,
given it's all pretty buggered until at least 5. But yeah, double
complete is harmless.

Specifically, the refcount the stopper has should avoid the stack from
getting released.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6] sched: Simplify set_affinity_pending refcounts
  2021-02-24 17:59     ` Valentin Schneider
@ 2021-02-25  9:27       ` Peter Zijlstra
  2021-02-25 11:11         ` Valentin Schneider
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2021-02-25  9:27 UTC (permalink / raw)
  To: Valentin Schneider
  Cc: Ingo Molnar, Thomas Gleixner, Vincent Guittot, Mel Gorman,
	Dietmar Eggemann, linux-kernel, Andi Kleen

On Wed, Feb 24, 2021 at 05:59:01PM +0000, Valentin Schneider wrote:

> Your change reinstores the "triple SCA" pattern, where a stopper can run
> with arg->pending && arg->pending != p->migration_pending, which I was
> kinda happy to see go away...

Right, fair enough. Any workload that can tell the difference is doing
it wrong anyway :-)

OK, I've munged your two patches together into the below.

---
Subject: sched: Simplify migration_cpu_stop()
From: Valentin Schneider <valentin.schneider@arm.com>
Date: Thu Feb 25 10:22:30 CET 2021

Since, when ->stop_pending, only the stopper can uninstall
p->migration_pending. This could simplify a few ifs, because:

  (pending != NULL) => (pending == p->migration_pending)

Also, the fatty comment above affine_move_task() probably needs a bit
of gardening.

Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/sched/core.c |   27 ++++++++++++++++++---------
 1 file changed, 18 insertions(+), 9 deletions(-)

--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1927,6 +1927,12 @@ static int migration_cpu_stop(void *data
 	rq_lock(rq, &rf);
 
 	/*
+	 * If we were passed a pending, then ->stop_pending was set, thus
+	 * p->migration_pending must have remained stable.
+	 */
+	WARN_ON_ONCE(pending && pending != p->migration_pending);
+
+	/*
 	 * If task_rq(p) != rq, it cannot be migrated here, because we're
 	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
 	 * we're holding p->pi_lock.
@@ -1936,8 +1942,7 @@ static int migration_cpu_stop(void *data
 			goto out;
 
 		if (pending) {
-			if (p->migration_pending == pending)
-				p->migration_pending = NULL;
+			p->migration_pending = NULL;
 			complete = true;
 		}
 
@@ -1976,8 +1981,7 @@ static int migration_cpu_stop(void *data
 		 * somewhere allowed, we're done.
 		 */
 		if (cpumask_test_cpu(task_cpu(p), p->cpus_ptr)) {
-			if (p->migration_pending == pending)
-				p->migration_pending = NULL;
+			p->migration_pending = NULL;
 			complete = true;
 			goto out;
 		}
@@ -2165,16 +2169,21 @@ void do_set_cpus_allowed(struct task_str
  *
  * (1) In the cases covered above. There is one more where the completion is
  * signaled within affine_move_task() itself: when a subsequent affinity request
- * cancels the need for an active migration. Consider:
+ * occurs after the stopper bailed out due to the targeted task still being
+ * Migrate-Disable. Consider:
  *
  *     Initial conditions: P0->cpus_mask = [0, 1]
  *
- *     P0@CPU0            P1                             P2
- *
- *     migrate_disable();
- *     <preempted>
+ *     CPU0		  P1				P2
+ *     <P0>
+ *       migrate_disable();
+ *       <preempted>
  *                        set_cpus_allowed_ptr(P0, [1]);
  *                          <blocks>
+ *     <migration/0>
+ *       migration_cpu_stop()
+ *         is_migration_disabled()
+ *           <bails>
  *                                                       set_cpus_allowed_ptr(P0, [0, 1]);
  *                                                         <signal completion>
  *                          <awakes>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] sched: Simplify migration_cpu_stop()
  2021-02-25  8:45     ` Peter Zijlstra
@ 2021-02-25 11:10       ` Valentin Schneider
  0 siblings, 0 replies; 27+ messages in thread
From: Valentin Schneider @ 2021-02-25 11:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Thomas Gleixner, Vincent Guittot, Mel Gorman,
	Dietmar Eggemann, linux-kernel, Andi Kleen

On 25/02/21 09:45, Peter Zijlstra wrote:
> On Wed, Feb 24, 2021 at 03:34:36PM +0000, Valentin Schneider wrote:
>> On 24/02/21 13:24, Peter Zijlstra wrote:
>> > @@ -1950,31 +1931,20 @@ static int migration_cpu_stop(void *data
>> >                       goto out;
>> >
>> >               if (pending) {
>> > -			p->migration_pending = NULL;
>> > +			if (p->migration_pending == pending)
>> > +				p->migration_pending = NULL;
>> >                       complete = true;
>> >               }
>> >
>> > -		/* migrate_enable() --  we must not race against SCA */
>> > -		if (dest_cpu < 0) {
>> > -			/*
>> > -			 * When this was migrate_enable() but we no longer
>> > -			 * have a @pending, a concurrent SCA 'fixed' things
>> > -			 * and we should be valid again. Nothing to do.
>> > -			 */
>> > -			if (!pending) {
>> > -				WARN_ON_ONCE(!cpumask_test_cpu(task_cpu(p), &p->cpus_mask));
>> > -				goto out;
>> > -			}
>> > -
>> 
>> This is fixed by 5+6, but at this patch I think you can have double
>> completions - I thought this was an issue, but briefly looking at
>> completion stuff it might not. In any case, consider:
>> 
>>   task_cpu(p) == Y
>> 
>>   SCA(p, X);
>>                  SCA(p, Y);
>> 
>> 
>> SCA(p, Y) will uninstall SCA(p, X)'s pending and complete.
>> 
>> migration/Y kicked by SCA(p, X) will grab arg->pending, which is still
>> SCA(p, X)'s pending and also complete.
>
> Right, so I didn't really think too hard about the intermediate states,
> given it's all pretty buggered until at least 5. But yeah, double
> complete is harmless.
>
> Specifically, the refcount the stopper has should avoid the stack from
> getting released.

Aye that should be fine, it really was just the double complete which I
was unsure about.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6] sched: Simplify set_affinity_pending refcounts
  2021-02-25  9:27       ` Peter Zijlstra
@ 2021-02-25 11:11         ` Valentin Schneider
  0 siblings, 0 replies; 27+ messages in thread
From: Valentin Schneider @ 2021-02-25 11:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Thomas Gleixner, Vincent Guittot, Mel Gorman,
	Dietmar Eggemann, linux-kernel, Andi Kleen

On 25/02/21 10:27, Peter Zijlstra wrote:
> On Wed, Feb 24, 2021 at 05:59:01PM +0000, Valentin Schneider wrote:
>
>> Your change reinstores the "triple SCA" pattern, where a stopper can run
>> with arg->pending && arg->pending != p->migration_pending, which I was
>> kinda happy to see go away...
>
> Right, fair enough. Any workload that can tell the difference is doing
> it wrong anyway :-)
>
> OK, I've munged your two patches together into the below.
>

Thanks!

I haven't found much else to say on the series after having slept on it, so
feel free to add:

Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>

to the rest. I'll go see about testing it in some way.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [tip: sched/urgent] sched: Simplify set_affinity_pending refcounts
  2021-02-24 12:24 ` [PATCH 6/6] sched: Simplify set_affinity_pending refcounts Peter Zijlstra
  2021-02-24 15:34   ` Valentin Schneider
  2021-02-24 15:34   ` Peter Zijlstra
@ 2021-03-01 10:16   ` tip-bot2 for Peter Zijlstra
  2021-03-06 11:42   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  3 siblings, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-03-01 10:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: stable, Peter Zijlstra (Intel), Valentin Schneider, x86, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     a4c2579076dc6951709a8e425df8369ab6eb2f24
Gitweb:        https://git.kernel.org/tip/a4c2579076dc6951709a8e425df8369ab6eb2f24
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Wed, 24 Feb 2021 11:42:08 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 01 Mar 2021 11:02:15 +01:00

sched: Simplify set_affinity_pending refcounts

Now that we have set_affinity_pending::stop_pending to indicate if a
stopper is in progress, and we have the guarantee that if that stopper
exists, it will (eventually) complete our @pending we can simplify the
refcount scheme by no longer counting the stopper thread.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.724130207@infradead.org
---
 kernel/sched/core.c | 32 ++++++++++++++++++++------------
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4e4d100..9819121 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1862,6 +1862,10 @@ struct migration_arg {
 	struct set_affinity_pending	*pending;
 };
 
+/*
+ * @refs: number of wait_for_completion()
+ * @stop_pending: is @stop_work in use
+ */
 struct set_affinity_pending {
 	refcount_t		refs;
 	unsigned int		stop_pending;
@@ -1997,10 +2001,6 @@ out:
 	if (complete)
 		complete_all(&pending->done);
 
-	/* For pending->{arg,stop_work} */
-	if (pending && refcount_dec_and_test(&pending->refs))
-		wake_up_var(&pending->refs);
-
 	return 0;
 }
 
@@ -2199,12 +2199,16 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 			push_task = get_task_struct(p);
 		}
 
+		/*
+		 * If there are pending waiters, but no pending stop_work,
+		 * then complete now.
+		 */
 		pending = p->migration_pending;
-		if (pending) {
-			refcount_inc(&pending->refs);
+		if (pending && !pending->stop_pending) {
 			p->migration_pending = NULL;
 			complete = true;
 		}
+
 		task_rq_unlock(rq, p, rf);
 
 		if (push_task) {
@@ -2213,7 +2217,7 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 		}
 
 		if (complete)
-			goto do_complete;
+			complete_all(&pending->done);
 
 		return 0;
 	}
@@ -2264,9 +2268,9 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 		if (!stop_pending)
 			pending->stop_pending = true;
 
-		refcount_inc(&pending->refs); /* pending->{arg,stop_work} */
 		if (flags & SCA_MIGRATE_ENABLE)
 			p->migration_flags &= ~MDF_PUSH;
+
 		task_rq_unlock(rq, p, rf);
 
 		if (!stop_pending) {
@@ -2282,12 +2286,13 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 			if (task_on_rq_queued(p))
 				rq = move_queued_task(rq, rf, p, dest_cpu);
 
-			p->migration_pending = NULL;
-			complete = true;
+			if (!pending->stop_pending) {
+				p->migration_pending = NULL;
+				complete = true;
+			}
 		}
 		task_rq_unlock(rq, p, rf);
 
-do_complete:
 		if (complete)
 			complete_all(&pending->done);
 	}
@@ -2295,7 +2300,7 @@ do_complete:
 	wait_for_completion(&pending->done);
 
 	if (refcount_dec_and_test(&pending->refs))
-		wake_up_var(&pending->refs);
+		wake_up_var(&pending->refs); /* No UaF, just an address */
 
 	/*
 	 * Block the original owner of &pending until all subsequent callers
@@ -2303,6 +2308,9 @@ do_complete:
 	 */
 	wait_var_event(&my_pending.refs, !refcount_read(&my_pending.refs));
 
+	/* ARGH */
+	WARN_ON_ONCE(my_pending.stop_pending);
+
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: sched/urgent] sched: Fix affine_move_task() self-concurrency
  2021-02-24 12:24 ` [PATCH 5/6] sched: Fix affine_move_task() self-concurrency Peter Zijlstra
@ 2021-03-01 10:16   ` tip-bot2 for Peter Zijlstra
  2021-03-06 11:42   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-03-01 10:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: stable, Peter Zijlstra (Intel), Valentin Schneider, x86, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     de8115ef5c83ef2c9941684019d59f4c2e5d16ce
Gitweb:        https://git.kernel.org/tip/de8115ef5c83ef2c9941684019d59f4c2e5d16ce
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Wed, 24 Feb 2021 11:31:09 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 01 Mar 2021 11:02:14 +01:00

sched: Fix affine_move_task() self-concurrency

Consider:

   sched_setaffinity(p, X);		sched_setaffinity(p, Y);

Then the first will install p->migration_pending = &my_pending; and
issue stop_one_cpu_nowait(pending); and the second one will read
p->migration_pending and _also_ issue: stop_one_cpu_nowait(pending),
the _SAME_ @pending.

This causes stopper list corruption.

Add set_affinity_pending::stop_pending, to indicate if a stopper is in
progress.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.649146419@infradead.org
---
 kernel/sched/core.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ac05afb..4e4d100 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1864,6 +1864,7 @@ struct migration_arg {
 
 struct set_affinity_pending {
 	refcount_t		refs;
+	unsigned int		stop_pending;
 	struct completion	done;
 	struct cpu_stop_work	stop_work;
 	struct migration_arg	arg;
@@ -1982,12 +1983,15 @@ static int migration_cpu_stop(void *data)
 		 * determine is_migration_disabled() and so have to chase after
 		 * it.
 		 */
+		WARN_ON_ONCE(!pending->stop_pending);
 		task_rq_unlock(rq, p, &rf);
 		stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
 				    &pending->arg, &pending->stop_work);
 		return 0;
 	}
 out:
+	if (pending)
+		pending->stop_pending = false;
 	task_rq_unlock(rq, p, &rf);
 
 	if (complete)
@@ -2183,7 +2187,7 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 			    int dest_cpu, unsigned int flags)
 {
 	struct set_affinity_pending my_pending = { }, *pending = NULL;
-	bool complete = false;
+	bool stop_pending, complete = false;
 
 	/* Can the task run on the task's current CPU? If so, we're done */
 	if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask)) {
@@ -2256,14 +2260,19 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 		 * anything else we cannot do is_migration_disabled(), punt
 		 * and have the stopper function handle it all race-free.
 		 */
+		stop_pending = pending->stop_pending;
+		if (!stop_pending)
+			pending->stop_pending = true;
 
 		refcount_inc(&pending->refs); /* pending->{arg,stop_work} */
 		if (flags & SCA_MIGRATE_ENABLE)
 			p->migration_flags &= ~MDF_PUSH;
 		task_rq_unlock(rq, p, rf);
 
-		stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
-				    &pending->arg, &pending->stop_work);
+		if (!stop_pending) {
+			stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
+					    &pending->arg, &pending->stop_work);
+		}
 
 		if (flags & SCA_MIGRATE_ENABLE)
 			return 0;

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: sched/urgent] sched: Optimize migration_cpu_stop()
  2021-02-24 12:24 ` [PATCH 4/6] sched: Optimize migration_cpu_stop() Peter Zijlstra
@ 2021-03-01 10:16   ` tip-bot2 for Peter Zijlstra
  2021-03-06 11:42   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-03-01 10:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: stable, Peter Zijlstra (Intel), Valentin Schneider, x86, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     9eca0f53b1c2f5acb85e84673e263bf996817a24
Gitweb:        https://git.kernel.org/tip/9eca0f53b1c2f5acb85e84673e263bf996817a24
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Wed, 24 Feb 2021 11:21:35 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 01 Mar 2021 11:02:14 +01:00

sched: Optimize migration_cpu_stop()

When the purpose of migration_cpu_stop() is to migrate the task to
'any' valid CPU, don't migrate the task when it's already running on a
valid CPU.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.569238629@infradead.org
---
 kernel/sched/core.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 84b657f..ac05afb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1936,14 +1936,25 @@ static int migration_cpu_stop(void *data)
 			complete = true;
 		}
 
-		if (dest_cpu < 0)
+		if (dest_cpu < 0) {
+			if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask))
+				goto out;
+
 			dest_cpu = cpumask_any_distribute(&p->cpus_mask);
+		}
 
 		if (task_on_rq_queued(p))
 			rq = __migrate_task(rq, &rf, p, dest_cpu);
 		else
 			p->wake_cpu = dest_cpu;
 
+		/*
+		 * XXX __migrate_task() can fail, at which point we might end
+		 * up running on a dodgy CPU, AFAICT this can only happen
+		 * during CPU hotplug, at which point we'll get pushed out
+		 * anyway, so it's probably not a big deal.
+		 */
+
 	} else if (pending) {
 		/*
 		 * This happens when we get migrated between migrate_enable()'s

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: sched/urgent] sched: Simplify migration_cpu_stop()
  2021-02-24 12:24 ` [PATCH 2/6] sched: Simplify migration_cpu_stop() Peter Zijlstra
  2021-02-24 15:34   ` Valentin Schneider
@ 2021-03-01 10:16   ` tip-bot2 for Peter Zijlstra
  2021-03-06 11:42   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  2 siblings, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-03-01 10:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: stable, Peter Zijlstra (Intel), Valentin Schneider, x86, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     6430eb536a97036b1d529cbf383cfe36e41a2f97
Gitweb:        https://git.kernel.org/tip/6430eb536a97036b1d529cbf383cfe36e41a2f97
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Wed, 24 Feb 2021 11:50:39 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 01 Mar 2021 11:02:13 +01:00

sched: Simplify migration_cpu_stop()

When affine_move_task() issues a migration_cpu_stop(), the purpose of
that function is to complete that @pending, not any random other
p->migration_pending that might have gotten installed since.

This realization much simplifies migration_cpu_stop() and allows
further necessary steps to fix all this as it provides the guarantee
that @pending's stopper will complete @pending (and not some random
other @pending).

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.430014682@infradead.org
---
 kernel/sched/core.c | 56 ++++++--------------------------------------
 1 file changed, 8 insertions(+), 48 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 79ddba5..088e8f4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1898,8 +1898,8 @@ static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
  */
 static int migration_cpu_stop(void *data)
 {
-	struct set_affinity_pending *pending;
 	struct migration_arg *arg = data;
+	struct set_affinity_pending *pending = arg->pending;
 	struct task_struct *p = arg->task;
 	int dest_cpu = arg->dest_cpu;
 	struct rq *rq = this_rq();
@@ -1921,25 +1921,6 @@ static int migration_cpu_stop(void *data)
 	raw_spin_lock(&p->pi_lock);
 	rq_lock(rq, &rf);
 
-	pending = p->migration_pending;
-	if (pending && !arg->pending) {
-		/*
-		 * This happens from sched_exec() and migrate_task_to(),
-		 * neither of them care about pending and just want a task to
-		 * maybe move about.
-		 *
-		 * Even if there is a pending, we can ignore it, since
-		 * affine_move_task() will have it's own stop_work's in flight
-		 * which will manage the completion.
-		 *
-		 * Notably, pending doesn't need to match arg->pending. This can
-		 * happen when tripple concurrent affine_move_task() first sets
-		 * pending, then clears pending and eventually sets another
-		 * pending.
-		 */
-		pending = NULL;
-	}
-
 	/*
 	 * If task_rq(p) != rq, it cannot be migrated here, because we're
 	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
@@ -1950,31 +1931,20 @@ static int migration_cpu_stop(void *data)
 			goto out;
 
 		if (pending) {
-			p->migration_pending = NULL;
+			if (p->migration_pending == pending)
+				p->migration_pending = NULL;
 			complete = true;
 		}
 
-		/* migrate_enable() --  we must not race against SCA */
-		if (dest_cpu < 0) {
-			/*
-			 * When this was migrate_enable() but we no longer
-			 * have a @pending, a concurrent SCA 'fixed' things
-			 * and we should be valid again. Nothing to do.
-			 */
-			if (!pending) {
-				WARN_ON_ONCE(!cpumask_test_cpu(task_cpu(p), &p->cpus_mask));
-				goto out;
-			}
-
+		if (dest_cpu < 0)
 			dest_cpu = cpumask_any_distribute(&p->cpus_mask);
-		}
 
 		if (task_on_rq_queued(p))
 			rq = __migrate_task(rq, &rf, p, dest_cpu);
 		else
 			p->wake_cpu = dest_cpu;
 
-	} else if (dest_cpu < 0 || pending) {
+	} else if (pending) {
 		/*
 		 * This happens when we get migrated between migrate_enable()'s
 		 * preempt_enable() and scheduling the stopper task. At that
@@ -1989,23 +1959,14 @@ static int migration_cpu_stop(void *data)
 		 * ->pi_lock, so the allowed mask is stable - if it got
 		 * somewhere allowed, we're done.
 		 */
-		if (pending && cpumask_test_cpu(task_cpu(p), p->cpus_ptr)) {
-			p->migration_pending = NULL;
+		if (cpumask_test_cpu(task_cpu(p), p->cpus_ptr)) {
+			if (p->migration_pending == pending)
+				p->migration_pending = NULL;
 			complete = true;
 			goto out;
 		}
 
 		/*
-		 * When this was migrate_enable() but we no longer have an
-		 * @pending, a concurrent SCA 'fixed' things and we should be
-		 * valid again. Nothing to do.
-		 */
-		if (!pending) {
-			WARN_ON_ONCE(!cpumask_test_cpu(task_cpu(p), &p->cpus_mask));
-			goto out;
-		}
-
-		/*
 		 * When migrate_enable() hits a rq mis-match we can't reliably
 		 * determine is_migration_disabled() and so have to chase after
 		 * it.
@@ -2022,7 +1983,6 @@ out:
 		complete_all(&pending->done);
 
 	/* For pending->{arg,stop_work} */
-	pending = arg->pending;
 	if (pending && refcount_dec_and_test(&pending->refs))
 		wake_up_var(&pending->refs);
 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: sched/urgent] sched: Collate affine_move_task() stoppers
  2021-02-24 12:24 ` [PATCH 3/6] sched: Collate affine_move_task() stoppers Peter Zijlstra
@ 2021-03-01 10:16   ` tip-bot2 for Peter Zijlstra
  2021-03-06 11:42   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-03-01 10:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: stable, Peter Zijlstra (Intel), Valentin Schneider, x86, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     dbf983c0a5c37da2d476564792bd84e0e8f067fc
Gitweb:        https://git.kernel.org/tip/dbf983c0a5c37da2d476564792bd84e0e8f067fc
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Wed, 24 Feb 2021 11:15:23 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 01 Mar 2021 11:02:14 +01:00

sched: Collate affine_move_task() stoppers

The SCA_MIGRATE_ENABLE and task_running() cases are almost identical,
collapse them to avoid further duplication.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.500108964@infradead.org
---
 kernel/sched/core.c | 23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 088e8f4..84b657f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2239,30 +2239,23 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 		return -EINVAL;
 	}
 
-	if (flags & SCA_MIGRATE_ENABLE) {
-
-		refcount_inc(&pending->refs); /* pending->{arg,stop_work} */
-		p->migration_flags &= ~MDF_PUSH;
-		task_rq_unlock(rq, p, rf);
-
-		stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
-				    &pending->arg, &pending->stop_work);
-
-		return 0;
-	}
-
 	if (task_running(rq, p) || p->state == TASK_WAKING) {
 		/*
-		 * Lessen races (and headaches) by delegating
-		 * is_migration_disabled(p) checks to the stopper, which will
-		 * run on the same CPU as said p.
+		 * MIGRATE_ENABLE gets here because 'p == current', but for
+		 * anything else we cannot do is_migration_disabled(), punt
+		 * and have the stopper function handle it all race-free.
 		 */
+
 		refcount_inc(&pending->refs); /* pending->{arg,stop_work} */
+		if (flags & SCA_MIGRATE_ENABLE)
+			p->migration_flags &= ~MDF_PUSH;
 		task_rq_unlock(rq, p, rf);
 
 		stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
 				    &pending->arg, &pending->stop_work);
 
+		if (flags & SCA_MIGRATE_ENABLE)
+			return 0;
 	} else {
 
 		if (!is_migration_disabled(p)) {

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: sched/urgent] sched: Fix migration_cpu_stop() requeueing
  2021-02-24 12:24 ` [PATCH 1/6] sched: Fix migration_cpu_stop() requeueing Peter Zijlstra
@ 2021-03-01 10:16   ` tip-bot2 for Peter Zijlstra
  2021-03-06 11:42   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-03-01 10:16 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: stable, Peter Zijlstra (Intel), Valentin Schneider, x86, linux-kernel

The following commit has been merged into the sched/urgent branch of tip:

Commit-ID:     b8e45e2a14bab684713f5dfc70c9e578c333dcdd
Gitweb:        https://git.kernel.org/tip/b8e45e2a14bab684713f5dfc70c9e578c333dcdd
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Sat, 13 Feb 2021 13:10:35 +01:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Mon, 01 Mar 2021 11:02:13 +01:00

sched: Fix migration_cpu_stop() requeueing

When affine_move_task(p) is called on a running task @p, which is not
otherwise already changing affinity, we'll first set
p->migration_pending and then do:

	 stop_one_cpu(cpu_of_rq(rq), migration_cpu_stop, &arg);

This then gets us to migration_cpu_stop() running on the CPU that was
previously running our victim task @p.

If we find that our task is no longer on that runqueue (this can
happen because of a concurrent migration due to load-balance etc.),
then we'll end up at the:

	} else if (dest_cpu < 1 || pending) {

branch. Which we'll take because we set pending earlier. Here we first
check if the task @p has already satisfied the affinity constraints,
if so we bail early [A]. Otherwise we'll reissue migration_cpu_stop()
onto the CPU that is now hosting our task @p:

	stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
			    &pending->arg, &pending->stop_work);

Except, we've never initialized pending->arg, which will be all 0s.

This then results in running migration_cpu_stop() on the next CPU with
arg->p == NULL, which gives the by now obvious result of fireworks.

The cure is to change affine_move_task() to always use pending->arg,
furthermore we can use the exact same pattern as the
SCA_MIGRATE_ENABLE case, since we'll block on the pending->done
completion anyway, no point in adding yet another completion in
stop_one_cpu().

This then gives a clear distinction between the two
migration_cpu_stop() use cases:

  - sched_exec() / migrate_task_to() : arg->pending == NULL
  - affine_move_task() : arg->pending != NULL;

And we can have it ignore p->migration_pending when !arg->pending. Any
stop work from sched_exec() / migrate_task_to() is in addition to stop
works from affine_move_task(), which will be sufficient to issue the
completion.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.357743989@infradead.org
---
 kernel/sched/core.c | 39 ++++++++++++++++++++++++++++-----------
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ca2bb62..79ddba5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1922,6 +1922,24 @@ static int migration_cpu_stop(void *data)
 	rq_lock(rq, &rf);
 
 	pending = p->migration_pending;
+	if (pending && !arg->pending) {
+		/*
+		 * This happens from sched_exec() and migrate_task_to(),
+		 * neither of them care about pending and just want a task to
+		 * maybe move about.
+		 *
+		 * Even if there is a pending, we can ignore it, since
+		 * affine_move_task() will have it's own stop_work's in flight
+		 * which will manage the completion.
+		 *
+		 * Notably, pending doesn't need to match arg->pending. This can
+		 * happen when tripple concurrent affine_move_task() first sets
+		 * pending, then clears pending and eventually sets another
+		 * pending.
+		 */
+		pending = NULL;
+	}
+
 	/*
 	 * If task_rq(p) != rq, it cannot be migrated here, because we're
 	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
@@ -2194,10 +2212,6 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 			    int dest_cpu, unsigned int flags)
 {
 	struct set_affinity_pending my_pending = { }, *pending = NULL;
-	struct migration_arg arg = {
-		.task = p,
-		.dest_cpu = dest_cpu,
-	};
 	bool complete = false;
 
 	/* Can the task run on the task's current CPU? If so, we're done */
@@ -2235,6 +2249,12 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 			/* Install the request */
 			refcount_set(&my_pending.refs, 1);
 			init_completion(&my_pending.done);
+			my_pending.arg = (struct migration_arg) {
+				.task = p,
+				.dest_cpu = -1,		/* any */
+				.pending = &my_pending,
+			};
+
 			p->migration_pending = &my_pending;
 		} else {
 			pending = p->migration_pending;
@@ -2265,12 +2285,6 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 		p->migration_flags &= ~MDF_PUSH;
 		task_rq_unlock(rq, p, rf);
 
-		pending->arg = (struct migration_arg) {
-			.task = p,
-			.dest_cpu = -1,
-			.pending = pending,
-		};
-
 		stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
 				    &pending->arg, &pending->stop_work);
 
@@ -2283,8 +2297,11 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 		 * is_migration_disabled(p) checks to the stopper, which will
 		 * run on the same CPU as said p.
 		 */
+		refcount_inc(&pending->refs); /* pending->{arg,stop_work} */
 		task_rq_unlock(rq, p, rf);
-		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
+
+		stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
+				    &pending->arg, &pending->stop_work);
 
 	} else {
 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: sched/core] sched: Simplify set_affinity_pending refcounts
  2021-02-24 12:24 ` [PATCH 6/6] sched: Simplify set_affinity_pending refcounts Peter Zijlstra
                     ` (2 preceding siblings ...)
  2021-03-01 10:16   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
@ 2021-03-06 11:42   ` tip-bot2 for Peter Zijlstra
  3 siblings, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-03-06 11:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: stable, Peter Zijlstra (Intel),
	Ingo Molnar, Valentin Schneider, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     50caf9c14b1498c90cf808dbba2ca29bd32ccba4
Gitweb:        https://git.kernel.org/tip/50caf9c14b1498c90cf808dbba2ca29bd32ccba4
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Wed, 24 Feb 2021 11:42:08 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 06 Mar 2021 12:40:21 +01:00

sched: Simplify set_affinity_pending refcounts

Now that we have set_affinity_pending::stop_pending to indicate if a
stopper is in progress, and we have the guarantee that if that stopper
exists, it will (eventually) complete our @pending we can simplify the
refcount scheme by no longer counting the stopper thread.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.724130207@infradead.org
---
 kernel/sched/core.c | 32 ++++++++++++++++++++------------
 1 file changed, 20 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4e4d100..9819121 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1862,6 +1862,10 @@ struct migration_arg {
 	struct set_affinity_pending	*pending;
 };
 
+/*
+ * @refs: number of wait_for_completion()
+ * @stop_pending: is @stop_work in use
+ */
 struct set_affinity_pending {
 	refcount_t		refs;
 	unsigned int		stop_pending;
@@ -1997,10 +2001,6 @@ out:
 	if (complete)
 		complete_all(&pending->done);
 
-	/* For pending->{arg,stop_work} */
-	if (pending && refcount_dec_and_test(&pending->refs))
-		wake_up_var(&pending->refs);
-
 	return 0;
 }
 
@@ -2199,12 +2199,16 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 			push_task = get_task_struct(p);
 		}
 
+		/*
+		 * If there are pending waiters, but no pending stop_work,
+		 * then complete now.
+		 */
 		pending = p->migration_pending;
-		if (pending) {
-			refcount_inc(&pending->refs);
+		if (pending && !pending->stop_pending) {
 			p->migration_pending = NULL;
 			complete = true;
 		}
+
 		task_rq_unlock(rq, p, rf);
 
 		if (push_task) {
@@ -2213,7 +2217,7 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 		}
 
 		if (complete)
-			goto do_complete;
+			complete_all(&pending->done);
 
 		return 0;
 	}
@@ -2264,9 +2268,9 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 		if (!stop_pending)
 			pending->stop_pending = true;
 
-		refcount_inc(&pending->refs); /* pending->{arg,stop_work} */
 		if (flags & SCA_MIGRATE_ENABLE)
 			p->migration_flags &= ~MDF_PUSH;
+
 		task_rq_unlock(rq, p, rf);
 
 		if (!stop_pending) {
@@ -2282,12 +2286,13 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 			if (task_on_rq_queued(p))
 				rq = move_queued_task(rq, rf, p, dest_cpu);
 
-			p->migration_pending = NULL;
-			complete = true;
+			if (!pending->stop_pending) {
+				p->migration_pending = NULL;
+				complete = true;
+			}
 		}
 		task_rq_unlock(rq, p, rf);
 
-do_complete:
 		if (complete)
 			complete_all(&pending->done);
 	}
@@ -2295,7 +2300,7 @@ do_complete:
 	wait_for_completion(&pending->done);
 
 	if (refcount_dec_and_test(&pending->refs))
-		wake_up_var(&pending->refs);
+		wake_up_var(&pending->refs); /* No UaF, just an address */
 
 	/*
 	 * Block the original owner of &pending until all subsequent callers
@@ -2303,6 +2308,9 @@ do_complete:
 	 */
 	wait_var_event(&my_pending.refs, !refcount_read(&my_pending.refs));
 
+	/* ARGH */
+	WARN_ON_ONCE(my_pending.stop_pending);
+
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: sched/core] sched: Optimize migration_cpu_stop()
  2021-02-24 12:24 ` [PATCH 4/6] sched: Optimize migration_cpu_stop() Peter Zijlstra
  2021-03-01 10:16   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
@ 2021-03-06 11:42   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-03-06 11:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: stable, Peter Zijlstra (Intel),
	Ingo Molnar, Valentin Schneider, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     3f1bc119cd7fc987c8ed25ffb717f99403bb308c
Gitweb:        https://git.kernel.org/tip/3f1bc119cd7fc987c8ed25ffb717f99403bb308c
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Wed, 24 Feb 2021 11:21:35 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 06 Mar 2021 12:40:21 +01:00

sched: Optimize migration_cpu_stop()

When the purpose of migration_cpu_stop() is to migrate the task to
'any' valid CPU, don't migrate the task when it's already running on a
valid CPU.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.569238629@infradead.org
---
 kernel/sched/core.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 84b657f..ac05afb 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1936,14 +1936,25 @@ static int migration_cpu_stop(void *data)
 			complete = true;
 		}
 
-		if (dest_cpu < 0)
+		if (dest_cpu < 0) {
+			if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask))
+				goto out;
+
 			dest_cpu = cpumask_any_distribute(&p->cpus_mask);
+		}
 
 		if (task_on_rq_queued(p))
 			rq = __migrate_task(rq, &rf, p, dest_cpu);
 		else
 			p->wake_cpu = dest_cpu;
 
+		/*
+		 * XXX __migrate_task() can fail, at which point we might end
+		 * up running on a dodgy CPU, AFAICT this can only happen
+		 * during CPU hotplug, at which point we'll get pushed out
+		 * anyway, so it's probably not a big deal.
+		 */
+
 	} else if (pending) {
 		/*
 		 * This happens when we get migrated between migrate_enable()'s

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: sched/core] sched: Fix affine_move_task() self-concurrency
  2021-02-24 12:24 ` [PATCH 5/6] sched: Fix affine_move_task() self-concurrency Peter Zijlstra
  2021-03-01 10:16   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
@ 2021-03-06 11:42   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-03-06 11:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: stable, Peter Zijlstra (Intel),
	Ingo Molnar, Valentin Schneider, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     9e81889c7648d48dd5fe13f41cbc99f3c362484a
Gitweb:        https://git.kernel.org/tip/9e81889c7648d48dd5fe13f41cbc99f3c362484a
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Wed, 24 Feb 2021 11:31:09 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 06 Mar 2021 12:40:21 +01:00

sched: Fix affine_move_task() self-concurrency

Consider:

   sched_setaffinity(p, X);		sched_setaffinity(p, Y);

Then the first will install p->migration_pending = &my_pending; and
issue stop_one_cpu_nowait(pending); and the second one will read
p->migration_pending and _also_ issue: stop_one_cpu_nowait(pending),
the _SAME_ @pending.

This causes stopper list corruption.

Add set_affinity_pending::stop_pending, to indicate if a stopper is in
progress.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.649146419@infradead.org
---
 kernel/sched/core.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ac05afb..4e4d100 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1864,6 +1864,7 @@ struct migration_arg {
 
 struct set_affinity_pending {
 	refcount_t		refs;
+	unsigned int		stop_pending;
 	struct completion	done;
 	struct cpu_stop_work	stop_work;
 	struct migration_arg	arg;
@@ -1982,12 +1983,15 @@ static int migration_cpu_stop(void *data)
 		 * determine is_migration_disabled() and so have to chase after
 		 * it.
 		 */
+		WARN_ON_ONCE(!pending->stop_pending);
 		task_rq_unlock(rq, p, &rf);
 		stop_one_cpu_nowait(task_cpu(p), migration_cpu_stop,
 				    &pending->arg, &pending->stop_work);
 		return 0;
 	}
 out:
+	if (pending)
+		pending->stop_pending = false;
 	task_rq_unlock(rq, p, &rf);
 
 	if (complete)
@@ -2183,7 +2187,7 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 			    int dest_cpu, unsigned int flags)
 {
 	struct set_affinity_pending my_pending = { }, *pending = NULL;
-	bool complete = false;
+	bool stop_pending, complete = false;
 
 	/* Can the task run on the task's current CPU? If so, we're done */
 	if (cpumask_test_cpu(task_cpu(p), &p->cpus_mask)) {
@@ -2256,14 +2260,19 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 		 * anything else we cannot do is_migration_disabled(), punt
 		 * and have the stopper function handle it all race-free.
 		 */
+		stop_pending = pending->stop_pending;
+		if (!stop_pending)
+			pending->stop_pending = true;
 
 		refcount_inc(&pending->refs); /* pending->{arg,stop_work} */
 		if (flags & SCA_MIGRATE_ENABLE)
 			p->migration_flags &= ~MDF_PUSH;
 		task_rq_unlock(rq, p, rf);
 
-		stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
-				    &pending->arg, &pending->stop_work);
+		if (!stop_pending) {
+			stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
+					    &pending->arg, &pending->stop_work);
+		}
 
 		if (flags & SCA_MIGRATE_ENABLE)
 			return 0;

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: sched/core] sched: Simplify migration_cpu_stop()
  2021-02-24 12:24 ` [PATCH 2/6] sched: Simplify migration_cpu_stop() Peter Zijlstra
  2021-02-24 15:34   ` Valentin Schneider
  2021-03-01 10:16   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
@ 2021-03-06 11:42   ` tip-bot2 for Peter Zijlstra
  2 siblings, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-03-06 11:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: stable, Peter Zijlstra (Intel),
	Ingo Molnar, Valentin Schneider, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     c20cf065d4a619d394d23290093b1002e27dff86
Gitweb:        https://git.kernel.org/tip/c20cf065d4a619d394d23290093b1002e27dff86
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Wed, 24 Feb 2021 11:50:39 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 06 Mar 2021 12:40:20 +01:00

sched: Simplify migration_cpu_stop()

When affine_move_task() issues a migration_cpu_stop(), the purpose of
that function is to complete that @pending, not any random other
p->migration_pending that might have gotten installed since.

This realization much simplifies migration_cpu_stop() and allows
further necessary steps to fix all this as it provides the guarantee
that @pending's stopper will complete @pending (and not some random
other @pending).

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.430014682@infradead.org
---
 kernel/sched/core.c | 56 ++++++--------------------------------------
 1 file changed, 8 insertions(+), 48 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 79ddba5..088e8f4 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1898,8 +1898,8 @@ static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
  */
 static int migration_cpu_stop(void *data)
 {
-	struct set_affinity_pending *pending;
 	struct migration_arg *arg = data;
+	struct set_affinity_pending *pending = arg->pending;
 	struct task_struct *p = arg->task;
 	int dest_cpu = arg->dest_cpu;
 	struct rq *rq = this_rq();
@@ -1921,25 +1921,6 @@ static int migration_cpu_stop(void *data)
 	raw_spin_lock(&p->pi_lock);
 	rq_lock(rq, &rf);
 
-	pending = p->migration_pending;
-	if (pending && !arg->pending) {
-		/*
-		 * This happens from sched_exec() and migrate_task_to(),
-		 * neither of them care about pending and just want a task to
-		 * maybe move about.
-		 *
-		 * Even if there is a pending, we can ignore it, since
-		 * affine_move_task() will have it's own stop_work's in flight
-		 * which will manage the completion.
-		 *
-		 * Notably, pending doesn't need to match arg->pending. This can
-		 * happen when tripple concurrent affine_move_task() first sets
-		 * pending, then clears pending and eventually sets another
-		 * pending.
-		 */
-		pending = NULL;
-	}
-
 	/*
 	 * If task_rq(p) != rq, it cannot be migrated here, because we're
 	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
@@ -1950,31 +1931,20 @@ static int migration_cpu_stop(void *data)
 			goto out;
 
 		if (pending) {
-			p->migration_pending = NULL;
+			if (p->migration_pending == pending)
+				p->migration_pending = NULL;
 			complete = true;
 		}
 
-		/* migrate_enable() --  we must not race against SCA */
-		if (dest_cpu < 0) {
-			/*
-			 * When this was migrate_enable() but we no longer
-			 * have a @pending, a concurrent SCA 'fixed' things
-			 * and we should be valid again. Nothing to do.
-			 */
-			if (!pending) {
-				WARN_ON_ONCE(!cpumask_test_cpu(task_cpu(p), &p->cpus_mask));
-				goto out;
-			}
-
+		if (dest_cpu < 0)
 			dest_cpu = cpumask_any_distribute(&p->cpus_mask);
-		}
 
 		if (task_on_rq_queued(p))
 			rq = __migrate_task(rq, &rf, p, dest_cpu);
 		else
 			p->wake_cpu = dest_cpu;
 
-	} else if (dest_cpu < 0 || pending) {
+	} else if (pending) {
 		/*
 		 * This happens when we get migrated between migrate_enable()'s
 		 * preempt_enable() and scheduling the stopper task. At that
@@ -1989,23 +1959,14 @@ static int migration_cpu_stop(void *data)
 		 * ->pi_lock, so the allowed mask is stable - if it got
 		 * somewhere allowed, we're done.
 		 */
-		if (pending && cpumask_test_cpu(task_cpu(p), p->cpus_ptr)) {
-			p->migration_pending = NULL;
+		if (cpumask_test_cpu(task_cpu(p), p->cpus_ptr)) {
+			if (p->migration_pending == pending)
+				p->migration_pending = NULL;
 			complete = true;
 			goto out;
 		}
 
 		/*
-		 * When this was migrate_enable() but we no longer have an
-		 * @pending, a concurrent SCA 'fixed' things and we should be
-		 * valid again. Nothing to do.
-		 */
-		if (!pending) {
-			WARN_ON_ONCE(!cpumask_test_cpu(task_cpu(p), &p->cpus_mask));
-			goto out;
-		}
-
-		/*
 		 * When migrate_enable() hits a rq mis-match we can't reliably
 		 * determine is_migration_disabled() and so have to chase after
 		 * it.
@@ -2022,7 +1983,6 @@ out:
 		complete_all(&pending->done);
 
 	/* For pending->{arg,stop_work} */
-	pending = arg->pending;
 	if (pending && refcount_dec_and_test(&pending->refs))
 		wake_up_var(&pending->refs);
 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: sched/core] sched: Collate affine_move_task() stoppers
  2021-02-24 12:24 ` [PATCH 3/6] sched: Collate affine_move_task() stoppers Peter Zijlstra
  2021-03-01 10:16   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
@ 2021-03-06 11:42   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-03-06 11:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: stable, Peter Zijlstra (Intel),
	Ingo Molnar, Valentin Schneider, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     58b1a45086b5f80f2b2842aa7ed0da51a64a302b
Gitweb:        https://git.kernel.org/tip/58b1a45086b5f80f2b2842aa7ed0da51a64a302b
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Wed, 24 Feb 2021 11:15:23 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 06 Mar 2021 12:40:21 +01:00

sched: Collate affine_move_task() stoppers

The SCA_MIGRATE_ENABLE and task_running() cases are almost identical,
collapse them to avoid further duplication.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.500108964@infradead.org
---
 kernel/sched/core.c | 23 ++++++++---------------
 1 file changed, 8 insertions(+), 15 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 088e8f4..84b657f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2239,30 +2239,23 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 		return -EINVAL;
 	}
 
-	if (flags & SCA_MIGRATE_ENABLE) {
-
-		refcount_inc(&pending->refs); /* pending->{arg,stop_work} */
-		p->migration_flags &= ~MDF_PUSH;
-		task_rq_unlock(rq, p, rf);
-
-		stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
-				    &pending->arg, &pending->stop_work);
-
-		return 0;
-	}
-
 	if (task_running(rq, p) || p->state == TASK_WAKING) {
 		/*
-		 * Lessen races (and headaches) by delegating
-		 * is_migration_disabled(p) checks to the stopper, which will
-		 * run on the same CPU as said p.
+		 * MIGRATE_ENABLE gets here because 'p == current', but for
+		 * anything else we cannot do is_migration_disabled(), punt
+		 * and have the stopper function handle it all race-free.
 		 */
+
 		refcount_inc(&pending->refs); /* pending->{arg,stop_work} */
+		if (flags & SCA_MIGRATE_ENABLE)
+			p->migration_flags &= ~MDF_PUSH;
 		task_rq_unlock(rq, p, rf);
 
 		stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
 				    &pending->arg, &pending->stop_work);
 
+		if (flags & SCA_MIGRATE_ENABLE)
+			return 0;
 	} else {
 
 		if (!is_migration_disabled(p)) {

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip: sched/core] sched: Fix migration_cpu_stop() requeueing
  2021-02-24 12:24 ` [PATCH 1/6] sched: Fix migration_cpu_stop() requeueing Peter Zijlstra
  2021-03-01 10:16   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
@ 2021-03-06 11:42   ` tip-bot2 for Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2021-03-06 11:42 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: stable, Peter Zijlstra (Intel),
	Ingo Molnar, Valentin Schneider, x86, linux-kernel

The following commit has been merged into the sched/core branch of tip:

Commit-ID:     8a6edb5257e2a84720fe78cb179eca58ba76126f
Gitweb:        https://git.kernel.org/tip/8a6edb5257e2a84720fe78cb179eca58ba76126f
Author:        Peter Zijlstra <peterz@infradead.org>
AuthorDate:    Sat, 13 Feb 2021 13:10:35 +01:00
Committer:     Ingo Molnar <mingo@kernel.org>
CommitterDate: Sat, 06 Mar 2021 12:40:20 +01:00

sched: Fix migration_cpu_stop() requeueing

When affine_move_task(p) is called on a running task @p, which is not
otherwise already changing affinity, we'll first set
p->migration_pending and then do:

	 stop_one_cpu(cpu_of_rq(rq), migration_cpu_stop, &arg);

This then gets us to migration_cpu_stop() running on the CPU that was
previously running our victim task @p.

If we find that our task is no longer on that runqueue (this can
happen because of a concurrent migration due to load-balance etc.),
then we'll end up at the:

	} else if (dest_cpu < 1 || pending) {

branch. Which we'll take because we set pending earlier. Here we first
check if the task @p has already satisfied the affinity constraints,
if so we bail early [A]. Otherwise we'll reissue migration_cpu_stop()
onto the CPU that is now hosting our task @p:

	stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
			    &pending->arg, &pending->stop_work);

Except, we've never initialized pending->arg, which will be all 0s.

This then results in running migration_cpu_stop() on the next CPU with
arg->p == NULL, which gives the by now obvious result of fireworks.

The cure is to change affine_move_task() to always use pending->arg,
furthermore we can use the exact same pattern as the
SCA_MIGRATE_ENABLE case, since we'll block on the pending->done
completion anyway, no point in adding yet another completion in
stop_one_cpu().

This then gives a clear distinction between the two
migration_cpu_stop() use cases:

  - sched_exec() / migrate_task_to() : arg->pending == NULL
  - affine_move_task() : arg->pending != NULL;

And we can have it ignore p->migration_pending when !arg->pending. Any
stop work from sched_exec() / migrate_task_to() is in addition to stop
works from affine_move_task(), which will be sufficient to issue the
completion.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Cc: stable@kernel.org
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210224131355.357743989@infradead.org
---
 kernel/sched/core.c | 39 ++++++++++++++++++++++++++++-----------
 1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index ca2bb62..79ddba5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1922,6 +1922,24 @@ static int migration_cpu_stop(void *data)
 	rq_lock(rq, &rf);
 
 	pending = p->migration_pending;
+	if (pending && !arg->pending) {
+		/*
+		 * This happens from sched_exec() and migrate_task_to(),
+		 * neither of them care about pending and just want a task to
+		 * maybe move about.
+		 *
+		 * Even if there is a pending, we can ignore it, since
+		 * affine_move_task() will have it's own stop_work's in flight
+		 * which will manage the completion.
+		 *
+		 * Notably, pending doesn't need to match arg->pending. This can
+		 * happen when tripple concurrent affine_move_task() first sets
+		 * pending, then clears pending and eventually sets another
+		 * pending.
+		 */
+		pending = NULL;
+	}
+
 	/*
 	 * If task_rq(p) != rq, it cannot be migrated here, because we're
 	 * holding rq->lock, if p->on_rq == 0 it cannot get enqueued because
@@ -2194,10 +2212,6 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 			    int dest_cpu, unsigned int flags)
 {
 	struct set_affinity_pending my_pending = { }, *pending = NULL;
-	struct migration_arg arg = {
-		.task = p,
-		.dest_cpu = dest_cpu,
-	};
 	bool complete = false;
 
 	/* Can the task run on the task's current CPU? If so, we're done */
@@ -2235,6 +2249,12 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 			/* Install the request */
 			refcount_set(&my_pending.refs, 1);
 			init_completion(&my_pending.done);
+			my_pending.arg = (struct migration_arg) {
+				.task = p,
+				.dest_cpu = -1,		/* any */
+				.pending = &my_pending,
+			};
+
 			p->migration_pending = &my_pending;
 		} else {
 			pending = p->migration_pending;
@@ -2265,12 +2285,6 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 		p->migration_flags &= ~MDF_PUSH;
 		task_rq_unlock(rq, p, rf);
 
-		pending->arg = (struct migration_arg) {
-			.task = p,
-			.dest_cpu = -1,
-			.pending = pending,
-		};
-
 		stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
 				    &pending->arg, &pending->stop_work);
 
@@ -2283,8 +2297,11 @@ static int affine_move_task(struct rq *rq, struct task_struct *p, struct rq_flag
 		 * is_migration_disabled(p) checks to the stopper, which will
 		 * run on the same CPU as said p.
 		 */
+		refcount_inc(&pending->refs); /* pending->{arg,stop_work} */
 		task_rq_unlock(rq, p, rf);
-		stop_one_cpu(cpu_of(rq), migration_cpu_stop, &arg);
+
+		stop_one_cpu_nowait(cpu_of(rq), migration_cpu_stop,
+				    &pending->arg, &pending->stop_work);
 
 	} else {
 

^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2021-03-06 11:43 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-24 12:24 [PATCH 0/6] sched: Fix affine_move_task() wreckage Peter Zijlstra
2021-02-24 12:24 ` [PATCH 1/6] sched: Fix migration_cpu_stop() requeueing Peter Zijlstra
2021-03-01 10:16   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
2021-03-06 11:42   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-02-24 12:24 ` [PATCH 2/6] sched: Simplify migration_cpu_stop() Peter Zijlstra
2021-02-24 15:34   ` Valentin Schneider
2021-02-25  8:45     ` Peter Zijlstra
2021-02-25 11:10       ` Valentin Schneider
2021-03-01 10:16   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
2021-03-06 11:42   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-02-24 12:24 ` [PATCH 3/6] sched: Collate affine_move_task() stoppers Peter Zijlstra
2021-03-01 10:16   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
2021-03-06 11:42   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-02-24 12:24 ` [PATCH 4/6] sched: Optimize migration_cpu_stop() Peter Zijlstra
2021-03-01 10:16   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
2021-03-06 11:42   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-02-24 12:24 ` [PATCH 5/6] sched: Fix affine_move_task() self-concurrency Peter Zijlstra
2021-03-01 10:16   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
2021-03-06 11:42   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2021-02-24 12:24 ` [PATCH 6/6] sched: Simplify set_affinity_pending refcounts Peter Zijlstra
2021-02-24 15:34   ` Valentin Schneider
2021-02-24 15:34   ` Peter Zijlstra
2021-02-24 17:59     ` Valentin Schneider
2021-02-25  9:27       ` Peter Zijlstra
2021-02-25 11:11         ` Valentin Schneider
2021-03-01 10:16   ` [tip: sched/urgent] " tip-bot2 for Peter Zijlstra
2021-03-06 11:42   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.