All of lore.kernel.org
 help / color / mirror / Atom feed
From: Valentin Schneider <valentin.schneider@arm.com>
To: linux-kernel@vger.kernel.org
Cc: Will Deacon <will@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Qais Yousef <qais.yousef@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Quentin Perret <qperret@google.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	kernel-team@android.com
Subject: [PATCH 2/2] sched: Plug race between SCA, hotplug and migration_cpu_stop()
Date: Wed, 26 May 2021 21:57:51 +0100	[thread overview]
Message-ID: <20210526205751.842360-3-valentin.schneider@arm.com> (raw)
In-Reply-To: <20210526205751.842360-1-valentin.schneider@arm.com>

migration_cpu_stop() is handed a CPU that was a valid pick at the time
set_cpus_allowed_ptr() was invoked. However, hotplug may have happened
between then and the stopper work being executed, meaning
__move_queued_task() could fail when passed arg.dest_cpu. This could mean
leaving a task on its current CPU (despite it being outside the task's
affinity) up until its next wakeup, which is a big no-no.

Verify the validity of the pick in the stopper callback, and invoke
select_fallback_rq() there if need be.

Fixes: 6d337eab041d ("sched: Fix migrate_disable() vs set_cpus_allowed_ptr()")
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
---
 kernel/sched/core.c | 36 ++++++++++++++++++++++++------------
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d40511c55e80..0679efb6c22d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2263,6 +2263,8 @@ static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
 	return rq;
 }
 
+static int select_fallback_rq(int cpu, struct task_struct *p);
+
 /*
  * migration_cpu_stop - this will be executed by a highprio stopper thread
  * and performs thread migration by bumping thread off CPU then
@@ -2276,6 +2278,7 @@ static int migration_cpu_stop(void *data)
 	struct rq *rq = this_rq();
 	bool complete = false;
 	struct rq_flags rf;
+	int dest_cpu;
 
 	/*
 	 * The original target CPU might have gone down and we might
@@ -2315,18 +2318,27 @@ static int migration_cpu_stop(void *data)
 				goto out;
 		}
 
-		if (task_on_rq_queued(p))
-			rq = __migrate_task(rq, &rf, p, arg->dest_cpu);
-		else
-			p->wake_cpu = arg->dest_cpu;
-
-		/*
-		 * XXX __migrate_task() can fail, at which point we might end
-		 * up running on a dodgy CPU, AFAICT this can only happen
-		 * during CPU hotplug, at which point we'll get pushed out
-		 * anyway, so it's probably not a big deal.
-		 */
-
+		dest_cpu = arg->dest_cpu;
+		if (task_on_rq_queued(p)) {
+			/*
+			 * A hotplug operation could have happened between
+			 * set_cpus_allowed_ptr() and here, making dest_cpu no
+			 * longer allowed.
+			 */
+			if (!is_cpu_allowed(p, dest_cpu))
+				dest_cpu = select_fallback_rq(cpu_of(rq), p);
+			/*
+			 * dest_cpu can be victim of hotplug between is_cpu_allowed()
+			 * and here. However, per the synchronize_rcu() in
+			 * sched_cpu_deactivate(), it can't have gone lower than
+			 * CPUHP_AP_ACTIVE, so it's safe to punt it over and let
+			 * balance_push() route it elsewhere.
+			 */
+			update_rq_clock(rq);
+			rq = move_queued_task(rq, &rf, p, dest_cpu);
+		} else {
+			p->wake_cpu = dest_cpu;
+		}
 	} else if (pending) {
 		/*
 		 * This happens when we get migrated between migrate_enable()'s
-- 
2.25.1


  parent reply	other threads:[~2021-05-26 20:58 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-26 20:57 [PATCH 0/2] sched: SCA vs hotplug vs stopper races fixes Valentin Schneider
2021-05-26 20:57 ` [PATCH 1/2] sched: Don't defer CPU pick to migration_cpu_stop() Valentin Schneider
2021-06-01 14:04   ` [tip: sched/core] " tip-bot2 for Valentin Schneider
2021-05-26 20:57 ` Valentin Schneider [this message]
2021-05-27  9:12   ` [PATCH 2/2] sched: Plug race between SCA, hotplug and migration_cpu_stop() Peter Zijlstra
2021-06-01 16:59   ` Valentin Schneider
2021-06-02 15:26     ` Will Deacon
2021-06-02 19:43       ` Valentin Schneider
2021-06-22 13:51         ` Will Deacon
2021-06-24  9:33           ` Valentin Schneider

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210526205751.842360-3-valentin.schneider@arm.com \
    --to=valentin.schneider@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=qais.yousef@arm.com \
    --cc=qperret@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.