From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751586AbdGYQ6b (ORCPT ); Tue, 25 Jul 2017 12:58:31 -0400 Received: from merlin.infradead.org ([205.233.59.134]:38248 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750732AbdGYQ6a (ORCPT ); Tue, 25 Jul 2017 12:58:30 -0400 Date: Tue, 25 Jul 2017 18:58:21 +0200 From: Peter Zijlstra To: Tejun Heo Cc: Ingo Molnar , "Paul E. McKenney" , Steven Rostedt , linux-kernel@vger.kernel.org, Lai Jiangshan , kernel-team@fb.com Subject: Re: [PATCH RFC] sched: Allow migrating kthreads into online but inactive CPUs Message-ID: <20170725165821.cejhb7v2s3kecems@hirez.programming.kicks-ass.net> References: <20170617121008.GB20758@htj.duckdns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170617121008.GB20758@htj.duckdns.org> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On Sat, Jun 17, 2017 at 08:10:08AM -0400, Tejun Heo wrote: > Per-cpu workqueues have been tripping CPU affinity sanity checks while > a CPU is being offlined. A per-cpu kworker ends up running on a CPU > which isn't its target CPU while the CPU is online but inactive. > > While the scheduler allows kthreads to wake up on an online but > inactive CPU, it doesn't allow a running kthread to be migrated to > such a CPU, which leads to an odd situation where setting affinity on > a sleeping and running kthread leads to different results. > > Each mem-reclaim workqueue has one rescuer which guarantees forward > progress and the rescuer needs to bind itself to the CPU which needs > help in making forward progress; however, due to the above issue, > while set_cpus_allowed_ptr() succeeds, the rescuer doesn't end up on > the correct CPU if the CPU is in the process of going offline, > tripping the sanity check and executing the work item on the wrong > CPU. > > This patch updates __migrate_task() so that kthreads can be migrated > into an inactive but online CPU. > > Signed-off-by: Tejun Heo > Reported-by: "Paul E. McKenney" > Reported-by: Steven Rostedt Hmm.. so the rules for running on !active && online are slightly stricter than just being a kthread, how about the below, does that work too? kernel/sched/core.c | 36 ++++++++++++++++++++++++++++++------ 1 file changed, 30 insertions(+), 6 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index d3d39a283beb..59b667c16826 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -894,6 +894,22 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags) } #ifdef CONFIG_SMP + +/* + * Per-CPU kthreads are allowed to run on !actie && online CPUs, see + * __set_cpus_allowed_ptr() and select_fallback_rq(). + */ +static inline bool is_per_cpu_kthread(struct task_struct *p) +{ + if (!(p->flags & PF_KTHREAD)) + return false; + + if (p->nr_cpus_allowed != 1) + return false; + + return true; +} + /* * This is how migration works: * @@ -951,8 +967,13 @@ struct migration_arg { static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf, struct task_struct *p, int dest_cpu) { - if (unlikely(!cpu_active(dest_cpu))) - return rq; + if (is_per_cpu_kthread(p)) { + if (unlikely(!cpu_online(dest_cpu))) + return rq; + } else { + if (unlikely(!cpu_active(dest_cpu))) + return rq; + } /* Affinity changed (again). */ if (!cpumask_test_cpu(dest_cpu, &p->cpus_allowed)) @@ -1482,10 +1503,13 @@ static int select_fallback_rq(int cpu, struct task_struct *p) for (;;) { /* Any allowed, online CPU? */ for_each_cpu(dest_cpu, &p->cpus_allowed) { - if (!(p->flags & PF_KTHREAD) && !cpu_active(dest_cpu)) - continue; - if (!cpu_online(dest_cpu)) - continue; + if (is_per_cpu_kthread(p)) { + if (!cpu_online(dest_cpu)) + continue; + } else { + if (!cpu_active(dest_cpu)) + continue; + } goto out; }