All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>, Ingo Molnar <mingo@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	linux-kernel@vger.kernel.org,
	Lai Jiangshan <jiangshanlai@gmail.com>,
	kernel-team@fb.com
Subject: Re: [PATCH RFC] sched: Allow migrating kthreads into online but inactive CPUs
Date: Tue, 25 Jul 2017 10:52:25 -0700	[thread overview]
Message-ID: <20170725175225.GT3730@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170725165821.cejhb7v2s3kecems@hirez.programming.kicks-ass.net>

On Tue, Jul 25, 2017 at 06:58:21PM +0200, Peter Zijlstra wrote:
> Hi,
> 
> On Sat, Jun 17, 2017 at 08:10:08AM -0400, Tejun Heo wrote:
> > Per-cpu workqueues have been tripping CPU affinity sanity checks while
> > a CPU is being offlined.  A per-cpu kworker ends up running on a CPU
> > which isn't its target CPU while the CPU is online but inactive.
> > 
> > While the scheduler allows kthreads to wake up on an online but
> > inactive CPU, it doesn't allow a running kthread to be migrated to
> > such a CPU, which leads to an odd situation where setting affinity on
> > a sleeping and running kthread leads to different results.
> > 
> > Each mem-reclaim workqueue has one rescuer which guarantees forward
> > progress and the rescuer needs to bind itself to the CPU which needs
> > help in making forward progress; however, due to the above issue,
> > while set_cpus_allowed_ptr() succeeds, the rescuer doesn't end up on
> > the correct CPU if the CPU is in the process of going offline,
> > tripping the sanity check and executing the work item on the wrong
> > CPU.
> > 
> > This patch updates __migrate_task() so that kthreads can be migrated
> > into an inactive but online CPU.
> > 
> > Signed-off-by: Tejun Heo <tj@kernel.org>
> > Reported-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> > Reported-by: Steven Rostedt <rostedt@goodmis.org>
> 
> Hmm.. so the rules for running on !active && online are slightly
> stricter than just being a kthread, how about the below, does that work
> too?

I will give this a shot over night, Pacific Time, but the bug occurs
with such low probability that a pass won't mean much.  :-(

							Thanx, Paul

>  kernel/sched/core.c | 36 ++++++++++++++++++++++++++++++------
>  1 file changed, 30 insertions(+), 6 deletions(-)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index d3d39a283beb..59b667c16826 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -894,6 +894,22 @@ void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
>  }
> 
>  #ifdef CONFIG_SMP
> +
> +/*
> + * Per-CPU kthreads are allowed to run on !actie && online CPUs, see
> + * __set_cpus_allowed_ptr() and select_fallback_rq().
> + */
> +static inline bool is_per_cpu_kthread(struct task_struct *p)
> +{
> +	if (!(p->flags & PF_KTHREAD))
> +		return false;
> +
> +	if (p->nr_cpus_allowed != 1)
> +		return false;
> +
> +	return true;
> +}
> +
>  /*
>   * This is how migration works:
>   *
> @@ -951,8 +967,13 @@ struct migration_arg {
>  static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
>  				 struct task_struct *p, int dest_cpu)
>  {
> -	if (unlikely(!cpu_active(dest_cpu)))
> -		return rq;
> +	if (is_per_cpu_kthread(p)) {
> +		if (unlikely(!cpu_online(dest_cpu)))
> +			return rq;
> +	} else {
> +		if (unlikely(!cpu_active(dest_cpu)))
> +			return rq;
> +	}
> 
>  	/* Affinity changed (again). */
>  	if (!cpumask_test_cpu(dest_cpu, &p->cpus_allowed))
> @@ -1482,10 +1503,13 @@ static int select_fallback_rq(int cpu, struct task_struct *p)
>  	for (;;) {
>  		/* Any allowed, online CPU? */
>  		for_each_cpu(dest_cpu, &p->cpus_allowed) {
> -			if (!(p->flags & PF_KTHREAD) && !cpu_active(dest_cpu))
> -				continue;
> -			if (!cpu_online(dest_cpu))
> -				continue;
> +			if (is_per_cpu_kthread(p)) {
> +				if (!cpu_online(dest_cpu))
> +					continue;
> +			} else {
> +				if (!cpu_active(dest_cpu))
> +					continue;
> +			}
>  			goto out;
>  		}
> 
> 

  reply	other threads:[~2017-07-25 17:52 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-17 12:10 [PATCH RFC] sched: Allow migrating kthreads into online but inactive CPUs Tejun Heo
2017-06-17 12:11 ` simple repro case Tejun Heo
2017-06-21 14:24   ` Steven Rostedt
2017-06-21 17:59     ` Tejun Heo
2017-07-25 16:58 ` [PATCH RFC] sched: Allow migrating kthreads into online but inactive CPUs Peter Zijlstra
2017-07-25 17:52   ` Paul E. McKenney [this message]
2017-07-26 12:57   ` Paul E. McKenney
2018-05-31 12:28   ` [tip:sched/urgent] sched/core: Fix rules for running on online && !active CPUs tip-bot for Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170725175225.GT3730@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=jiangshanlai@gmail.com \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.