linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vineeth Remanan Pillai <vpillai@digitalocean.com>
To: Joel Fernandes <joel@joelfernandes.org>
Cc: "Peter Zijlstra" <peterz@infradead.org>,
	"Tim Chen" <tim.c.chen@linux.intel.com>,
	"Nishanth Aravamudan" <naravamudan@digitalocean.com>,
	"Julien Desfossez" <jdesfossez@digitalocean.com>,
	"Thomas Gleixner" <tglx@linutronix.de>,
	"Paul Turner" <pjt@google.com>,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	"Linux List Kernel Mailing" <linux-kernel@vger.kernel.org>,
	"Subhra Mazumdar" <subhra.mazumdar@oracle.com>,
	"Ingo Molnar" <mingo@kernel.org>,
	"Frédéric Weisbecker" <fweisbec@gmail.com>,
	"Kees Cook" <keescook@chromium.org>,
	"Greg Kerr" <kerrnel@google.com>, "Phil Auld" <pauld@redhat.com>,
	"Aaron Lu" <aaron.lwe@gmail.com>,
	"Aubrey Li" <aubrey.intel@gmail.com>,
	"Valentin Schneider" <valentin.schneider@arm.com>,
	"Mel Gorman" <mgorman@techsingularity.net>,
	"Pawan Gupta" <pawan.kumar.gupta@linux.intel.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Vineeth Pillai" <vineethrp@gmail.com>,
	"Chen Yu" <yu.c.chen@intel.com>,
	"Christian Brauner" <christian.brauner@ubuntu.com>,
	"Aaron Lu" <aaron.lu@linux.alibaba.com>,
	"Paul E . McKenney" <paulmck@kernel.org>
Subject: Re: [RFC PATCH 06/16] sched: Add core wide task selection and scheduling.
Date: Fri, 3 Jul 2020 16:21:46 -0400	[thread overview]
Message-ID: <CANaguZDtZrXbjmot2crLM0ComgY=NfqxWYs7GzUEY8aLeaUVrg@mail.gmail.com> (raw)
In-Reply-To: <20200701232847.GA439212@google.com>

On Wed, Jul 1, 2020 at 7:28 PM Joel Fernandes <joel@joelfernandes.org> wrote:
>
> From: "Joel Fernandes (Google)" <joel@joelfernandes.org>
> Subject: [PATCH] sched: Fix CPU hotplug causing crashes in task selection logic
>
> The selection logic does not run correctly if the current CPU is not in the
> cpu_smt_mask (which it is not because the CPU is offlined when the stopper
> finishes running and needs to switch to idle).  There are also other issues
> fixed by the patch I think such as: if some other sibling set core_pick to
> something, however the selection logic on current cpu resets it before
> selecting. In this case, we need to run the task selection logic again to
> make sure it picks something if there is something to run. It might end up
> picking the wrong task.
>
I am not sure if this can happen. If the other sibling sets core_pick, it
will be under the core wide lock and it should set the core_sched_seq also
before releasing the lock. So when this cpu tries, it would see the core_pick
before resetting it. Is this the same case you were mentioning? Sorry if I
misunderstood the case you mentioned..

> Yet another issue was, if the stopper thread is an
> unconstrained pick, then rq->core_pick is set. The next time task selection
> logic runs when stopper needs to switch to idle, the current CPU is not in
> the smt_mask. This causes the previous ->core_pick to be picked again which
> happens to be the unconstrained task! so the stopper keeps getting selected
> forever.
>
I did not clearly understand this. During an unconstrained pick, current
cpu's core_pick is not set and tasks are not picked for siblings as well.
If it is observed being set in the v6 code, I think it should be a bug.

> That and there are a few more safe guards and checks around checking/setting
> rq->core_pick. To test it, I ran rcutorture and made it tag all torture
> threads. Then ran it in hotplug mode (hotplugging every 200ms) and it hit the
> issue. Now it runs for an hour or so without issue. (Torture testing debug
> changes: https://bit.ly/38htfqK ).
>
> Various fixes were tried causing varying degrees of crashes.  Finally I found
> that it is easiest to just add current CPU to the smt_mask's copy always.
> This is so that task selection logic always runs on the current CPU which
> called schedule().
>
> [...]
>         cpu = cpu_of(rq);
> -       smt_mask = cpu_smt_mask(cpu);
> +       /* Make a copy of cpu_smt_mask as we should not set that. */
> +       cpumask_copy(&select_mask, cpu_smt_mask(cpu));
> +
> +       /*
> +        * Always make sure current CPU is added to smt_mask so that below
> +        * selection logic runs on it.
> +        */
> +       cpumask_set_cpu(cpu, &select_mask);
>
I like this idea. Probably we can optimize it a bit. We get here with cpu
not in smt_mask only during an offline and online(including the boot time
online) phase. So we could probably wrap it in an "if (unlikely())". Also,
during this time, it would be idle thread or some hotplug online thread that
would be runnable and no other tasks should be runnable on this cpu. So, I
think it makes sense to do an unconstrained pick rather than a costly sync
of all siblings. Probably something like:

cpumask_copy(&select_mask, cpu_smt_mask(cpu));
if (unlikely(cpumask_test_cpu(cpu, &select_mask))) {
    cpumask_set_cpu(cpu, &select_mask);
    need_sync = false;
}

By setting need_sync to false, we will do an unconstrained pick and will
not sync with other siblings. I guess we need to reset need_sync after
or in the following for_each_cpu loop, because the loop may set it.

>         /*
>          * core->core_task_seq, core->core_pick_seq, rq->core_sched_seq
> @@ -4351,7 +4358,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf)

>                         if (i == cpu && !need_sync && !p->core_cookie) {
>                                 next = p;
> +                               rq_i->core_pick = next;
> +                               rq_i->core_sched_seq = rq_i->core->core_pick_seq;
>
I think we would not need these here. core_pick needs to be set only
for siblings if we are picking a task for them. For unconstrained pick,
we pick only for ourselves. Also, core_sched_seq need not be synced here.
We might already be synced with the existing core->core_pick_seq. Even
if it is not synced, I don't think it will cause an issue in subsequent
schedule events.


>  done:
> +       /*
> +        * If we reset a sibling's core_pick, make sure that we picked a task
> +        * for it, this is because we might have reset it though it was set to
> +        * something by another selector. In this case we cannot leave it as
> +        * NULL and should have found something for it.
> +        */
> +       for_each_cpu(i, &select_mask) {
> +               WARN_ON_ONCE(!cpu_rq(i)->core_pick);
> +       }
> +
I think this check will not be true always. For unconstrained pick, we
do not pick tasks for siblings and hence do not set core_pick for them.
So this WARN_ON will fire for unconstrained pick. Easily reproducible
by creating an empty cgroup and tagging it. Then only unconstrained
picks will happen and this WARN_ON fires. I guess this check after the
done label does not hold and could be removed.

Thanks,
Vineeth

  parent reply	other threads:[~2020-07-03 20:22 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-30 21:32 [RFC PATCH 00/16] Core scheduling v6 Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 01/16] sched: Wrap rq::lock access Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 02/16] sched: Introduce sched_class::pick_task() Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 03/16] sched: Core-wide rq->lock Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 04/16] sched/fair: Add a few assertions Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 05/16] sched: Basic tracking of matching tasks Vineeth Remanan Pillai
2020-07-21 14:02   ` [RFC PATCH 05/16] sched: Basic tracking of matching tasks(Internet mail) benbjiang(蒋彪)
2020-06-30 21:32 ` [RFC PATCH 06/16] sched: Add core wide task selection and scheduling Vineeth Remanan Pillai
2020-07-01 23:28   ` Joel Fernandes
2020-07-02  0:54     ` Tim Chen
2020-07-02 12:57       ` Joel Fernandes
2020-07-02 13:23         ` Joel Fernandes
2020-07-05 23:44         ` Tim Chen
2020-07-03 20:21     ` Vineeth Remanan Pillai [this message]
2020-07-06 14:09       ` Joel Fernandes
2020-07-06 14:38         ` Vineeth Remanan Pillai
2020-07-06 17:37           ` Joel Fernandes
2020-06-30 21:32 ` [RFC PATCH 07/16] sched/fair: Fix forced idle sibling starvation corner case Vineeth Remanan Pillai
2020-07-21  7:35   ` [RFC PATCH 07/16] sched/fair: Fix forced idle sibling starvation corner case(Internet mail) benbjiang(蒋彪)
2020-07-22  7:20   ` benbjiang(蒋彪)
2020-06-30 21:32 ` [RFC PATCH 08/16] sched/fair: wrapper for cfs_rq->min_vruntime Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 09/16] sched/fair: core wide cfs task priority comparison Vineeth Remanan Pillai
2020-07-22  0:23   ` [RFC PATCH 09/16] sched/fair: core wide cfs task priority comparison(Internet mail) benbjiang(蒋彪)
2020-07-24  7:14     ` Aaron Lu
2020-07-24 12:08       ` Jiang Biao
2020-06-30 21:32 ` [RFC PATCH 10/16] sched: Trivial forced-newidle balancer Vineeth Remanan Pillai
2020-07-20  4:06   ` [RFC PATCH 10/16] sched: Trivial forced-newidle balancer(Internet mail) benbjiang(蒋彪)
2020-07-20  6:06     ` Li, Aubrey
     [not found]       ` <8082F052-2F52-42D3-B396-18A35A94F26F@tencent.com>
2020-07-20  8:03         ` Li, Aubrey
2020-07-20  8:22           ` benbjiang(蒋彪)
2020-07-20 14:34   ` benbjiang(蒋彪)
2020-06-30 21:32 ` [RFC PATCH 11/16] sched: migration changes for core scheduling Vineeth Remanan Pillai
2020-07-22  8:54   ` [RFC PATCH 11/16] sched: migration changes for core scheduling(Internet mail) benbjiang(蒋彪)
2020-07-22 12:13     ` Li, Aubrey
2020-07-22 14:32       ` benbjiang(蒋彪)
2020-07-23  1:57         ` Li, Aubrey
2020-07-23  2:42           ` benbjiang(蒋彪)
2020-07-23  3:35             ` Li, Aubrey
2020-07-23  4:23               ` benbjiang(蒋彪)
2020-07-23  5:39                 ` Li, Aubrey
2020-07-23  7:47                   ` benbjiang(蒋彪)
2020-07-23  8:06                     ` Li, Aubrey
2020-07-23  8:28                       ` benbjiang(蒋彪)
2020-07-23 23:43                         ` Aubrey Li
2020-07-24  1:26                           ` benbjiang(蒋彪)
2020-07-24  2:05                             ` Li, Aubrey
2020-07-24  2:29                               ` benbjiang(蒋彪)
2020-06-30 21:32 ` [RFC PATCH 12/16] sched: cgroup tagging interface for core scheduling Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 13/16] sched: Fix pick_next_task() race condition in " Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 14/16] irq: Add support for core-wide protection of IRQ and softirq Vineeth Remanan Pillai
2020-07-10 12:19   ` Li, Aubrey
2020-07-10 13:21     ` Joel Fernandes
2020-07-13  2:23       ` Li, Aubrey
2020-07-13 15:58         ` Joel Fernandes
2020-07-10 13:36     ` Vineeth Remanan Pillai
2020-07-11  1:33       ` Aubrey Li
2020-07-17 23:37     ` Thomas Gleixner
2020-07-18 17:05       ` Joel Fernandes
2020-07-17 23:36   ` Thomas Gleixner
2020-07-20  3:53     ` Joel Fernandes
2020-07-20  8:20       ` Thomas Gleixner
2020-07-20 11:09       ` Vineeth Pillai
2020-06-30 21:32 ` [RFC PATCH 15/16] Documentation: Add documentation on core scheduling Vineeth Remanan Pillai
2020-06-30 21:32 ` [RFC PATCH 16/16] sched: Debug bits Vineeth Remanan Pillai
2020-07-31 16:41 ` [RFC PATCH 00/16] Core scheduling v6 Vineeth Pillai
2020-08-03  8:23 ` Li, Aubrey
2020-08-03 16:53   ` Joel Fernandes
2020-08-05  3:57     ` Li, Aubrey
2020-08-05  6:16       ` [RFC PATCH 00/16] Core scheduling v6(Internet mail) benbjiang(蒋彪)
2020-08-09 16:44       ` [RFC PATCH 00/16] Core scheduling v6 Joel Fernandes
2020-08-12  2:01         ` Li, Aubrey
2020-08-12 23:08           ` Joel Fernandes
2020-08-13  4:28             ` Li, Aubrey
2020-08-14  0:26               ` [RFC PATCH 00/16] Core scheduling v6(Internet mail) benbjiang(蒋彪)
2020-08-14  1:36                 ` Li, Aubrey
2020-08-14  4:04                   ` benbjiang(蒋彪)
2020-08-14  5:18                     ` Li, Aubrey
2020-08-14  7:54                       ` benbjiang(蒋彪)
2020-08-20 22:37               ` [RFC PATCH 00/16] Core scheduling v6 Joel Fernandes
2020-08-27  0:30 ` Alexander Graf
2020-08-27  1:20   ` Vineeth Pillai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CANaguZDtZrXbjmot2crLM0ComgY=NfqxWYs7GzUEY8aLeaUVrg@mail.gmail.com' \
    --to=vpillai@digitalocean.com \
    --cc=aaron.lu@linux.alibaba.com \
    --cc=aaron.lwe@gmail.com \
    --cc=aubrey.intel@gmail.com \
    --cc=christian.brauner@ubuntu.com \
    --cc=fweisbec@gmail.com \
    --cc=jdesfossez@digitalocean.com \
    --cc=joel@joelfernandes.org \
    --cc=keescook@chromium.org \
    --cc=kerrnel@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=naravamudan@digitalocean.com \
    --cc=pauld@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=subhra.mazumdar@oracle.com \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=valentin.schneider@arm.com \
    --cc=vineethrp@gmail.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).