All of lore.kernel.org
 help / color / mirror / Atom feed
From: Josh Don <joshdon@google.com>
To: Abel Wu <wuyun.abel@bytedance.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Mel Gorman <mgorman@suse.de>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	linux-kernel <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v3] sched/fair: filter out overloaded cpus in SIS
Date: Mon, 9 May 2022 18:14:22 -0700	[thread overview]
Message-ID: <CABk29Ns3KBwLXBSwiSe7Pv2YE9iMg+A1kPpPESWG=KNJu9dz0w@mail.gmail.com> (raw)
In-Reply-To: <20220505122331.42696-1-wuyun.abel@bytedance.com>

Hi Abel,

Overall this looks good, just a couple of comments.

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index d4bd299d67ab..79b4ff24faee 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -6323,7 +6323,9 @@ static inline int select_idle_smt(struct task_struct *p, struct sched_domain *sd
>  static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool has_idle_core, int target)
>  {
>         struct cpumask *cpus = this_cpu_cpumask_var_ptr(select_idle_mask);
> -       int i, cpu, idle_cpu = -1, nr = INT_MAX;
> +       struct sched_domain_shared *sds = sd->shared;
> +       int nr, nro, weight = sd->span_weight;
> +       int i, cpu, idle_cpu = -1;
>         struct rq *this_rq = this_rq();
>         int this = smp_processor_id();
>         struct sched_domain *this_sd;
> @@ -6333,7 +6335,23 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>         if (!this_sd)
>                 return -1;
>
> +       nro = atomic_read(&sds->nr_overloaded_cpus);
> +       if (nro == weight)
> +               goto out;

This assumes that the sd we're operating on here is the LLC domain
(true for current use). Perhaps to catch future bugs from changing
this assumption, we could WARN_ON_ONCE(nro > weight).

> +
> +       nr = min_t(int, weight, p->nr_cpus_allowed);
> +
> +       /*
> +        * It's unlikely to find an idle cpu if the system is under
> +        * heavy pressure, so skip searching to save a few cycles
> +        * and relieve cache traffic.
> +        */
> +       if (weight - nro < (nr >> 4) && !has_idle_core)
> +               return -1;

nit: nr / 16 is easier to read and the compiler will do the shifting for you.

Was < intentional vs <= ? With <= you'll be able to skip the search in
the case where both sides evaluate to 0 (can happen frequently if we
have no idle cpus, and a task with a small affinity mask).

This will also get a bit confused in the case where the task has many
cpus allowed, but almost all of them on a different LLC than the one
we're considering here. Apart from caching the per-LLC
nr_cpus_allowed, we could instead use cpumask_weight(cpus) below (and
only do this in the !has_idle_core case to reduce calls to
cpumask_weight()).

> +
>         cpumask_and(cpus, sched_domain_span(sd), p->cpus_ptr);
> +       if (nro > 1)
> +               cpumask_andnot(cpus, cpus, sdo_mask(sds));

Just
if (nro)
?

> @@ -6392,6 +6407,9 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, bool
>
>                 update_avg(&this_sd->avg_scan_cost, time);
>         }
> +out:
> +       if (has_idle_core)
> +               WRITE_ONCE(sds->has_idle_cores, 0);

nit: use set_idle_cores() instead (or, if you really want to avoid the
extra sds dereference, add a __set_idle_cores(sds, val) helper you can
call directly.

> @@ -7904,6 +7922,7 @@ static struct task_struct *detach_one_task(struct lb_env *env)
>                         continue;
>
>                 detach_task(p, env);
> +               update_overloaded_rq(env->src_rq);
>
>                 /*
>                  * Right now, this is only the second place where
> @@ -8047,6 +8066,9 @@ static int detach_tasks(struct lb_env *env)
>                 list_move(&p->se.group_node, tasks);
>         }
>
> +       if (detached)
> +               update_overloaded_rq(env->src_rq);
> +

Thinking about this more, I don't see an issue with moving the
update_overloaded_rq() calls to enqueue/dequeue_task, rather than here
in the attach/detach_task paths. Overloaded state only changes when we
pass the boundary of 2 runnable non-idle tasks, so thashing of the
overloaded mask is a lot less worrisome than if it were updated on the
boundary of 1 runnable task. The attach/detach_task paths run as part
of load balancing, which can be on a millisecond time scale.

Best,
Josh

  parent reply	other threads:[~2022-05-10  1:14 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-05 12:23 [PATCH v3] sched/fair: filter out overloaded cpus in SIS Abel Wu
2022-05-07 16:09 ` Chen Yu
2022-05-07 17:50   ` Abel Wu
2022-05-09 15:21     ` Chen Yu
2022-05-09 15:31       ` Chen Yu
2022-05-10  2:55       ` Abel Wu
2022-05-10  1:14 ` Josh Don [this message]
2022-05-10  8:03   ` Abel Wu
2022-05-19 22:16 ` Tim Chen
2022-05-20  2:48   ` Abel Wu
2022-05-20  6:48 ` K Prateek Nayak
2022-05-20  7:43   ` Abel Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CABk29Ns3KBwLXBSwiSe7Pv2YE9iMg+A1kPpPESWG=KNJu9dz0w@mail.gmail.com' \
    --to=joshdon@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=peterz@infradead.org \
    --cc=vincent.guittot@linaro.org \
    --cc=wuyun.abel@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.