linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC PATCH] sched/fair: correct llc shared domain's number of busy CPUs
       [not found] <20200504015704.6952-1-hdanton@sina.com>
@ 2020-05-04  7:53 ` Vincent Guittot
       [not found]   ` <20200504114125.10180-1-hdanton@sina.com>
  0 siblings, 1 reply; 2+ messages in thread
From: Vincent Guittot @ 2020-05-04  7:53 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Peter Zijlstra, Ingo Molnar, lkml, Mel Gorman,
	Valentin Schneider, Dietmar Eggemann

On Mon, 4 May 2020 at 03:57, Hillf Danton <hdanton@sina.com> wrote:
>
>
> The comment says, if there is an imbalance between LLC domains (IOW we
> could increase the overall cache use),  we need some less-loaded LLC
> domain to pull some load.
>
> To show that imbalance, record busy CPUs as they come and go by doing
> a minor cleanup for sd::nohz_idle.

Your comment failed to explain why we can get rid of sd->nohz_idle

>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Valentin Schneider <valentin.schneider@arm.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Signed-off-by: Hillf Danton <hdanton@sina.com>
> ---
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10138,13 +10138,8 @@ static void set_cpu_sd_state_busy(int cp
>
>         rcu_read_lock();
>         sd = rcu_dereference(per_cpu(sd_llc, cpu));
> -
> -       if (!sd || !sd->nohz_idle)
> -               goto unlock;
> -       sd->nohz_idle = 0;

you remove the use of sd->nohz_idle but you don't remove it from
struct sched_domain

> -
> -       atomic_inc(&sd->shared->nr_busy_cpus);
> -unlock:
> +       if (sd)
> +               atomic_inc(&sd->shared->nr_busy_cpus);
>         rcu_read_unlock();
>  }
>
> @@ -10168,13 +10163,8 @@ static void set_cpu_sd_state_idle(int cp
>
>         rcu_read_lock();
>         sd = rcu_dereference(per_cpu(sd_llc, cpu));
> -
> -       if (!sd || sd->nohz_idle)
> -               goto unlock;
> -       sd->nohz_idle = 1;
> -
> -       atomic_dec(&sd->shared->nr_busy_cpus);
> -unlock:
> +       if (sd)
> +               atomic_dec(&sd->shared->nr_busy_cpus);
>         rcu_read_unlock();
>  }
>
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [RFC PATCH] sched/fair: correct llc shared domain's number of busy CPUs
       [not found]   ` <20200504114125.10180-1-hdanton@sina.com>
@ 2020-05-04 13:29     ` Vincent Guittot
  0 siblings, 0 replies; 2+ messages in thread
From: Vincent Guittot @ 2020-05-04 13:29 UTC (permalink / raw)
  To: Hillf Danton
  Cc: Peter Zijlstra, Ingo Molnar, lkml, Mel Gorman,
	Valentin Schneider, Dietmar Eggemann

On Mon, 4 May 2020 at 13:41, Hillf Danton <hdanton@sina.com> wrote:
>
>
> On Mon, 4 May 2020 09:53:36 Vincent Guittot wrote:
> >
> > On Mon, 4 May 2020 at 03:57, Hillf Danton wrote:
> > >
> > > The comment says, if there is an imbalance between LLC domains (IOW we
> > > could increase the overall cache use),  we need some less-loaded LLC
> > > domain to pull some load.
> > >
> > > To show that imbalance, record busy CPUs as they come and go by doing
> > > a minor cleanup for sd::nohz_idle.
> >
> > Your comment failed to explain why we can get rid of sd->nohz_idle
> >
> The serialization added in 25f55d9d01ad ("sched: Fix init NOHZ_IDLE flag") to
> updating nr_busy_cpus is no longer needed after 0e369d757578 ("sched/core:
> Replace sd_busy/nr_busy_cpus with sched_domain_shared") AFAICT because a

I don't see the link between commit 0e369d757578 and the fact that we
can remove the nohz_idle field.

> recorded idle/busy CPU does not mean the current CPU could not become idle or
> busy. The right thing is to update the counter if we have a valid sd under rcu.

No it's not the root cause because the sd is per cpu so each cpu has
its own sd->nohz_idle so if cpu A set sd->nohz_idle, cpu B will not be
impact and will have to set its own.

We must ensure that nr_busy_cpus is inc/dec only once when
transitioning from/to idle/busy state in order to keep the shared
nr_busy_cpus correct. But set_cpu_sd_state_busy() is called from
scheduler_tick() which means potentially every tick:

scheduler_tick() -> trigger_load_balance() -> nohz_balancer_kick() ->
nohz_balance_exit_idle() -> set_cpu_sd_state_busy()

The nohz_idle field is there to prevent incrementing nr_busy_cpus at
every tick. But set_cpu_sd_state_busy() is called from
nohz_balance_exit_idle() since 00357f5ec5d6 ("sched/nohz: Clean up
nohz enter/exit") and the latter has a similar mechanism with
rq->nohz_tick_stopped so sd_llc->nohz_idle is useless

>
> > you remove the use of sd->nohz_idle but you don't remove it from
> > struct sched_domain
>
> A seperate cleanup for it is needed if it's no longer used somewhere else.

Please remove it in the same patch

>
> Hillf
>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-05-04 13:29 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20200504015704.6952-1-hdanton@sina.com>
2020-05-04  7:53 ` [RFC PATCH] sched/fair: correct llc shared domain's number of busy CPUs Vincent Guittot
     [not found]   ` <20200504114125.10180-1-hdanton@sina.com>
2020-05-04 13:29     ` Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).