* Re: [RFC PATCH] sched/fair: correct llc shared domain's number of busy CPUs
[not found] <20200504015704.6952-1-hdanton@sina.com>
@ 2020-05-04 7:53 ` Vincent Guittot
[not found] ` <20200504114125.10180-1-hdanton@sina.com>
0 siblings, 1 reply; 2+ messages in thread
From: Vincent Guittot @ 2020-05-04 7:53 UTC (permalink / raw)
To: Hillf Danton
Cc: Peter Zijlstra, Ingo Molnar, lkml, Mel Gorman,
Valentin Schneider, Dietmar Eggemann
On Mon, 4 May 2020 at 03:57, Hillf Danton <hdanton@sina.com> wrote:
>
>
> The comment says, if there is an imbalance between LLC domains (IOW we
> could increase the overall cache use), we need some less-loaded LLC
> domain to pull some load.
>
> To show that imbalance, record busy CPUs as they come and go by doing
> a minor cleanup for sd::nohz_idle.
Your comment failed to explain why we can get rid of sd->nohz_idle
>
> Cc: Mel Gorman <mgorman@techsingularity.net>
> Cc: Vincent Guittot <vincent.guittot@linaro.org>
> Cc: Valentin Schneider <valentin.schneider@arm.com>
> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> Signed-off-by: Hillf Danton <hdanton@sina.com>
> ---
>
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10138,13 +10138,8 @@ static void set_cpu_sd_state_busy(int cp
>
> rcu_read_lock();
> sd = rcu_dereference(per_cpu(sd_llc, cpu));
> -
> - if (!sd || !sd->nohz_idle)
> - goto unlock;
> - sd->nohz_idle = 0;
you remove the use of sd->nohz_idle but you don't remove it from
struct sched_domain
> -
> - atomic_inc(&sd->shared->nr_busy_cpus);
> -unlock:
> + if (sd)
> + atomic_inc(&sd->shared->nr_busy_cpus);
> rcu_read_unlock();
> }
>
> @@ -10168,13 +10163,8 @@ static void set_cpu_sd_state_idle(int cp
>
> rcu_read_lock();
> sd = rcu_dereference(per_cpu(sd_llc, cpu));
> -
> - if (!sd || sd->nohz_idle)
> - goto unlock;
> - sd->nohz_idle = 1;
> -
> - atomic_dec(&sd->shared->nr_busy_cpus);
> -unlock:
> + if (sd)
> + atomic_dec(&sd->shared->nr_busy_cpus);
> rcu_read_unlock();
> }
>
>
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [RFC PATCH] sched/fair: correct llc shared domain's number of busy CPUs
[not found] ` <20200504114125.10180-1-hdanton@sina.com>
@ 2020-05-04 13:29 ` Vincent Guittot
0 siblings, 0 replies; 2+ messages in thread
From: Vincent Guittot @ 2020-05-04 13:29 UTC (permalink / raw)
To: Hillf Danton
Cc: Peter Zijlstra, Ingo Molnar, lkml, Mel Gorman,
Valentin Schneider, Dietmar Eggemann
On Mon, 4 May 2020 at 13:41, Hillf Danton <hdanton@sina.com> wrote:
>
>
> On Mon, 4 May 2020 09:53:36 Vincent Guittot wrote:
> >
> > On Mon, 4 May 2020 at 03:57, Hillf Danton wrote:
> > >
> > > The comment says, if there is an imbalance between LLC domains (IOW we
> > > could increase the overall cache use), we need some less-loaded LLC
> > > domain to pull some load.
> > >
> > > To show that imbalance, record busy CPUs as they come and go by doing
> > > a minor cleanup for sd::nohz_idle.
> >
> > Your comment failed to explain why we can get rid of sd->nohz_idle
> >
> The serialization added in 25f55d9d01ad ("sched: Fix init NOHZ_IDLE flag") to
> updating nr_busy_cpus is no longer needed after 0e369d757578 ("sched/core:
> Replace sd_busy/nr_busy_cpus with sched_domain_shared") AFAICT because a
I don't see the link between commit 0e369d757578 and the fact that we
can remove the nohz_idle field.
> recorded idle/busy CPU does not mean the current CPU could not become idle or
> busy. The right thing is to update the counter if we have a valid sd under rcu.
No it's not the root cause because the sd is per cpu so each cpu has
its own sd->nohz_idle so if cpu A set sd->nohz_idle, cpu B will not be
impact and will have to set its own.
We must ensure that nr_busy_cpus is inc/dec only once when
transitioning from/to idle/busy state in order to keep the shared
nr_busy_cpus correct. But set_cpu_sd_state_busy() is called from
scheduler_tick() which means potentially every tick:
scheduler_tick() -> trigger_load_balance() -> nohz_balancer_kick() ->
nohz_balance_exit_idle() -> set_cpu_sd_state_busy()
The nohz_idle field is there to prevent incrementing nr_busy_cpus at
every tick. But set_cpu_sd_state_busy() is called from
nohz_balance_exit_idle() since 00357f5ec5d6 ("sched/nohz: Clean up
nohz enter/exit") and the latter has a similar mechanism with
rq->nohz_tick_stopped so sd_llc->nohz_idle is useless
>
> > you remove the use of sd->nohz_idle but you don't remove it from
> > struct sched_domain
>
> A seperate cleanup for it is needed if it's no longer used somewhere else.
Please remove it in the same patch
>
> Hillf
>
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2020-05-04 13:29 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20200504015704.6952-1-hdanton@sina.com>
2020-05-04 7:53 ` [RFC PATCH] sched/fair: correct llc shared domain's number of busy CPUs Vincent Guittot
[not found] ` <20200504114125.10180-1-hdanton@sina.com>
2020-05-04 13:29 ` Vincent Guittot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).