All of lore.kernel.org
 help / color / mirror / Atom feed
From: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Rik van Riel <riel@surriel.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Gautham R Shenoy <ego@linux.vnet.ibm.com>,
	Parth Shah <parth@linux.ibm.com>,
	Aubrey Li <aubrey.li@linux.intel.com>
Subject: Re: [PATCH v3 6/8] sched/idle: Move busy_cpu accounting to idle callback
Date: Fri, 21 May 2021 18:51:23 +0530	[thread overview]
Message-ID: <20210521132123.GH2633526@linux.vnet.ibm.com> (raw)
In-Reply-To: <CAKfTPtB05dxcXPX_hZOFXHYaW98sdcykxVYnWdNdMOBHqLMBow@mail.gmail.com>

* Vincent Guittot <vincent.guittot@linaro.org> [2021-05-21 14:37:51]:

> On Thu, 13 May 2021 at 09:41, Srikar Dronamraju
> <srikar@linux.vnet.ibm.com> wrote:
> >
> > Currently we account nr_busy_cpus in no_hz idle functions.
> > There is no reason why nr_busy_cpus should updated be in NO_HZ_COMMON
> > configs only. Also scheduler can mark a CPU as non-busy as soon as an
> > idle class task starts to run. Scheduler can then mark a CPU as busy
> > as soon as its woken up from idle or a new task is placed on it's
> > runqueue.
> >
> > Cc: LKML <linux-kernel@vger.kernel.org>
> > Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> > Cc: Parth Shah <parth@linux.ibm.com>
> > Cc: Ingo Molnar <mingo@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Valentin Schneider <valentin.schneider@arm.com>
> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
> > Cc: Mel Gorman <mgorman@techsingularity.net>
> > Cc: Vincent Guittot <vincent.guittot@linaro.org>
> > Cc: Rik van Riel <riel@surriel.com>
> > Cc: Aubrey Li <aubrey.li@linux.intel.com>
> > Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
> > ---
> >  kernel/sched/fair.c     |  6 ++++--
> >  kernel/sched/idle.c     | 29 +++++++++++++++++++++++++++--
> >  kernel/sched/sched.h    |  1 +
> >  kernel/sched/topology.c |  2 ++
> >  4 files changed, 34 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> > index 0dfe01de22d6..8f86359efdbd 100644
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -10410,7 +10410,10 @@ static void set_cpu_sd_state_busy(int cpu)
> >                 goto unlock;
> >         sd->nohz_idle = 0;
> >
> > -       atomic_inc(&sd->shared->nr_busy_cpus);
> > +       if (sd && per_cpu(is_idle, cpu)) {
> > +               atomic_add_unless(&sd->shared->nr_busy_cpus, 1, per_cpu(sd_llc_size, cpu));
> > +               per_cpu(is_idle, cpu) = 0;
> > +       }
> >  unlock:
> >         rcu_read_unlock();
> >  }
> > @@ -10440,7 +10443,6 @@ static void set_cpu_sd_state_idle(int cpu)
> >                 goto unlock;
> >         sd->nohz_idle = 1;
> >
> > -       atomic_dec(&sd->shared->nr_busy_cpus);
> >  unlock:
> >         rcu_read_unlock();
> >  }
> > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> > index a9f5a8ace59e..c13105fe06b3 100644
> > --- a/kernel/sched/idle.c
> > +++ b/kernel/sched/idle.c
> > @@ -431,12 +431,25 @@ static void check_preempt_curr_idle(struct rq *rq, struct task_struct *p, int fl
> >
> >  static void put_prev_task_idle(struct rq *rq, struct task_struct *prev)
> >  {
> > -#ifdef CONFIG_SCHED_SMT
> > +#ifdef CONFIG_SMP
> > +       struct sched_domain_shared *sds;
> >         int cpu = rq->cpu;
> >
> > +#ifdef CONFIG_SCHED_SMT
> >         if (static_branch_likely(&sched_smt_present))
> >                 set_core_busy(cpu);
> >  #endif
> > +
> > +       rcu_read_lock();
> > +       sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
> > +       if (sds) {
> > +               if (per_cpu(is_idle, cpu)) {
> > +                       atomic_inc(&sds->nr_busy_cpus);
> > +                       per_cpu(is_idle, cpu) = 0;
> > +               }
> > +       }
> > +       rcu_read_unlock();
> > +#endif
> >  }
> >
> >  static void set_next_task_idle(struct rq *rq, struct task_struct *next, bool first)
> > @@ -448,9 +461,21 @@ static void set_next_task_idle(struct rq *rq, struct task_struct *next, bool fir
> >  struct task_struct *pick_next_task_idle(struct rq *rq)
> >  {
> >         struct task_struct *next = rq->idle;
> > +#ifdef CONFIG_SMP
> > +       struct sched_domain_shared *sds;
> > +       int cpu = rq->cpu;
> >
> > -       set_next_task_idle(rq, next, true);
> > +       rcu_read_lock();
> > +       sds = rcu_dereference(per_cpu(sd_llc_shared, cpu));
> > +       if (sds) {
> > +               atomic_add_unless(&sds->nr_busy_cpus, -1, 0);
> > +               per_cpu(is_idle, cpu) = 1;
> > +       }
> 
> One reason to update nr_busy_cpus only during tick is and not at each
> and every single sleep/wakeup to limit the number of atomic_inc/dec in
> case of storm of short running tasks. Because at the end , you waste
> more time trying to accurately follow the current state of the CPU
> than doing work
> 

Yes, I do understand that for short running tasks or if the CPUs are
entering idle for a very short interval; we are unnecessarily tracking the
number of busy_cpus.

However lets assume we have to compare 2 LLCs and have to choose a better
one for a wakeup.
1. We can look at nr_busy_cpus which may not have been updated at every
CPU idle.
2. We can look at nr_busy_cpus which has been updated at every CPU idle.
3. We start aggregating the load of all the CPUs in the LLC.
4. Use the current method, where it only compares the load on previous CPU
and current CPU. However that doesnt give too much indication if the other
CPUs in those LLCs were free.

or probably some other method.

I thought option 2 would be better but I am okay with option 1 too.
Please let me know what option you would prefer.

> >
> > +       rcu_read_unlock();
> > +#endif
> > +
> > +       set_next_task_idle(rq, next, true);
> >         return next;
> >  }
> >
> > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> > index 98c3cfbc5d26..b66c4dad5fd2 100644
> > --- a/kernel/sched/sched.h
> > +++ b/kernel/sched/sched.h
> > @@ -1496,6 +1496,7 @@ DECLARE_PER_CPU(int, sd_llc_id);
> >  #ifdef CONFIG_SCHED_SMT
> >  DECLARE_PER_CPU(int, smt_id);
> >  #endif
> > +DECLARE_PER_CPU(int, is_idle);
> >  DECLARE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
> >  DECLARE_PER_CPU(struct sched_domain __rcu *, sd_numa);
> >  DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing);
> > diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c
> > index 232fb261dfc2..730252937712 100644
> > --- a/kernel/sched/topology.c
> > +++ b/kernel/sched/topology.c
> > @@ -647,6 +647,7 @@ DEFINE_PER_CPU(int, sd_llc_id);
> >  #ifdef CONFIG_SCHED_SMT
> >  DEFINE_PER_CPU(int, smt_id);
> >  #endif
> > +DEFINE_PER_CPU(int, is_idle);
> >  DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared);
> >  DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa);
> >  DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing);
> > @@ -673,6 +674,7 @@ static void update_top_cache_domain(int cpu)
> >  #ifdef CONFIG_SCHED_SMT
> >         per_cpu(smt_id, cpu) = cpumask_first(cpu_smt_mask(cpu));
> >  #endif
> > +       per_cpu(is_idle, cpu) = 1;
> >         rcu_assign_pointer(per_cpu(sd_llc_shared, cpu), sds);
> >
> >         sd = lowest_flag_domain(cpu, SD_NUMA);
> > --
> > 2.18.2
> >

-- 
Thanks and Regards
Srikar Dronamraju

  reply	other threads:[~2021-05-21 13:22 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-13  7:40 [PATCH v3 0/8] sched/fair: wake_affine improvements Srikar Dronamraju
2021-05-13  7:40 ` [PATCH v3 1/8] sched/fair: Update affine statistics when needed Srikar Dronamraju
2021-05-13  7:40 ` [PATCH v3 2/8] sched/fair: Maintain the identity of idle-core Srikar Dronamraju
2021-05-21 12:36   ` Vincent Guittot
2021-05-21 13:31     ` Srikar Dronamraju
2021-05-22 12:42       ` Vincent Guittot
2021-05-22 14:10         ` Srikar Dronamraju
2021-05-25  7:11           ` Vincent Guittot
2021-05-13  7:40 ` [PATCH v3 3/8] sched/fair: Update idle-core more often Srikar Dronamraju
2021-05-13  7:40 ` [PATCH v3 4/8] sched/fair: Prefer idle CPU to cache affinity Srikar Dronamraju
2021-05-13  7:40 ` [PATCH v3 5/8] sched/fair: Use affine_idler_llc for wakeups across LLC Srikar Dronamraju
2021-05-13  7:40 ` [PATCH v3 6/8] sched/idle: Move busy_cpu accounting to idle callback Srikar Dronamraju
2021-05-21 12:37   ` Vincent Guittot
2021-05-21 13:21     ` Srikar Dronamraju [this message]
2021-05-13  7:40 ` [PATCH v3 7/8] sched/fair: Remove ifdefs in waker_affine_idler_llc Srikar Dronamraju
2021-05-13  7:40 ` [PATCH v3 8/8] sched/fair: Dont iterate if no idle CPUs Srikar Dronamraju
2021-05-19  9:36 ` [PATCH v3 0/8] sched/fair: wake_affine improvements Mel Gorman
2021-05-19 16:55   ` Srikar Dronamraju

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210521132123.GH2633526@linux.vnet.ibm.com \
    --to=srikar@linux.vnet.ibm.com \
    --cc=aubrey.li@linux.intel.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=ego@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=parth@linux.ibm.com \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=tglx@linutronix.de \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.