All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Parth Shah <parth@linux.ibm.com>
Cc: linux-kernel <linux-kernel@vger.kernel.org>,
	"open list:THERMAL" <linux-pm@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Patrick Bellasi <patrick.bellasi@matbug.net>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Pavel Machek <pavel@ucw.cz>, Doug Smythies <dsmythies@telus.net>,
	Quentin Perret <quentin.perret@arm.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Tim Chen <tim.c.chen@linux.intel.com>,
	Daniel Lezcano <daniel.lezcano@linaro.org>
Subject: Re: [RFC v5 4/6] sched/fair: Tune task wake-up logic to pack small background tasks on fewer cores
Date: Tue, 8 Oct 2019 18:20:24 +0200	[thread overview]
Message-ID: <CAKfTPtDEp7s-1F2mOHd0sYqD=TzGcufKS0orddMLaMePUp8T2g@mail.gmail.com> (raw)
In-Reply-To: <80bb34ec-6358-f4dc-d20d-cde6c9d7e197@linux.ibm.com>

On Mon, 7 Oct 2019 at 18:54, Parth Shah <parth@linux.ibm.com> wrote:
>
>
>
> On 10/7/19 5:49 PM, Vincent Guittot wrote:
> > On Mon, 7 Oct 2019 at 10:31, Parth Shah <parth@linux.ibm.com> wrote:
> >>
> >> The algorithm finds the first non idle core in the system and tries to
> >> place a task in the idle CPU in the chosen core. To maintain
> >> cache hotness, work of finding non idle core starts from the prev_cpu,
> >> which also reduces task ping-pong behaviour inside of the core.
> >>
> >> Define a new method to select_non_idle_core which keep tracks of the idle
> >> and non-idle CPUs in the core and based on the heuristics determines if the
> >> core is sufficiently busy to place the incoming backgroung task. The
> >> heuristic further defines the non-idle CPU into either busy (>12.5% util)
> >> CPU and overutilized (>80% util) CPU.
> >> - The core containing more idle CPUs and no busy CPUs is not selected for
> >>   packing
> >> - The core if contains more than 1 overutilized CPUs are exempted from
> >>   task packing
> >> - Pack if there is atleast one busy CPU and overutilized CPUs count is <2
> >>
> >> Value of 12.5% utilization for busy CPU gives sufficient heuristics for CPU
> >> doing enough work and not become idle in nearby timeframe.
> >>
> >> Signed-off-by: Parth Shah <parth@linux.ibm.com>
> >> ---
> >>  kernel/sched/core.c |  3 ++
> >>  kernel/sched/fair.c | 95 ++++++++++++++++++++++++++++++++++++++++++++-
> >>  2 files changed, 97 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> >> index 6e1ae8046fe0..7e3aff59540a 100644
> >> --- a/kernel/sched/core.c
> >> +++ b/kernel/sched/core.c
> >> @@ -6402,6 +6402,7 @@ static struct kmem_cache *task_group_cache __read_mostly;
> >>
> >>  DECLARE_PER_CPU(cpumask_var_t, load_balance_mask);
> >>  DECLARE_PER_CPU(cpumask_var_t, select_idle_mask);
> >> +DECLARE_PER_CPU(cpumask_var_t, turbo_sched_mask);
> >>
> >>  void __init sched_init(void)
> >>  {
> >> @@ -6442,6 +6443,8 @@ void __init sched_init(void)
> >>                         cpumask_size(), GFP_KERNEL, cpu_to_node(i));
> >>                 per_cpu(select_idle_mask, i) = (cpumask_var_t)kzalloc_node(
> >>                         cpumask_size(), GFP_KERNEL, cpu_to_node(i));
> >> +               per_cpu(turbo_sched_mask, i) = (cpumask_var_t)kzalloc_node(
> >> +                       cpumask_size(), GFP_KERNEL, cpu_to_node(i));
> >>         }
> >>  #endif /* CONFIG_CPUMASK_OFFSTACK */
> >>
> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> >> index b798fe7ff7cd..d4a1b6474338 100644
> >> --- a/kernel/sched/fair.c
> >> +++ b/kernel/sched/fair.c
> >> @@ -5353,6 +5353,8 @@ static void dequeue_task_fair(struct rq *rq, struct task_struct *p, int flags)
> >>  /* Working cpumask for: load_balance, load_balance_newidle. */
> >>  DEFINE_PER_CPU(cpumask_var_t, load_balance_mask);
> >>  DEFINE_PER_CPU(cpumask_var_t, select_idle_mask);
> >> +/* A cpumask to find active cores in the system. */
> >> +DEFINE_PER_CPU(cpumask_var_t, turbo_sched_mask);
> >>
> >>  #ifdef CONFIG_NO_HZ_COMMON
> >>
> >> @@ -5964,6 +5966,76 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
> >>         return cpu;
> >>  }
> >>
> >> +#ifdef CONFIG_SCHED_SMT
> >> +static inline bool is_background_task(struct task_struct *p)
> >> +{
> >> +       if (p->flags & PF_CAN_BE_PACKED)
> >> +               return true;
> >> +
> >> +       return false;
> >> +}
> >> +
> >> +#define busyness_threshold     (100 >> 3)
> >> +#define is_cpu_busy(util) ((util) > busyness_threshold)
> >> +
> >> +/*
> >> + * Try to find a non idle core in the system  based on few heuristics:
> >> + * - Keep track of overutilized (>80% util) and busy (>12.5% util) CPUs
> >> + * - If none CPUs are busy then do not select the core for task packing
> >> + * - If atleast one CPU is busy then do task packing unless overutilized CPUs
> >> + *   count is < busy/2 CPU count
> >> + * - Always select idle CPU for task packing
> >> + */
> >> +static int select_non_idle_core(struct task_struct *p, int prev_cpu, int target)
> >> +{
> >> +       struct cpumask *cpus = this_cpu_cpumask_var_ptr(turbo_sched_mask);
> >> +       int iter_cpu, sibling;
> >> +
> >> +       cpumask_and(cpus, cpu_online_mask, p->cpus_ptr);
> >> +
> >> +       for_each_cpu_wrap(iter_cpu, cpus, prev_cpu) {
> >> +               int idle_cpu_count = 0, non_idle_cpu_count = 0;
> >> +               int overutil_cpu_count = 0;
> >> +               int busy_cpu_count = 0;
> >> +               int best_cpu = iter_cpu;
> >> +
> >> +               for_each_cpu(sibling, cpu_smt_mask(iter_cpu)) {
> >> +                       __cpumask_clear_cpu(sibling, cpus);
> >> +                       if (idle_cpu(iter_cpu)) {
> >> +                               idle_cpu_count++;
> >> +                               best_cpu = iter_cpu;
> >> +                       } else {
> >> +                               non_idle_cpu_count++;
> >> +                               if (cpu_overutilized(iter_cpu))
> >> +                                       overutil_cpu_count++;
> >> +                               if (is_cpu_busy(cpu_util(iter_cpu)))
> >> +                                       busy_cpu_count++;
> >> +                       }
> >> +               }
> >> +
> >> +               /*
> >> +                * Pack tasks to this core if
> >> +                * 1. Idle CPU count is higher and atleast one is busy
> >> +                * 2. If idle_cpu_count < non_idle_cpu_count then ideally do
> >> +                * packing but if there are more CPUs overutilized then don't
> >> +                * overload it.
> >
> > Could you give details about the rationale behind these conditions ?
>
> sure. but first maybe some background is required for busy_cpu.
> Task packing needs to be done across cores minimizing number of busy cores
> in the chip. Hence when picking a core for packing a background task it
> will be good to not select a core which is in deeper idle-states.

Make sense.
find_idlest_group_cpu() is doing something similar with the help of cpuidle
Don't you have such information available in your cpuidle driver ?

>
> Usually deeper idle states have target_residency >= 10ms which are really
> power saving states and saved power can be channeled to active cores.
> A CPU with utilization of 12.5% is most probably not going to those deeper
> idle states and picking a CPU with >= 12.5% util seems to be a good
> approximation.

you should better use idle_get_state(rq)

>
> Now going to the _main point_, task packing needs to take care of the
> following scenarios:
> 1. Not select a core having all the CPUs idle or <= 12.5% util
> 2. Do not select a core with 2 or more CPUs overloaded (>=80% util)

Why is it always 2 CPUs ? it seems that you can have 1/2/4/8 CPUs but
you keep using 2 CPUs as a threshold

> 3. Select a core even if 1 CPU is overloaded as background tasks are
> usually short running and spending time for selecting better alternative is
> not worth the investment here
> 4. Select a core if at least one CPU is busy (>=12.5% util)
> 5. On selecting a core, select an idle CPU in it.
>
> Hence to satisfy this scenarios for SMT-1/2/4 (POWER9) or 8 (POWER8 has
> 8-threads per core/ POWER9 has feature to make fake SMT-8), the approach
> keeps track of idle, non-idle, busy and overloaded CPU count in the core
> and uses above approach to find _sufficiently_ non-idle core, which seems
> to be a good heuristics to do task packing without much of regression on
> CPU intensive threads.
>
> So as per the comments in this patch, first point covers the scenario 1 and
> 4 (if part in the code), and second point covers scenario 2 and 3 (else
> part in the code).
>
> >> +                */
> >> +               if (idle_cpu_count > non_idle_cpu_count) {
> >> +                       if (busy_cpu_count)
> >> +                               return best_cpu;
> >> +               } else {
> >> +                       /*
> >> +                        * Pack tasks if at max 1 CPU is overutilized
> >> +                        */
> >> +                       if (overutil_cpu_count < 2)
> >> +                               return best_cpu;
> >> +               }
> >> +       }
> >> +
> >> +       return select_idle_sibling(p, prev_cpu, target);
> >> +}
> >> +#endif /* CONFIG_SCHED_SMT */
> >> +
> >>  /*
> >>   * Try and locate an idle core/thread in the LLC cache domain.
> >>   */
> >> @@ -6418,6 +6490,23 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu)
> >>         return -1;
> >>  }
> >>
> >> +#ifdef CONFIG_SCHED_SMT
> >> +/*
> >> + * Select all classified background tasks for task packing
> >> + */
> >> +static inline int turbosched_select_non_idle_core(struct task_struct *p,
> >> +                                                 int prev_cpu, int target)
> >> +{
> >> +       return select_non_idle_core(p, prev_cpu, target);
> >> +}
> >> +#else
> >> +static inline int turbosched_select_non_idle_core(struct task_struct *p,
> >> +                                                 int prev_cpu, int target)
> >> +{
> >> +       return select_idle_sibling(p, prev_cpu, target);
> >
> > should be better to make turbosched_select_non_idle_core empty and
> > make sure that __turbo_sched_enabled is never enabled if
> > CONFIG_SCHED_SMT is disabled
> >
>
> Totally agreed. I thought keeping like this so as to not have any "#def.."
> in select_task_rq_fair method.
> So can I do this by adding a new method like __select_idle_sibling() which
> will call turbosched_select_non_idle_core() in case of SCHED_SMT present
> and otherwise will call the regular select_idle_sibling()?
>
> >> +}
> >> +#endif
> >> +
> >>  /*
> >>   * select_task_rq_fair: Select target runqueue for the waking task in domains
> >>   * that have the 'sd_flag' flag set. In practice, this is SD_BALANCE_WAKE,
> >> @@ -6483,7 +6572,11 @@ select_task_rq_fair(struct task_struct *p, int prev_cpu, int sd_flag, int wake_f
> >>         } else if (sd_flag & SD_BALANCE_WAKE) { /* XXX always ? */
> >>                 /* Fast path */
> >>
> >> -               new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
> >> +               if (is_turbosched_enabled() && unlikely(is_background_task(p)))
> >> +                       new_cpu = turbosched_select_non_idle_core(p, prev_cpu,
> >> +                                                                 new_cpu);
> >
> > Could you add turbosched_select_non_idle_core() similarly to
> > find_energy_efficient_cpu() ?
> > Add it at the beg select_task_rq_fair()
> > Return immediately with theCPU if you have found one
> > Or let the normal path select a CPU if the
> > turbosched_select_non_idle_core() has not been able to find a suitable
> > CPU for packing
> >
>
> of course. I can do that.
> I was just not aware about the effect of wake_affine and so was waiting for
> such comments to be sure of. Thanks for this.
> Maybe I can add just below the sched_energy_present(){...} construct giving
> precedence to EAS? I'm asking this because I remember Patrick telling me to
> leverage task packing for android as well?

After  sched_energy_present(){...} seems to be a good place.

Leveraging task packing for android means that it task pacing should
collaborate with EAS and find_energy_efficient_cpu()

>
>
> Thank you very much for looking at the patches.
> Parth
>
> >
> >> +               else
> >> +                       new_cpu = select_idle_sibling(p, prev_cpu, new_cpu);
> >>
> >>                 if (want_affine)
> >>                         current->recent_used_cpu = cpu;
> >> --
> >> 2.17.1
> >>
>

  reply	other threads:[~2019-10-08 16:20 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-07  8:30 [RFC v5 0/6] TurboSched: A scheduler for sustaining Turbo Frequencies for longer durations Parth Shah
2019-10-07  8:30 ` [RFC v5 1/6] sched/core: Add manual background task classification using sched_setattr syscall Parth Shah
2019-10-07  8:30 ` [RFC v5 2/6] sched: Introduce switch to enable TurboSched for task packing Parth Shah
2019-10-07  8:30 ` [RFC v5 3/6] sched/core: Update turbo_sched count only when required Parth Shah
2019-10-07  8:30 ` [RFC v5 4/6] sched/fair: Tune task wake-up logic to pack small background tasks on fewer cores Parth Shah
2019-10-07 12:19   ` Vincent Guittot
2019-10-07 16:53     ` Parth Shah
2019-10-08 16:20       ` Vincent Guittot [this message]
2019-10-09  8:46         ` Parth Shah
2019-10-08 16:52       ` Dietmar Eggemann
2019-10-09  8:57         ` Parth Shah
2019-10-09 14:26           ` Dietmar Eggemann
2019-10-09 17:02             ` Parth Shah
2019-10-10 14:53               ` Dietmar Eggemann
2019-10-07  8:30 ` [RFC v5 5/6] sched/fair: Provide arch hook to find domain for non idle core search scan Parth Shah
2019-10-07  8:30 ` [RFC v5 6/6] powerpc: Set turbo domain to NUMA node for task packing Parth Shah
     [not found] ` <20191008132842.6612-1-hdanton@sina.com>
2019-10-09  9:22   ` [RFC v5 4/6] sched/fair: Tune task wake-up logic to pack small background tasks on fewer cores Parth Shah
2019-10-09 11:34     ` Vincent Guittot
2019-10-09 16:55       ` Parth Shah

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAKfTPtDEp7s-1F2mOHd0sYqD=TzGcufKS0orddMLaMePUp8T2g@mail.gmail.com' \
    --to=vincent.guittot@linaro.org \
    --cc=daniel.lezcano@linaro.org \
    --cc=dietmar.eggemann@arm.com \
    --cc=dsmythies@telus.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=parth@linux.ibm.com \
    --cc=patrick.bellasi@matbug.net \
    --cc=pavel@ucw.cz \
    --cc=peterz@infradead.org \
    --cc=quentin.perret@arm.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=tim.c.chen@linux.intel.com \
    --cc=valentin.schneider@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.