All of lore.kernel.org
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>, Paul Turner <pjt@google.com>,
	Morten Rasmussen <Morten.Rasmussen@arm.com>,
	Chris Metcalf <cmetcalf@tilera.com>,
	Tony Luck <tony.luck@intel.com>,
	"alex.shi@intel.com" <alex.shi@intel.com>,
	Preeti U Murthy <preeti@linux.vnet.ibm.com>,
	linaro-kernel <linaro-kernel@lists.linaro.org>,
	"len.brown@intel.com" <len.brown@intel.com>,
	l.majewski@samsung.com, Jonathan Corbet <corbet@lwn.net>,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	Paul McKenney <paulmck@linux.vnet.ibm.com>,
	Arjan van de Ven <arjan@linux.intel.com>,
	linux-pm@vger.kernel.org
Subject: Re: [RFC][PATCH v5 00/14] sched: packing tasks
Date: Mon, 11 Nov 2013 11:33:45 +0000	[thread overview]
Message-ID: <CAHkRjk69GNYtLGBSWCNcsCzkBHywKrD0qQQbNkJRpMbcdsCPyw@mail.gmail.com> (raw)
In-Reply-To: <1382097147-30088-1-git-send-email-vincent.guittot@linaro.org>

Hi Vincent,

(cross-posting to linux-pm as it was agreed to follow up on this list)

On 18 October 2013 12:52, Vincent Guittot <vincent.guittot@linaro.org> wrote:
> This is the 5th version of the previously named "packing small tasks" patchset.
> "small" has been removed because the patchset doesn't only target small tasks
> anymore.
>
> This patchset takes advantage of the new per-task load tracking that is
> available in the scheduler to pack the tasks in a minimum number of
> CPU/Cluster/Core. The packing mechanism takes into account the power gating
> topology of the CPUs to minimize the number of power domains that need to be
> powered on simultaneously.

As a general comment, it's not clear how this set of patches address
the bigger problem of energy aware scheduling, mainly because we
haven't yet defined _what_ we want from the scheduler, what the
scenarios are, constraints, are we prepared to give up some
performance (speed, latency) for power, how much.

This packing heuristics may work for certain SoCs and workloads but,
for example, there are modern ARM SoCs where the P-state has a much
bigger effect on power and it's more energy-efficient to keep two CPUs
in lower P-state than packing all tasks onto one, even though they may
be gated independently. In such cases _small_ task packing (for some
definition of 'small') would be more useful than general packing but
even this is just heuristics that saves power for particular workloads
without fully defining/addressing the problem.

I would rather start by defining the main goal and working backwards
to an algorithm. We may as well find that task packing based on this
patch set is sufficient but we may also get packing-like behaviour as
a side effect of a broader approach (better energy cost awareness). An
important aspect even in the mobile space is keeping the performance
as close as possible to the standard scheduler while saving a bit more
power. Just trying to reduce the number of non-idle CPUs may not meet
this requirement.


So, IMO, defining the power topology is a good starting point and I
think it's better to separate the patches from the energy saving
algorithms like packing. We need to agree on what information we have
(C-state details, coupling, power gating) and what we can/need to
expose to the scheduler. This can be revisited once we start
implementing/refining the energy awareness.

2nd step is how the _current_ scheduler could use such information
while keeping the current overall system behaviour (how much of
cpuidle we should move into the scheduler).

Question for Peter/Ingo: do you want the scheduler to decide on which
C-state a CPU should be in or we still leave this to a cpuidle
layer/driver?

My understanding from the recent discussions is that the scheduler
should decide directly on the C-state (or rather the deepest C-state
possible since we don't want to duplicate the backend logic for
synchronising CPUs going up or down). This means that the scheduler
needs to know about C-state target residency, wake-up latency (I think
we can leave coupled C-states to the backend, there is some complex
synchronisation which I wouldn't duplicate).

Alternatively (my preferred approach), we get the scheduler to predict
and pass the expected residency and latency requirements down to a
power driver and read back the actual C-states for making task
placement decisions. Some of the menu governor prediction logic could
be turned into a library and used by the scheduler. Basically what
this tries to achieve is better scheduler awareness of the current
C-states decided by a cpuidle/power driver based on the scheduler
constraints.

3rd step is optimising the scheduler for energy saving, taking into
account the information added by the previous steps and possibly
adding some more. This stage however has several sub-steps (that can
be worked on in parallel to the steps above):

a) Define use-cases, typical workloads, acceptance criteria
(performance, latency requirements).

b) Set of benchmarks simulating the scenarios above. I wouldn't bother
with linsched since a power model is never realistic enough. It's
better to run those benchmarks on real hardware and either estimate
the energy based on the C/P states or, depending on SoC, read some
sensors, energy probes. If the scheduler maintainers want to reproduce
the numbers, I'm pretty sure we can ship some boards.

c) Start defining/implementing scheduler algorithm to do optimal task placement.

d) Assess the implementation against benchmarks at (b) *and* other
typical performance benchmarks (whether it's for servers, mobile,
Android etc). At this point we'll most likely go back and refine the
previous steps.

So far we've jumped directly to (c) because we had some scenarios in
mind that needed optimising but those haven't been written down and we
don't have a clear way to assess the impact. There is more here than
simply maximising the idle time. Ideally the scheduler should have an
estimate of the overall energy cost, the cost per task, run-queue, the
energy implications of moving the tasks to another run-queue, possibly
taking the P-state into account (but not 'picking' a P-state).

Anyway, I think we need to address the first steps and think about the
algorithm once we have the bigger picture of what we try to solve.

Thanks.

-- 
Catalin

  parent reply	other threads:[~2013-11-11 11:34 UTC|newest]

Thread overview: 101+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-18 11:52 [RFC][PATCH v5 00/14] sched: packing tasks Vincent Guittot
2013-10-18 11:52 ` [RFC][PATCH v5 01/14] sched: add a new arch_sd_local_flags for sched_domain init Vincent Guittot
2013-11-05 14:06   ` Peter Zijlstra
2013-11-05 14:57     ` Vincent Guittot
2013-11-05 22:27       ` Peter Zijlstra
2013-11-06 10:10         ` Vincent Guittot
2013-11-06 13:53         ` Martin Schwidefsky
2013-11-06 14:08           ` Peter Zijlstra
2013-11-12 17:43             ` Dietmar Eggemann
2013-11-12 18:08               ` Peter Zijlstra
2013-11-13 15:47                 ` Dietmar Eggemann
2013-11-13 16:29                   ` Peter Zijlstra
2013-11-14 10:49                     ` Morten Rasmussen
2013-11-14 12:07                       ` Peter Zijlstra
2013-12-18 13:13         ` [RFC] sched: CPU topology try Vincent Guittot
2013-12-23 17:22           ` Dietmar Eggemann
2014-01-06 13:41             ` Vincent Guittot
2014-01-06 16:31               ` Peter Zijlstra
2014-01-07  8:32                 ` Vincent Guittot
2014-01-07 13:22                   ` Peter Zijlstra
2014-01-07 14:10                     ` Peter Zijlstra
2014-01-07 15:41                       ` Morten Rasmussen
2014-01-07 20:49                         ` Peter Zijlstra
2014-01-08  8:32                           ` Alex Shi
2014-01-08  8:37                             ` Peter Zijlstra
2014-01-08 12:52                               ` Morten Rasmussen
2014-01-08 13:04                                 ` Peter Zijlstra
2014-01-08 13:33                                   ` Morten Rasmussen
2014-01-08 12:35                           ` Morten Rasmussen
2014-01-08 12:42                             ` Peter Zijlstra
2014-01-08 12:45                             ` Peter Zijlstra
2014-01-08 13:27                               ` Morten Rasmussen
2014-01-08 13:32                                 ` Peter Zijlstra
2014-01-08 13:45                                   ` Morten Rasmussen
2014-01-07 14:11                     ` Vincent Guittot
2014-01-07 15:37                       ` Morten Rasmussen
2014-01-08  8:37                         ` Alex Shi
2014-01-06 16:28             ` Peter Zijlstra
2014-01-06 17:15               ` Morten Rasmussen
2014-01-07  9:57                 ` Peter Zijlstra
2014-01-01  5:00           ` Preeti U Murthy
2014-01-06 16:33             ` Peter Zijlstra
2014-01-06 16:37               ` Arjan van de Ven
2014-01-06 16:48                 ` Peter Zijlstra
2014-01-06 16:54                   ` Peter Zijlstra
2014-01-06 17:13                     ` Arjan van de Ven
2014-01-07 12:40             ` Vincent Guittot
2014-01-06 16:21           ` Peter Zijlstra
2014-01-07  8:22             ` Vincent Guittot
2014-01-07  9:40           ` Preeti U Murthy
2014-01-07  9:50             ` Peter Zijlstra
2014-01-07 10:39               ` Preeti U Murthy
2014-01-07 11:13                 ` Peter Zijlstra
2014-01-07 16:31                   ` Preeti U Murthy
2014-01-07 11:20                 ` Morten Rasmussen
2014-01-07 12:31                 ` Vincent Guittot
2014-01-07 16:51                   ` Preeti U Murthy
2013-10-18 11:52 ` [RFC][PATCH v5 03/14] sched: define pack buddy CPUs Vincent Guittot
2013-10-18 11:52 ` [RFC][PATCH v5 04/14] sched: do load balance only with packing cpus Vincent Guittot
2013-10-18 11:52 ` [RFC][PATCH v5 05/14] sched: add a packing level knob Vincent Guittot
2013-11-12 10:32   ` Peter Zijlstra
2013-11-12 10:44     ` Vincent Guittot
2013-11-12 10:55       ` Peter Zijlstra
2013-11-12 10:57         ` Vincent Guittot
2013-10-18 11:52 ` [RFC][PATCH v5 06/14] sched: create a new field with available capacity Vincent Guittot
2013-11-12 10:34   ` Peter Zijlstra
2013-11-12 11:05     ` Vincent Guittot
2013-10-18 11:52 ` [RFC][PATCH v5 07/14] sched: get CPU's activity statistic Vincent Guittot
2013-11-12 10:36   ` Peter Zijlstra
2013-11-12 10:41   ` Peter Zijlstra
2013-10-18 11:52 ` [RFC][PATCH v5 08/14] sched: move load idx selection in find_idlest_group Vincent Guittot
2013-11-12 10:49   ` Peter Zijlstra
2013-11-27 14:10   ` [tip:sched/core] sched/fair: Move " tip-bot for Vincent Guittot
2013-10-18 11:52 ` [RFC][PATCH v5 09/14] sched: update the packing cpu list Vincent Guittot
2013-10-18 11:52 ` [RFC][PATCH v5 10/14] sched: init this_load to max in find_idlest_group Vincent Guittot
2013-10-18 11:52 ` [RFC][PATCH v5 11/14] sched: add a SCHED_PACKING_TASKS config Vincent Guittot
2013-10-18 11:52 ` [RFC][PATCH v5 12/14] sched: create a statistic structure Vincent Guittot
2013-10-18 11:52 ` [RFC][PATCH v5 13/14] sched: differantiate idle cpu Vincent Guittot
2013-10-18 11:52 ` [RFC][PATCH v5 14/14] cpuidle: set the current wake up latency Vincent Guittot
2013-11-11 11:33 ` Catalin Marinas [this message]
2013-11-11 16:36   ` [RFC][PATCH v5 00/14] sched: packing tasks Peter Zijlstra
2013-11-11 16:39     ` Arjan van de Ven
2013-11-11 18:18       ` Catalin Marinas
2013-11-11 18:20         ` Arjan van de Ven
2013-11-12 12:06         ` Morten Rasmussen
2013-11-12 16:48         ` Arjan van de Ven
2013-11-12 23:14           ` Catalin Marinas
2013-11-13 16:13             ` Arjan van de Ven
2013-11-13 16:45               ` Catalin Marinas
2013-11-13 17:56                 ` Arjan van de Ven
2013-11-12 17:40     ` Catalin Marinas
2013-11-25 18:55     ` Daniel Lezcano
2013-11-11 16:38   ` Peter Zijlstra
2013-11-11 16:40     ` Arjan van de Ven
2013-11-12 10:36     ` Vincent Guittot
2013-11-11 16:54   ` Morten Rasmussen
2013-11-11 18:31     ` Catalin Marinas
2013-11-11 19:26       ` Arjan van de Ven
2013-11-11 22:43         ` Nicolas Pitre
2013-11-11 23:43         ` Catalin Marinas
2013-11-12 12:35   ` Vincent Guittot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHkRjk69GNYtLGBSWCNcsCzkBHywKrD0qQQbNkJRpMbcdsCPyw@mail.gmail.com \
    --to=catalin.marinas@arm.com \
    --cc=Morten.Rasmussen@arm.com \
    --cc=alex.shi@intel.com \
    --cc=arjan@linux.intel.com \
    --cc=cmetcalf@tilera.com \
    --cc=corbet@lwn.net \
    --cc=l.majewski@samsung.com \
    --cc=len.brown@intel.com \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=preeti@linux.vnet.ibm.com \
    --cc=rjw@sisk.pl \
    --cc=tony.luck@intel.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.