All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
	"mingo@kernel.org" <mingo@kernel.org>,
	"rjw@rjwysocki.net" <rjw@rjwysocki.net>,
	"vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
	"daniel.lezcano@linaro.org" <daniel.lezcano@linaro.org>,
	"preeti@linux.vnet.ibm.com" <preeti@linux.vnet.ibm.com>,
	Dietmar Eggemann <Dietmar.Eggemann@arm.com>
Subject: Re: [RFC PATCH 06/16] arm: topology: Define TC2 sched energy and provide it to scheduler
Date: Tue, 3 Jun 2014 13:41:45 +0200	[thread overview]
Message-ID: <20140603114145.GX11096@twins.programming.kicks-ass.net> (raw)
In-Reply-To: <20140602141536.GL19967@e103034-lin>

[-- Attachment #1: Type: text/plain, Size: 3303 bytes --]

On Mon, Jun 02, 2014 at 03:15:36PM +0100, Morten Rasmussen wrote:
> > 
> > Talk to me about this core vs cluster thing.
> > 
> > Why would an architecture have multiple energy domains like this?

> The reason is that power domains are often organized in a hierarchy
> where you may be able to power down just a cpu or the entire cluster
> along with cluster wide shared resources. This is quite typical for ARM
> systems. Frequency domains (P-states) typically cover the same hardware
> as one of the power domain levels. That is, there might be several
> smaller power domains sharing the same frequency (P-state) or there
> might be a power domain spanning multiple frequency domains.
> 
> The main reason why we need to worry about all this is that it typically
> cost a lot more energy to use the first cpu in a cluster since you
> also need to power up all the shared hardware resources than the energy
> cost of waking and using additional cpus in the same cluster.
> 
> IMHO, the most natural way to model the energy is therefore something
> like:
> 
>     energy = energy_cluster + n * energy_cpu
> 
> Where 'n' is the number of cpus powered up and energy_cluster is the
> cost paid as soon as any cpu in the cluster is powered up.

OK, that makes sense, thanks! Maybe expand the doc/changelogs with this
because it wasn't immediately clear to me.

> > Also, in general, why would we need to walk the domain tree all the way
> > up, typically I would expect to stop walking once we've covered the two
> > cpu's we're interested in, because above that nothing changes.
> 
> True. In some cases we don't have to go all the way up. There is a
> condition in energy_diff_load() that bails out if the energy doesn't
> change further up the hierarchy. There might be scope for improving that
> condition though.
> 
> We can basically stop going up if the utilization of the domain is
> unchanged by the change we want to do. For example, we can ignore the
> next level above if a third cpu is keeping the domain up all the time
> anyway. In the 100% + 50% case above, putting another 50% task on the
> 50% cpu wouldn't affect the cluster according the proposed model, so it
> can be ignored. However, if we did the same on any of the two cpus in
> the 50% + 25% example we affect the cluster utilization and have to do
> the cluster level maths.
> 
> So we do sometimes have to go all the way up even if we are balancing
> two sibling cpus to determine the energy implications. At least if we
> want an energy score like energy_diff_load() produces. However, we might
> be able to take some other shortcuts if we are balancing load between
> two specific cpus (not wakeup/fork/exec balancing) as you point out. But
> there are cases where we need to continue up until the domain
> utilization is unchanged.

Right.. so my worry with this is scalability. We typically want to avoid
having to scan the entire machine, even for power aware balancing.

That said, I don't think we have a 'sane' model for really big hardware
(yet). Intel still hasn't really said anything much on that iirc, as
long as a single core is up, all the memory controllers in the numa
fabric need to be awake, not to mention to cost of keeping the dram
alive.



[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

  reply	other threads:[~2014-06-03 11:59 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-23 18:16 [RFC PATCH 00/16] sched: Energy cost model for energy-aware scheduling Morten Rasmussen
2014-05-23 18:16 ` [RFC PATCH 01/16] sched: Documentation for scheduler energy cost model Morten Rasmussen
2014-06-05  8:49   ` Vincent Guittot
2014-06-05 11:35     ` Morten Rasmussen
2014-06-05 15:02       ` Vincent Guittot
2014-05-23 18:16 ` [RFC PATCH 02/16] sched: Introduce CONFIG_SCHED_ENERGY Morten Rasmussen
2014-06-08  6:03   ` Henrik Austad
2014-06-09 10:20     ` Morten Rasmussen
2014-06-10  9:39       ` Peter Zijlstra
2014-06-10 10:06         ` Morten Rasmussen
2014-06-10 10:23           ` Peter Zijlstra
2014-06-10 11:17             ` Henrik Austad
2014-06-10 12:19               ` Peter Zijlstra
2014-06-10 11:24             ` Morten Rasmussen
2014-06-10 12:24               ` Peter Zijlstra
2014-06-10 14:41                 ` Morten Rasmussen
2014-05-23 18:16 ` [RFC PATCH 03/16] sched: Introduce sd energy data structures Morten Rasmussen
2014-05-23 18:16 ` [RFC PATCH 04/16] sched: Allocate and initialize sched energy Morten Rasmussen
2014-05-23 18:16 ` [RFC PATCH 05/16] sched: Add sd energy procfs interface Morten Rasmussen
2014-05-23 18:16 ` [RFC PATCH 06/16] arm: topology: Define TC2 sched energy and provide it to scheduler Morten Rasmussen
2014-05-30 12:04   ` Peter Zijlstra
2014-06-02 14:15     ` Morten Rasmussen
2014-06-03 11:41       ` Peter Zijlstra [this message]
2014-06-04 13:49         ` Morten Rasmussen
2014-06-03 11:44   ` Peter Zijlstra
2014-06-04 15:42     ` Morten Rasmussen
2014-06-04 16:16       ` Peter Zijlstra
2014-06-06 13:15         ` Morten Rasmussen
2014-06-06 13:43           ` Peter Zijlstra
2014-06-06 14:29             ` Morten Rasmussen
2014-06-12 15:05               ` Vince Weaver
2014-06-03 11:50   ` Peter Zijlstra
2014-06-04 16:02     ` Morten Rasmussen
2014-06-04 17:27       ` Peter Zijlstra
2014-06-04 21:56         ` Rafael J. Wysocki
2014-06-05  6:52           ` Peter Zijlstra
2014-06-05 15:03             ` Dirk Brandewie
2014-06-05 20:29               ` Yuyang Du
2014-06-06  8:05                 ` Peter Zijlstra
2014-06-06  0:35                   ` Yuyang Du
2014-06-06 10:50                     ` Peter Zijlstra
2014-06-06 12:13                       ` Ingo Molnar
2014-06-06 12:27                         ` Ingo Molnar
2014-06-06 14:11                           ` Morten Rasmussen
2014-06-07  2:33                           ` Nicolas Pitre
2014-06-09  8:27                             ` Morten Rasmussen
2014-06-09 13:22                               ` Nicolas Pitre
2014-06-11 11:02                                 ` Eduardo Valentin
2014-06-11 11:42                                   ` Morten Rasmussen
2014-06-11 11:43                                     ` Eduardo Valentin
2014-06-11 13:37                                       ` Morten Rasmussen
2014-06-07 23:53                         ` Yuyang Du
2014-06-07 23:26                       ` Yuyang Du
2014-06-09  8:59                         ` Morten Rasmussen
2014-06-09  2:15                           ` Yuyang Du
2014-06-10 10:16                         ` Peter Zijlstra
2014-06-10 17:01                           ` Nicolas Pitre
2014-06-10 18:35                           ` Yuyang Du
2014-06-06 16:27                     ` Jacob Pan
2014-06-06 13:03         ` Morten Rasmussen
2014-06-07  2:52         ` Nicolas Pitre
2014-05-23 18:16 ` [RFC PATCH 07/16] sched: Introduce system-wide sched_energy Morten Rasmussen
2014-05-23 18:16 ` [RFC PATCH 08/16] sched: Introduce SD_SHARE_CAP_STATES sched_domain flag Morten Rasmussen
2014-05-23 18:16 ` [RFC PATCH 09/16] sched, cpufreq: Introduce current cpu compute capacity into scheduler Morten Rasmussen
2014-05-23 18:16 ` [RFC PATCH 10/16] sched, cpufreq: Current compute capacity hack for ARM TC2 Morten Rasmussen
2014-05-23 18:16 ` [RFC PATCH 11/16] sched: Energy model functions Morten Rasmussen
2014-05-23 18:16 ` [RFC PATCH 12/16] sched: Task wakeup tracking Morten Rasmussen
2014-05-23 18:16 ` [RFC PATCH 13/16] sched: Take task wakeups into account in energy estimates Morten Rasmussen
2014-05-23 18:16 ` [RFC PATCH 14/16] sched: Use energy model in select_idle_sibling Morten Rasmussen
2014-05-23 18:16 ` [RFC PATCH 15/16] sched: Use energy to guide wakeup task placement Morten Rasmussen
2014-05-23 18:16 ` [RFC PATCH 16/16] sched: Disable wake_affine to broaden the scope of wakeup target cpus Morten Rasmussen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140603114145.GX11096@twins.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=Dietmar.Eggemann@arm.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=preeti@linux.vnet.ibm.com \
    --cc=rjw@rjwysocki.net \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.