linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Preeti U Murthy <preeti@linux.vnet.ibm.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>,
	alex.shi@intel.com, peterz@infradead.org,
	vincent.guittot@linaro.org, efault@gmx.de, pjt@google.com,
	linux-kernel@vger.kernel.org, linaro-kernel@lists.linaro.org,
	arjan@linux.intel.com, len.brown@intel.com, corbet@lwn.net,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	tglx@linutronix.de
Subject: Re: power-efficient scheduling design
Date: Fri, 07 Jun 2013 11:33:22 +0530	[thread overview]
Message-ID: <51B177AA.1000600@linux.vnet.ibm.com> (raw)
In-Reply-To: <20130531105204.GE30394@gmail.com>

Hi,

On 05/31/2013 04:22 PM, Ingo Molnar wrote:
> PeterZ and me tried to point out the design requirements previously, but 
> it still does not appear to be clear enough to people, so let me spell it 
> out again, in a hopefully clearer fashion.
> 
> The scheduler has valuable power saving information available:
> 
>  - when a CPU is busy: about how long the current task expects to run
> 
>  - when a CPU is idle: how long the current CPU expects _not_ to run
> 
>  - topology: it knows how the CPUs and caches interrelate and already 
>    optimizes based on that
> 
>  - various high level and low level load averages and other metrics about 
>    the recent past that show how busy a particular CPU is, how busy the 
>    whole system is, and what the runtime properties of individual tasks is 
>    (how often it sleeps, etc.)
> 
> so the scheduler is in an _ideal_ position to do a judgement call about 
> the near future and estimate how deep an idle state a CPU core should 
> enter into and what frequency it should run at.

I don't think the problem lies in the fact that scheduler is not making
these decisions about which idle state the CPU should enter or which
frequency the CPU should run at.

IIUC, I think the problem lies in the part where although the
*cpuidle and cpufrequency governors are co-operating with the scheduler,
the scheduler is not doing the same.*

Let me elaborate with respect to cpuidle subsystem. When the scheduler
chooses the CPUs to run tasks on, it leaves certain other CPUs idle. The
cpuidle governor then evaluates, among other things, the load average of
the CPUs, before deciding to put it into an ideal idle state. With the
PJT's metric, an idle CPU's load average degrades over time and cpuidle
governor will perhaps decide to put such CPUs to deep idle states.

But the problem surfaces when scheduler gets to choose a CPU to run
new/woken up tasks on. It chooses the *idlest_cpu* to run the task on
without considering how deep an idle state that CPU is in,if at all it
is in an idle state. It would end up waking a deep sleeping CPU, which
will *hinder power savings*.

I think here is where we need to focus. Currently, there is no
*two way co-operation between the scheduler and cpuidle/cpufrequency*
subsystems, which makes no sense. In the above case for instance
scheduler prompts the cpuidle governor to put CPU to idle state and
comes back to hamper that move.

> 
> The scheduler is also at a high enough level to host a "I want maximum 
> performance, power does not matter to me" user policy override switch and 
> similar user policy details.
> 
> No ifs and whens about that.
> 
> Today the power saving landscape is fragmented and sad: we just randomly 
> interface scheduler task packing changes with some idle policy (and 
> cpufreq policy), which might or might not combine correctly.

I would repeat here that today we interface cpuidle/cpufrequency
policies with scheduler but not the other way around. They do their bit
when a cpu is busy/idle. However scheduler does not see that somebody
else is taking instructions from it and comes back to give different
instructions!

Therefore I think among other things, this is one fundamental issue that
we need to resolve in the steps towards better power savings through
scheduler.

Regards
Preeti U Murthy



  parent reply	other threads:[~2013-06-07  6:06 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-30 13:47 [RFC] Comparison of power-efficient scheduling patch sets Morten Rasmussen
2013-05-31  1:17 ` Alex Shi
2013-05-31  8:23   ` Alex Shi
2013-05-31 10:52 ` power-efficient scheduling design Ingo Molnar
2013-06-03 14:59   ` Arjan van de Ven
2013-06-03 15:43     ` Ingo Molnar
2013-06-04 15:03   ` Morten Rasmussen
2013-06-07  6:26     ` Preeti U Murthy
2013-06-20 15:23     ` Ingo Molnar
2013-06-05  9:56   ` Amit Kucheria
2013-06-07  6:03   ` Preeti U Murthy [this message]
2013-06-07 14:51     ` Catalin Marinas
2013-06-07 18:08       ` Preeti U Murthy
2013-06-07 17:36         ` David Lang
2013-06-09  4:33           ` Preeti U Murthy
2013-06-08 11:28         ` Catalin Marinas
2013-06-08 14:02           ` Rafael J. Wysocki
2013-06-09  3:42             ` Preeti U Murthy
2013-06-09 22:53               ` Catalin Marinas
2013-06-10 16:25               ` Daniel Lezcano
2013-06-12  0:27                 ` David Lang
2013-06-12  1:48                   ` Arjan van de Ven
2013-06-12  9:48                     ` Amit Kucheria
2013-06-12 16:22                       ` David Lang
2013-06-12 10:20                     ` Catalin Marinas
2013-06-12 15:24                       ` Arjan van de Ven
2013-06-12 17:04                         ` Catalin Marinas
2013-06-12  9:50                   ` Daniel Lezcano
2013-06-12 16:30                     ` David Lang
2013-06-11  0:50               ` Rafael J. Wysocki
2013-06-13  4:32                 ` Preeti U Murthy
2013-06-09  4:23           ` Preeti U Murthy
2013-06-07 15:23     ` Arjan van de Ven
2013-06-14 16:05   ` Morten Rasmussen
2013-06-17 11:23     ` Catalin Marinas
2013-06-18  1:37     ` David Lang
2013-06-18 10:23       ` Morten Rasmussen
2013-06-18 17:39         ` David Lang
2013-06-19 12:39           ` Morten Rasmussen
2013-06-18 15:20     ` Arjan van de Ven
2013-06-18 17:47       ` David Lang
2013-06-18 19:36         ` Arjan van de Ven
2013-06-19 15:39         ` Arjan van de Ven
2013-06-19 17:00           ` Morten Rasmussen
2013-06-19 17:08             ` Arjan van de Ven
2013-06-21  8:50               ` Morten Rasmussen
2013-06-21 15:29                 ` Arjan van de Ven
2013-06-21 15:38                 ` Arjan van de Ven
2013-06-21 21:23                   ` Catalin Marinas
2013-06-21 21:34                     ` Arjan van de Ven
2013-06-23 23:32                       ` Benjamin Herrenschmidt
2013-06-24 10:07                         ` Catalin Marinas
2013-06-24 15:26                         ` Arjan van de Ven
2013-06-24 21:59                           ` Benjamin Herrenschmidt
2013-06-24 23:10                             ` Arjan van de Ven
2013-06-18 19:06       ` Catalin Marinas
2013-06-21 15:06       ` Morten Rasmussen
2013-06-23 10:55         ` Ingo Molnar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51B177AA.1000600@linux.vnet.ibm.com \
    --to=preeti@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@intel.com \
    --cc=arjan@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=efault@gmx.de \
    --cc=len.brown@intel.com \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).