linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Ingo Molnar <mingo@kernel.org>
To: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>,
	"alex.shi@intel.com" <alex.shi@intel.com>,
	"peterz@infradead.org" <peterz@infradead.org>,
	"preeti@linux.vnet.ibm.com" <preeti@linux.vnet.ibm.com>,
	"vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
	"efault@gmx.de" <efault@gmx.de>,
	"pjt@google.com" <pjt@google.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linaro-kernel@lists.linaro.org" <linaro-kernel@lists.linaro.org>,
	"len.brown@intel.com" <len.brown@intel.com>,
	"corbet@lwn.net" <corbet@lwn.net>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	Catalin Marinas <Catalin.Marinas@arm.com>
Subject: Re: power-efficient scheduling design
Date: Sun, 23 Jun 2013 12:55:05 +0200	[thread overview]
Message-ID: <20130623105505.GC20084@gmail.com> (raw)
In-Reply-To: <20130621150656.GK5460@e103034-lin>


* Morten Rasmussen <morten.rasmussen@arm.com> wrote:

> On Tue, Jun 18, 2013 at 04:20:28PM +0100, Arjan van de Ven wrote:
> > On 6/14/2013 9:05 AM, Morten Rasmussen wrote:
> > 
> > > Looking at the discussion it seems that people have slightly different
> > > views, but most agree that the goal is an integrated scheduling,
> > > frequency, and idle policy like you pointed out from the beginning.
> > 
> > 
> > ... except that such a solution does not really work for Intel hardware.
> > 
> > The OS does not get to really pick the CPU "frequency" (never mind 
> > that frequency is not what gets controlled), the hardware picks the 
> > frequency. The OS can do some level of requests (best to think of this 
> > as a percentage more than frequency) but what you actually get is more 
> > often than not what you asked for.
> > 
> > You can look in hindsight what kind of performance you got (from some 
> > basic counters in MSRs), and the scheduler can use that to account 
> > backwards to what some process got. But to predict what you will get 
> > in the future...... that's near impossible on any realistic system 
> > nowadays (and even more so in the future).
> 
> The proposed power scheduler doesn't have to drive p-state selection if 
> it doesn't make sense for the particular platform. The aim of the power 
> scheduler is integration of power policies in general.

Exactly.

> > Treating "frequency" (well "performance) and idle separately is also a 
> > false thing to do (yes I know in 3.9/3.10 we still do that for Intel 
> > hw, but we're working on fixing that). They are by no means separate 
> > things. One guy's idle state is the other guys power budget (and thus 
> > performance)!.
> 
> I agree.
> 
> Based on our discussions so far, where it has become more clear where 
> Intel is heading, and Ingo's reply I think we have three ways to ahead 
> with the power-aware scheduling work. Each with their advantages and 
> disadvantages:
> 
> 1. We work on a generic power scheduler with appropriate abstractions 
> that will work for all of us. Current and future Intel p-state policies 
> will be implemented through the power scheduler.
> 
> Pros: We can arrive at fairly standard solution with standard tunables. 
> There will be one interface to the scheduler.

This is what we prefer really, made available under CONFIG_SCHED_POWER=y.

With CONFIG_SCHED_POWER=y, or if low level facilities are not (yet) 
available then the kernel falls back to legacy (current) behavior.

> Cons: Finding a suitable platform abstraction for the power scheduler.

Just do it incrementally. Start from the dumbest possible state: all CPUs 
are powered up fully, there's no idle state selection essentially. Then go 
for the biggest effect first and add the ability to idle in a lower power 
state (with new functions and a low level driver that implements this for 
the platform with no policy embedded into it - just p-state switching 
logic), and combine that with task packing.

Then do small, measured steps to integrate more and more facilities, the 
ability to turn off more and more hardware, etc. The more basic steps you 
can figure out to iterate this, the better.

Important: it's not a problem that the initial code won't outperform the 
current kernel's performance. It should outperform the _initial_ 'dumb'
code in the first step. Then the next step should outperform the previous 
step, etc.

The quality of this iterative approach will eventually surpass the 
combined effect of currently available but non-integrated facilities.

Since this can be done without touching all the other existing facilities 
it's fundamentally non-intrusive.

An initial implementation should probably cover just two platforms, a 
modern ARM platform and Intel - those two are far enough from each other 
so that if a generic approach helps both we are reasonably certain that 
the generalization makes sense.

The new code could live under a new file in kernel/sched/power.c, to 
separate it out in a tidy fashion, and to make it easy to understand.

> 2. Like 1, but we introduce a CONFIG_SCHED_POWER as suggested by Ingo, 
> that makes it all go away.

That's not really what CONFIG_SCHED_POWER should do: its purpose is to 
allow a 'legacy power saving mode' that makes any new logic go away.

> Pros: Intel can keep intel_pstate.c others can use the power scheduler 
> or their own driver.
> 
> Cons: Different platform specific drivers may need different interfaces 
> to the scheduler. Harder to define cross-platform tunables.
> 
> 3. We go for independent platform specific power policy driver that may 
> or may not use existing frameworks, like intel_pstate.c.

And that's a NAK from the scheduler maintainers.

Thanks,

	Ingo

      reply	other threads:[~2013-06-23 10:55 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-30 13:47 [RFC] Comparison of power-efficient scheduling patch sets Morten Rasmussen
2013-05-31  1:17 ` Alex Shi
2013-05-31  8:23   ` Alex Shi
2013-05-31 10:52 ` power-efficient scheduling design Ingo Molnar
2013-06-03 14:59   ` Arjan van de Ven
2013-06-03 15:43     ` Ingo Molnar
2013-06-04 15:03   ` Morten Rasmussen
2013-06-07  6:26     ` Preeti U Murthy
2013-06-20 15:23     ` Ingo Molnar
2013-06-05  9:56   ` Amit Kucheria
2013-06-07  6:03   ` Preeti U Murthy
2013-06-07 14:51     ` Catalin Marinas
2013-06-07 18:08       ` Preeti U Murthy
2013-06-07 17:36         ` David Lang
2013-06-09  4:33           ` Preeti U Murthy
2013-06-08 11:28         ` Catalin Marinas
2013-06-08 14:02           ` Rafael J. Wysocki
2013-06-09  3:42             ` Preeti U Murthy
2013-06-09 22:53               ` Catalin Marinas
2013-06-10 16:25               ` Daniel Lezcano
2013-06-12  0:27                 ` David Lang
2013-06-12  1:48                   ` Arjan van de Ven
2013-06-12  9:48                     ` Amit Kucheria
2013-06-12 16:22                       ` David Lang
2013-06-12 10:20                     ` Catalin Marinas
2013-06-12 15:24                       ` Arjan van de Ven
2013-06-12 17:04                         ` Catalin Marinas
2013-06-12  9:50                   ` Daniel Lezcano
2013-06-12 16:30                     ` David Lang
2013-06-11  0:50               ` Rafael J. Wysocki
2013-06-13  4:32                 ` Preeti U Murthy
2013-06-09  4:23           ` Preeti U Murthy
2013-06-07 15:23     ` Arjan van de Ven
2013-06-14 16:05   ` Morten Rasmussen
2013-06-17 11:23     ` Catalin Marinas
2013-06-18  1:37     ` David Lang
2013-06-18 10:23       ` Morten Rasmussen
2013-06-18 17:39         ` David Lang
2013-06-19 12:39           ` Morten Rasmussen
2013-06-18 15:20     ` Arjan van de Ven
2013-06-18 17:47       ` David Lang
2013-06-18 19:36         ` Arjan van de Ven
2013-06-19 15:39         ` Arjan van de Ven
2013-06-19 17:00           ` Morten Rasmussen
2013-06-19 17:08             ` Arjan van de Ven
2013-06-21  8:50               ` Morten Rasmussen
2013-06-21 15:29                 ` Arjan van de Ven
2013-06-21 15:38                 ` Arjan van de Ven
2013-06-21 21:23                   ` Catalin Marinas
2013-06-21 21:34                     ` Arjan van de Ven
2013-06-23 23:32                       ` Benjamin Herrenschmidt
2013-06-24 10:07                         ` Catalin Marinas
2013-06-24 15:26                         ` Arjan van de Ven
2013-06-24 21:59                           ` Benjamin Herrenschmidt
2013-06-24 23:10                             ` Arjan van de Ven
2013-06-18 19:06       ` Catalin Marinas
2013-06-21 15:06       ` Morten Rasmussen
2013-06-23 10:55         ` Ingo Molnar [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130623105505.GC20084@gmail.com \
    --to=mingo@kernel.org \
    --cc=Catalin.Marinas@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@intel.com \
    --cc=arjan@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=efault@gmx.de \
    --cc=len.brown@intel.com \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=preeti@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).