All of lore.kernel.org
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Morten Rasmussen <Morten.Rasmussen@arm.com>,
	"mingo@kernel.org" <mingo@kernel.org>,
	"arjan@linux.intel.com" <arjan@linux.intel.com>,
	"vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
	"preeti@linux.vnet.ibm.com" <preeti@linux.vnet.ibm.com>,
	"alex.shi@intel.com" <alex.shi@intel.com>,
	"efault@gmx.de" <efault@gmx.de>,
	"pjt@google.com" <pjt@google.com>,
	"len.brown@intel.com" <len.brown@intel.com>,
	"corbet@lwn.net" <corbet@lwn.net>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
	"tglx@linutronix.de" <tglx@linutronix.de>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linaro-kernel@lists.linaro.org" <linaro-kernel@lists.linaro.org>
Subject: Re: [RFC][PATCH 0/9] sched: Power scheduler design proposal
Date: Tue, 16 Jul 2013 13:42:48 +0100	[thread overview]
Message-ID: <20130716124248.GB10036@arm.com> (raw)
In-Reply-To: <20130715203922.GD23818@dyad.programming.kicks-ass.net>

On Mon, Jul 15, 2013 at 09:39:22PM +0100, Peter Zijlstra wrote:
> On Sat, Jul 13, 2013 at 11:23:51AM +0100, Catalin Marinas wrote:
> > > This looks like a userspace hotplug deamon approach lifted to kernel space :/
> > 
> > The difference is that this is faster. We even had hotplug in mind some
> > years ago for big.LITTLE but it wouldn't give the performance we need
> > (hotplug is incredibly slow even if driven from the kernel).
> 
> faster, slower, still horrid :-)

Hotplug for power management is horrid, I agree, but it depends on how
you look at the problem. What we need (at least or ARM) is to leave a
socket/cluster idle when the number of tasks is sufficient to run on the
other. The old power saving scheduling used to have some hierarchy with
different balancing policies per level of hierarchy. IIRC this was too
complex with 9 possible states and some chance of going to to 27. To get
a simpler replacement, just left-packing of tasks does not work either,
so you need some power topology information into the scheduler.

I can see two approaches with regards to task placement:

1. Get the load balancer to pack tasks in a way to optimise performance
   within a socket but let other sockets idle.
2. Have another entity (power scheduler as per Morten's patches) decide
   which sockets to be used and let the main scheduler do its best
   within those constraints.

With (2) you have little changes to the main load balancer with reduced
state space (basically it only cares about CPU capacities rather than
balancing policies at different levels). We then keep the power
topology, feedback from the low-level driver (like what can/cannot be
done) into the separate power scheduler entity. I would say the load
balancer state space from a power awareness perspective is linearised.

> > That's what we've been pushing for. From a big.LITTLE perspective, I
> > would probably vote for Vincent's patches but I guess we could probably
> > adapt any of the other options.
> > 
> > But then we got Ingo NAK'ing all these approaches. Taking the best bits
> > from the current load balancing patches would create yet another set of
> > patches which don't fall under Ingo's requirements (at least as I
> > understand them).
> 
> Right, so Ingo is currently away as well -- should be back 'today' or tomorrow.
> But I suspect he mostly fell over the presentation. 
> 
> I've never known Ingo to object to doing incremental development; in fact he
> often suggests doing so.
> 
> So don't present the packing thing as a power aware scheduler; that
> presentation suggests its the complete deal. Give instead a complete
> description of the problem; and tell how the current patch set fits into that
> and which aspect it solves; and that further patches will follow to sort the
> other issues.

Thanks for the clarification ;).

> > > Then worry about power thingies.
> > 
> > To quote Ingo: "To create a new low level idle driver mechanism the
> > scheduler could use and integrate proper power saving / idle policy into
> > the scheduler."
> > 
> > That's unless we all agree (including Ingo) that the above requirement
> > is orthogonal to task packing and, as a *separate* project, we look at
> > better integrating the cpufreq/cpuidle with the scheduler, possibly with
> > a new driver model and governors as libraries used by such drivers. In
> > which case the current packing patches shouldn't be NAK'ed but reviewed
> > so that they can be improved further or rewritten.
> 
> Right, so first thing would be to list all the thing that need doing:
> 
>  - integrate idle guestimator
>  - intergrate cpufreq stats
>  - fix per entity runtime vs cpufreq
>  - intrgrate/redo cpufreq
>  - add packing features
>  - {all the stuff I forgot}
> 
> Then see what is orthogonal and what is most important and get people to agree
> to an order. Then go..

It sounds fine, not different from what we've thought. A problem is that
task packing on its own doesn't give any clear view of what the overall
solution will look like, so I assume you/Ingo would like to see the
bigger picture (though probably not the complete implementation but
close enough).

Morten's power scheduler tries to address the above and it will grow
into controlling a new model of power driver (and taking into account
Arjan's and others' comments regarding the API). At the same time, we
need some form of task packing. The power scheduler can drive this
(currently via cpu_power) or can simply turn a knob if there are better
options that will be accepted in the scheduler.

> > I agree in general but there is the intel_pstate.c driver which has it's
> > own separate statistics that the scheduler does not track. 
> 
> Right, question is how much of that will survive Arjan next-gen effort.

I think all Arjan's care about is a simple go_fastest() API ;).

> > We could move
> > to invariant task load tracking which uses aperf/mperf (and could do
> > similar things with perf counters on ARM). As I understand from Arjan,
> > the new pstate driver will be different, so we don't know exactly what
> > it requires.
> 
> Right, so part of the effort should be understanding what the various parties
> want/need. As far as I understand the Intel stuff, P states are basically
> useless and the only useful state to ever program is the max one -- although
> I'm sure Arjan will eventually explain how that is wrong :-)
> 
> We could do optional things; I'm not much for 'requiring' stuff that other
> arch simply cannot support, or only support at great effort/cost.
> 
> Stealing PMU counters for sched work would be crossing the line for me, that
> must be optional.

I agree, it should be optional.

-- 
Catalin

  reply	other threads:[~2013-07-16 12:44 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-07-09 15:55 [RFC][PATCH 0/9] sched: Power scheduler design proposal Morten Rasmussen
2013-07-09 15:55 ` [RFC][PATCH 1/9] sched: Introduce power scheduler Morten Rasmussen
2013-07-09 16:48   ` Arjan van de Ven
2013-07-10  2:10   ` Arjan van de Ven
2013-07-10 11:11     ` Morten Rasmussen
2013-07-10 11:19       ` Vincent Guittot
2013-07-09 15:55 ` [RFC][PATCH 2/9] sched: Redirect update_cpu_power to sched/power.c Morten Rasmussen
2013-07-09 15:55 ` [RFC][PATCH 3/9] sched: Make select_idle_sibling() skip cpu with a cpu_power of 1 Morten Rasmussen
2013-07-09 15:55 ` [RFC][PATCH 4/9] sched: Make periodic load-balance disregard cpus " Morten Rasmussen
2013-07-09 15:55 ` [RFC][PATCH 5/9] sched: Make idle_balance() skip " Morten Rasmussen
2013-07-09 15:55 ` [RFC][PATCH 6/9] sched: power: add power_domain data structure Morten Rasmussen
2013-07-09 15:55 ` [RFC][PATCH 7/9] sched: power: Add power driver interface Morten Rasmussen
2013-07-09 15:55 ` [RFC][PATCH 8/9] sched: power: Add initial frequency scaling support to power scheduler Morten Rasmussen
2013-07-10 13:10   ` Arjan van de Ven
2013-07-12 12:51     ` Morten Rasmussen
2013-07-12 13:06       ` Catalin Marinas
2013-07-12 15:37       ` Arjan van de Ven
2013-07-09 15:55 ` [RFC][PATCH 9/9] sched: power: cpufreq: Initial schedpower cpufreq governor Morten Rasmussen
2013-07-09 16:58 ` [RFC][PATCH 0/9] sched: Power scheduler design proposal Arjan van de Ven
2013-07-10 11:16   ` Morten Rasmussen
2013-07-10 13:05     ` Arjan van de Ven
2013-07-12 12:46       ` Morten Rasmussen
2013-07-12 15:35         ` Arjan van de Ven
2013-07-12 13:00       ` Catalin Marinas
2013-07-12 15:44         ` Arjan van de Ven
2013-07-11 11:34   ` Preeti U Murthy
2013-07-12 13:48     ` Morten Rasmussen
2013-07-15  3:43       ` Preeti U Murthy
2013-07-15  9:55         ` Catalin Marinas
2013-07-15 15:24           ` Arjan van de Ven
2013-07-12 13:31   ` Catalin Marinas
2013-07-13  6:49 ` Peter Zijlstra
2013-07-13 10:23   ` Catalin Marinas
2013-07-15  7:53     ` Vincent Guittot
2013-07-15 20:39     ` Peter Zijlstra
2013-07-16 12:42       ` Catalin Marinas [this message]
2013-07-16 15:23         ` Arjan van de Ven
2013-07-17 14:14           ` Catalin Marinas
2013-07-24 13:50             ` Morten Rasmussen
2013-07-24 15:16               ` Arjan van de Ven
2013-07-24 16:46                 ` Morten Rasmussen
2013-07-24 16:48                   ` Arjan van de Ven
2013-07-25  8:00                     ` Morten Rasmussen
2013-07-13 14:40   ` Arjan van de Ven
2013-07-15 19:59     ` Peter Zijlstra
2013-07-15 20:37       ` Arjan van de Ven
2013-07-15 21:03         ` Peter Zijlstra
2013-07-15 22:46           ` Arjan van de Ven
2013-07-16 20:45             ` David Lang
2013-07-15 20:41       ` Arjan van de Ven
2013-07-15 21:06         ` Peter Zijlstra
2013-07-15 21:12           ` Peter Zijlstra
2013-07-15 22:52             ` Arjan van de Ven
2013-07-16 17:38               ` Peter Zijlstra
2013-07-16 18:44                 ` Arjan van de Ven
2013-07-16 19:21                   ` Peter Zijlstra
2013-07-16 19:57                     ` Arjan van de Ven
2013-07-16 20:17                       ` Peter Zijlstra
2013-07-16 20:21                         ` Arjan van de Ven
2013-07-16 20:32                         ` Arjan van de Ven
2013-07-15 22:46           ` Arjan van de Ven
2013-07-13 16:14   ` Arjan van de Ven
2013-07-15  2:05     ` Alex Shi
2013-07-24 13:16   ` Morten Rasmussen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130716124248.GB10036@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=Morten.Rasmussen@arm.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.shi@intel.com \
    --cc=arjan@linux.intel.com \
    --cc=corbet@lwn.net \
    --cc=efault@gmx.de \
    --cc=len.brown@intel.com \
    --cc=linaro-kernel@lists.linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=preeti@linux.vnet.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.