From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1030194AbbD1NGt (ORCPT <rfc822;w@1wt.eu>);
	Tue, 28 Apr 2015 09:06:49 -0400
Received: from casper.infradead.org ([85.118.1.10]:52267 "EHLO
	casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S965502AbbD1NGr (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Tue, 28 Apr 2015 09:06:47 -0400
Date: Tue, 28 Apr 2015 15:06:32 +0200
From: Peter Zijlstra <peterz@infradead.org>
To: Michael Turquette <mturquette@linaro.org>
Cc: Juri Lelli <juri.lelli@arm.com>,
        Morten Rasmussen <Morten.Rasmussen@arm.com>,
        "mingo@redhat.com" <mingo@redhat.com>,
        "vincent.guittot@linaro.org" <vincent.guittot@linaro.org>,
        Dietmar Eggemann <Dietmar.Eggemann@arm.com>,
        "yuyang.du@intel.com" <yuyang.du@intel.com>,
        "preeti@linux.vnet.ibm.com" <preeti@linux.vnet.ibm.com>,
        "nico@linaro.org" <nico@linaro.org>,
        "rjw@rjwysocki.net" <rjw@rjwysocki.net>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [RFCv3 PATCH 33/48] sched: Energy-aware wake-up task placement
Message-ID: <20150428130632.GA23123@twins.programming.kicks-ass.net>
References: <1423074685-6336-1-git-send-email-morten.rasmussen@arm.com>
 <1423074685-6336-34-git-send-email-morten.rasmussen@arm.com>
 <20150324163503.GZ23123@twins.programming.kicks-ass.net>
 <5512F7F2.2010705@arm.com>
 <20150325181413.GT21418@twins.programming.kicks-ass.net>
 <5513DDA4.10802@arm.com>
 <20150326104150.GW21418@twins.programming.kicks-ass.net>
 <20150427160113.16410.10935@quantum>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20150427160113.16410.10935@quantum>
User-Agent: Mutt/1.5.21 (2012-12-30)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On Mon, Apr 27, 2015 at 09:01:13AM -0700, Michael Turquette wrote:
> Quoting Peter Zijlstra (2015-03-26 03:41:50)
> > On Thu, Mar 26, 2015 at 10:21:24AM +0000, Juri Lelli wrote:
> > >  - what about other sched classes? I know that this is very premature,
> > >    but I can help but thinking that we'll need to do some sort of
> > >    aggregation of requests, and if we put triggers in very specialized
> > >    points we might lose some of the sched classes separation
> > 
> > So for deadline we can do P state selection (as you're well aware) based
> > on the requested utilization. Not sure what to do for fifo/rr though,
> > they lack much useful information (as always).
> > 
> > Now if we also look ahead to things like the ACPI CPPC stuff we'll see
> > that CFS and DL place different requirements on the hints. Where CFS
> > would like to hint a max perf (the hardware going slower due to the code
> > consisting of mostly stalls is always fine from a best effort energy
> > pov), the DL stuff would like to hint a min perf, seeing how it 'needs'
> > to provide a QoS.
> > 
> > So we either need to carry this information along in a 'generic' way
> > between the various classes or put the hinting in every class.
> > 
> > But yes, food for thought for sure.
> 
> I am a fan of putting the hints in every class. One idea I've been
> considering is that each sched class could have a small, simple cpufreq
> governor that expresses its constraints (max for cfs, min qos for dl)
> and then the cpufreq core Does The Right Thing.
> 
> This would be a multi-governor approach, which requires some surgery to
> cpufreq core code, but I like the modularity and maintainability of it
> more than having one big super governor that has to satisfy every need.

Well, at that point we really don't need cpufreq anymore do we? All
you need is the hardware driver (ACPI P-state, ACPI CPPC etc.).

Because as I understand it, cpufreq currently is mostly the governor
thing (which we'll replace) and some infra for dealing with these head
cases that require scheduling for changing P states (which we can leave
on cpufreq proper for the time being).

Would it no be easier to just start from scratch and convert the (few)
drivers we need to prototype this? Instead of trying to drag the
entirety of cpufreq along just to keep all the drivers?