LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Patrick Bellasi <patrick.bellasi@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>,
	linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	Ingo Molnar <mingo@redhat.com>, Tejun Heo <tj@kernel.org>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Paul Turner <pjt@google.com>,
	Quentin Perret <quentin.perret@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Todd Kjos <tkjos@google.com>, Joel Fernandes <joelaf@google.com>,
	Steve Muckle <smuckle@google.com>,
	Suren Baghdasaryan <surenb@google.com>
Subject: Re: [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default
Date: Mon, 17 Sep 2018 13:27:23 +0100
Message-ID: <20180917122723.GS1413@e110439-lin> (raw)
In-Reply-To: <20180914142813.GM24124@hirez.programming.kicks-ass.net>

On 14-Sep 16:28, Peter Zijlstra wrote:
> Just a quick reply because I have to run..
> 
> On Fri, Sep 14, 2018 at 03:07:32PM +0100, Patrick Bellasi wrote:
> > On 14-Sep 13:10, Peter Zijlstra wrote:
> 
> > > I think the problem here is that the two are conflated in the very same
> > > interface.
> > > 
> > > Would it make sense to move the available clamp values out to some sysfs
> > > interface like thing and guard that with a capability, while keeping the
> > > task interface unprivilidged?
> > 
> > You mean something like:
> > 
> >    $ cat /proc/sys/kernel/sched_uclamp_min_utils
> >    0 10 20 ... 100
> > 
> > to notify users about the set of clamp values which are available ?
> > 
> > > Another thing that has me 'worried' about this interface is the direct
> > > tie to CPU capacity (not that I have a better suggestion). But it does
> > > raise the point of how userspace is going to discover the relevant
> > > values of the platform.
> > 
> > This point worries me too, and that's what I think is addressed in a
> > sane way in:
> > 
> >    [PATCH v4 13/16] sched/core: uclamp: use percentage clamp values
> >    https://lore.kernel.org/lkml/20180828135324.21976-14-patrick.bellasi@arm.com/
> > 
> > IMHO percentages are a reasonably safe and generic API to expose to
> > user-space. Don't you think this should address your concern above ?
> 
> Not at all what I meant, and no, percentages don't help.
> 
> The thing is, the values you'd want to use are for example the capacity
> of the little CPUs. or the capacity of the most energy efficient OPP
> (the knee).

I don't think so.

On the knee topic, we had some thinking and on most platforms it seems
to be a rather arbitrary decision.

On sane platforms, the Energy Efficiency (EE) is monotonically
decreasing with frequency increase.  Maybe we can define a threshold
for a "EE derivative ratio", but it will still be quite arbitrary.
Moreover, it could be that in certain use-cases we want to push for
higher energy efficiency (i.e. lower derivatives) then others.

> Similarly for boosting, how are we 'easily' going to find the values
> that correspond to the various available OPPs.

In our experience with SchedTune on Android, we found that we
generally focus on a small set of representative use-cases and then
run an exploration, by tuning the percentage of boost, to identify the
optimal trade-off between Performance and Energy.
The value you get could be something which do not match exactly an OPP
but still, since we (will) bias not only OPP selection but also tasks
placement, it's the one which makes most sense.

Thus, the capacity of little CPUs, or the exact capacity of an OPP, is
something we don't care to specify exactly, since:

 - schedutil will top the util request to the next frequency anyway

 - capacity by itself is a loosely defined metric, since it's usually
   measured considering a specific kind of instructions mix, which
   can be very different from the actual instruction mix (e.g. integer
   vs floating point)

 - certain platforms don't even expose OPPs, but just "performance
   levels"... which ultimately are a "percentage"

 - there are so many rounding errors around on utilization tracking
   and it aggregation that being exact on an OPP if of "relative"
   importance

Do you see specific use-cases where an exact OPP capacity is much
better then a percentage value ?

Of course there can be scenarios in which wa want to clamp to a
specific OPP. But still, why should it be difficult for a platform
integrator to express it as a close enough percentage value ?

> The EAS thing might have these around; but I forgot if/how they're
> exposed to userspace (I'll have to soon look at the latest posting).

The new "Energy Model Management" framework can certainly be use to
get the list of OPPs for each frequency domain. IMO this could be
used to identify the maximum number of clamp groups we can have.
In this case, the discretization patch can translate a generic
percentage clamp into the closest OPP capacity... 

... but to me that's an internal detail which I'm not convinced we
don't need to expose to user-space.

IMHO we should instead focus just on defining a usable and generic
userspace interface. Then, platform specific tuning is something
user-space can do, either offline or on-line.

> But changing the clamp metric to something different than these values
> is going to be pain.

Maybe I don't completely get what you mean here... are you saying that
not using exact capacity values to defined clamps is difficult ?
If that's the case why? Can you elaborate with an example ?

Cheers,
Patrick

-- 
#include <best/regards.h>

Patrick Bellasi

  reply index

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-28 13:53 [PATCH v4 00/16] Add utilization clamping support Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 01/16] sched/core: uclamp: extend sched_setattr to support utilization clamping Patrick Bellasi
2018-09-05 11:01   ` Juri Lelli
2018-08-28 13:53 ` [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups Patrick Bellasi
2018-09-05 10:45   ` Juri Lelli
2018-09-06 13:48     ` Patrick Bellasi
2018-09-06 14:13       ` Juri Lelli
2018-09-06  8:17   ` Juri Lelli
2018-09-06 14:00     ` Patrick Bellasi
2018-09-08 23:47   ` Suren Baghdasaryan
2018-09-12 10:32     ` Patrick Bellasi
2018-09-12 13:49   ` Peter Zijlstra
2018-09-12 15:56     ` Patrick Bellasi
2018-09-12 16:12       ` Peter Zijlstra
2018-09-12 17:35         ` Patrick Bellasi
2018-09-12 17:42           ` Peter Zijlstra
2018-09-12 17:52             ` Patrick Bellasi
2018-09-13 19:14               ` Peter Zijlstra
2018-09-14  8:51                 ` Patrick Bellasi
2018-09-12 16:24   ` Peter Zijlstra
2018-09-12 17:42     ` Patrick Bellasi
2018-09-13 19:20       ` Peter Zijlstra
2018-09-14  8:47         ` Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 03/16] sched/core: uclamp: add CPU's clamp groups accounting Patrick Bellasi
2018-09-12 17:34   ` Peter Zijlstra
2018-09-12 17:44     ` Patrick Bellasi
2018-09-13 19:12   ` Peter Zijlstra
2018-09-14  9:07     ` Patrick Bellasi
2018-09-14 11:52       ` Peter Zijlstra
2018-09-14 13:41         ` Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 04/16] sched/core: uclamp: update CPU's refcount on clamp changes Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 05/16] sched/core: uclamp: enforce last task UCLAMP_MAX Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 06/16] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks Patrick Bellasi
2018-09-14  9:32   ` Peter Zijlstra
2018-09-14 13:19     ` Patrick Bellasi
2018-09-14 13:36       ` Peter Zijlstra
2018-09-14 13:57         ` Patrick Bellasi
2018-09-27 10:23           ` Quentin Perret
2018-08-28 13:53 ` [PATCH v4 07/16] sched/core: uclamp: extend cpu's cgroup controller Patrick Bellasi
2018-08-28 18:29   ` Randy Dunlap
2018-08-29  8:53     ` Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 08/16] sched/core: uclamp: propagate parent clamps Patrick Bellasi
2018-09-09  3:02   ` Suren Baghdasaryan
2018-09-12 12:51     ` Patrick Bellasi
2018-09-12 15:56       ` Suren Baghdasaryan
2018-09-11 15:18   ` Tejun Heo
2018-09-11 16:26     ` Patrick Bellasi
2018-09-11 16:28       ` Tejun Heo
2018-08-28 13:53 ` [PATCH v4 09/16] sched/core: uclamp: map TG's clamp values into CPU's clamp groups Patrick Bellasi
2018-09-09 18:52   ` Suren Baghdasaryan
2018-09-12 14:19     ` Patrick Bellasi
2018-09-12 15:53       ` Suren Baghdasaryan
2018-08-28 13:53 ` [PATCH v4 10/16] sched/core: uclamp: use TG's clamps to restrict Task's clamps Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 11/16] sched/core: uclamp: add system default clamps Patrick Bellasi
2018-09-10 16:20   ` Suren Baghdasaryan
2018-09-11 16:46     ` Patrick Bellasi
2018-09-11 19:25       ` Suren Baghdasaryan
2018-08-28 13:53 ` [PATCH v4 12/16] sched/core: uclamp: update CPU's refcount on TG's clamp changes Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 13/16] sched/core: uclamp: use percentage clamp values Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default Patrick Bellasi
2018-09-04 13:47   ` Juri Lelli
2018-09-06 14:40     ` Patrick Bellasi
2018-09-06 14:59       ` Juri Lelli
2018-09-06 17:21         ` Patrick Bellasi
2018-09-14 11:10       ` Peter Zijlstra
2018-09-14 14:07         ` Patrick Bellasi
2018-09-14 14:28           ` Peter Zijlstra
2018-09-17 12:27             ` Patrick Bellasi [this message]
2018-09-21  9:13               ` Peter Zijlstra
2018-09-24 15:14                 ` Patrick Bellasi
2018-09-24 15:56                   ` Peter Zijlstra
2018-09-24 17:23                     ` Patrick Bellasi
2018-09-24 16:26                   ` Peter Zijlstra
2018-09-24 17:19                     ` Patrick Bellasi
2018-09-25 15:49                   ` Peter Zijlstra
2018-09-26 10:43                     ` Patrick Bellasi
2018-09-27 10:00                     ` Quentin Perret
2018-09-26 17:51                 ` Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 15/16] sched/core: uclamp: add clamp group discretization support Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 16/16] sched/cpufreq: uclamp: add utilization clamping for RT tasks Patrick Bellasi

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180917122723.GS1413@e110439-lin \
    --to=patrick.bellasi@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=quentin.perret@arm.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=smuckle@google.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=tkjos@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git