LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Patrick Bellasi <patrick.bellasi@arm.com>
To: Tejun Heo <tj@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Paul Turner <pjt@google.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Juri Lelli <juri.lelli@redhat.com>, Todd Kjos <tkjos@google.com>,
	Joel Fernandes <joelaf@google.com>,
	Steve Muckle <smuckle@google.com>,
	Suren Baghdasaryan <surenb@google.com>
Subject: Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller
Date: Tue, 24 Jul 2018 16:39:16 +0100
Message-ID: <20180724153916.GA3275@e110439-lin> (raw)
In-Reply-To: <20180724132902.GI1934745@devbig577.frc2.facebook.com>

Hi Tejun,

I apologize in advance for the (yet another) long reply, however I did
my best hereafter to try to resume all the controversial points
discussed so far.

If you will have (one more time) the patience to go through the
following text you'll find a set of precise clarifications and
questions I have for you.

Thank you again for your time.

On 24-Jul 06:29, Tejun Heo wrote:

[...]

> > What I describe here is just an additional hint to the scheduler which
> > enrich the above described model. Provided A and B are already
> > satisfied, when a task gets a chance to run it will be executed at a
> > min/max configured frequency. That's really all... there is not
> > additional impact on "resources allocation".
> 
> So, if it's a cpufreq range controller.  It'd have sth like
> cpu.freq.min and cpu.freq.max, where min defines the maximum minimum
> cpufreq its descendants can get and max defines the maximum cpufreq
> allowed in the subtree.  For an example, please refer to how
> memory.min and memory.max are defined.

I think you are still looking at just one usage of this interface,
which is likely mainly my fault also because of the long time between
posting. Sorry for that...

Let me re-propose here an abstract of the cover letter with some
additional notes inline.

--- Cover Letter Abstract START ---

> > [...] utilization is a task specific property which is used by the scheduler
> > to know how much CPU bandwidth a task requires (under certain conditions).
> > Thus, the utilization clamp values defined either per-task or via the
> > CPU controller, can be used to represent tasks to the scheduler as
> > being bigger (or smaller) then what they really are.
          ^^^^^^^^^^^^^^^^^^^

This is a fundamental feature added by utilization clamping: this is a
task property which can be useful in many different ways to the
scheduler and not "just" to bias frequency selection.

> > Utilization clamping thus ultimately enable interesting additional
> > optimizations, especially on asymmetric capacity systems like Arm
> > big.LITTLE and DynamIQ CPUs, where:
> > 
> >  - boosting: small tasks are preferably scheduled on higher-capacity CPUs
> >    where, despite being less energy efficient, they can complete faster
> > 
> >  - clamping: big/background tasks are preferably scheduler on low-capacity CPUs
> >    where, being more energy efficient, they can still run but save power and
> >    thermal headroom for more important tasks.

These two point above are two examples of how we can use utilization
clamping which is not frequency selection.

> > This additional usage of the utilization clamping is not presented in this
                                                      ^^^^^^^^^^^^^^^^^^^^^^^^

Is it acceptable to add a generic interface by properly and completely
describing, both in the cover letter and in the relative changelogs,
what will be the future bits we can add ?

> > series but it's an integral part of the Energy Aware Scheduler (EAS) feature
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The EAS scheduler, without the utilization clamping bits, does a great
job in scheduling tasks while saving energy. However, on every system,
we are interested also in other metrics, like for example: completion
time and power dissipation.

Whether certain tasks should be scheduled to optimize energy
efficiency, completion time and/or power dissipation is something we
can achieve only by:

1. adopting a proper tasks classification schema
   => that's why CGroups are of interest

2. using a generic enough mechanism to describe certain tasks
   properties which affect all the metrics above,
   i.e. energy, speed and power
   => that's why utilization and its clamping is of interest

> > set. A similar solution (SchedTune) is already used on Android kernels, which
                                           ^^^^^^^^^^^^^^^^^^^^^^^

This _complete support_ is already actively and successfully used on
many Android devices...

> > targets both frequency selection and task placement biasing.
            ^^^^                     ^^^^^^^^^^^^^^^^^^

... to support _not only_ frequency selections.

> > This series provides the foundation bits to add similar features in mainline
                             ^^^^^^^^^^^^^^^
> > and its first simple client with the schedutil integration.
            ^^^^^^^^^^^^^^^^^^^

The solution presented here shows only the integration with
cpufreq/schedutil. However, since we are adding a user-space
interface, we have to add this new interface in a generic way since
the beginning to support also the complete implementation we will have
at the end.

--- Cover Letter Abstract END ---


From my comments above I hope it's now more clear that "utilization
clamping" is not just a "cpufreq range controller" and, since we
will extend the internal usage of such interface, we cannot add now a
user-space interface which targets just frequency control.

To resume, here we are at proposing a generic interface which:

a) do not strictly enforce and/or grant any bandwidth to tasks and
   do not directly define how the CPU resource has to be partitioned
   among tasks

b) improves the way we can constraint bandwidth consumed by TGs, by
   specifying a min/max "MIPS range" (in scheduler terms: utilization)
   the bandwidth can be consumed at

c) it's based on a fundamental task scheduler metric: utilization
   since the "MIPS range" can be affected by the "type of CPUs" and
   not only by the "operating frequency"

d) can be used by the scheduler to bias "tasks placement" as well as
   "frequency selection"

e) do not provide the full implementation here not only to keep the
   initial patchset limited in size but also because of some
   dependencies on other EAS bits which are currently under discussion
   on LKML.
   These different EAS features can still be progressed independently.

f) at our best, it aims at providing a complete use-case description
   both in the cover-letter as well as in the relative changelogs

Going back to one of your previous comments, when you says:

> What's described is computation bandwidth control but what's
> implemented is just frequency clamping.

Do we agree now that:

1. what we propose is not a "computational bandwidth control"
   mechanism and/or interface

2. what we implement is freq clamping but that's just one use case to
   keep the series small enough

3. despite 2) we need to add an interface which is generic enough to
   accommodate the other use-cases

4. the basic metric exposed (i.e. utilization) is used now for
   frequency clamping but the same one will be used for task placement
   biasing

?

And again, when you say:

> So, there are fundamental discrepancies between
> description+interface vs. what it actually does.

Is it acceptable to have a new interface which fits a wider
description?

With such a description, our aim is also to demonstrate that we are
_not_ adding a special case new user-space interface but a generic
enough interface which can be properly extended in the future without
breaking existing functionalities but just by keep improving them.

Best,
Patrick

-- 
#include <best/regards.h>

Patrick Bellasi

  reply index

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16  8:28 [PATCH v2 00/12] Add utilization clamping support Patrick Bellasi
2018-07-16  8:28 ` [PATCH v2 01/12] sched/core: uclamp: extend sched_setattr to support utilization clamping Patrick Bellasi
2018-07-17 17:50   ` Joel Fernandes
2018-07-18  8:42     ` Patrick Bellasi
2018-07-18 17:02       ` Joel Fernandes
2018-07-17 18:04   ` Joel Fernandes
2018-07-16  8:28 ` [PATCH v2 02/12] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups Patrick Bellasi
2018-07-19 23:51   ` Suren Baghdasaryan
2018-07-20 15:11     ` Patrick Bellasi
2018-07-21  0:25       ` Suren Baghdasaryan
2018-07-23 13:36         ` Patrick Bellasi
2018-07-16  8:28 ` [PATCH v2 03/12] sched/core: uclamp: add CPU's clamp groups accounting Patrick Bellasi
2018-07-20 20:25   ` Suren Baghdasaryan
2018-07-16  8:28 ` [PATCH v2 04/12] sched/core: uclamp: update CPU's refcount on clamp changes Patrick Bellasi
2018-07-16  8:28 ` [PATCH v2 05/12] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks Patrick Bellasi
2018-07-16  8:29 ` [PATCH v2 06/12] sched/cpufreq: uclamp: add utilization clamping for RT tasks Patrick Bellasi
2018-07-16  8:29 ` [PATCH v2 07/12] sched/core: uclamp: enforce last task UCLAMP_MAX Patrick Bellasi
2018-07-21  1:23   ` Suren Baghdasaryan
2018-07-23 15:02     ` Patrick Bellasi
2018-07-23 16:40       ` Suren Baghdasaryan
2018-07-16  8:29 ` [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller Patrick Bellasi
2018-07-21  2:37   ` Suren Baghdasaryan
2018-07-21  3:16     ` Suren Baghdasaryan
2018-07-23 15:17     ` Patrick Bellasi
2018-07-23 15:30   ` Tejun Heo
2018-07-23 17:22     ` Patrick Bellasi
2018-07-24 13:29       ` Tejun Heo
2018-07-24 15:39         ` Patrick Bellasi [this message]
2018-07-27  0:39         ` Joel Fernandes
2018-07-27  8:09           ` Quentin Perret
2018-07-16  8:29 ` [PATCH v2 09/12] sched/core: uclamp: map TG's clamp values into CPU's clamp groups Patrick Bellasi
2018-07-16  8:29 ` [PATCH v2 10/12] sched/core: uclamp: use TG's clamps to restrict Task's clamps Patrick Bellasi
2018-07-22  3:05   ` Suren Baghdasaryan
2018-07-23 15:40     ` Patrick Bellasi
2018-07-23 17:11       ` Suren Baghdasaryan
2018-07-24  9:56         ` Patrick Bellasi
2018-07-24 15:28           ` Suren Baghdasaryan
2018-07-24 15:49             ` Patrick Bellasi
2018-07-16  8:29 ` [PATCH v2 11/12] sched/core: uclamp: update CPU's refcount on TG's clamp changes Patrick Bellasi
2018-07-22  3:17   ` Suren Baghdasaryan
2018-07-16  8:29 ` [PATCH v2 12/12] sched/core: uclamp: use percentage clamp values Patrick Bellasi
2018-07-22  4:04   ` Suren Baghdasaryan
2018-07-24 16:43     ` Patrick Bellasi
2018-07-24 17:11       ` Suren Baghdasaryan
2018-07-24 17:17         ` Patrick Bellasi
2018-07-17 13:03 ` [PATCH v2 00/12] Add utilization clamping support Joel Fernandes
2018-07-17 13:41   ` Patrick Bellasi

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180724153916.GA3275@e110439-lin \
    --to=patrick.bellasi@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=smuckle@google.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=tkjos@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git