All of lore.kernel.org
 help / color / mirror / Atom feed
From: Paul Turner <pjt@google.com>
To: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Tejun Heo <tj@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
	linux-pm@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: [RFC v3 1/5] sched/core: add capacity constraints to CPU controller
Date: Thu, 30 Mar 2017 14:15:59 -0700	[thread overview]
Message-ID: <CAPM31RJtWrvAk54gnjtDqpjmRf98D-aEqvcO39Qz7BiJWc457A@mail.gmail.com> (raw)
In-Reply-To: <20170320180837.GB28391@e110439-lin>

On Mon, Mar 20, 2017 at 11:08 AM, Patrick Bellasi
<patrick.bellasi@arm.com> wrote:
> On 20-Mar 13:15, Tejun Heo wrote:
>> Hello,
>>
>> On Tue, Feb 28, 2017 at 02:38:38PM +0000, Patrick Bellasi wrote:
>> > This patch extends the CPU controller by adding a couple of new
>> > attributes, capacity_min and capacity_max, which can be used to enforce
>> > bandwidth boosting and capping. More specifically:
>> >
>> > - capacity_min: defines the minimum capacity which should be granted
>> >                 (by schedutil) when a task in this group is running,
>> >                 i.e. the task will run at least at that capacity
>> >
>> > - capacity_max: defines the maximum capacity which can be granted
>> >                 (by schedutil) when a task in this group is running,
>> >                 i.e. the task can run up to that capacity
>>
>> cpu.capacity.min and cpu.capacity.max are the more conventional names.
>
> Ok, should be an easy renaming.
>
>> I'm not sure about the name capacity as it doesn't encode what it does
>> and is difficult to tell apart from cpu bandwidth limits.  I think
>> it'd be better to represent what it controls more explicitly.
>
> In the scheduler jargon, capacity represents the amount of computation
> that a CPU can provide and it's usually defined to be 1024 for the
> biggest CPU (on non SMP systems) running at the highest OPP (i.e.
> maximum frequency).
>
> It's true that it kind of overlaps with the concept of "bandwidth".
> However, the main difference here is that "bandwidth" is not frequency
> (and architecture) scaled.
> Thus, for example, assuming we have only one CPU with these two OPPs:
>
>    OPP | Frequency | Capacity
>      1 |    500MHz |      512
>      2 |      1GHz |     1024

I think exposing capacity in this manner is extremely challenging.
It's not normalized in any way between architectures, which places a
lot of the ABI in the API.

Have you considered any schemes for normalizing this in a reasonable fashion?
`
>
> a task running 60% of the time on that CPU when configured to run at
> 500MHz, from the bandwidth standpoint it's using 60% bandwidth but, from
> the capacity standpoint, is using only 30% of the available capacity.
>
> IOW, bandwidth is purely temporal based while capacity factors in both
> frequency and architectural differences.
> Thus, while a "bandwidth" constraint limits the amount of time a task
> can use a CPU, independently from the "actual computation" performed,
> with the new "capacity" constraints we can enforce much "actual
> computation" a task can perform in the "unit of time".
>
>> > These attributes:
>> > a) are tunable at all hierarchy levels, i.e. root group too
>>
>> This usually is problematic because there should be a non-cgroup way
>> of configuring the feature in case cgroup isn't configured or used,
>> and it becomes awkward to have two separate mechanisms configuring the
>> same thing.  Maybe the feature is cgroup specific enough that it makes
>> sense here but this needs more explanation / justification.
>
> In the previous proposal I used to expose global tunables under
> procfs, e.g.:
>
>  /proc/sys/kernel/sched_capacity_min
>  /proc/sys/kernel/sched_capacity_max
>
> which can be used to defined tunable root constraints when CGroups are
> not available, and becomes RO when CGroups are.
>
> Can this be eventually an acceptable option?
>
> In any case I think that this feature will be mainly targeting CGroup
> based systems. Indeed, one of the main goals is to collect
> "application specific" information from "informed run-times". Being
> "application specific" means that we need a way to classify
> applications depending on the runtime context... and that capability
> in Linux is ultimately provided via the CGroup interface.
>
>> > b) allow to create subgroups of tasks which are not violating the
>> >    capacity constraints defined by the parent group.
>> >    Thus, tasks on a subgroup can only be more boosted and/or more
>>
>> For both limits and protections, the parent caps the maximum the
>> children can get.  At least that's what memcg does for memory.low.
>> Doing that makes sense for memcg because for memory the parent can
>> still do protections regardless of what its children are doing and it
>> makes delegation safe by default.
>
> Just to be more clear, the current proposal enforces:
>
> - capacity_max_child <= capacity_max_parent
>
>   Since, if a task is constrained to get only up to a certain amount
>   of capacity, than its childs cannot use more than that... eventually
>   they can only be further constrained.
>
> - capacity_min_child >= capacity_min_parent
>
>   Since, if a task has been boosted to run at least as much fast, than
>   its childs cannot be constrained to go slower without eventually
>   impacting parent performance.
>
>> I understand why you would want a property like capacity to be the
>> other direction as that way you get more specific as you walk down the
>> tree for both limits and protections;
>
> Right, the protection schema is defined in such a way to never affect
> parent constraints.
>
>> however, I think we need to
>> think a bit more about it and ensure that the resulting interface
>> isn't confusing.
>
> Sure.
>
>> Would it work for capacity to behave the other
>> direction - ie. a parent's min restricting the highest min that its
>> descendants can get?  It's completely fine if that's weird.
>
> I had a thought about that possibility and it was not convincing me
> from the use-cases standpoint, at least for the ones I've considered.
>
> Reason is that capacity_min is used to implement a concept of
> "boosting" where, let say we want to "run a task faster then a minimum
> frequency". Assuming that this constraint has been defined because we
> know that this task, and likely all its descendant threads, needs at
> least that capacity level to perform according to expectations.
>
> In that case the "refining down the hierarchy" can require to boost
> further some threads but likely not less.
>
> Does this make sense?
>
> To me this seems to match quite well at least Android/ChromeOS
> specific use-cases. I'm not sure if there can be other different
> use-cases in the domain for example of managed containers.
>
>
>> Thanks.
>>
>> --
>> tejun
>
> --
> #include <best/regards.h>
>
> Patrick Bellasi

  parent reply	other threads:[~2017-03-30 21:16 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-28 14:38 [RFC v3 0/5] Add capacity capping support to the CPU controller Patrick Bellasi
2017-02-28 14:38 ` [RFC v3 1/5] sched/core: add capacity constraints to " Patrick Bellasi
2017-03-13 10:46   ` Joel Fernandes (Google)
2017-03-15 11:20     ` Patrick Bellasi
2017-03-15 13:20       ` Joel Fernandes
2017-03-15 16:10         ` Paul E. McKenney
2017-03-15 16:44           ` Patrick Bellasi
2017-03-15 17:24             ` Paul E. McKenney
2017-03-15 17:57               ` Patrick Bellasi
2017-03-20 17:15   ` Tejun Heo
2017-03-20 17:36     ` Tejun Heo
2017-03-20 18:08     ` Patrick Bellasi
2017-03-23  0:28       ` Joel Fernandes (Google)
2017-03-23 10:32         ` Patrick Bellasi
2017-03-23 16:01           ` Tejun Heo
2017-03-23 18:15             ` Patrick Bellasi
2017-03-23 18:39               ` Tejun Heo
2017-03-24  6:37                 ` Joel Fernandes (Google)
2017-03-24 15:00                   ` Tejun Heo
2017-03-30 21:13                 ` Paul Turner
2017-03-24  7:02           ` Joel Fernandes (Google)
2017-03-30 21:15       ` Paul Turner [this message]
2017-04-01 16:25         ` Patrick Bellasi
2017-02-28 14:38 ` [RFC v3 2/5] sched/core: track CPU's capacity_{min,max} Patrick Bellasi
2017-02-28 14:38 ` [RFC v3 3/5] sched/core: sync capacity_{min,max} between slow and fast paths Patrick Bellasi
2017-02-28 14:38 ` [RFC v3 4/5] sched/{core,cpufreq_schedutil}: add capacity clamping for FAIR tasks Patrick Bellasi
2017-02-28 14:38 ` [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks Patrick Bellasi
2017-03-13 10:08   ` Joel Fernandes (Google)
2017-03-15 11:40     ` Patrick Bellasi
2017-03-15 12:59       ` Joel Fernandes
2017-03-15 14:44         ` Juri Lelli
2017-03-15 16:13           ` Joel Fernandes
2017-03-15 16:24             ` Juri Lelli
2017-03-15 23:40               ` Joel Fernandes
2017-03-16 11:16                 ` Juri Lelli
2017-03-16 12:27                   ` Patrick Bellasi
2017-03-16 12:44                     ` Juri Lelli
2017-03-16 16:58                       ` Joel Fernandes
2017-03-16 17:17                         ` Juri Lelli
2017-03-15 11:41 ` [RFC v3 0/5] Add capacity capping support to the CPU controller Rafael J. Wysocki
2017-03-15 12:59   ` Patrick Bellasi
2017-03-16  1:04     ` Rafael J. Wysocki
2017-03-16  3:15       ` Joel Fernandes
2017-03-20 22:51         ` Rafael J. Wysocki
2017-03-21 11:01           ` Patrick Bellasi
2017-03-24 23:52             ` Rafael J. Wysocki
2017-03-16 12:23       ` Patrick Bellasi
2017-03-20 14:51 ` Tejun Heo
2017-03-20 17:22   ` Patrick Bellasi
2017-04-10  7:36     ` Peter Zijlstra
2017-04-11 17:58       ` Patrick Bellasi
2017-04-12 12:10         ` Peter Zijlstra
2017-04-12 13:55           ` Patrick Bellasi
2017-04-12 15:37             ` Peter Zijlstra
2017-04-13 11:33               ` Patrick Bellasi
2017-04-12 12:15         ` Peter Zijlstra
2017-04-12 13:34           ` Patrick Bellasi
2017-04-12 14:41             ` Peter Zijlstra
2017-04-12 12:22         ` Peter Zijlstra
2017-04-12 13:24           ` Patrick Bellasi
2017-04-12 12:48         ` Peter Zijlstra
2017-04-12 13:27           ` Patrick Bellasi
2017-04-12 14:34             ` Peter Zijlstra
2017-04-12 14:43               ` Patrick Bellasi
2017-04-12 16:14                 ` Peter Zijlstra
2017-04-13 10:34                   ` Patrick Bellasi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPM31RJtWrvAk54gnjtDqpjmRf98D-aEqvcO39Qz7BiJWc457A@mail.gmail.com \
    --to=pjt@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=patrick.bellasi@arm.com \
    --cc=peterz@infradead.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.