All of lore.kernel.org
 help / color / mirror / Atom feed
From: Patrick Bellasi <patrick.bellasi@arm.com>
To: Tejun Heo <tj@kernel.org>
Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	linux-api@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Paul Turner <pjt@google.com>,
	Quentin Perret <quentin.perret@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Juri Lelli <juri.lelli@redhat.com>, Todd Kjos <tkjos@google.com>,
	Joel Fernandes <joelaf@google.com>,
	Steve Muckle <smuckle@google.com>,
	Suren Baghdasaryan <surenb@google.com>
Subject: Re: [PATCH v9 12/16] sched/core: uclamp: Extend CPU's cgroup controller
Date: Mon, 3 Jun 2019 13:27:25 +0100	[thread overview]
Message-ID: <20190603122725.GB19426@darkstar> (raw)
In-Reply-To: <20190531153545.GE374014@devbig004.ftw2.facebook.com>

On 31-May 08:35, Tejun Heo wrote:

[...]

> > These attributes:
> > 
> > a) are available only for non-root nodes, both on default and legacy
> >    hierarchies, while system wide clamps are defined by a generic
> >    interface which does not depends on cgroups. This system wide
> >    interface enforces constraints on tasks in the root node.
> 
> I'd much prefer if they weren't entangled this way.  The system wide
> limits should work the same regardless of cgroup's existence.  cgroup
> can put further restriction on top but mere creation of cgroups with
> cpu controller enabled shouldn't take them out of the system-wide
> limits.

That's correct and what you describe matches, at least on its
intents, the current implementation provided in:

   [PATCH v9 14/16] sched/core: uclamp: Propagate system defaults to root group
   https://lore.kernel.org/lkml/20190515094459.10317-15-patrick.bellasi@arm.com/

System clamps always work the same way, independently from cgroups:
they define the upper bound for both min and max clamps.

When cgroups are not available, tasks specific clamps are always
capped by system clamps.

When cgroups are available, the root task group clamps are capped by
the system clamps, which affects its "effective" clamps and propagate
them down the hierarchy to child's "effective" clamps.
That's done in:

   [PATCH v9 13/16] sched/core: uclamp: Propagate parent clamps
   https://lore.kernel.org/lkml/20190515094459.10317-14-patrick.bellasi@arm.com/


Example 1
---------

Here is an example of system and groups clamps aggregation:

        	        min                        max
system defaults         400                        600

cg_name        	        min       min.effective	   max	     max.effective
 /uclamp               1024       400              500       500
 /uclamp/app              512       400	             512       500
 /uclamp/app/pid_smalls	    100	      100              200       200
 /uclamp/app/pid_bigs       500	      400              700       500


The ".effective" clamps are used to define the actual clamp value to
apply to tasks, according to the aggregation rules defined in:

   [PATCH v9 15/16] sched/core: uclamp: Use TG's clamps to restrict TASK's clamps
   https://lore.kernel.org/lkml/20190515094459.10317-16-patrick.bellasi@arm.com/

All the above, to me it means that:
 - cgroups are always capped by system clamps
 - cgroups can further restrict system clamps

Does that match with your view?

> > b) enforce effective constraints at each level of the hierarchy which
> >    are a restriction of the group requests considering its parent's
> >    effective constraints. Root group effective constraints are defined
> >    by the system wide interface.
> >    This mechanism allows each (non-root) level of the hierarchy to:
> >    - request whatever clamp values it would like to get
> >    - effectively get only up to the maximum amount allowed by its parent
> 
> I'll come back to this later.
> 
> > c) have higher priority than task-specific clamps, defined via
> >    sched_setattr(), thus allowing to control and restrict task requests
> 
> This sounds good.
> 
> > Add two new attributes to the cpu controller to collect "requested"
> > clamp values. Allow that at each non-root level of the hierarchy.
> > Validate local consistency by enforcing util.min < util.max.
> > Keep it simple by do not caring now about "effective" values computation
> > and propagation along the hierarchy.
> 
> So, the followings are what we're doing for hierarchical protection
> and limit propgations.
> 
> * Limits (high / max) default to max.  Protections (low / min) 0.  A
>   new cgroup by default doesn't constrain itself further and doesn't
>   have any protection.

Example 2
---------

Let say we have:

  /tg1:
        util_min=200 (as a protection)
        util_max=800 (as a limit)

the moment we create a subgroup /tg1/tg11, in v9 it is initialized
with the same limits _and protections_ of its father:

  /tg1/tg11:
        util_min=200 (protection inherited from /tg1)
        util_max=800 (limit inherited from /tg1)

Do you mean that we should have instead:

  /tg1/tg11:
        util_min=0   (no protection by default at creation time)
        util_max=800 (limit inherited from /tg1)


i.e. we need to reset the protection of a newly created subgroup?


> * A limit defines the upper ceiling for the subtree.  If an ancestor
>   has a limit of X, none of its descendants can have more than X.

That's correct, however we distinguish between "requested" and
"effective" values.


Example 3
---------

We can have:

cg_name        	        max       max.effective
 /uclamp/app            400                 400
 /uclamp/app/pid_bigs     500	              400


Which means that a subgroup can "request" a limit (max=500) higher
then its father (max=400), while still getting only up to what its
father allows (max.effective = 400).


Example 4
---------

Tracking the actual requested limit (max=500) it's useful to enforce
it once the father limit should be relaxed, for example we will have:

cg_name        	        max       max.effective
 /uclamp/app            600                 600
 /uclamp/app/pid_bigs     500	              500

where a subgroup gets not more than what it has been configured for.

This is the logic implemented by cpu_util_update_eff() in:

   [PATCH v9 13/16] sched/core: uclamp: Propagate parent clamps
   https://lore.kernel.org/lkml/20190515094459.10317-14-patrick.bellasi@arm.com/


> * A protection defines the upper ceiling of protections for the
>   subtree.  If an andester has a protection of X, none of its
>   descendants can have more protection than X.

Right, that's the current behavior in v9.

> Note that there's no way for an ancestor to enforce protection its
> descendants.  It can only allow them to claim some.  This is
> intentional as the other end of the spectrum is either descendants
> losing the ability to further distribute protections as they see fit.

Ok, that means I need to update in v10 the initialization of subgroups
min clamps to be none by default as discussed in the above Example 2,
right?

[...]

Cheers,
Patrick

-- 
#include <best/regards.h>

Patrick Bellasi

  parent reply	other threads:[~2019-06-03 12:27 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-15  9:44 [PATCH v9 00/16] Add utilization clamping support Patrick Bellasi
2019-05-15  9:44 ` Patrick Bellasi
2019-05-15  9:44 ` [PATCH v9 01/16] sched/core: uclamp: Add CPU's clamp buckets refcounting Patrick Bellasi
2019-05-15  9:44 ` [PATCH v9 02/16] sched/core: uclamp: Add bucket local max tracking Patrick Bellasi
2019-05-15  9:44 ` [PATCH v9 03/16] sched/core: uclamp: Enforce last task's UCLAMP_MAX Patrick Bellasi
2019-05-15  9:44 ` [PATCH v9 04/16] sched/core: uclamp: Add system default clamps Patrick Bellasi
2019-05-15  9:44 ` [PATCH v9 05/16] sched/core: Allow sched_setattr() to use the current policy Patrick Bellasi
2019-05-15  9:44 ` [PATCH v9 06/16] sched/core: uclamp: Extend sched_setattr() to support utilization clamping Patrick Bellasi
2019-05-15  9:44 ` [PATCH v9 07/16] sched/core: uclamp: Reset uclamp values on RESET_ON_FORK Patrick Bellasi
2019-05-15  9:44 ` [PATCH v9 08/16] sched/core: uclamp: Set default clamps for RT tasks Patrick Bellasi
2019-05-15  9:44 ` [PATCH v9 09/16] sched/cpufreq: uclamp: Add clamps for FAIR and " Patrick Bellasi
2019-05-15  9:44 ` [PATCH v9 10/16] sched/core: uclamp: Add uclamp_util_with() Patrick Bellasi
2019-05-15  9:44 ` [PATCH v9 11/16] sched/fair: uclamp: Add uclamp support to energy_compute() Patrick Bellasi
2019-05-15  9:44 ` [PATCH v9 12/16] sched/core: uclamp: Extend CPU's cgroup controller Patrick Bellasi
2019-05-31 15:35   ` Tejun Heo
2019-06-03 12:24     ` Patrick Bellasi
2019-06-03 12:27     ` Patrick Bellasi [this message]
2019-06-05 14:03       ` Tejun Heo
2019-06-05 14:39         ` Patrick Bellasi
2019-06-05 14:44           ` Tejun Heo
2019-06-05 15:37             ` Patrick Bellasi
2019-06-05 15:39               ` Tejun Heo
2019-06-03 12:29     ` Patrick Bellasi
2019-06-05 14:09       ` Tejun Heo
2019-06-05 15:06         ` Patrick Bellasi
2019-06-05 15:27           ` Tejun Heo
2019-05-15  9:44 ` [PATCH v9 13/16] sched/core: uclamp: Propagate parent clamps Patrick Bellasi
2019-05-15  9:44 ` [PATCH v9 14/16] sched/core: uclamp: Propagate system defaults to root group Patrick Bellasi
2019-05-15  9:44 ` [PATCH v9 15/16] sched/core: uclamp: Use TG's clamps to restrict TASK's clamps Patrick Bellasi
2019-05-15  9:44 ` [PATCH v9 16/16] sched/core: uclamp: Update CPU's refcount on TG's clamp changes Patrick Bellasi
2019-05-30 10:15 ` [PATCH v9 00/16] Add utilization clamping support Patrick Bellasi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190603122725.GB19426@darkstar \
    --to=patrick.bellasi@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=quentin.perret@arm.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=smuckle@google.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=tkjos@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.