All of lore.kernel.org
 help / color / mirror / Atom feed
From: Patrick Bellasi <patrick.bellasi@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>,
	linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Paul Turner <pjt@google.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	John Stultz <john.stultz@linaro.org>,
	Todd Kjos <tkjos@android.com>, Tim Murray <timmurray@google.com>,
	Andres Oportus <andresoportus@google.com>,
	Joel Fernandes <joelaf@google.com>,
	Juri Lelli <juri.lelli@arm.com>,
	Chris Redpath <chris.redpath@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>
Subject: Re: [RFC v3 0/5] Add capacity capping support to the CPU controller
Date: Wed, 12 Apr 2017 14:55:38 +0100	[thread overview]
Message-ID: <20170412135538.GM29455@e110439-lin> (raw)
In-Reply-To: <20170412121009.GD3093@worktop>

On 12-Apr 14:10, Peter Zijlstra wrote:
> Let me reply in parts as I read this.. easy things first :-)
> 
> On Tue, Apr 11, 2017 at 06:58:33PM +0100, Patrick Bellasi wrote:
> > On 10-Apr 09:36, Peter Zijlstra wrote:
> 
> > >  4) they have muddled semantics, because while its presented as a task
> > >     property, it very much is not.
> > 
> > Actually we always presented it as a task group property, while other
> > people suggested it should be a per-task API.
> > 
> > Still, IMO it could make sense also as a per-task API, for example
> > considering a specific RT task which we know in our system is
> > perfectly fine to always run it below a certain OPP.
> > 
> > Do you think it should be a per-task API?
> > Should we focus (at least initially) on providing a per task-group API?
> 
> Even for the cgroup interface, I think they should set a per-task
> property, not a group property.

Ok, right now using CGroups ans primary (and unique) interface, these
values are tracked as attributes of the CPU controller. Tasks gets
them by reading these attributes once they are binded to a CGroup.

Are you proposing to move these attributes within the task_struct?

In that case we should also defined a primary interface to set them,
any preferred proposal? sched_setattr(), prctl?

> > >  3) they have absolutely unacceptable overhead in implementation. Two
> > >     more RB tree operations per enqueue/dequeue is just not going to
> > >     happen.
> > 
> > This last point is about "implementation details", I'm pretty
> > confident that if we find an agreement on the previous point than
> > this last will be simple to solve.
> > 
> > Just to be clear, the rb-trees are per CPU and used to track just the
> > RUNNABLE tasks on each CPUs and, as I described in the previous
> > example, for the OPP biasing to work I think we really need an
> > aggregation mechanism.
> 
> I know its runnable, which is exactly what the regular RB tree in fair
> already tracks. That gets us 3 RB tree operations per scheduling
> operation, which is entirely ridiculous.
> 
> And while I disagree with the need to track this at all, see below, it
> can be done _much_ cheaper using a double augmented RB-tree, where you
> keep the min/max as heaps inside the regular RB-tree.

By regular rb-tree do you mean the cfs_rq->tasks_timeline?

Because in that case this would apply only to the FAIR class, while
the rb-tree we are using here are across classes.
Supporting both FAIR and RT I think is a worth having feature.

> > Ideally, every time a task is enqueue/dequeued from a CPU we want to
> > know what is the currently required capacity clamping. This requires
> > to maintain an ordered list of values and rb-trees are the most effective
> > solution.
> > 
> > Perhaps, if we accept a certain level of approximation, we can
> > potentially reduce the set of tracked values to a finite set (maybe
> > corresponding to the EM capacity values) and use just a per-cpu
> > ref-counting array.
> > Still the min/max aggregation will have a worst case O(N) search
> > complexity, where N is the number of finite values we want to use.
> 
> So the bigger point is that if the min/max is a per-task property (even
> if set through a cgroup interface), the min(max) / max(min) thing is
> wrong.

Perhaps I'm not following you here but, being per-task does not mean
that we need to aggregate these constraints by summing them (look
below)...

> If the min/max were to apply to each individual task's util, you'd end
> up with something like: Dom(\Sum util) = [min(1, \Sum min), min(1, \Sum max)].

... as you do here.

Let's use the usual simple example, where these per-tasks constraints
are configured:
- TaskA: capacity_min: 20% capacity_max: 80%
- TaskB: capacity_min: 40% capacity_max: 60%

This means that, at CPU level, we want to enforce the following
clamping depending on the tasks status:

 RUNNABLE tasks    capacity_min    capacity_max
A) TaskA                      20%             80%
B) TaskA,TaskB                40%             80%
C) TaskB                      40%             60%
 
In case C, TaskA gets a bigger boost while is co-scheduled with TaskB.

Notice that this CPU-level aggregation is used just for OPP selection
on that CPU, while for TaskA we still use capacity_min=20% when we are
looking for a CPU.

> Where a possible approximation is scaling the aggregate util into that
> domain.
> 

-- 
#include <best/regards.h>

Patrick Bellasi

  reply	other threads:[~2017-04-12 13:55 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-28 14:38 [RFC v3 0/5] Add capacity capping support to the CPU controller Patrick Bellasi
2017-02-28 14:38 ` [RFC v3 1/5] sched/core: add capacity constraints to " Patrick Bellasi
2017-03-13 10:46   ` Joel Fernandes (Google)
2017-03-15 11:20     ` Patrick Bellasi
2017-03-15 13:20       ` Joel Fernandes
2017-03-15 16:10         ` Paul E. McKenney
2017-03-15 16:44           ` Patrick Bellasi
2017-03-15 17:24             ` Paul E. McKenney
2017-03-15 17:57               ` Patrick Bellasi
2017-03-20 17:15   ` Tejun Heo
2017-03-20 17:36     ` Tejun Heo
2017-03-20 18:08     ` Patrick Bellasi
2017-03-23  0:28       ` Joel Fernandes (Google)
2017-03-23 10:32         ` Patrick Bellasi
2017-03-23 16:01           ` Tejun Heo
2017-03-23 18:15             ` Patrick Bellasi
2017-03-23 18:39               ` Tejun Heo
2017-03-24  6:37                 ` Joel Fernandes (Google)
2017-03-24 15:00                   ` Tejun Heo
2017-03-30 21:13                 ` Paul Turner
2017-03-24  7:02           ` Joel Fernandes (Google)
2017-03-30 21:15       ` Paul Turner
2017-04-01 16:25         ` Patrick Bellasi
2017-02-28 14:38 ` [RFC v3 2/5] sched/core: track CPU's capacity_{min,max} Patrick Bellasi
2017-02-28 14:38 ` [RFC v3 3/5] sched/core: sync capacity_{min,max} between slow and fast paths Patrick Bellasi
2017-02-28 14:38 ` [RFC v3 4/5] sched/{core,cpufreq_schedutil}: add capacity clamping for FAIR tasks Patrick Bellasi
2017-02-28 14:38 ` [RFC v3 5/5] sched/{core,cpufreq_schedutil}: add capacity clamping for RT/DL tasks Patrick Bellasi
2017-03-13 10:08   ` Joel Fernandes (Google)
2017-03-15 11:40     ` Patrick Bellasi
2017-03-15 12:59       ` Joel Fernandes
2017-03-15 14:44         ` Juri Lelli
2017-03-15 16:13           ` Joel Fernandes
2017-03-15 16:24             ` Juri Lelli
2017-03-15 23:40               ` Joel Fernandes
2017-03-16 11:16                 ` Juri Lelli
2017-03-16 12:27                   ` Patrick Bellasi
2017-03-16 12:44                     ` Juri Lelli
2017-03-16 16:58                       ` Joel Fernandes
2017-03-16 17:17                         ` Juri Lelli
2017-03-15 11:41 ` [RFC v3 0/5] Add capacity capping support to the CPU controller Rafael J. Wysocki
2017-03-15 12:59   ` Patrick Bellasi
2017-03-16  1:04     ` Rafael J. Wysocki
2017-03-16  3:15       ` Joel Fernandes
2017-03-20 22:51         ` Rafael J. Wysocki
2017-03-21 11:01           ` Patrick Bellasi
2017-03-24 23:52             ` Rafael J. Wysocki
2017-03-16 12:23       ` Patrick Bellasi
2017-03-20 14:51 ` Tejun Heo
2017-03-20 17:22   ` Patrick Bellasi
2017-04-10  7:36     ` Peter Zijlstra
2017-04-11 17:58       ` Patrick Bellasi
2017-04-12 12:10         ` Peter Zijlstra
2017-04-12 13:55           ` Patrick Bellasi [this message]
2017-04-12 15:37             ` Peter Zijlstra
2017-04-13 11:33               ` Patrick Bellasi
2017-04-12 12:15         ` Peter Zijlstra
2017-04-12 13:34           ` Patrick Bellasi
2017-04-12 14:41             ` Peter Zijlstra
2017-04-12 12:22         ` Peter Zijlstra
2017-04-12 13:24           ` Patrick Bellasi
2017-04-12 12:48         ` Peter Zijlstra
2017-04-12 13:27           ` Patrick Bellasi
2017-04-12 14:34             ` Peter Zijlstra
2017-04-12 14:43               ` Patrick Bellasi
2017-04-12 16:14                 ` Peter Zijlstra
2017-04-13 10:34                   ` Patrick Bellasi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170412135538.GM29455@e110439-lin \
    --to=patrick.bellasi@arm.com \
    --cc=andresoportus@google.com \
    --cc=chris.redpath@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=john.stultz@linaro.org \
    --cc=juri.lelli@arm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=timmurray@google.com \
    --cc=tj@kernel.org \
    --cc=tkjos@android.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.