linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Bellasi <patrick.bellasi@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	linux-api@vger.kernel.org, Ingo Molnar <mingo@redhat.com>,
	Tejun Heo <tj@kernel.org>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Paul Turner <pjt@google.com>,
	Quentin Perret <quentin.perret@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Juri Lelli <juri.lelli@redhat.com>, Todd Kjos <tkjos@google.com>,
	Joel Fernandes <joelaf@google.com>,
	Steve Muckle <smuckle@google.com>,
	Suren Baghdasaryan <surenb@google.com>
Subject: Re: [PATCH v7 01/15] sched/core: uclamp: Add CPU's clamp buckets refcounting
Date: Wed, 13 Mar 2019 15:59:54 +0000	[thread overview]
Message-ID: <20190313155954.jse2tyn5iqxm6wle@e110439-lin> (raw)
In-Reply-To: <20190313135238.GC5922@hirez.programming.kicks-ass.net>

On 13-Mar 14:52, Peter Zijlstra wrote:
> On Fri, Feb 08, 2019 at 10:05:40AM +0000, Patrick Bellasi wrote:
> > +/*
> > + * When a task is enqueued on a rq, the clamp bucket currently defined by the
> > + * task's uclamp::bucket_id is reference counted on that rq. This also
> > + * immediately updates the rq's clamp value if required.
> > + *
> > + * Since tasks know their specific value requested from user-space, we track
> > + * within each bucket the maximum value for tasks refcounted in that bucket.
> > + * This provide a further aggregation (local clamping) which allows to track
> > + * within each bucket the exact "requested" clamp value whenever all tasks
> > + * RUNNABLE in that bucket require the same clamp.
> > + */
> > +static inline void uclamp_rq_inc_id(struct task_struct *p, struct rq *rq,
> > +				    unsigned int clamp_id)
> > +{
> > +	unsigned int bucket_id = p->uclamp[clamp_id].bucket_id;
> > +	unsigned int rq_clamp, bkt_clamp, tsk_clamp;
> > +
> > +	rq->uclamp[clamp_id].bucket[bucket_id].tasks++;
> > +
> > +	/*
> > +	 * Local clamping: rq's buckets always track the max "requested"
> > +	 * clamp value from all RUNNABLE tasks in that bucket.
> > +	 */
> > +	tsk_clamp = p->uclamp[clamp_id].value;
> > +	bkt_clamp = rq->uclamp[clamp_id].bucket[bucket_id].value;
> > +	rq->uclamp[clamp_id].bucket[bucket_id].value = max(bkt_clamp, tsk_clamp);
> 
> So, if I read this correct:
> 
>  - here we track a max value in a bucket,
> 
> > +	rq_clamp = READ_ONCE(rq->uclamp[clamp_id].value);
> > +	WRITE_ONCE(rq->uclamp[clamp_id].value, max(rq_clamp, tsk_clamp));
> > +}
> > +
> > +/*
> > + * When a task is dequeued from a rq, the clamp bucket reference counted by
> > + * the task is released. If this is the last task reference counting the rq's
> > + * max active clamp value, then the rq's clamp value is updated.
> > + * Both the tasks reference counter and the rq's cached clamp values are
> > + * expected to be always valid, if we detect they are not we skip the updates,
> > + * enforce a consistent state and warn.
> > + */
> > +static inline void uclamp_rq_dec_id(struct task_struct *p, struct rq *rq,
> > +				    unsigned int clamp_id)
> > +{
> > +	unsigned int bucket_id = p->uclamp[clamp_id].bucket_id;
> > +	unsigned int rq_clamp, bkt_clamp;
> > +
> > +	SCHED_WARN_ON(!rq->uclamp[clamp_id].bucket[bucket_id].tasks);
> > +	if (likely(rq->uclamp[clamp_id].bucket[bucket_id].tasks))
> > +		rq->uclamp[clamp_id].bucket[bucket_id].tasks--;
> > +
> > +	/*
> > +	 * Keep "local clamping" simple and accept to (possibly) overboost
> > +	 * still RUNNABLE tasks in the same bucket.
> > +	 */
> > +	if (likely(rq->uclamp[clamp_id].bucket[bucket_id].tasks))
> > +		return;
> 
> (Oh man, I hope that generates semi sane code; long live CSE passes I
> suppose)

What do you mean ?

> But we never decrement that bkt_clamp value on dequeue.

We decrement the bkt_clamp value only when the bucket becomes empty
and thus we pass the condition above. That's what the comment above is
there to call out.


> > +	bkt_clamp = rq->uclamp[clamp_id].bucket[bucket_id].value;
> > +
> > +	/* The rq's clamp value is expected to always track the max */
> > +	rq_clamp = READ_ONCE(rq->uclamp[clamp_id].value);
> > +	SCHED_WARN_ON(bkt_clamp > rq_clamp);
> > +	if (bkt_clamp >= rq_clamp) {
> 
> head hurts, this reads ==, how can this ever not be so?

Never, given the current code, that's just defensive programming.

If in the future the accounting should be accidentally broken by some
refactoring we warn and fix the corrupted data structures at first
chance.

Is that so bad?

> > +		/*
> > +		 * Reset rq's clamp bucket value to its nominal value whenever
> > +		 * there are anymore RUNNABLE tasks refcounting it.
> 
> -ENOPARSE

That's related to the comment above, when you say we don't decrement
the bkt_clamp.

Because of backetization, we potentially end up tracking tasks with
different requested clamp values in the same bucket.

For example, with 20% bucket size, we can have:
  Task1: util_min=25%
  Task2: util_min=35%
accounted in the same bucket.

This ensure that while they are both running we boost 35%. If Task1
should run longer than Task2, Task1 will be "overboosted" until its
end. The bucket value will be reset to 20% (nominal value) when both
tasks are idle.


> > +		 */
> > +		rq->uclamp[clamp_id].bucket[bucket_id].value =
> > +			uclamp_bucket_value(rq_clamp);
> 
> But basically you decrement the bucket value to the nominal value.

Yes, at this point we know there are no more tasks in this bucket and
we reset its value.

> 
> > +		uclamp_rq_update(rq, clamp_id);
> > +	}
> > +}
> 
> Given all that, what is to stop the bucket value to climbing to
> uclamp_bucket_value(+1)-1 and staying there (provided there's someone
> runnable)?

Nothing... but that's an expected consequence of bucketization.

> Why are we doing this... ?

You can either decide to:

 a) always boost tasks to just the bucket nominal value
    thus always penalizing both Task1 and Task2 of the example above

 b) always boost tasks to the bucket "max" value
    thus always overboosting both Task1 and Task2 of the example above

The solution above instead has a very good property: in systems
where you have only few and well defined clamp values we always
provide the exact boost.

For example, if your system requires only 23% and 47% boost values
(totally random numbers), then you can always get the exact boost
required using just 3 bucksts or ~33% size each.

In systems where you don't know which boost values you will have, you
can still defined the maximum overboost granularity you accept for
each task by just tuning the number of clamp groups. For example, with
20 groups you can have a 5% max overboost.

-- 
#include <best/regards.h>

Patrick Bellasi

  reply	other threads:[~2019-03-13 16:00 UTC|newest]

Thread overview: 84+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-08 10:05 [PATCH v7 00/15] Add utilization clamping support Patrick Bellasi
2019-02-08 10:05 ` [PATCH v7 01/15] sched/core: uclamp: Add CPU's clamp buckets refcounting Patrick Bellasi
2019-03-12 12:52   ` Dietmar Eggemann
2019-03-13 15:15     ` Patrick Bellasi
2019-03-13 21:01       ` Suren Baghdasaryan
2019-03-14 14:54         ` Patrick Bellasi
2019-03-14 15:00       ` Patrick Bellasi
2019-03-12 15:20   ` Peter Zijlstra
2019-03-12 15:50     ` Patrick Bellasi
2019-03-13  8:19       ` Peter Zijlstra
2019-03-13 11:37         ` Patrick Bellasi
2019-03-13 13:40   ` Peter Zijlstra
2019-03-13 16:12     ` Patrick Bellasi
2019-03-13 17:22       ` Peter Zijlstra
2019-03-13 18:22         ` Patrick Bellasi
2019-03-13 19:48       ` Peter Zijlstra
2019-03-14 12:13         ` Patrick Bellasi
2019-03-14 13:32           ` Peter Zijlstra
2019-03-14 15:07             ` Patrick Bellasi
2019-03-14 19:18               ` Peter Zijlstra
2019-03-13 13:52   ` Peter Zijlstra
2019-03-13 15:59     ` Patrick Bellasi [this message]
2019-03-13 19:30       ` Peter Zijlstra
2019-03-14 11:03         ` Patrick Bellasi
2019-03-14 13:27           ` Peter Zijlstra
2019-03-13 19:39       ` Peter Zijlstra
2019-03-14 11:18         ` Patrick Bellasi
2019-03-13 21:23     ` Suren Baghdasaryan
2019-03-14 12:43       ` Patrick Bellasi
2019-03-13 14:06   ` Peter Zijlstra
2019-03-13 15:28     ` Patrick Bellasi
2019-03-13 14:09   ` Peter Zijlstra
2019-03-13 15:23     ` Patrick Bellasi
2019-03-13 19:46       ` Peter Zijlstra
2019-03-13 21:08         ` Suren Baghdasaryan
2019-03-14 12:22           ` Patrick Bellasi
2019-03-14 11:45         ` Patrick Bellasi
2019-03-13 21:32   ` Suren Baghdasaryan
2019-03-14 14:46     ` Patrick Bellasi
2019-03-14 15:29       ` Suren Baghdasaryan
2019-03-14 15:40         ` Patrick Bellasi
2019-03-14 16:39           ` Suren Baghdasaryan
2019-02-08 10:05 ` [PATCH v7 02/15] sched/core: uclamp: Enforce last task UCLAMP_MAX Patrick Bellasi
2019-03-13 14:10   ` Peter Zijlstra
2019-03-13 16:20     ` Patrick Bellasi
2019-03-13 17:29       ` Peter Zijlstra
2019-03-13 18:29         ` Patrick Bellasi
2019-03-13 14:12   ` Peter Zijlstra
2019-03-13 16:16     ` Patrick Bellasi
2019-03-14  0:29       ` Suren Baghdasaryan
2019-03-14 17:06         ` Patrick Bellasi
2019-02-08 10:05 ` [PATCH v7 03/15] sched/core: uclamp: Add system default clamps Patrick Bellasi
2019-03-13 14:32   ` Peter Zijlstra
2019-03-13 17:09     ` Patrick Bellasi
2019-03-13 19:58       ` Peter Zijlstra
2019-03-13 20:10       ` Peter Zijlstra
2019-03-15 13:41         ` Patrick Bellasi
2019-03-13 20:13   ` Peter Zijlstra
2019-03-13 20:18   ` Peter Zijlstra
2019-03-18 12:18     ` Patrick Bellasi
2019-03-18 13:10       ` Peter Zijlstra
2019-03-18 14:21         ` Patrick Bellasi
2019-03-18 14:29           ` Peter Zijlstra
2019-02-08 10:05 ` [PATCH v7 04/15] sched/core: Allow sched_setattr() to use the current policy Patrick Bellasi
2019-02-08 10:05 ` [PATCH v7 05/15] sched/core: uclamp: Extend sched_setattr() to support utilization clamping Patrick Bellasi
2019-02-08 10:05 ` [PATCH v7 06/15] sched/core: uclamp: Reset uclamp values on RESET_ON_FORK Patrick Bellasi
2019-03-13 20:52   ` Peter Zijlstra
2019-03-18 12:58     ` Patrick Bellasi
2019-02-08 10:05 ` [PATCH v7 07/15] sched/core: uclamp: Set default clamps for RT tasks Patrick Bellasi
2019-02-08 10:05 ` [PATCH v7 08/15] sched/cpufreq: uclamp: Add clamps for FAIR and " Patrick Bellasi
2019-02-08 10:05 ` [PATCH v7 09/15] sched/core: uclamp: Add uclamp_util_with() Patrick Bellasi
2019-02-08 10:05 ` [PATCH v7 10/15] sched/fair: uclamp: Add uclamp support to energy_compute() Patrick Bellasi
2019-03-06 17:21   ` Quentin Perret
2019-03-18 15:19     ` Patrick Bellasi
2019-02-08 10:05 ` [PATCH v7 11/15] sched/core: uclamp: Extend CPU's cgroup controller Patrick Bellasi
2019-02-14 15:48   ` Tejun Heo
2019-03-19 10:00     ` Patrick Bellasi
2019-02-08 10:05 ` [PATCH v7 12/15] sched/core: uclamp: Propagate parent clamps Patrick Bellasi
2019-03-14 16:17   ` Suren Baghdasaryan
2019-03-18 16:54     ` Patrick Bellasi
2019-03-18 16:58       ` Suren Baghdasaryan
2019-02-08 10:05 ` [PATCH v7 13/15] sched/core: uclamp: Propagate system defaults to root group Patrick Bellasi
2019-02-08 10:05 ` [PATCH v7 14/15] sched/core: uclamp: Use TG's clamps to restrict TASK's clamps Patrick Bellasi
2019-02-08 10:05 ` [PATCH v7 15/15] sched/core: uclamp: Update CPU's refcount on TG's clamp changes Patrick Bellasi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190313155954.jse2tyn5iqxm6wle@e110439-lin \
    --to=patrick.bellasi@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=quentin.perret@arm.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=smuckle@google.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=tkjos@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).