From: Suren Baghdasaryan <surenb@google.com>
To: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>,
"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
Viresh Kumar <viresh.kumar@linaro.org>,
Vincent Guittot <vincent.guittot@linaro.org>,
Paul Turner <pjt@google.com>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Morten Rasmussen <morten.rasmussen@arm.com>,
Juri Lelli <juri.lelli@redhat.com>, Todd Kjos <tkjos@google.com>,
Joel Fernandes <joelaf@google.com>,
Steve Muckle <smuckle@google.com>
Subject: Re: [PATCH v2 02/12] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
Date: Fri, 20 Jul 2018 17:25:31 -0700 [thread overview]
Message-ID: <CAJuCfpGvpX5U9rED+vX=wW624GLZo8f=-2kOLGrY1UJSEXRh6g@mail.gmail.com> (raw)
In-Reply-To: <20180720151156.GA31421@e110439-lin>
Hi Patrick,
On Fri, Jul 20, 2018 at 8:11 AM, Patrick Bellasi
<patrick.bellasi@arm.com> wrote:
> Hi Suren,
> thanks for the review, all good point... some more comments follow
> inline.
>
> On 19-Jul 16:51, Suren Baghdasaryan wrote:
>> On Mon, Jul 16, 2018 at 1:28 AM, Patrick Bellasi
>> <patrick.bellasi@arm.com> wrote:
>
> [...]
>
>> > +/**
>> > + * uclamp_group_available: checks if a clamp group is available
>> > + * @clamp_id: the utilization clamp index (i.e. min or max clamp)
>> > + * @group_id: the group index in the given clamp_id
>> > + *
>> > + * A clamp group is not free if there is at least one SE which is sing a clamp
>>
>> Did you mean to say "single clamp"?
>
> No, it's "...at least one SE which is USING a clamp value..."
>
>> > + * value mapped on the specified clamp_id. These SEs are reference counted by
>> > + * the se_count of a uclamp_map entry.
>> > + *
>> > + * Return: true if there are no SE's mapped on the specified clamp
>> > + * index and group
>> > + */
>> > +static inline bool uclamp_group_available(int clamp_id, int group_id)
>> > +{
>> > + struct uclamp_map *uc_map = &uclamp_maps[clamp_id][0];
>> > +
>> > + return (uc_map[group_id].value == UCLAMP_NONE);
>>
>> The usage of UCLAMP_NONE is very confusing to me. It was not used at
>> all in the patch where it was introduced [1/12], here it's used as a
>> clamp value and in uclamp_group_find() it's used as group_id. Please
>> clarify the usage.
>
> Yes, it's meant to represent a "clamp not valid" condition, whatever
> it's a "clamp group" or a "clamp value"... perhaps the name can be
> improved.
>
>> I also feel UCLAMP_NONE does not really belong to
>> the uclamp_id enum because other elements there are indexes in
>> uclamp_maps and this one is a special value.
>
> Right, it looks a bit misplaced, I agree. I think I tried to set it
> using a #define but there was some issues I don't remember now...
> Anyway, I'll give it another go...
>
>
>> IMHO if both *group_id*
>> and *value* need a special value (-1) to represent
>> unused/uninitialized entry it would be better to use different
>> constants. Maybe UCLAMP_VAL_NONE and UCLAMP_GROUP_NONE?
>
> Yes, maybe we can use a
>
> #define UCLAMP_NOT_VALID -1
>
> and get rid the confusing enum entry.
>
> Will update it on v3.
>
Sounds good to me.
>> > +}
>
> [...]
>
>> > +/**
>> > + * uclamp_group_find: finds the group index of a utilization clamp group
>> > + * @clamp_id: the utilization clamp index (i.e. min or max clamping)
>> > + * @clamp_value: the utilization clamping value lookup for
>> > + *
>> > + * Verify if a group has been assigned to a certain clamp value and return
>> > + * its index to be used for accounting.
>> > + *
>> > + * Since only a limited number of utilization clamp groups are allowed, if no
>> > + * groups have been assigned for the specified value, a new group is assigned
>> > + * if possible. Otherwise an error is returned, meaning that an additional clamp
>> > + * value is not (currently) supported.
>> > + */
>> > +static int
>> > +uclamp_group_find(int clamp_id, unsigned int clamp_value)
>> > +{
>> > + struct uclamp_map *uc_map = &uclamp_maps[clamp_id][0];
>> > + int free_group_id = UCLAMP_NONE;
>> > + unsigned int group_id = 0;
>> > +
>> > + for ( ; group_id <= CONFIG_UCLAMP_GROUPS_COUNT; ++group_id) {
>> > + /* Keep track of first free clamp group */
>> > + if (uclamp_group_available(clamp_id, group_id)) {
>> > + if (free_group_id == UCLAMP_NONE)
>> > + free_group_id = group_id;
>> > + continue;
>> > + }
>> > + /* Return index of first group with same clamp value */
>> > + if (uc_map[group_id].value == clamp_value)
>> > + return group_id;
>> > + }
>> > + /* Default to first free clamp group */
>> > + if (group_id > CONFIG_UCLAMP_GROUPS_COUNT)
>>
>> Is the condition above needed? I think it's always true if you got here.
>> Also AFAIKT after the for loop you can just do:
>>
>> return (free_group_id != UCLAMP_NONE) ? free_group_id : -ENOSPC;
>
> Yes, you right... the code above can be simplified!
>
>>
>> > + group_id = free_group_id;
>> > + /* All clamp group already track different clamp values */
>> > + if (group_id == UCLAMP_NONE)
>> > + return -ENOSPC;
>> > + return group_id;
>> > +}
>
> [...]
>
>> > +static inline void uclamp_group_put(int clamp_id, int group_id)
>> > +{
>> > + struct uclamp_map *uc_map = &uclamp_maps[clamp_id][0];
>> > + unsigned long flags;
>> > +
>> > + /* Ignore SE's not yet attached */
>> > + if (group_id == UCLAMP_NONE)
>> > + return;
>> > +
>> > + /* Remove SE from this clamp group */
>> > + raw_spin_lock_irqsave(&uc_map[group_id].se_lock, flags);
>> > + uc_map[group_id].se_count -= 1;
>>
>> If uc_map[group_id].se_count was 0 before decrement you end up with
>> se_count == -1 and no reset for the element.
>
> Well... this should never happen, otherwise the refcounting is not
> working as expected.
>
> Maybe we can add (at least) a debug check and warning, something like:
>
> #ifdef SCHED_DEBUG
> if (unlikely(uc_map[group_id].se_count == 0)) {
> WARN(1, "invalid clamp group [%d:%d] refcount\n",
> clamp_id, group_id);
> uc_map[group_id].se_count = 1;
> }
> #endif
>
>> > + if (uc_map[group_id].se_count == 0)
>> > + uclamp_group_reset(clamp_id, group_id);
>> > + raw_spin_unlock_irqrestore(&uc_map[group_id].se_lock, flags);
>> > +}
>> > +
>
> [...]
>
>> > static inline int __setscheduler_uclamp(struct task_struct *p,
>> > const struct sched_attr *attr)
>> > {
>> > + struct uclamp_se *uc_se;
>> > + int retval = 0;
>> > +
>> > if (attr->sched_util_min > attr->sched_util_max)
>> > return -EINVAL;
>> > if (attr->sched_util_max > SCHED_CAPACITY_SCALE)
>> > return -EINVAL;
>> >
>> > - p->uclamp[UCLAMP_MIN] = attr->sched_util_min;
>> > - p->uclamp[UCLAMP_MAX] = attr->sched_util_max;
>> > + mutex_lock(&uclamp_mutex);
>> > +
>> > + /* Update min utilization clamp */
>> > + uc_se = &p->uclamp[UCLAMP_MIN];
>> > + retval |= uclamp_group_get(p, UCLAMP_MIN, uc_se,
>> > + attr->sched_util_min);
>> > +
>> > + /* Update max utilization clamp */
>> > + uc_se = &p->uclamp[UCLAMP_MAX];
>> > + retval |= uclamp_group_get(p, UCLAMP_MAX, uc_se,
>> > + attr->sched_util_max);
>> > +
>> > + mutex_unlock(&uclamp_mutex);
>> > +
>> > + /*
>> > + * If one of the two clamp values should fail,
>> > + * let the userspace know.
>> > + */
>> > + if (retval)
>> > + return -ENOSPC;
>>
>> Maybe a minor issue but this failure is ambiguous. It might mean:
>> 1. no clamp value was updated
>> 2. UCLAMP_MIN was updated but UCLAMP_MAX was not
>> 3. UCLAMP_MAX was updated but UCLAMP_MIN was not
>
> That's right, I put a bit of thought on that me too but at the end
> I've been convinced that the possibility to use a single syscall to
> set both clamps at the same time has some benefits for user-space.
>
> Maybe the current solution can be improved by supporting an (optional)
> strict semantic with an in-kernel roll-back in case one of the two
> uclamp_group_get should fail.
>
> The strict semantic with roll-back could be controller via an
> additional flag, e.g. SCHED_FLAG_UTIL_CLAMP_STRICT.
>
> When the flag is set, either we are able to set both the attributes or
> we roll-back. Otherwise, when the flag is not set, we keep the current
> behavior. i.e. we set what we can and report an error to notify
> userspace that one constraints was not enforced.
>
> The following snippet should implement this semantics:
>
> ---8<---
>
> /* Uclamp flags */
> #define SCHED_FLAG_UTIL_CLAMP_STRICT 0x11 /* Roll-back on failure */
> #define SCHED_FLAG_UTIL_CLAMP_MIN 0x12 /* Update util_min */
> #define SCHED_FLAG_UTIL_CLAMP_MAX 0x14 /* Update util_max */
> #define SCHED_FLAG_UTIL_CLAMP ( \
> SCHED_FLAG_UTIL_CLAMP_MIN | SCHED_FLAG_UTIL_CLAMP_MAX)
>
Having ability to update only min or only max this way might be indeed
very useful.
Instead of rolling back on failure I would suggest to check both
inputs first to make sure there won't be any error before updating.
This would remove the need for SCHED_FLAG_UTIL_CLAMP_STRICT (which I
think any user would want to set to 1 anyway).
Looks like uclamp_group_get() can fail only if uclamp_group_find()
fails to find a slot for uclamp_value or a free slot. So one way to do
this search before update is to call uclamp_group_find() for both
UCLAMP_MIN and UCLAMP_MAX beforehand and if they succeed then pass
obtained next_group_ids into uclamp_group_get() to avoid doing the
same search twice. This requires some refactoring of
uclamp_group_get() but I think the end result would be a cleaner and
more predictable solution.
> static inline int __setscheduler_uclamp(struct task_struct *p,
> const struct sched_attr *attr)
> {
> unsigned int uclamp_value_old = 0;
> unsigned int uclamp_value;
> struct uclamp_se *uc_se;
> int retval = 0;
>
> if (attr->sched_util_min > attr->sched_util_max)
> return -EINVAL;
> if (attr->sched_util_max > 100)
> return -EINVAL;
>
> mutex_lock(&uclamp_mutex);
>
> if (!(attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN))
> goto set_util_max;
>
> uc_se = &p->uclamp[UCLAMP_MIN];
> uclamp_value = scale_from_percent(attr->sched_util_min);
> if (uc_se->value == uclamp_value)
> goto set_util_max;
>
> /* Update min utilization clamp */
> uclamp_value_old = uc_se->value;
> retval |= uclamp_group_get(p, NULL, UCLAMP_MIN, uc_se, uclamp_value);
> if (retval &&
> attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_STRICT)
> goto exit_failure;
>
> set_util_max:
>
> if (!(attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX))
> goto exit_done;
>
> uc_se = &p->uclamp[UCLAMP_MAX];
> uclamp_value = scale_from_percent(attr->sched_util_max);
> if (uc_se->value == uclamp_value)
> goto exit_done;
>
> /* Update max utilization clamp */
> if (uclamp_group_get(p, NULL, UCLAMP_MAX,
> uc_se, uclamp_value))
> goto exit_rollback;
>
> exit_done:
> mutex_unlock(&uclamp_mutex);
> return retval;
>
> exit_rollback:
> if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN &&
> attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_STRICT) {
> uclamp_group_get(p, NULL, UCLAMP_MIN,
> uc_se, uclamp_value_old);
> }
> exit_failure:
> mutex_unlock(&uclamp_mutex);
>
> return -ENOSPC;
> }
>
> ---8<---
>
>
> Could that work better?
>
> The code is maybe a bit more convoluted... but perhaps it can be
> improved by encoding it in a loop.
>
>
> --
> #include <best/regards.h>
>
> Patrick Bellasi
next prev parent reply other threads:[~2018-07-21 0:25 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-16 8:28 [PATCH v2 00/12] Add utilization clamping support Patrick Bellasi
2018-07-16 8:28 ` [PATCH v2 01/12] sched/core: uclamp: extend sched_setattr to support utilization clamping Patrick Bellasi
2018-07-17 17:50 ` Joel Fernandes
2018-07-18 8:42 ` Patrick Bellasi
2018-07-18 17:02 ` Joel Fernandes
2018-07-17 18:04 ` Joel Fernandes
2018-07-16 8:28 ` [PATCH v2 02/12] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups Patrick Bellasi
2018-07-19 23:51 ` Suren Baghdasaryan
2018-07-20 15:11 ` Patrick Bellasi
2018-07-21 0:25 ` Suren Baghdasaryan [this message]
2018-07-23 13:36 ` Patrick Bellasi
2018-07-16 8:28 ` [PATCH v2 03/12] sched/core: uclamp: add CPU's clamp groups accounting Patrick Bellasi
2018-07-20 20:25 ` Suren Baghdasaryan
2018-07-16 8:28 ` [PATCH v2 04/12] sched/core: uclamp: update CPU's refcount on clamp changes Patrick Bellasi
2018-07-16 8:28 ` [PATCH v2 05/12] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks Patrick Bellasi
2018-07-16 8:29 ` [PATCH v2 06/12] sched/cpufreq: uclamp: add utilization clamping for RT tasks Patrick Bellasi
2018-07-16 8:29 ` [PATCH v2 07/12] sched/core: uclamp: enforce last task UCLAMP_MAX Patrick Bellasi
2018-07-21 1:23 ` Suren Baghdasaryan
2018-07-23 15:02 ` Patrick Bellasi
2018-07-23 16:40 ` Suren Baghdasaryan
2018-07-16 8:29 ` [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller Patrick Bellasi
2018-07-21 2:37 ` Suren Baghdasaryan
2018-07-21 3:16 ` Suren Baghdasaryan
2018-07-23 15:17 ` Patrick Bellasi
2018-07-23 15:30 ` Tejun Heo
2018-07-23 17:22 ` Patrick Bellasi
2018-07-24 13:29 ` Tejun Heo
2018-07-24 15:39 ` Patrick Bellasi
2018-07-27 0:39 ` Joel Fernandes
2018-07-27 8:09 ` Quentin Perret
2018-07-16 8:29 ` [PATCH v2 09/12] sched/core: uclamp: map TG's clamp values into CPU's clamp groups Patrick Bellasi
2018-07-16 8:29 ` [PATCH v2 10/12] sched/core: uclamp: use TG's clamps to restrict Task's clamps Patrick Bellasi
2018-07-22 3:05 ` Suren Baghdasaryan
2018-07-23 15:40 ` Patrick Bellasi
2018-07-23 17:11 ` Suren Baghdasaryan
2018-07-24 9:56 ` Patrick Bellasi
2018-07-24 15:28 ` Suren Baghdasaryan
2018-07-24 15:49 ` Patrick Bellasi
2018-07-16 8:29 ` [PATCH v2 11/12] sched/core: uclamp: update CPU's refcount on TG's clamp changes Patrick Bellasi
2018-07-22 3:17 ` Suren Baghdasaryan
2018-07-16 8:29 ` [PATCH v2 12/12] sched/core: uclamp: use percentage clamp values Patrick Bellasi
2018-07-22 4:04 ` Suren Baghdasaryan
2018-07-24 16:43 ` Patrick Bellasi
2018-07-24 17:11 ` Suren Baghdasaryan
2018-07-24 17:17 ` Patrick Bellasi
2018-07-17 13:03 ` [PATCH v2 00/12] Add utilization clamping support Joel Fernandes
2018-07-17 13:41 ` Patrick Bellasi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJuCfpGvpX5U9rED+vX=wW624GLZo8f=-2kOLGrY1UJSEXRh6g@mail.gmail.com' \
--to=surenb@google.com \
--cc=dietmar.eggemann@arm.com \
--cc=joelaf@google.com \
--cc=juri.lelli@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=morten.rasmussen@arm.com \
--cc=patrick.bellasi@arm.com \
--cc=peterz@infradead.org \
--cc=pjt@google.com \
--cc=rafael.j.wysocki@intel.com \
--cc=smuckle@google.com \
--cc=tj@kernel.org \
--cc=tkjos@google.com \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).