LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Suren Baghdasaryan <surenb@google.com>
To: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Paul Turner <pjt@google.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Juri Lelli <juri.lelli@redhat.com>, Todd Kjos <tkjos@google.com>,
	Joel Fernandes <joelaf@google.com>,
	Steve Muckle <smuckle@google.com>
Subject: Re: [PATCH v2 02/12] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
Date: Fri, 20 Jul 2018 17:25:31 -0700
Message-ID: <CAJuCfpGvpX5U9rED+vX=wW624GLZo8f=-2kOLGrY1UJSEXRh6g@mail.gmail.com> (raw)
In-Reply-To: <20180720151156.GA31421@e110439-lin>

Hi Patrick,

On Fri, Jul 20, 2018 at 8:11 AM, Patrick Bellasi
<patrick.bellasi@arm.com> wrote:
> Hi Suren,
> thanks for the review, all good point... some more comments follow
> inline.
>
> On 19-Jul 16:51, Suren Baghdasaryan wrote:
>> On Mon, Jul 16, 2018 at 1:28 AM, Patrick Bellasi
>> <patrick.bellasi@arm.com> wrote:
>
> [...]
>
>> > +/**
>> > + * uclamp_group_available: checks if a clamp group is available
>> > + * @clamp_id: the utilization clamp index (i.e. min or max clamp)
>> > + * @group_id: the group index in the given clamp_id
>> > + *
>> > + * A clamp group is not free if there is at least one SE which is sing a clamp
>>
>> Did you mean to say "single clamp"?
>
> No, it's "...at least one SE which is USING a clamp value..."
>
>> > + * value mapped on the specified clamp_id. These SEs are reference counted by
>> > + * the se_count of a uclamp_map entry.
>> > + *
>> > + * Return: true if there are no SE's mapped on the specified clamp
>> > + *         index and group
>> > + */
>> > +static inline bool uclamp_group_available(int clamp_id, int group_id)
>> > +{
>> > +       struct uclamp_map *uc_map = &uclamp_maps[clamp_id][0];
>> > +
>> > +       return (uc_map[group_id].value == UCLAMP_NONE);
>>
>> The usage of UCLAMP_NONE is very confusing to me. It was not used at
>> all in the patch where it was introduced [1/12], here it's used as a
>> clamp value and in uclamp_group_find() it's used as group_id. Please
>> clarify the usage.
>
> Yes, it's meant to represent a "clamp not valid" condition, whatever
> it's a "clamp group" or a "clamp value"... perhaps the name can be
> improved.
>
>> I also feel UCLAMP_NONE does not really belong to
>> the uclamp_id enum because other elements there are indexes in
>> uclamp_maps and this one is a special value.
>
> Right, it looks a bit misplaced, I agree. I think I tried to set it
> using a #define but there was some issues I don't remember now...
> Anyway, I'll give it another go...
>
>
>> IMHO if both *group_id*
>> and *value* need a special value (-1) to represent
>> unused/uninitialized entry it would be better to use different
>> constants. Maybe UCLAMP_VAL_NONE and UCLAMP_GROUP_NONE?
>
> Yes, maybe we can use a
>
>    #define UCLAMP_NOT_VALID -1
>
> and get rid the confusing enum entry.
>
> Will update it on v3.
>

Sounds good to me.

>> > +}
>
> [...]
>
>> > +/**
>> > + * uclamp_group_find: finds the group index of a utilization clamp group
>> > + * @clamp_id: the utilization clamp index (i.e. min or max clamping)
>> > + * @clamp_value: the utilization clamping value lookup for
>> > + *
>> > + * Verify if a group has been assigned to a certain clamp value and return
>> > + * its index to be used for accounting.
>> > + *
>> > + * Since only a limited number of utilization clamp groups are allowed, if no
>> > + * groups have been assigned for the specified value, a new group is assigned
>> > + * if possible. Otherwise an error is returned, meaning that an additional clamp
>> > + * value is not (currently) supported.
>> > + */
>> > +static int
>> > +uclamp_group_find(int clamp_id, unsigned int clamp_value)
>> > +{
>> > +       struct uclamp_map *uc_map = &uclamp_maps[clamp_id][0];
>> > +       int free_group_id = UCLAMP_NONE;
>> > +       unsigned int group_id = 0;
>> > +
>> > +       for ( ; group_id <= CONFIG_UCLAMP_GROUPS_COUNT; ++group_id) {
>> > +               /* Keep track of first free clamp group */
>> > +               if (uclamp_group_available(clamp_id, group_id)) {
>> > +                       if (free_group_id == UCLAMP_NONE)
>> > +                               free_group_id = group_id;
>> > +                       continue;
>> > +               }
>> > +               /* Return index of first group with same clamp value */
>> > +               if (uc_map[group_id].value == clamp_value)
>> > +                       return group_id;
>> > +       }
>> > +       /* Default to first free clamp group */
>> > +       if (group_id > CONFIG_UCLAMP_GROUPS_COUNT)
>>
>> Is the condition above needed? I think it's always true if you got here.
>> Also AFAIKT after the for loop you can just do:
>>
>> return (free_group_id != UCLAMP_NONE) ? free_group_id : -ENOSPC;
>
> Yes, you right... the code above can be simplified!
>
>>
>> > +               group_id = free_group_id;
>> > +       /* All clamp group already track different clamp values */
>> > +       if (group_id == UCLAMP_NONE)
>> > +               return -ENOSPC;
>> > +       return group_id;
>> > +}
>
> [...]
>
>> > +static inline void uclamp_group_put(int clamp_id, int group_id)
>> > +{
>> > +       struct uclamp_map *uc_map = &uclamp_maps[clamp_id][0];
>> > +       unsigned long flags;
>> > +
>> > +       /* Ignore SE's not yet attached */
>> > +       if (group_id == UCLAMP_NONE)
>> > +               return;
>> > +
>> > +       /* Remove SE from this clamp group */
>> > +       raw_spin_lock_irqsave(&uc_map[group_id].se_lock, flags);
>> > +       uc_map[group_id].se_count -= 1;
>>
>> If uc_map[group_id].se_count was 0 before decrement you end up with
>> se_count == -1 and no reset for the element.
>
> Well... this should never happen, otherwise the refcounting is not
> working as expected.
>
> Maybe we can add (at least) a debug check and warning, something like:
>
> #ifdef SCHED_DEBUG
>         if (unlikely(uc_map[group_id].se_count == 0)) {
>                 WARN(1, "invalid clamp group [%d:%d] refcount\n",
>                                 clamp_id, group_id);
>                 uc_map[group_id].se_count = 1;
>         }
> #endif
>
>> > +       if (uc_map[group_id].se_count == 0)
>> > +               uclamp_group_reset(clamp_id, group_id);
>> > +       raw_spin_unlock_irqrestore(&uc_map[group_id].se_lock, flags);
>> > +}
>> > +
>
> [...]
>
>> >  static inline int __setscheduler_uclamp(struct task_struct *p,
>> >                                         const struct sched_attr *attr)
>> >  {
>> > +       struct uclamp_se *uc_se;
>> > +       int retval = 0;
>> > +
>> >         if (attr->sched_util_min > attr->sched_util_max)
>> >                 return -EINVAL;
>> >         if (attr->sched_util_max > SCHED_CAPACITY_SCALE)
>> >                 return -EINVAL;
>> >
>> > -       p->uclamp[UCLAMP_MIN] = attr->sched_util_min;
>> > -       p->uclamp[UCLAMP_MAX] = attr->sched_util_max;
>> > +       mutex_lock(&uclamp_mutex);
>> > +
>> > +       /* Update min utilization clamp */
>> > +       uc_se = &p->uclamp[UCLAMP_MIN];
>> > +       retval |= uclamp_group_get(p, UCLAMP_MIN, uc_se,
>> > +                                  attr->sched_util_min);
>> > +
>> > +       /* Update max utilization clamp */
>> > +       uc_se = &p->uclamp[UCLAMP_MAX];
>> > +       retval |= uclamp_group_get(p, UCLAMP_MAX, uc_se,
>> > +                                  attr->sched_util_max);
>> > +
>> > +       mutex_unlock(&uclamp_mutex);
>> > +
>> > +       /*
>> > +        * If one of the two clamp values should fail,
>> > +        * let the userspace know.
>> > +        */
>> > +       if (retval)
>> > +               return -ENOSPC;
>>
>> Maybe a minor issue but this failure is ambiguous. It might mean:
>> 1. no clamp value was updated
>> 2. UCLAMP_MIN was updated but UCLAMP_MAX was not
>> 3. UCLAMP_MAX was updated but UCLAMP_MIN was not
>
> That's right, I put a bit of thought on that me too but at the end
> I've been convinced that the possibility to use a single syscall to
> set both clamps at the same time has some benefits for user-space.
>
> Maybe the current solution can be improved by supporting an (optional)
> strict semantic with an in-kernel roll-back in case one of the two
> uclamp_group_get should fail.
>
> The strict semantic with roll-back could be controller via an
> additional flag, e.g. SCHED_FLAG_UTIL_CLAMP_STRICT.
>
> When the flag is set, either we are able to set both the attributes or
> we roll-back. Otherwise, when the flag is not set, we keep the current
> behavior. i.e. we set what we can and report an error to notify
> userspace that one constraints was not enforced.
>
> The following snippet should implement this semantics:
>
> ---8<---
>
> /* Uclamp flags */
> #define SCHED_FLAG_UTIL_CLAMP_STRICT    0x11 /* Roll-back on failure */
> #define SCHED_FLAG_UTIL_CLAMP_MIN       0x12 /* Update util_min */
> #define SCHED_FLAG_UTIL_CLAMP_MAX       0x14 /* Update util_max */
> #define SCHED_FLAG_UTIL_CLAMP           ( \
>                 SCHED_FLAG_UTIL_CLAMP_MIN | SCHED_FLAG_UTIL_CLAMP_MAX)
>

Having ability to update only min or only max this way might be indeed
very useful.
Instead of rolling back on failure I would suggest to check both
inputs first to make sure there won't be any error before updating.
This would remove the need for SCHED_FLAG_UTIL_CLAMP_STRICT (which I
think any user would want to set to 1 anyway).
Looks like uclamp_group_get() can fail only if uclamp_group_find()
fails to find a slot for uclamp_value or a free slot. So one way to do
this search before update is to call uclamp_group_find() for both
UCLAMP_MIN and UCLAMP_MAX beforehand and if they succeed then pass
obtained next_group_ids into uclamp_group_get() to avoid doing the
same search twice. This requires some refactoring of
uclamp_group_get() but I think the end result would be a cleaner and
more predictable solution.

> static inline int __setscheduler_uclamp(struct task_struct *p,
>                                         const struct sched_attr *attr)
> {
>         unsigned int uclamp_value_old = 0;
>         unsigned int uclamp_value;
>         struct uclamp_se *uc_se;
>         int retval = 0;
>
>         if (attr->sched_util_min > attr->sched_util_max)
>                 return -EINVAL;
>         if (attr->sched_util_max > 100)
>                 return -EINVAL;
>
>         mutex_lock(&uclamp_mutex);
>
>         if (!(attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN))
>                 goto set_util_max;
>
>         uc_se = &p->uclamp[UCLAMP_MIN];
>         uclamp_value = scale_from_percent(attr->sched_util_min);
>         if (uc_se->value == uclamp_value)
>                 goto set_util_max;
>
>         /* Update min utilization clamp */
>         uclamp_value_old = uc_se->value;
>         retval |= uclamp_group_get(p, NULL, UCLAMP_MIN, uc_se, uclamp_value);
>         if (retval &&
>             attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_STRICT)
>                 goto exit_failure;
>
> set_util_max:
>
>         if (!(attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX))
>                 goto exit_done;
>
>         uc_se = &p->uclamp[UCLAMP_MAX];
>         uclamp_value = scale_from_percent(attr->sched_util_max);
>         if (uc_se->value == uclamp_value)
>                 goto exit_done;
>
>         /* Update max utilization clamp */
>         if (uclamp_group_get(p, NULL, UCLAMP_MAX,
>                              uc_se, uclamp_value))
>                 goto exit_rollback;
>
> exit_done:
>         mutex_unlock(&uclamp_mutex);
>         return retval;
>
> exit_rollback:
>         if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN &&
>             attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_STRICT) {
>                 uclamp_group_get(p, NULL, UCLAMP_MIN,
>                                  uc_se, uclamp_value_old);
>         }
> exit_failure:
>         mutex_unlock(&uclamp_mutex);
>
>         return -ENOSPC;
> }
>
> ---8<---
>
>
> Could that work better?
>
> The code is maybe a bit more convoluted... but perhaps it can be
> improved by encoding it in a loop.
>
>
> --
> #include <best/regards.h>
>
> Patrick Bellasi

  reply index

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16  8:28 [PATCH v2 00/12] Add utilization clamping support Patrick Bellasi
2018-07-16  8:28 ` [PATCH v2 01/12] sched/core: uclamp: extend sched_setattr to support utilization clamping Patrick Bellasi
2018-07-17 17:50   ` Joel Fernandes
2018-07-18  8:42     ` Patrick Bellasi
2018-07-18 17:02       ` Joel Fernandes
2018-07-17 18:04   ` Joel Fernandes
2018-07-16  8:28 ` [PATCH v2 02/12] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups Patrick Bellasi
2018-07-19 23:51   ` Suren Baghdasaryan
2018-07-20 15:11     ` Patrick Bellasi
2018-07-21  0:25       ` Suren Baghdasaryan [this message]
2018-07-23 13:36         ` Patrick Bellasi
2018-07-16  8:28 ` [PATCH v2 03/12] sched/core: uclamp: add CPU's clamp groups accounting Patrick Bellasi
2018-07-20 20:25   ` Suren Baghdasaryan
2018-07-16  8:28 ` [PATCH v2 04/12] sched/core: uclamp: update CPU's refcount on clamp changes Patrick Bellasi
2018-07-16  8:28 ` [PATCH v2 05/12] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks Patrick Bellasi
2018-07-16  8:29 ` [PATCH v2 06/12] sched/cpufreq: uclamp: add utilization clamping for RT tasks Patrick Bellasi
2018-07-16  8:29 ` [PATCH v2 07/12] sched/core: uclamp: enforce last task UCLAMP_MAX Patrick Bellasi
2018-07-21  1:23   ` Suren Baghdasaryan
2018-07-23 15:02     ` Patrick Bellasi
2018-07-23 16:40       ` Suren Baghdasaryan
2018-07-16  8:29 ` [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller Patrick Bellasi
2018-07-21  2:37   ` Suren Baghdasaryan
2018-07-21  3:16     ` Suren Baghdasaryan
2018-07-23 15:17     ` Patrick Bellasi
2018-07-23 15:30   ` Tejun Heo
2018-07-23 17:22     ` Patrick Bellasi
2018-07-24 13:29       ` Tejun Heo
2018-07-24 15:39         ` Patrick Bellasi
2018-07-27  0:39         ` Joel Fernandes
2018-07-27  8:09           ` Quentin Perret
2018-07-16  8:29 ` [PATCH v2 09/12] sched/core: uclamp: map TG's clamp values into CPU's clamp groups Patrick Bellasi
2018-07-16  8:29 ` [PATCH v2 10/12] sched/core: uclamp: use TG's clamps to restrict Task's clamps Patrick Bellasi
2018-07-22  3:05   ` Suren Baghdasaryan
2018-07-23 15:40     ` Patrick Bellasi
2018-07-23 17:11       ` Suren Baghdasaryan
2018-07-24  9:56         ` Patrick Bellasi
2018-07-24 15:28           ` Suren Baghdasaryan
2018-07-24 15:49             ` Patrick Bellasi
2018-07-16  8:29 ` [PATCH v2 11/12] sched/core: uclamp: update CPU's refcount on TG's clamp changes Patrick Bellasi
2018-07-22  3:17   ` Suren Baghdasaryan
2018-07-16  8:29 ` [PATCH v2 12/12] sched/core: uclamp: use percentage clamp values Patrick Bellasi
2018-07-22  4:04   ` Suren Baghdasaryan
2018-07-24 16:43     ` Patrick Bellasi
2018-07-24 17:11       ` Suren Baghdasaryan
2018-07-24 17:17         ` Patrick Bellasi
2018-07-17 13:03 ` [PATCH v2 00/12] Add utilization clamping support Joel Fernandes
2018-07-17 13:41   ` Patrick Bellasi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJuCfpGvpX5U9rED+vX=wW624GLZo8f=-2kOLGrY1UJSEXRh6g@mail.gmail.com' \
    --to=surenb@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=patrick.bellasi@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=smuckle@google.com \
    --cc=tj@kernel.org \
    --cc=tkjos@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git
	git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git
	git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git