LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Patrick Bellasi <patrick.bellasi@arm.com>
To: Suren Baghdasaryan <surenb@google.com>
Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Paul Turner <pjt@google.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Juri Lelli <juri.lelli@redhat.com>, Todd Kjos <tkjos@google.com>,
	Joel Fernandes <joelaf@google.com>,
	Steve Muckle <smuckle@google.com>
Subject: Re: [PATCH v2 02/12] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
Date: Fri, 20 Jul 2018 16:11:56 +0100
Message-ID: <20180720151156.GA31421@e110439-lin> (raw)
In-Reply-To: <CAJuCfpF6=L=0LrmNnJrTNPazT4dWKqNv+thhN0dwpKCgUzs9sg@mail.gmail.com>

Hi Suren,
thanks for the review, all good point... some more comments follow
inline.

On 19-Jul 16:51, Suren Baghdasaryan wrote:
> On Mon, Jul 16, 2018 at 1:28 AM, Patrick Bellasi
> <patrick.bellasi@arm.com> wrote:

[...]

> > +/**
> > + * uclamp_group_available: checks if a clamp group is available
> > + * @clamp_id: the utilization clamp index (i.e. min or max clamp)
> > + * @group_id: the group index in the given clamp_id
> > + *
> > + * A clamp group is not free if there is at least one SE which is sing a clamp
> 
> Did you mean to say "single clamp"?

No, it's "...at least one SE which is USING a clamp value..."

> > + * value mapped on the specified clamp_id. These SEs are reference counted by
> > + * the se_count of a uclamp_map entry.
> > + *
> > + * Return: true if there are no SE's mapped on the specified clamp
> > + *         index and group
> > + */
> > +static inline bool uclamp_group_available(int clamp_id, int group_id)
> > +{
> > +       struct uclamp_map *uc_map = &uclamp_maps[clamp_id][0];
> > +
> > +       return (uc_map[group_id].value == UCLAMP_NONE);
> 
> The usage of UCLAMP_NONE is very confusing to me. It was not used at
> all in the patch where it was introduced [1/12], here it's used as a
> clamp value and in uclamp_group_find() it's used as group_id. Please
> clarify the usage.

Yes, it's meant to represent a "clamp not valid" condition, whatever
it's a "clamp group" or a "clamp value"... perhaps the name can be
improved.

> I also feel UCLAMP_NONE does not really belong to
> the uclamp_id enum because other elements there are indexes in
> uclamp_maps and this one is a special value.

Right, it looks a bit misplaced, I agree. I think I tried to set it
using a #define but there was some issues I don't remember now...
Anyway, I'll give it another go...


> IMHO if both *group_id*
> and *value* need a special value (-1) to represent
> unused/uninitialized entry it would be better to use different
> constants. Maybe UCLAMP_VAL_NONE and UCLAMP_GROUP_NONE?

Yes, maybe we can use a

   #define UCLAMP_NOT_VALID -1

and get rid the confusing enum entry.

Will update it on v3.

> > +}

[...]

> > +/**
> > + * uclamp_group_find: finds the group index of a utilization clamp group
> > + * @clamp_id: the utilization clamp index (i.e. min or max clamping)
> > + * @clamp_value: the utilization clamping value lookup for
> > + *
> > + * Verify if a group has been assigned to a certain clamp value and return
> > + * its index to be used for accounting.
> > + *
> > + * Since only a limited number of utilization clamp groups are allowed, if no
> > + * groups have been assigned for the specified value, a new group is assigned
> > + * if possible. Otherwise an error is returned, meaning that an additional clamp
> > + * value is not (currently) supported.
> > + */
> > +static int
> > +uclamp_group_find(int clamp_id, unsigned int clamp_value)
> > +{
> > +       struct uclamp_map *uc_map = &uclamp_maps[clamp_id][0];
> > +       int free_group_id = UCLAMP_NONE;
> > +       unsigned int group_id = 0;
> > +
> > +       for ( ; group_id <= CONFIG_UCLAMP_GROUPS_COUNT; ++group_id) {
> > +               /* Keep track of first free clamp group */
> > +               if (uclamp_group_available(clamp_id, group_id)) {
> > +                       if (free_group_id == UCLAMP_NONE)
> > +                               free_group_id = group_id;
> > +                       continue;
> > +               }
> > +               /* Return index of first group with same clamp value */
> > +               if (uc_map[group_id].value == clamp_value)
> > +                       return group_id;
> > +       }
> > +       /* Default to first free clamp group */
> > +       if (group_id > CONFIG_UCLAMP_GROUPS_COUNT)
> 
> Is the condition above needed? I think it's always true if you got here.
> Also AFAIKT after the for loop you can just do:
> 
> return (free_group_id != UCLAMP_NONE) ? free_group_id : -ENOSPC;

Yes, you right... the code above can be simplified!

> 
> > +               group_id = free_group_id;
> > +       /* All clamp group already track different clamp values */
> > +       if (group_id == UCLAMP_NONE)
> > +               return -ENOSPC;
> > +       return group_id;
> > +}

[...]

> > +static inline void uclamp_group_put(int clamp_id, int group_id)
> > +{
> > +       struct uclamp_map *uc_map = &uclamp_maps[clamp_id][0];
> > +       unsigned long flags;
> > +
> > +       /* Ignore SE's not yet attached */
> > +       if (group_id == UCLAMP_NONE)
> > +               return;
> > +
> > +       /* Remove SE from this clamp group */
> > +       raw_spin_lock_irqsave(&uc_map[group_id].se_lock, flags);
> > +       uc_map[group_id].se_count -= 1;
> 
> If uc_map[group_id].se_count was 0 before decrement you end up with
> se_count == -1 and no reset for the element.

Well... this should never happen, otherwise the refcounting is not
working as expected.

Maybe we can add (at least) a debug check and warning, something like:

#ifdef SCHED_DEBUG
	if (unlikely(uc_map[group_id].se_count == 0)) {
		WARN(1, "invalid clamp group [%d:%d] refcount\n",
				clamp_id, group_id);
		uc_map[group_id].se_count = 1;
        }
#endif

> > +       if (uc_map[group_id].se_count == 0)
> > +               uclamp_group_reset(clamp_id, group_id);
> > +       raw_spin_unlock_irqrestore(&uc_map[group_id].se_lock, flags);
> > +}
> > +

[...]

> >  static inline int __setscheduler_uclamp(struct task_struct *p,
> >                                         const struct sched_attr *attr)
> >  {
> > +       struct uclamp_se *uc_se;
> > +       int retval = 0;
> > +
> >         if (attr->sched_util_min > attr->sched_util_max)
> >                 return -EINVAL;
> >         if (attr->sched_util_max > SCHED_CAPACITY_SCALE)
> >                 return -EINVAL;
> >
> > -       p->uclamp[UCLAMP_MIN] = attr->sched_util_min;
> > -       p->uclamp[UCLAMP_MAX] = attr->sched_util_max;
> > +       mutex_lock(&uclamp_mutex);
> > +
> > +       /* Update min utilization clamp */
> > +       uc_se = &p->uclamp[UCLAMP_MIN];
> > +       retval |= uclamp_group_get(p, UCLAMP_MIN, uc_se,
> > +                                  attr->sched_util_min);
> > +
> > +       /* Update max utilization clamp */
> > +       uc_se = &p->uclamp[UCLAMP_MAX];
> > +       retval |= uclamp_group_get(p, UCLAMP_MAX, uc_se,
> > +                                  attr->sched_util_max);
> > +
> > +       mutex_unlock(&uclamp_mutex);
> > +
> > +       /*
> > +        * If one of the two clamp values should fail,
> > +        * let the userspace know.
> > +        */
> > +       if (retval)
> > +               return -ENOSPC;
> 
> Maybe a minor issue but this failure is ambiguous. It might mean:
> 1. no clamp value was updated
> 2. UCLAMP_MIN was updated but UCLAMP_MAX was not
> 3. UCLAMP_MAX was updated but UCLAMP_MIN was not

That's right, I put a bit of thought on that me too but at the end
I've been convinced that the possibility to use a single syscall to
set both clamps at the same time has some benefits for user-space.

Maybe the current solution can be improved by supporting an (optional)
strict semantic with an in-kernel roll-back in case one of the two
uclamp_group_get should fail.

The strict semantic with roll-back could be controller via an
additional flag, e.g. SCHED_FLAG_UTIL_CLAMP_STRICT.

When the flag is set, either we are able to set both the attributes or
we roll-back. Otherwise, when the flag is not set, we keep the current
behavior. i.e. we set what we can and report an error to notify
userspace that one constraints was not enforced.

The following snippet should implement this semantics:

---8<---

/* Uclamp flags */
#define SCHED_FLAG_UTIL_CLAMP_STRICT	0x11 /* Roll-back on failure */
#define SCHED_FLAG_UTIL_CLAMP_MIN	0x12 /* Update util_min */
#define SCHED_FLAG_UTIL_CLAMP_MAX	0x14 /* Update util_max */
#define SCHED_FLAG_UTIL_CLAMP		( \
		SCHED_FLAG_UTIL_CLAMP_MIN | SCHED_FLAG_UTIL_CLAMP_MAX)

static inline int __setscheduler_uclamp(struct task_struct *p,
					const struct sched_attr *attr)
{
	unsigned int uclamp_value_old = 0;
	unsigned int uclamp_value;
	struct uclamp_se *uc_se;
	int retval = 0;

	if (attr->sched_util_min > attr->sched_util_max)
		return -EINVAL;
	if (attr->sched_util_max > 100)
		return -EINVAL;

	mutex_lock(&uclamp_mutex);

	if (!(attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN))
		goto set_util_max;

	uc_se = &p->uclamp[UCLAMP_MIN];
	uclamp_value = scale_from_percent(attr->sched_util_min);
	if (uc_se->value == uclamp_value)
		goto set_util_max;

	/* Update min utilization clamp */
	uclamp_value_old = uc_se->value;
	retval |= uclamp_group_get(p, NULL, UCLAMP_MIN, uc_se, uclamp_value);
	if (retval &&
	    attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_STRICT)
		goto exit_failure;

set_util_max:

	if (!(attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MAX))
		goto exit_done;

	uc_se = &p->uclamp[UCLAMP_MAX];
	uclamp_value = scale_from_percent(attr->sched_util_max);
	if (uc_se->value == uclamp_value)
		goto exit_done;

	/* Update max utilization clamp */
	if (uclamp_group_get(p, NULL, UCLAMP_MAX,
			     uc_se, uclamp_value))
		goto exit_rollback;

exit_done:
	mutex_unlock(&uclamp_mutex);
	return retval;

exit_rollback:
	if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN &&
	    attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_STRICT) {
		uclamp_group_get(p, NULL, UCLAMP_MIN,
				 uc_se, uclamp_value_old);
	}
exit_failure:
	mutex_unlock(&uclamp_mutex);

	return -ENOSPC;
}

---8<---


Could that work better?

The code is maybe a bit more convoluted... but perhaps it can be
improved by encoding it in a loop.


-- 
#include <best/regards.h>

Patrick Bellasi

  reply index

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16  8:28 [PATCH v2 00/12] Add utilization clamping support Patrick Bellasi
2018-07-16  8:28 ` [PATCH v2 01/12] sched/core: uclamp: extend sched_setattr to support utilization clamping Patrick Bellasi
2018-07-17 17:50   ` Joel Fernandes
2018-07-18  8:42     ` Patrick Bellasi
2018-07-18 17:02       ` Joel Fernandes
2018-07-17 18:04   ` Joel Fernandes
2018-07-16  8:28 ` [PATCH v2 02/12] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups Patrick Bellasi
2018-07-19 23:51   ` Suren Baghdasaryan
2018-07-20 15:11     ` Patrick Bellasi [this message]
2018-07-21  0:25       ` Suren Baghdasaryan
2018-07-23 13:36         ` Patrick Bellasi
2018-07-16  8:28 ` [PATCH v2 03/12] sched/core: uclamp: add CPU's clamp groups accounting Patrick Bellasi
2018-07-20 20:25   ` Suren Baghdasaryan
2018-07-16  8:28 ` [PATCH v2 04/12] sched/core: uclamp: update CPU's refcount on clamp changes Patrick Bellasi
2018-07-16  8:28 ` [PATCH v2 05/12] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks Patrick Bellasi
2018-07-16  8:29 ` [PATCH v2 06/12] sched/cpufreq: uclamp: add utilization clamping for RT tasks Patrick Bellasi
2018-07-16  8:29 ` [PATCH v2 07/12] sched/core: uclamp: enforce last task UCLAMP_MAX Patrick Bellasi
2018-07-21  1:23   ` Suren Baghdasaryan
2018-07-23 15:02     ` Patrick Bellasi
2018-07-23 16:40       ` Suren Baghdasaryan
2018-07-16  8:29 ` [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller Patrick Bellasi
2018-07-21  2:37   ` Suren Baghdasaryan
2018-07-21  3:16     ` Suren Baghdasaryan
2018-07-23 15:17     ` Patrick Bellasi
2018-07-23 15:30   ` Tejun Heo
2018-07-23 17:22     ` Patrick Bellasi
2018-07-24 13:29       ` Tejun Heo
2018-07-24 15:39         ` Patrick Bellasi
2018-07-27  0:39         ` Joel Fernandes
2018-07-27  8:09           ` Quentin Perret
2018-07-16  8:29 ` [PATCH v2 09/12] sched/core: uclamp: map TG's clamp values into CPU's clamp groups Patrick Bellasi
2018-07-16  8:29 ` [PATCH v2 10/12] sched/core: uclamp: use TG's clamps to restrict Task's clamps Patrick Bellasi
2018-07-22  3:05   ` Suren Baghdasaryan
2018-07-23 15:40     ` Patrick Bellasi
2018-07-23 17:11       ` Suren Baghdasaryan
2018-07-24  9:56         ` Patrick Bellasi
2018-07-24 15:28           ` Suren Baghdasaryan
2018-07-24 15:49             ` Patrick Bellasi
2018-07-16  8:29 ` [PATCH v2 11/12] sched/core: uclamp: update CPU's refcount on TG's clamp changes Patrick Bellasi
2018-07-22  3:17   ` Suren Baghdasaryan
2018-07-16  8:29 ` [PATCH v2 12/12] sched/core: uclamp: use percentage clamp values Patrick Bellasi
2018-07-22  4:04   ` Suren Baghdasaryan
2018-07-24 16:43     ` Patrick Bellasi
2018-07-24 17:11       ` Suren Baghdasaryan
2018-07-24 17:17         ` Patrick Bellasi
2018-07-17 13:03 ` [PATCH v2 00/12] Add utilization clamping support Joel Fernandes
2018-07-17 13:41   ` Patrick Bellasi

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180720151156.GA31421@e110439-lin \
    --to=patrick.bellasi@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=smuckle@google.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=tkjos@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox