All of lore.kernel.org
 help / color / mirror / Atom feed
From: Patrick Bellasi <patrick.bellasi@arm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	Ingo Molnar <mingo@redhat.com>, Tejun Heo <tj@kernel.org>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Paul Turner <pjt@google.com>,
	Quentin Perret <quentin.perret@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Juri Lelli <juri.lelli@redhat.com>, Todd Kjos <tkjos@google.com>,
	Joel Fernandes <joelaf@google.com>,
	Steve Muckle <smuckle@google.com>,
	Suren Baghdasaryan <surenb@google.com>
Subject: Re: [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups
Date: Wed, 12 Sep 2018 16:56:19 +0100	[thread overview]
Message-ID: <20180912155619.GG1413@e110439-lin> (raw)
In-Reply-To: <20180912134945.GZ24106@hirez.programming.kicks-ass.net>

On 12-Sep 15:49, Peter Zijlstra wrote:
> On Tue, Aug 28, 2018 at 02:53:10PM +0100, Patrick Bellasi wrote:
> > +/**
> > + * Utilization's clamp group
> > + *
> > + * A utilization clamp group maps a "clamp value" (value), i.e.
> > + * util_{min,max}, to a "clamp group index" (group_id).
> > + */
> > +struct uclamp_se {
> > +	unsigned int value;
> > +	unsigned int group_id;
> > +};
> 
> > +/**
> > + * uclamp_map: reference counts a utilization "clamp value"
> > + * @value:    the utilization "clamp value" required
> > + * @se_count: the number of scheduling entities requiring the "clamp value"
> > + * @se_lock:  serialize reference count updates by protecting se_count
> 
> Why do you have a spinlock to serialize a single value? Don't we have
> atomics for that?

There are some code paths where it's used to protect clamp groups
mapping and initialization, e.g.

      uclamp_group_get()
          spin_lock()
              // initialize clamp group (if required) and then...
              se_count += 1
          spin_unlock()

Almost all these paths are triggered from user-space and protected
by a global uclamp_mutex, but fork/exit paths.

To serialize these paths I'm using the spinlock above, does it make
sense ? Can we use the global uclamp_mutex on forks/exit too ?

One additional observations is that, if in the future we want to add a
kernel space API, (e.g. driver asking for a new clamp value), maybe we
will need to have a serialized non-sleeping uclamp_group_get() API ?

> > + */
> > +struct uclamp_map {
> > +	int value;
> > +	int se_count;
> > +	raw_spinlock_t se_lock;
> > +};
> > +
> > +/**
> > + * uclamp_maps: maps each SEs "clamp value" into a CPUs "clamp group"
> > + *
> > + * Since only a limited number of different "clamp values" are supported, we
> > + * need to map each different clamp value into a "clamp group" (group_id) to
> > + * be used by the per-CPU accounting in the fast-path, when tasks are
> > + * enqueued and dequeued.
> > + * We also support different kind of utilization clamping, min and max
> > + * utilization for example, each representing what we call a "clamp index"
> > + * (clamp_id).
> > + *
> > + * A matrix is thus required to map "clamp values" to "clamp groups"
> > + * (group_id), for each "clamp index" (clamp_id), where:
> > + * - rows are indexed by clamp_id and they collect the clamp groups for a
> > + *   given clamp index
> > + * - columns are indexed by group_id and they collect the clamp values which
> > + *   maps to that clamp group
> > + *
> > + * Thus, the column index of a given (clamp_id, value) pair represents the
> > + * clamp group (group_id) used by the fast-path's per-CPU accounting.
> > + *
> > + * NOTE: first clamp group (group_id=0) is reserved for tracking of non
> > + * clamped tasks.  Thus we allocate one more slot than the value of
> > + * CONFIG_UCLAMP_GROUPS_COUNT.
> > + *
> > + * Here is the map layout and, right below, how entries are accessed by the
> > + * following code.
> > + *
> > + *                          uclamp_maps is a matrix of
> > + *          +------- UCLAMP_CNT by CONFIG_UCLAMP_GROUPS_COUNT+1 entries
> > + *          |                                |
> > + *          |                /---------------+---------------\
> > + *          |               +------------+       +------------+
> > + *          |  / UCLAMP_MIN | value      |       | value      |
> > + *          |  |            | se_count   |...... | se_count   |
> > + *          |  |            +------------+       +------------+
> > + *          +--+            +------------+       +------------+
> > + *             |            | value      |       | value      |
> > + *             \ UCLAMP_MAX | se_count   |...... | se_count   |
> > + *                          +-----^------+       +----^-------+
> > + *                                |                   |
> > + *                      uc_map =  +                   |
> > + *                     &uclamp_maps[clamp_id][0]      +
> > + *                                                clamp_value =
> > + *                                       uc_map[group_id].value
> > + */
> > +static struct uclamp_map uclamp_maps[UCLAMP_CNT]
> > +				    [CONFIG_UCLAMP_GROUPS_COUNT + 1]
> > +				    ____cacheline_aligned_in_smp;
> > +
> 
> I'm still completely confused by all this.
> 
> sizeof(uclamp_map) = 12
> 
> that array is 2*6=12 of those, so the whole thing is 144 bytes. which is
> more than 2 (64 byte) cachelines.

This data structure is *not* used in the hot-path, that's why I did not
care about fitting it exactly into few cache lines.

It's used to map a user-space "clamp value" into a kernel-space "clamp
group" when user-space:
 - changes a task specific clamp value
 - changes a cgroup clamp value
 - a task forks/exits

I assume we can consider all those as "slow" code paths, is that correct ?

At enqueue/dequeue time we use instead struct uclamp_cpu, introduced
by the next patch:

   [PATCH v4 03/16] sched/core: uclamp: add CPU's clamp groups accounting
   https://lore.kernel.org/lkml/20180828135324.21976-4-patrick.bellasi@arm.com/

That's where we refcount RUNNABLE tasks and we have to figure out
the current clamp value for a CPU.

That data structure, with CONFIG_UCLAMP_GROUPS_COUNT=5, is:

   struct uclamp_cpu {
           struct uclamp_group        group[2][6];          /*     0    96 */
           /* --- cacheline 1 boundary (64 bytes) was 32 bytes ago --- */
           int                        value[2];             /*    96     8 */
           int                        flags;                /*   104     4 */

           /* size: 108, cachelines: 2, members: 3 */
           /* last cacheline: 44 bytes */
   };

and we fit into 2 cache lines with this data layout:

   util_min[0..5] | util_max[0..5] | other data

> What's the purpose of that cacheline align statement?

In uclamp_maps, we still need to scan the array when a clamp value is
changed from user-space, i.e. the cases reported above. Thus, that
alignment is just to ensure that we minimize the number of cache lines
used. Does that make sense ?

Maybe that alignment implicitly generated by the compiler ?

> Note that without that apparently superfluous lock, it would be 8*12 =
> 96 bytes, which is 1.5 lines and would indeed suggest you default to
> GROUP_COUNT=7 by default to fill 2 lines.

Yes, will check better if we can count on just the uclamp_mutex

> Why are the min and max things torn up like that?  I'm fairly sure I
> asked some of that last time; but the above comments only try to explain
> what, not why.

We use that organization to speedup scanning for clamp values of the
same clamp_id. That's more important in the hot-path than above, where
we need to scan struct uclamp_cpu when a new aggregated clamp value
has to be computed. This is done in:

   [PATCH v4 03/16] sched/core: uclamp: add CPU's clamp groups accounting
   https://lore.kernel.org/lkml/20180828135324.21976-4-patrick.bellasi@arm.com/

Specifically:

   dequeue_task()
     uclamp_cpu_put()
       uclamp_cpu_put_id(clamp_id)
         uclamp_cpu_update(clamp_id)
            // Here we have an array scan by clamp_id

With the given data layout I reported above, when we update the
min_clamp value (boost) we have all the data required in a single
cache line.

If that makes sense, I can certainly improve the comment above to
justify its layout.

Cheers,
Patrick

-- 
#include <best/regards.h>

Patrick Bellasi

  reply	other threads:[~2018-09-12 15:56 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-28 13:53 [PATCH v4 00/16] Add utilization clamping support Patrick Bellasi
2018-08-28 13:53 ` Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 01/16] sched/core: uclamp: extend sched_setattr to support utilization clamping Patrick Bellasi
2018-09-05 11:01   ` Juri Lelli
2018-08-28 13:53 ` [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups Patrick Bellasi
2018-09-05 10:45   ` Juri Lelli
2018-09-06 13:48     ` Patrick Bellasi
2018-09-06 14:13       ` Juri Lelli
2018-09-06  8:17   ` Juri Lelli
2018-09-06 14:00     ` Patrick Bellasi
2018-09-08 23:47   ` Suren Baghdasaryan
2018-09-12 10:32     ` Patrick Bellasi
2018-09-12 13:49   ` Peter Zijlstra
2018-09-12 15:56     ` Patrick Bellasi [this message]
2018-09-12 16:12       ` Peter Zijlstra
2018-09-12 17:35         ` Patrick Bellasi
2018-09-12 17:42           ` Peter Zijlstra
2018-09-12 17:52             ` Patrick Bellasi
2018-09-13 19:14               ` Peter Zijlstra
2018-09-14  8:51                 ` Patrick Bellasi
2018-09-12 16:24   ` Peter Zijlstra
2018-09-12 17:42     ` Patrick Bellasi
2018-09-13 19:20       ` Peter Zijlstra
2018-09-14  8:47         ` Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 03/16] sched/core: uclamp: add CPU's clamp groups accounting Patrick Bellasi
2018-09-12 17:34   ` Peter Zijlstra
2018-09-12 17:44     ` Patrick Bellasi
2018-09-13 19:12   ` Peter Zijlstra
2018-09-14  9:07     ` Patrick Bellasi
2018-09-14 11:52       ` Peter Zijlstra
2018-09-14 13:41         ` Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 04/16] sched/core: uclamp: update CPU's refcount on clamp changes Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 05/16] sched/core: uclamp: enforce last task UCLAMP_MAX Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 06/16] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks Patrick Bellasi
2018-09-14  9:32   ` Peter Zijlstra
2018-09-14 13:19     ` Patrick Bellasi
2018-09-14 13:36       ` Peter Zijlstra
2018-09-14 13:57         ` Patrick Bellasi
2018-09-27 10:23           ` Quentin Perret
2018-08-28 13:53 ` [PATCH v4 07/16] sched/core: uclamp: extend cpu's cgroup controller Patrick Bellasi
2018-08-28 18:29   ` Randy Dunlap
2018-08-29  8:53     ` Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 08/16] sched/core: uclamp: propagate parent clamps Patrick Bellasi
2018-09-09  3:02   ` Suren Baghdasaryan
2018-09-12 12:51     ` Patrick Bellasi
2018-09-12 15:56       ` Suren Baghdasaryan
2018-09-11 15:18   ` Tejun Heo
2018-09-11 16:26     ` Patrick Bellasi
2018-09-11 16:28       ` Tejun Heo
2018-08-28 13:53 ` [PATCH v4 09/16] sched/core: uclamp: map TG's clamp values into CPU's clamp groups Patrick Bellasi
2018-09-09 18:52   ` Suren Baghdasaryan
2018-09-12 14:19     ` Patrick Bellasi
2018-09-12 15:53       ` Suren Baghdasaryan
2018-08-28 13:53 ` [PATCH v4 10/16] sched/core: uclamp: use TG's clamps to restrict Task's clamps Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 11/16] sched/core: uclamp: add system default clamps Patrick Bellasi
2018-09-10 16:20   ` Suren Baghdasaryan
2018-09-11 16:46     ` Patrick Bellasi
2018-09-11 19:25       ` Suren Baghdasaryan
2018-08-28 13:53 ` [PATCH v4 12/16] sched/core: uclamp: update CPU's refcount on TG's clamp changes Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 13/16] sched/core: uclamp: use percentage clamp values Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default Patrick Bellasi
2018-09-04 13:47   ` Juri Lelli
2018-09-06 14:40     ` Patrick Bellasi
2018-09-06 14:59       ` Juri Lelli
2018-09-06 17:21         ` Patrick Bellasi
2018-09-14 11:10       ` Peter Zijlstra
2018-09-14 14:07         ` Patrick Bellasi
2018-09-14 14:28           ` Peter Zijlstra
2018-09-17 12:27             ` Patrick Bellasi
2018-09-21  9:13               ` Peter Zijlstra
2018-09-24 15:14                 ` Patrick Bellasi
2018-09-24 15:56                   ` Peter Zijlstra
2018-09-24 17:23                     ` Patrick Bellasi
2018-09-24 16:26                   ` Peter Zijlstra
2018-09-24 17:19                     ` Patrick Bellasi
2018-09-25 15:49                   ` Peter Zijlstra
2018-09-26 10:43                     ` Patrick Bellasi
2018-09-27 10:00                     ` Quentin Perret
2018-09-26 17:51                 ` Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 15/16] sched/core: uclamp: add clamp group discretization support Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 16/16] sched/cpufreq: uclamp: add utilization clamping for RT tasks Patrick Bellasi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180912155619.GG1413@e110439-lin \
    --to=patrick.bellasi@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=quentin.perret@arm.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=smuckle@google.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=tkjos@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.