LKML Archive on lore.kernel.org
 help / color / Atom feed
From: Patrick Bellasi <patrick.bellasi@arm.com>
To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Paul Turner <pjt@google.com>,
	Quentin Perret <quentin.perret@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Juri Lelli <juri.lelli@redhat.com>, Todd Kjos <tkjos@google.com>,
	Joel Fernandes <joelaf@google.com>,
	Steve Muckle <smuckle@google.com>,
	Suren Baghdasaryan <surenb@google.com>
Subject: [PATCH v4 15/16] sched/core: uclamp: add clamp group discretization support
Date: Tue, 28 Aug 2018 14:53:23 +0100
Message-ID: <20180828135324.21976-16-patrick.bellasi@arm.com> (raw)
In-Reply-To: <20180828135324.21976-1-patrick.bellasi@arm.com>

The limited number of clamp groups is required to have both an effective
and efficient run-time tracking of the clamp groups required by RUNNABLE
tasks. However, being a limited number imposes some constraints on its
usage at run-time. Specifically, a System Management Software should
"reserve" all the possible clamp values required at run-time to ensure
there will always be a clamp group to track them whenever required.

To fix this problem we can trade-off CPU clamping precision for
efficiency by transforming CPU's clamp groups into buckets of a
predefined range.

The number of clamp groups configured at compile time defines the range
of utilization clamp values tracked by each CPU clamp group. Thus, for
example, with the default:
   CONFIG_UCLAMP_GROUPS_COUNT 5
we will have 5 clamp groups tracking 20% utilization each and a task
with util_min=25% will have group_id=1.

Scheduling entities keep tracking the specific value defined from
user-space, which can still be used for task placement biasing
decisions. However, at enqueue time tasks will be refcounted in the
clamp group which range includes the task specific clamp value.

Each CPU's clamp value will also be updated to aggregate and represent
at run-time the most restrictive value among those of the RUNNABLE tasks
refcounted by that group. Each time a CPU clamp group becomes empty we
reset its clamp value to the minimum value of the range it tracks.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Paul Turner <pjt@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org

---
Changes in v4:
 Message-ID: <20180809152313.lewfhufidhxb2qrk@darkstar>
 - implements the idea discussed in this thread
 Others:
 - new patch added in this version
 - rebased on v4.19-rc1
---
 include/linux/sched.h   | 13 ++++-----
 kernel/sched/core.c     | 59 ++++++++++++++++++++++++++++++++++++++++-
 kernel/sched/features.h |  5 ++++
 3 files changed, 70 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index ca0a80881fa9..752fcd5d2cea 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -608,17 +608,18 @@ struct sched_dl_entity {
  * either tasks or task groups, to enforce the same clamp "value" for a given
  * clamp index.
  *
- * Scheduling entity's specific clamp group index can be different
- * from the effective clamp group index used at enqueue time since
- * task groups's clamps can be restricted by their parent task group.
+ * Scheduling entity's specific clamp value and group index can be different
+ * from the effective value and group index used at enqueue time. Indeed:
+ * - task's clamps can be restricted by their task group calmps
+ * - task groups's clamps can be restricted by their parent task group
  */
 struct uclamp_se {
 	unsigned int value;
 	unsigned int group_id;
 	/*
-	 * Effective task (group) clamp value and group index.
-	 * For task groups it's the value (eventually) enforced by a parent
-	 * task group.
+	 * Effective task (group) clamp value and group index:
+	 * for tasks: those used at enqueue time
+	 * for task groups: those (eventually) enforced by a parent task group
 	 */
 	struct {
 		unsigned int value;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8341ce580a9a..f71e15eaf152 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -807,6 +807,34 @@ static struct uclamp_map uclamp_maps[UCLAMP_CNT]
 #define UCLAMP_ENOSPC_FMT "Cannot allocate more than " \
 	__stringify(CONFIG_UCLAMP_GROUPS_COUNT) " UTIL_%s clamp groups\n"
 
+/*
+ * uclamp_round: round a clamp value to the closest trackable value
+ *
+ * The number of clamp group, which is defined at compile time, allows to
+ * track a finete number of different clamp values. This makes sense from both
+ * a practical standpoint, since we do not expect many different values at on
+ * a real system, as well as for run-time efficiency.
+ *
+ * To ensure a clamp group is always available, this methd allows to
+ * discretize a required value into one of the possible available clamp
+ * groups.
+ */
+static inline int uclamp_round(int value)
+{
+#define UCLAMP_GROUP_DELTA (SCHED_CAPACITY_SCALE / CONFIG_UCLAMP_GROUPS_COUNT)
+#define UCLAMP_GROUP_UPPER (UCLAMP_GROUP_DELTA * CONFIG_UCLAMP_GROUPS_COUNT)
+
+	if (unlikely(!sched_feat(UCLAMP_ROUNDING)))
+		return value;
+
+	if (value <= 0)
+		return value;
+	if (value >= UCLAMP_GROUP_UPPER)
+		return SCHED_CAPACITY_SCALE;
+
+	return UCLAMP_GROUP_DELTA * (value / UCLAMP_GROUP_DELTA);
+}
+
 /**
  * uclamp_group_available: checks if a clamp group is available
  * @clamp_id: the utilization clamp index (i.e. min or max clamp)
@@ -846,6 +874,9 @@ static inline void uclamp_group_init(int clamp_id, int group_id,
 	struct uclamp_cpu *uc_cpu;
 	int cpu;
 
+	/* Clamp groups are always initialized to the rounded clamp value */
+	clamp_value = uclamp_round(clamp_value);
+
 	/* Set clamp group map */
 	uc_map[group_id].value = clamp_value;
 	uc_map[group_id].se_count = 0;
@@ -892,6 +923,7 @@ uclamp_group_find(int clamp_id, unsigned int clamp_value)
 	int free_group_id = UCLAMP_NOT_VALID;
 	unsigned int group_id = 0;
 
+	clamp_value = uclamp_round(clamp_value);
 	for ( ; group_id <= CONFIG_UCLAMP_GROUPS_COUNT; ++group_id) {
 		/* Keep track of first free clamp group */
 		if (uclamp_group_available(clamp_id, group_id)) {
@@ -979,6 +1011,22 @@ static inline void uclamp_cpu_update(struct rq *rq, int clamp_id,
  *    task_struct::uclamp::effective::value
  * is updated to represent the clamp value corresponding to the taks effective
  * group index.
+ *
+ * Thus, the effective clamp value for a task is granted to be in the range of
+ * the rounded clamp values of its effective clamp group. For example:
+ *  - CONFIG_UCLAMP_GROUPS_COUNT=5 => UCLAMP_GROUP_DELTA=20%
+ *  - TaskA:      util_min=25%     => clamp_group1: range [20-39]%
+ *  - TaskB:      util_min=35%     => clamp_group1: range [20-39]%
+ *  - TaskGroupA: util_min=10%     => clamp_group0: range [ 0-19]%
+ * Then, when TaskA is part of TaskGroupA, it will be:
+ *  - allocated in clamp_group1
+ *  - clamp_group1.value=25
+ *    while TaskA is running alone
+ *  - clamp_group1.value=35
+ *    since TaskB was RUNNABLE and until TaskA is RUNNABLE
+ *  - clamp_group1.value=20
+ *    i.e. CPU's clamp group value is reset to the nominal rounded value,
+ *    while TaskA and TaskB are not RUNNABLE
  */
 static inline int uclamp_task_group_id(struct task_struct *p, int clamp_id)
 {
@@ -1106,6 +1154,10 @@ static inline void uclamp_cpu_get_id(struct task_struct *p,
 		uc_cpu->value[clamp_id] = clamp_value;
 	}
 
+	/* Track the max effective clamp value for each CPU's clamp group */
+	if (clamp_value > uc_cpu->group[clamp_id][group_id].value)
+		uc_cpu->group[clamp_id][group_id].value = clamp_value;
+
 	/*
 	 * If this is the new max utilization clamp value, then we can update
 	 * straight away the CPU clamp value. Otherwise, the current CPU clamp
@@ -1170,8 +1222,13 @@ static inline void uclamp_cpu_put_id(struct task_struct *p,
 		     cpu_of(rq), clamp_id, group_id);
 	}
 #endif
-	if (clamp_value >= uc_cpu->value[clamp_id])
+	if (clamp_value >= uc_cpu->value[clamp_id]) {
+		/* Reset CPU's clamp value to rounded clamp group value */
+		clamp_value = uclamp_group_value(clamp_id, group_id);
+		uc_cpu->group[clamp_id][group_id].value = clamp_value;
+
 		uclamp_cpu_update(rq, clamp_id, clamp_value);
+	}
 }
 
 /**
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index aad826aa55f8..5b7d0965b090 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -95,3 +95,8 @@ SCHED_FEAT(UTIL_EST, true)
  * Utilization clamping lazy update.
  */
 SCHED_FEAT(UCLAMP_LAZY_UPDATE, false)
+
+/*
+ * Utilization clamping discretization.
+ */
+SCHED_FEAT(UCLAMP_ROUNDING, true)
-- 
2.18.0


  parent reply index

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-28 13:53 [PATCH v4 00/16] Add utilization clamping support Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 01/16] sched/core: uclamp: extend sched_setattr to support utilization clamping Patrick Bellasi
2018-09-05 11:01   ` Juri Lelli
2018-08-28 13:53 ` [PATCH v4 02/16] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups Patrick Bellasi
2018-09-05 10:45   ` Juri Lelli
2018-09-06 13:48     ` Patrick Bellasi
2018-09-06 14:13       ` Juri Lelli
2018-09-06  8:17   ` Juri Lelli
2018-09-06 14:00     ` Patrick Bellasi
2018-09-08 23:47   ` Suren Baghdasaryan
2018-09-12 10:32     ` Patrick Bellasi
2018-09-12 13:49   ` Peter Zijlstra
2018-09-12 15:56     ` Patrick Bellasi
2018-09-12 16:12       ` Peter Zijlstra
2018-09-12 17:35         ` Patrick Bellasi
2018-09-12 17:42           ` Peter Zijlstra
2018-09-12 17:52             ` Patrick Bellasi
2018-09-13 19:14               ` Peter Zijlstra
2018-09-14  8:51                 ` Patrick Bellasi
2018-09-12 16:24   ` Peter Zijlstra
2018-09-12 17:42     ` Patrick Bellasi
2018-09-13 19:20       ` Peter Zijlstra
2018-09-14  8:47         ` Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 03/16] sched/core: uclamp: add CPU's clamp groups accounting Patrick Bellasi
2018-09-12 17:34   ` Peter Zijlstra
2018-09-12 17:44     ` Patrick Bellasi
2018-09-13 19:12   ` Peter Zijlstra
2018-09-14  9:07     ` Patrick Bellasi
2018-09-14 11:52       ` Peter Zijlstra
2018-09-14 13:41         ` Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 04/16] sched/core: uclamp: update CPU's refcount on clamp changes Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 05/16] sched/core: uclamp: enforce last task UCLAMP_MAX Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 06/16] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks Patrick Bellasi
2018-09-14  9:32   ` Peter Zijlstra
2018-09-14 13:19     ` Patrick Bellasi
2018-09-14 13:36       ` Peter Zijlstra
2018-09-14 13:57         ` Patrick Bellasi
2018-09-27 10:23           ` Quentin Perret
2018-08-28 13:53 ` [PATCH v4 07/16] sched/core: uclamp: extend cpu's cgroup controller Patrick Bellasi
2018-08-28 18:29   ` Randy Dunlap
2018-08-29  8:53     ` Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 08/16] sched/core: uclamp: propagate parent clamps Patrick Bellasi
2018-09-09  3:02   ` Suren Baghdasaryan
2018-09-12 12:51     ` Patrick Bellasi
2018-09-12 15:56       ` Suren Baghdasaryan
2018-09-11 15:18   ` Tejun Heo
2018-09-11 16:26     ` Patrick Bellasi
2018-09-11 16:28       ` Tejun Heo
2018-08-28 13:53 ` [PATCH v4 09/16] sched/core: uclamp: map TG's clamp values into CPU's clamp groups Patrick Bellasi
2018-09-09 18:52   ` Suren Baghdasaryan
2018-09-12 14:19     ` Patrick Bellasi
2018-09-12 15:53       ` Suren Baghdasaryan
2018-08-28 13:53 ` [PATCH v4 10/16] sched/core: uclamp: use TG's clamps to restrict Task's clamps Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 11/16] sched/core: uclamp: add system default clamps Patrick Bellasi
2018-09-10 16:20   ` Suren Baghdasaryan
2018-09-11 16:46     ` Patrick Bellasi
2018-09-11 19:25       ` Suren Baghdasaryan
2018-08-28 13:53 ` [PATCH v4 12/16] sched/core: uclamp: update CPU's refcount on TG's clamp changes Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 13/16] sched/core: uclamp: use percentage clamp values Patrick Bellasi
2018-08-28 13:53 ` [PATCH v4 14/16] sched/core: uclamp: request CAP_SYS_ADMIN by default Patrick Bellasi
2018-09-04 13:47   ` Juri Lelli
2018-09-06 14:40     ` Patrick Bellasi
2018-09-06 14:59       ` Juri Lelli
2018-09-06 17:21         ` Patrick Bellasi
2018-09-14 11:10       ` Peter Zijlstra
2018-09-14 14:07         ` Patrick Bellasi
2018-09-14 14:28           ` Peter Zijlstra
2018-09-17 12:27             ` Patrick Bellasi
2018-09-21  9:13               ` Peter Zijlstra
2018-09-24 15:14                 ` Patrick Bellasi
2018-09-24 15:56                   ` Peter Zijlstra
2018-09-24 17:23                     ` Patrick Bellasi
2018-09-24 16:26                   ` Peter Zijlstra
2018-09-24 17:19                     ` Patrick Bellasi
2018-09-25 15:49                   ` Peter Zijlstra
2018-09-26 10:43                     ` Patrick Bellasi
2018-09-27 10:00                     ` Quentin Perret
2018-09-26 17:51                 ` Patrick Bellasi
2018-08-28 13:53 ` Patrick Bellasi [this message]
2018-08-28 13:53 ` [PATCH v4 16/16] sched/cpufreq: uclamp: add utilization clamping for RT tasks Patrick Bellasi

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180828135324.21976-16-patrick.bellasi@arm.com \
    --to=patrick.bellasi@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=quentin.perret@arm.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=smuckle@google.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=tkjos@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org
	public-inbox-index lkml

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git