LKML Archive on lore.kernel.org
 help / Atom feed
From: Patrick Bellasi <patrick.bellasi@arm.com>
To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Paul Turner <pjt@google.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Juri Lelli <juri.lelli@redhat.com>, Todd Kjos <tkjos@google.com>,
	Joel Fernandes <joelaf@google.com>,
	Steve Muckle <smuckle@google.com>,
	Suren Baghdasaryan <surenb@google.com>
Subject: [PATCH v3 09/14] sched/core: uclamp: propagate parent clamps
Date: Mon,  6 Aug 2018 17:39:41 +0100
Message-ID: <20180806163946.28380-10-patrick.bellasi@arm.com> (raw)
In-Reply-To: <20180806163946.28380-1-patrick.bellasi@arm.com>

In order to properly support hierarchical resources control, the cgroup
delegation model requires that attribute writes from a child group never
fail but still are (potentially) constrained based on parent's assigned
resources. This requires to properly propagate and aggregate parent
attributes down to its descendants.

Let's implement this mechanism by adding a new "effective" clamp value
for each task group. The effective clamp value is defined as the smaller
value between the clamp value of a group and the effective clamp value
of its parent. This represent also the clamp value which is actually
used to clamp tasks in each task group.

Since it can be interesting for tasks in a cgroup to know exactly what
is the currently propagated/enforced configuration, the effective clamp
values are exposed to user-space by means of a new pair of read-only
attributes: cpu.util.{min,max}.effective.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org

---
Changes in v3:
 Message-ID: <20180409222417.GK3126663@devbig577.frc2.facebook.com>
 - new patch in v3, to implement a suggestion from v1 review
---
 Documentation/admin-guide/cgroup-v2.rst | 25 +++++++-
 include/linux/sched.h                   |  5 ++
 kernel/sched/core.c                     | 81 +++++++++++++++++++++++--
 3 files changed, 105 insertions(+), 6 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 71244b55d901..c73ceaf496b2 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -973,22 +973,43 @@ All time durations are in microseconds.
         A read-write single value file which exists on non-root cgroups.
         The default is "0", i.e. no bandwidth boosting.
 
-        The minimum utilization in the range [0, 1023].
+        The requested minimum utilization in the range [0, 1023].
 
         This interface allows reading and setting minimum utilization clamp
         values similar to the sched_setattr(2). This minimum utilization
         value is used to clamp the task specific minimum utilization clamp.
 
+  cpu.util.min.effective
+        A read-only single value file which exists on non-root cgroups and
+        reports minimum utilization clamp value currently enforced on a task
+        group.
+
+        The actual minimum utilization in the range [0, 1023].
+
+        This value can be lower then cpu.util.min in case a parent cgroup
+        is enforcing a more restrictive clamping on minimum utilization.
+
   cpu.util.max
         A read-write single value file which exists on non-root cgroups.
         The default is "1023". i.e. no bandwidth clamping
 
-        The maximum utilization in the range [0, 1023].
+        The requested maximum utilization in the range [0, 1023].
 
         This interface allows reading and setting maximum utilization clamp
         values similar to the sched_setattr(2). This maximum utilization
         value is used to clamp the task specific maximum utilization clamp.
 
+  cpu.util.max.effective
+        A read-only single value file which exists on non-root cgroups and
+        reports maximum utilization clamp value currently enforced on a task
+        group.
+
+        The actual maximum utilization in the range [0, 1023].
+
+        This value can be lower then cpu.util.max in case a parent cgroup
+        is enforcing a more restrictive clamping on max utilization.
+
+
 Memory
 ------
 
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8f48e64fb8a6..3fac2d098084 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -589,6 +589,11 @@ struct uclamp_se {
 	unsigned int value;
 	/* Utilization clamp group for this constraint */
 	unsigned int group_id;
+	/* Effective clamp  for tasks in this group */
+	struct {
+		unsigned int value;
+		unsigned int group_id;
+	} effective;
 };
 
 union rcu_special {
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2ba55a4afffb..f692df3787bd 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1237,6 +1237,8 @@ static inline void init_uclamp_sched_group(void)
 		uc_se = &root_task_group.uclamp[clamp_id];
 		uc_se->value = uclamp_none(clamp_id);
 		uc_se->group_id = group_id;
+		uc_se->effective.value = uclamp_none(clamp_id);
+		uc_se->effective.group_id = group_id;
 
 		/* Attach root TG's clamp group */
 		uc_map[group_id].se_count = 1;
@@ -1266,6 +1268,10 @@ static inline int alloc_uclamp_sched_group(struct task_group *tg,
 
 		uc_se->value = parent->uclamp[clamp_id].value;
 		uc_se->group_id = UCLAMP_NOT_VALID;
+		uc_se->effective.value =
+			parent->uclamp[clamp_id].effective.value;
+		uc_se->effective.group_id =
+			parent->uclamp[clamp_id].effective.group_id;
 	}
 
 	return 1;
@@ -7197,6 +7203,44 @@ static void cpu_cgroup_attach(struct cgroup_taskset *tset)
 }
 
 #ifdef CONFIG_UCLAMP_TASK_GROUP
+static void cpu_util_update_hier(struct cgroup_subsys_state *css,
+				 int clamp_id, int value)
+{
+	struct cgroup_subsys_state *top_css = css;
+	struct uclamp_se *uc_se, *uc_parent;
+
+	css_for_each_descendant_pre(css, top_css) {
+		/*
+		 * The first visited task group is top_css, which clamp value
+		 * is the one passed as parameter. For descendent task
+		 * groups we consider their current value.
+		 */
+		uc_se = &css_tg(css)->uclamp[clamp_id];
+		if (css != top_css)
+			value = uc_se->value;
+		/*
+		 * Skip the whole subtrees if the current effective clamp is
+		 * alredy matching the TG's clamp value.
+		 * In this case, all the subtrees already have top_value, or a
+		 * more restrictive, as effective clamp.
+		 */
+		uc_parent = &css_tg(css)->parent->uclamp[clamp_id];
+		if (uc_se->effective.value == value &&
+		    uc_parent->effective.value >= value) {
+			css = css_rightmost_descendant(css);
+			continue;
+		}
+
+		/* Propagate the most restrictive effective value */
+		if (uc_parent->effective.value < value)
+			value = uc_parent->effective.value;
+		if (uc_se->effective.value == value)
+			continue;
+
+		uc_se->effective.value = value;
+	}
+}
+
 static int cpu_util_min_write_u64(struct cgroup_subsys_state *css,
 				  struct cftype *cftype, u64 min_value)
 {
@@ -7217,6 +7261,9 @@ static int cpu_util_min_write_u64(struct cgroup_subsys_state *css,
 	if (tg->uclamp[UCLAMP_MAX].value < min_value)
 		goto out;
 
+	/* Update effective clamps to track the most restrictive value */
+	cpu_util_update_hier(css, UCLAMP_MIN, min_value);
+
 out:
 	rcu_read_unlock();
 	mutex_unlock(&uclamp_mutex);
@@ -7244,6 +7291,9 @@ static int cpu_util_max_write_u64(struct cgroup_subsys_state *css,
 	if (tg->uclamp[UCLAMP_MIN].value > max_value)
 		goto out;
 
+	/* Update effective clamps to track the most restrictive value */
+	cpu_util_update_hier(css, UCLAMP_MAX, max_value);
+
 out:
 	rcu_read_unlock();
 	mutex_unlock(&uclamp_mutex);
@@ -7252,14 +7302,17 @@ static int cpu_util_max_write_u64(struct cgroup_subsys_state *css,
 }
 
 static inline u64 cpu_uclamp_read(struct cgroup_subsys_state *css,
-				  enum uclamp_id clamp_id)
+				  enum uclamp_id clamp_id,
+				  bool effective)
 {
 	struct task_group *tg;
 	u64 util_clamp;
 
 	rcu_read_lock();
 	tg = css_tg(css);
-	util_clamp = tg->uclamp[clamp_id].value;
+	util_clamp = effective
+		? tg->uclamp[clamp_id].effective.value
+		: tg->uclamp[clamp_id].value;
 	rcu_read_unlock();
 
 	return util_clamp;
@@ -7268,13 +7321,25 @@ static inline u64 cpu_uclamp_read(struct cgroup_subsys_state *css,
 static u64 cpu_util_min_read_u64(struct cgroup_subsys_state *css,
 				 struct cftype *cft)
 {
-	return cpu_uclamp_read(css, UCLAMP_MIN);
+	return cpu_uclamp_read(css, UCLAMP_MIN, false);
 }
 
 static u64 cpu_util_max_read_u64(struct cgroup_subsys_state *css,
 				 struct cftype *cft)
 {
-	return cpu_uclamp_read(css, UCLAMP_MAX);
+	return cpu_uclamp_read(css, UCLAMP_MAX, false);
+}
+
+static u64 cpu_util_min_effective_read_u64(struct cgroup_subsys_state *css,
+					   struct cftype *cft)
+{
+	return cpu_uclamp_read(css, UCLAMP_MIN, true);
+}
+
+static u64 cpu_util_max_effective_read_u64(struct cgroup_subsys_state *css,
+					   struct cftype *cft)
+{
+	return cpu_uclamp_read(css, UCLAMP_MAX, true);
 }
 #endif /* CONFIG_UCLAMP_TASK_GROUP */
 
@@ -7622,11 +7687,19 @@ static struct cftype cpu_legacy_files[] = {
 		.read_u64 = cpu_util_min_read_u64,
 		.write_u64 = cpu_util_min_write_u64,
 	},
+	{
+		.name = "util.min.effective",
+		.read_u64 = cpu_util_min_effective_read_u64,
+	},
 	{
 		.name = "util.max",
 		.read_u64 = cpu_util_max_read_u64,
 		.write_u64 = cpu_util_max_write_u64,
 	},
+	{
+		.name = "util.max.effective",
+		.read_u64 = cpu_util_max_effective_read_u64,
+	},
 #endif
 	{ }	/* Terminate */
 };
-- 
2.18.0


  parent reply index

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-06 16:39 [PATCH v3 00/14] Add utilization clamping support Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping Patrick Bellasi
2018-08-06 16:50   ` Randy Dunlap
2018-08-09  8:39     ` Patrick Bellasi
2018-08-09 15:20       ` Randy Dunlap
2018-08-07  9:59   ` Juri Lelli
2018-08-13 12:14     ` Patrick Bellasi
2018-08-13 12:27       ` Juri Lelli
2018-08-07 12:35   ` Juri Lelli
2018-08-09  9:14     ` Patrick Bellasi
2018-08-09  9:50       ` Juri Lelli
2018-08-09 15:23         ` Patrick Bellasi
2018-08-10  7:50           ` Juri Lelli
2018-08-17 10:34           ` Quentin Perret
2018-08-17 10:57             ` Patrick Bellasi
2018-08-17 11:14               ` Quentin Perret
2018-08-06 16:39 ` [PATCH v3 02/14] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups Patrick Bellasi
2018-08-14 11:25   ` Pavan Kondeti
2018-08-14 15:21     ` Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 03/14] sched/core: uclamp: add CPU's clamp groups accounting Patrick Bellasi
2018-08-14 15:44   ` Dietmar Eggemann
2018-08-14 16:49     ` Patrick Bellasi
2018-08-15  9:37       ` Dietmar Eggemann
2018-08-15 10:54         ` Patrick Bellasi
2018-08-15 10:59           ` Dietmar Eggemann
2018-08-16 13:32             ` Patrick Bellasi
2018-08-16 13:37               ` Quentin Perret
2018-08-16 13:45                 ` Dietmar Eggemann
2018-08-16 14:21                   ` Quentin Perret
2018-08-16 15:00                     ` Dietmar Eggemann
2018-08-17 11:04   ` Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 04/14] sched/core: uclamp: update CPU's refcount on clamp changes Patrick Bellasi
2018-08-15 15:02   ` Dietmar Eggemann
2018-08-16 13:22     ` Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 05/14] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks Patrick Bellasi
2018-08-08 13:18   ` Vincent Guittot
2018-08-09 15:30     ` Patrick Bellasi
2018-08-15 15:30   ` Dietmar Eggemann
2018-08-16 13:53     ` Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 06/14] sched/cpufreq: uclamp: add utilization clamping for RT tasks Patrick Bellasi
2018-08-07 13:26   ` Juri Lelli
2018-08-09 15:34     ` Patrick Bellasi
2018-08-09 16:03       ` Vincent Guittot
2018-08-13 10:12         ` Patrick Bellasi
2018-08-13 10:50           ` Juri Lelli
2018-08-13 12:07           ` Vincent Guittot
2018-08-13 12:09             ` Vincent Guittot
2018-08-13 12:49             ` Patrick Bellasi
2018-08-13 14:06               ` Vincent Guittot
2018-08-13 15:01                 ` Patrick Bellasi
2018-08-16 10:34                   ` Dietmar Eggemann
2018-08-16 13:40                     ` Patrick Bellasi
2018-08-07 13:54   ` Quentin Perret
2018-08-09 15:41     ` Patrick Bellasi
2018-08-09 15:55       ` Quentin Perret
2018-08-13 10:17         ` Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 07/14] sched/core: uclamp: enforce last task UCLAMP_MAX Patrick Bellasi
2018-08-16 15:43   ` Dietmar Eggemann
2018-08-16 16:47     ` Patrick Bellasi
2018-08-16 17:10       ` Dietmar Eggemann
2018-08-16 17:27         ` Patrick Bellasi
2018-08-16 17:20   ` Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 08/14] sched/core: uclamp: extend cpu's cgroup controller Patrick Bellasi
2018-08-17 12:21   ` Dietmar Eggemann
2018-08-17 14:24     ` Patrick Bellasi
2018-08-06 16:39 ` Patrick Bellasi [this message]
2018-08-16  9:09   ` [PATCH v3 09/14] sched/core: uclamp: propagate parent clamps Pavan Kondeti
2018-08-16 14:07     ` Patrick Bellasi
2018-08-17 13:43   ` Dietmar Eggemann
2018-08-17 14:45     ` Patrick Bellasi
2018-08-17 15:50       ` Dietmar Eggemann
2018-08-20 10:01         ` Dietmar Eggemann
2018-08-20 12:28           ` Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 10/14] sched/core: uclamp: map TG's clamp values into CPU's clamp groups Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 11/14] sched/core: uclamp: use TG's clamps to restrict Task's clamps Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 12/14] sched/core: uclamp: add system default clamps Patrick Bellasi
2018-08-16  9:13   ` Pavan Kondeti
2018-08-16 14:37     ` Patrick Bellasi
2018-08-20 10:18   ` Dietmar Eggemann
2018-08-20 12:27     ` Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 13/14] sched/core: uclamp: update CPU's refcount on TG's clamp changes Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 14/14] sched/core: uclamp: use percentage clamp values Patrick Bellasi

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180806163946.28380-10-patrick.bellasi@arm.com \
    --to=patrick.bellasi@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=smuckle@google.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=tkjos@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox