linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Bellasi <patrick.bellasi@arm.com>
To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org,
	linux-api@vger.kernel.org
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Paul Turner <pjt@google.com>,
	Quentin Perret <quentin.perret@arm.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Juri Lelli <juri.lelli@redhat.com>, Todd Kjos <tkjos@google.com>,
	Joel Fernandes <joelaf@google.com>,
	Steve Muckle <smuckle@google.com>,
	Suren Baghdasaryan <surenb@google.com>
Subject: [PATCH v8 13/16] sched/core: uclamp: Propagate parent clamps
Date: Tue,  2 Apr 2019 11:41:49 +0100	[thread overview]
Message-ID: <20190402104153.25404-14-patrick.bellasi@arm.com> (raw)
In-Reply-To: <20190402104153.25404-1-patrick.bellasi@arm.com>

In order to properly support hierarchical resources control, the cgroup
delegation model requires that attribute writes from a child group never
fail but still are (potentially) constrained based on parent's assigned
resources. This requires to properly propagate and aggregate parent
attributes down to its descendants.

Let's implement this mechanism by adding a new "effective" clamp value
for each task group. The effective clamp value is defined as the smaller
value between the clamp value of a group and the effective clamp value
of its parent. This is the actual clamp value enforced on tasks in a
task group.

Since it can be interesting for userspace, e.g. system management
software, to know exactly what the currently propagated/enforced
configuration is, the effective clamp values are exposed to user-space
by means of a new pair of read-only attributes
cpu.util.{min,max}.effective.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>

---
Changes in v8:
 Message-ID: <20190313201010.GU2482@worktop.programming.kicks-ass.net>
 - to be aligned with the changes in the above message:
   add "effective" values uclamp_se instance beside the existing
   "requested" values instance
 Message-ID: <20190318165430.n222vuq4tv3ntbod@e110439-lin>
 - s/cpu_util_update_hier/cpu_util_update_eff/
---
 Documentation/admin-guide/cgroup-v2.rst |  19 +++++
 kernel/sched/core.c                     | 108 ++++++++++++++++++++++--
 kernel/sched/sched.h                    |   2 +
 3 files changed, 124 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 47710a77f4fa..7aad2435e961 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -990,6 +990,16 @@ All time durations are in microseconds.
         values similar to the sched_setattr(2). This minimum utilization
         value is used to clamp the task specific minimum utilization clamp.
 
+  cpu.util.min.effective
+        A read-only single value file which exists on non-root cgroups and
+        reports minimum utilization clamp value currently enforced on a task
+        group.
+
+        The actual minimum utilization in the range [0, 1024].
+
+        This value can be lower then cpu.util.min in case a parent cgroup
+        allows only smaller minimum utilization values.
+
   cpu.util.max
         A read-write single value file which exists on non-root cgroups.
         The default is "1024". i.e. no utilization capping
@@ -1000,6 +1010,15 @@ All time durations are in microseconds.
         values similar to the sched_setattr(2). This maximum utilization
         value is used to clamp the task specific maximum utilization clamp.
 
+  cpu.util.max.effective
+        A read-only single value file which exists on non-root cgroups and
+        reports maximum utilization clamp value currently enforced on a task
+        group.
+
+        The actual maximum utilization in the range [0, 1024].
+
+        This value can be lower then cpu.util.max in case a parent cgroup
+        is enforcing a more restrictive clamping on max utilization.
 
 
 Memory
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index aeed2dd315cc..e09cf8881567 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -733,6 +733,18 @@ static void set_load_weight(struct task_struct *p, bool update_load)
 }
 
 #ifdef CONFIG_UCLAMP_TASK
+/*
+ * Serializes updates of utilization clamp values
+ *
+ * The (slow-path) user-space triggers utilization clamp value updates which
+ * can require updates on (fast-path) scheduler's data structures used to
+ * support enqueue/dequeue operations.
+ * While the per-CPU rq lock protects fast-path update operations, user-space
+ * requests are serialized using a mutex to reduce the risk of conflicting
+ * updates or API abuses.
+ */
+static DEFINE_MUTEX(uclamp_mutex);
+
 /* Max allowed minimum utilization */
 unsigned int sysctl_sched_uclamp_util_min = SCHED_CAPACITY_SCALE;
 
@@ -1115,6 +1127,8 @@ static void __init init_uclamp(void)
 	unsigned int clamp_id;
 	int cpu;
 
+	mutex_init(&uclamp_mutex);
+
 	for_each_possible_cpu(cpu) {
 		memset(&cpu_rq(cpu)->uclamp, 0, sizeof(struct uclamp_rq));
 		cpu_rq(cpu)->uclamp_flags = 0;
@@ -1134,6 +1148,7 @@ static void __init init_uclamp(void)
 		uclamp_default[clamp_id] = uc_max;
 #ifdef CONFIG_UCLAMP_TASK_GROUP
 		root_task_group.uclamp_req[clamp_id] = uc_max;
+		root_task_group.uclamp[clamp_id] = uc_max;
 #endif
 	}
 }
@@ -6730,8 +6745,10 @@ static inline int alloc_uclamp_sched_group(struct task_group *tg,
 #ifdef CONFIG_UCLAMP_TASK_GROUP
 	int clamp_id;
 
-	for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id)
+	for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) {
 		tg->uclamp_req[clamp_id] = parent->uclamp_req[clamp_id];
+		tg->uclamp[clamp_id] = parent->uclamp[clamp_id];
+	}
 #endif
 
 	return 1;
@@ -6984,6 +7001,44 @@ static void cpu_cgroup_attach(struct cgroup_taskset *tset)
 }
 
 #ifdef CONFIG_UCLAMP_TASK_GROUP
+static void cpu_util_update_eff(struct cgroup_subsys_state *css,
+				unsigned int clamp_id)
+{
+	struct cgroup_subsys_state *top_css = css;
+	struct uclamp_se *uc_se, *uc_parent;
+	unsigned int value;
+
+	css_for_each_descendant_pre(css, top_css) {
+		value = css_tg(css)->uclamp_req[clamp_id].value;
+
+		uc_parent = NULL;
+		if (css_tg(css)->parent)
+			uc_parent = &css_tg(css)->parent->uclamp[clamp_id];
+
+		/*
+		 * Skip the whole subtrees if the current effective clamp is
+		 * already matching the TG's clamp value.
+		 * In this case, all the subtrees already have top_value, or a
+		 * more restrictive value, as effective clamp.
+		 */
+		uc_se = &css_tg(css)->uclamp[clamp_id];
+		if (uc_se->value == value &&
+		    uc_parent && uc_parent->value >= value) {
+			css = css_rightmost_descendant(css);
+			continue;
+		}
+
+		/* Propagate the most restrictive effective value */
+		if (uc_parent && uc_parent->value < value)
+			value = uc_parent->value;
+		if (uc_se->value == value)
+			continue;
+
+		uc_se->value = value;
+		uc_se->bucket_id = uclamp_bucket_id(value);
+	}
+}
+
 static int cpu_util_min_write_u64(struct cgroup_subsys_state *css,
 				  struct cftype *cftype, u64 min_value)
 {
@@ -6993,6 +7048,7 @@ static int cpu_util_min_write_u64(struct cgroup_subsys_state *css,
 	if (min_value > SCHED_CAPACITY_SCALE)
 		return -ERANGE;
 
+	mutex_lock(&uclamp_mutex);
 	rcu_read_lock();
 
 	tg = css_tg(css);
@@ -7011,8 +7067,12 @@ static int cpu_util_min_write_u64(struct cgroup_subsys_state *css,
 	tg->uclamp_req[UCLAMP_MIN].value = min_value;
 	tg->uclamp_req[UCLAMP_MIN].bucket_id = uclamp_bucket_id(min_value);
 
+	/* Update effective clamps to track the most restrictive value */
+	cpu_util_update_eff(css, UCLAMP_MIN);
+
 out:
 	rcu_read_unlock();
+	mutex_unlock(&uclamp_mutex);
 
 	return ret;
 }
@@ -7026,6 +7086,7 @@ static int cpu_util_max_write_u64(struct cgroup_subsys_state *css,
 	if (max_value > SCHED_CAPACITY_SCALE)
 		return -ERANGE;
 
+	mutex_lock(&uclamp_mutex);
 	rcu_read_lock();
 
 	tg = css_tg(css);
@@ -7044,21 +7105,28 @@ static int cpu_util_max_write_u64(struct cgroup_subsys_state *css,
 	tg->uclamp_req[UCLAMP_MAX].value = max_value;
 	tg->uclamp_req[UCLAMP_MAX].bucket_id = uclamp_bucket_id(max_value);
 
+	/* Update effective clamps to track the most restrictive value */
+	cpu_util_update_eff(css, UCLAMP_MAX);
+
 out:
 	rcu_read_unlock();
+	mutex_unlock(&uclamp_mutex);
 
 	return ret;
 }
 
 static inline u64 cpu_uclamp_read(struct cgroup_subsys_state *css,
-				  enum uclamp_id clamp_id)
+				  enum uclamp_id clamp_id,
+				  bool effective)
 {
 	struct task_group *tg;
 	u64 util_clamp;
 
 	rcu_read_lock();
 	tg = css_tg(css);
-	util_clamp = tg->uclamp_req[clamp_id].value;
+	util_clamp = effective
+		? tg->uclamp[clamp_id].value
+		: tg->uclamp_req[clamp_id].value;
 	rcu_read_unlock();
 
 	return util_clamp;
@@ -7067,13 +7135,25 @@ static inline u64 cpu_uclamp_read(struct cgroup_subsys_state *css,
 static u64 cpu_util_min_read_u64(struct cgroup_subsys_state *css,
 				 struct cftype *cft)
 {
-	return cpu_uclamp_read(css, UCLAMP_MIN);
+	return cpu_uclamp_read(css, UCLAMP_MIN, false);
 }
 
 static u64 cpu_util_max_read_u64(struct cgroup_subsys_state *css,
 				 struct cftype *cft)
 {
-	return cpu_uclamp_read(css, UCLAMP_MAX);
+	return cpu_uclamp_read(css, UCLAMP_MAX, false);
+}
+
+static u64 cpu_util_min_effective_read_u64(struct cgroup_subsys_state *css,
+					   struct cftype *cft)
+{
+	return cpu_uclamp_read(css, UCLAMP_MIN, true);
+}
+
+static u64 cpu_util_max_effective_read_u64(struct cgroup_subsys_state *css,
+					   struct cftype *cft)
+{
+	return cpu_uclamp_read(css, UCLAMP_MAX, true);
 }
 #endif /* CONFIG_UCLAMP_TASK_GROUP */
 
@@ -7421,11 +7501,19 @@ static struct cftype cpu_legacy_files[] = {
 		.read_u64 = cpu_util_min_read_u64,
 		.write_u64 = cpu_util_min_write_u64,
 	},
+	{
+		.name = "util.min.effective",
+		.read_u64 = cpu_util_min_effective_read_u64,
+	},
 	{
 		.name = "util.max",
 		.read_u64 = cpu_util_max_read_u64,
 		.write_u64 = cpu_util_max_write_u64,
 	},
+	{
+		.name = "util.max.effective",
+		.read_u64 = cpu_util_max_effective_read_u64,
+	},
 #endif
 	{ }	/* Terminate */
 };
@@ -7601,12 +7689,22 @@ static struct cftype cpu_files[] = {
 		.read_u64 = cpu_util_min_read_u64,
 		.write_u64 = cpu_util_min_write_u64,
 	},
+	{
+		.name = "util.min.effective",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.read_u64 = cpu_util_min_effective_read_u64,
+	},
 	{
 		.name = "util.max",
 		.flags = CFTYPE_NOT_ON_ROOT,
 		.read_u64 = cpu_util_max_read_u64,
 		.write_u64 = cpu_util_max_write_u64,
 	},
+	{
+		.name = "util.max.effective",
+		.flags = CFTYPE_NOT_ON_ROOT,
+		.read_u64 = cpu_util_max_effective_read_u64,
+	},
 #endif
 	{ }	/* terminate */
 };
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index b46b6912beba..b0d4c5e98305 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -403,6 +403,8 @@ struct task_group {
 #ifdef CONFIG_UCLAMP_TASK_GROUP
 	/* Clamp values requested for a task group */
 	struct uclamp_se	uclamp_req[UCLAMP_CNT];
+	/* Effective clamp values used for a task group */
+	struct uclamp_se	uclamp[UCLAMP_CNT];
 #endif
 
 };
-- 
2.20.1


  parent reply	other threads:[~2019-04-02 10:43 UTC|newest]

Thread overview: 50+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-02 10:41 [PATCH v8 00/16] Add utilization clamping support Patrick Bellasi
2019-04-02 10:41 ` [PATCH v8 01/16] sched/core: uclamp: Add CPU's clamp buckets refcounting Patrick Bellasi
2019-04-06 23:51   ` Suren Baghdasaryan
2019-04-08 11:49     ` Patrick Bellasi
2019-04-02 10:41 ` [PATCH v8 02/16] sched/core: Add bucket local max tracking Patrick Bellasi
2019-04-15 14:51   ` Patrick Bellasi
2019-04-02 10:41 ` [PATCH v8 03/16] sched/core: uclamp: Enforce last task's UCLAMP_MAX Patrick Bellasi
2019-04-17 20:36   ` Suren Baghdasaryan
2019-05-07 10:10     ` Patrick Bellasi
2019-04-02 10:41 ` [PATCH v8 04/16] sched/core: uclamp: Add system default clamps Patrick Bellasi
2019-04-18  0:51   ` Suren Baghdasaryan
2019-05-07 10:38     ` Patrick Bellasi
2019-05-08 18:42   ` Peter Zijlstra
2019-05-09  8:43     ` Patrick Bellasi
2019-05-08 19:00   ` Peter Zijlstra
2019-05-09  8:45     ` Patrick Bellasi
2019-05-08 19:07   ` Peter Zijlstra
2019-05-08 19:15     ` Peter Zijlstra
2019-05-09  9:10       ` Patrick Bellasi
2019-05-09 11:53         ` Peter Zijlstra
2019-05-09 13:04           ` Patrick Bellasi
2019-04-02 10:41 ` [PATCH v8 05/16] sched/core: Allow sched_setattr() to use the current policy Patrick Bellasi
2019-05-08 19:21   ` Peter Zijlstra
2019-05-09  9:18     ` Patrick Bellasi
2019-05-09 11:55       ` Peter Zijlstra
2019-05-09 14:59     ` Patrick Bellasi
2019-04-02 10:41 ` [PATCH v8 06/16] sched/core: uclamp: Extend sched_setattr() to support utilization clamping Patrick Bellasi
2019-04-17 22:26   ` Suren Baghdasaryan
2019-05-07 11:13     ` Patrick Bellasi
2019-05-08 19:44       ` Peter Zijlstra
2019-05-09  9:24         ` Patrick Bellasi
2019-05-08 19:41   ` Peter Zijlstra
2019-05-09  9:23     ` Patrick Bellasi
2019-04-02 10:41 ` [PATCH v8 07/16] sched/core: uclamp: Reset uclamp values on RESET_ON_FORK Patrick Bellasi
2019-04-02 10:41 ` [PATCH v8 08/16] sched/core: uclamp: Set default clamps for RT tasks Patrick Bellasi
2019-04-17 23:07   ` Suren Baghdasaryan
2019-05-07 11:25     ` Patrick Bellasi
2019-04-02 10:41 ` [PATCH v8 09/16] sched/cpufreq: uclamp: Add clamps for FAIR and " Patrick Bellasi
2019-04-02 10:41 ` [PATCH v8 10/16] sched/core: uclamp: Add uclamp_util_with() Patrick Bellasi
2019-04-02 10:41 ` [PATCH v8 11/16] sched/fair: uclamp: Add uclamp support to energy_compute() Patrick Bellasi
2019-05-09 12:51   ` Peter Zijlstra
2019-04-02 10:41 ` [PATCH v8 12/16] sched/core: uclamp: Extend CPU's cgroup controller Patrick Bellasi
2019-04-18  0:12   ` Suren Baghdasaryan
2019-05-07 11:42     ` Patrick Bellasi
2019-04-02 10:41 ` Patrick Bellasi [this message]
2019-04-02 10:41 ` [PATCH v8 14/16] sched/core: uclamp: Propagate system defaults to root group Patrick Bellasi
2019-04-02 10:41 ` [PATCH v8 15/16] sched/core: uclamp: Use TG's clamps to restrict TASK's clamps Patrick Bellasi
2019-04-02 10:41 ` [PATCH v8 16/16] sched/core: uclamp: Update CPU's refcount on TG's clamp changes Patrick Bellasi
2019-05-09 13:02 ` [PATCH v8 00/16] Add utilization clamping support Peter Zijlstra
2019-05-09 13:09   ` Patrick Bellasi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190402104153.25404-14-patrick.bellasi@arm.com \
    --to=patrick.bellasi@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=quentin.perret@arm.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=smuckle@google.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=tkjos@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).