linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Bellasi <patrick.bellasi@arm.com>
To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Paul Turner <pjt@google.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Juri Lelli <juri.lelli@redhat.com>, Todd Kjos <tkjos@google.com>,
	Joel Fernandes <joelaf@google.com>,
	Steve Muckle <smuckle@google.com>,
	Suren Baghdasaryan <surenb@google.com>
Subject: [PATCH v2 10/12] sched/core: uclamp: use TG's clamps to restrict Task's clamps
Date: Mon, 16 Jul 2018 09:29:04 +0100	[thread overview]
Message-ID: <20180716082906.6061-11-patrick.bellasi@arm.com> (raw)
In-Reply-To: <20180716082906.6061-1-patrick.bellasi@arm.com>

When a task's util_clamp value is configured via sched_setattr(2), this
value has to be properly accounted in the corresponding clamp group
every time the task is enqueued and dequeued. When cgroups are also in
use, per-task clamp values have to be aggregated to those of the CPU's
controller's Task Group (TG) in which the task is currently living.

Let's update uclamp_cpu_get() to provide aggregation between the task
and the TG clamp values. Every time a task is enqueued, it will be
accounted in the clamp_group which defines the smaller clamp between the
task specific value and its TG value.

This also mimics what already happen for a task's CPU affinity mask when
the task is also living in a cpuset.  he overall idea is that cgroup
attributes are always used to restrict the per-task attributes.

Thus, this implementation allows to:

1. ensure cgroup clamps are always used to restrict task specific
   requests, i.e. boosted only up to a granted value or clamped at least
   to a certain value
2. implements a "nice-like" policy, where tasks are still allowed to
   request less then what enforced by their current TG

For this mecanisms to work properly, we need to implement a concept of
"effective" clamp group, which is used to track the currently most
restrictive clamp value each task is subject to.
The effective clamp is computed at enqueue time, by using an additional
   task_struct::uclamp_group_id
to keep track of the clamp group in which each task is currently
accounted into. This solution allows to update task constrains on
demand, only when they became RUNNABLE, to always get the least
restrictive clamp depending on the current TG's settings.

This solution allows also to better decouple the slow-path, where task
and task group clamp values are updated, from the fast-path, where the
most appropriate clamp value is tracked by refcounting clamp groups.

For consistency purposes, as well as to properly inform userspace, the
sched_getattr(2) call is updated to always return the properly
aggregated constrains as described above. This will also make
sched_getattr(2) a convenient userpace API to know the utilization
constraints enforced on a task by the cgroup's CPU controller.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Paul Turner <pjt@google.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org
---
 include/linux/sched.h |  2 ++
 kernel/sched/core.c   | 40 +++++++++++++++++++++++++++++++++++-----
 kernel/sched/sched.h  |  2 +-
 3 files changed, 38 insertions(+), 6 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 260aa8d3fca9..5dd76a27ec17 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -676,6 +676,8 @@ struct task_struct {
 	struct sched_dl_entity		dl;
 
 #ifdef CONFIG_UCLAMP_TASK
+	/* Clamp group the task is currently accounted into */
+	int				uclamp_group_id[UCLAMP_CNT];
 	/* Utlization clamp values for this task */
 	struct uclamp_se		uclamp[UCLAMP_CNT];
 #endif
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 04e758224e22..50613d3d5b83 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -971,8 +971,15 @@ static inline void uclamp_cpu_update(struct rq *rq, int clamp_id,
  * @rq: the CPU's rq where the clamp group has to be reference counted
  * @clamp_id: the utilization clamp (e.g. min or max utilization) to reference
  *
- * Once a task is enqueued on a CPU's RQ, the clamp group currently defined by
- * the task's uclamp.group_id is reference counted on that CPU.
+ * Once a task is enqueued on a CPU's RQ, the most restrictive clamp group,
+ * among the task specific and that of the task's cgroup one, is reference
+ * counted on that CPU.
+ *
+ * Since the CPUs reference counted clamp group can be either that of the task
+ * or of its cgroup, we keep track of the reference counted clamp group by
+ * storing its index (group_id) into the task's task_struct::uclamp_group_id.
+ * This group index will then be used at task's dequeue time to release the
+ * correct refcount.
  */
 static inline void uclamp_cpu_get_id(struct task_struct *p,
 				     struct rq *rq, int clamp_id)
@@ -982,18 +989,30 @@ static inline void uclamp_cpu_get_id(struct task_struct *p,
 	int clamp_value;
 	int group_id;
 
-	/* No task specific clamp values: nothing to do */
 	group_id = p->uclamp[clamp_id].group_id;
+	clamp_value = p->uclamp[clamp_id].value;
+#ifdef CONFIG_UCLAMP_TASK_GROUP
+	/* Use TG's clamp value to limit task specific values */
+	if (group_id == UCLAMP_NONE ||
+	    clamp_value >= task_group(p)->uclamp[clamp_id].value) {
+		clamp_value = task_group(p)->uclamp[clamp_id].value;
+		group_id = task_group(p)->uclamp[clamp_id].group_id;
+	}
+#else
+	/* No task specific clamp values: nothing to do */
 	if (group_id == UCLAMP_NONE)
 		return;
+#endif
 
 	/* Reference count the task into its current group_id */
 	uc_grp = &rq->uclamp.group[clamp_id][0];
 	uc_grp[group_id].tasks += 1;
 
+	/* Track the effective clamp group */
+	p->uclamp_group_id[clamp_id] = group_id;
+
 	/* Force clamp update on idle exit */
 	uc_cpu = &rq->uclamp;
-	clamp_value = p->uclamp[clamp_id].value;
 	if (unlikely(uc_cpu->flags & UCLAMP_FLAG_IDLE)) {
 		if (clamp_id == UCLAMP_MAX)
 			uc_cpu->flags &= ~UCLAMP_FLAG_IDLE;
@@ -1031,7 +1050,7 @@ static inline void uclamp_cpu_put_id(struct task_struct *p,
 	int group_id;
 
 	/* No task specific clamp values: nothing to do */
-	group_id = p->uclamp[clamp_id].group_id;
+	group_id = p->uclamp_group_id[clamp_id];
 	if (group_id == UCLAMP_NONE)
 		return;
 
@@ -1039,6 +1058,9 @@ static inline void uclamp_cpu_put_id(struct task_struct *p,
 	uc_grp = &rq->uclamp.group[clamp_id][0];
 	uc_grp[group_id].tasks -= 1;
 
+	/* Flag the task as not affecting any clamp index */
+	p->uclamp_group_id[clamp_id] = UCLAMP_NONE;
+
 	/* If this is not the last task, no updates are required */
 	if (uc_grp[group_id].tasks > 0)
 		return;
@@ -2848,6 +2870,7 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 #endif
 
 #ifdef CONFIG_UCLAMP_TASK
+	memset(&p->uclamp_group_id, UCLAMP_NONE, sizeof(p->uclamp_group_id));
 	p->uclamp[UCLAMP_MIN].value = 0;
 	p->uclamp[UCLAMP_MIN].group_id = UCLAMP_NONE;
 	p->uclamp[UCLAMP_MAX].value = SCHED_CAPACITY_SCALE;
@@ -5437,6 +5460,13 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
 #ifdef CONFIG_UCLAMP_TASK
 	attr.sched_util_min = p->uclamp[UCLAMP_MIN].value;
 	attr.sched_util_max = p->uclamp[UCLAMP_MAX].value;
+#ifdef CONFIG_UCLAMP_TASK_GROUP
+	/*  Use cgroup enforced clamps to restrict task specific clamps */
+	if (task_group(p)->uclamp[UCLAMP_MIN].value < attr.sched_util_min)
+		attr.sched_util_min = task_group(p)->uclamp[UCLAMP_MIN].value;
+	if (task_group(p)->uclamp[UCLAMP_MAX].value < attr.sched_util_max)
+		attr.sched_util_max = task_group(p)->uclamp[UCLAMP_MAX].value;
+#endif
 #endif
 
 	rcu_read_unlock();
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 1471a23e8f57..e3d5a2bc2f6c 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -2220,7 +2220,7 @@ static inline bool uclamp_group_active(struct uclamp_group *uc_grp,
  */
 static inline bool uclamp_task_affects(struct task_struct *p, int clamp_id)
 {
-	return (p->uclamp[clamp_id].group_id != UCLAMP_NONE);
+	return (p->uclamp_group_id[clamp_id] != UCLAMP_NONE);
 }
 
 /**
-- 
2.17.1


  parent reply	other threads:[~2018-07-16  8:30 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-16  8:28 [PATCH v2 00/12] Add utilization clamping support Patrick Bellasi
2018-07-16  8:28 ` [PATCH v2 01/12] sched/core: uclamp: extend sched_setattr to support utilization clamping Patrick Bellasi
2018-07-17 17:50   ` Joel Fernandes
2018-07-18  8:42     ` Patrick Bellasi
2018-07-18 17:02       ` Joel Fernandes
2018-07-17 18:04   ` Joel Fernandes
2018-07-16  8:28 ` [PATCH v2 02/12] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups Patrick Bellasi
2018-07-19 23:51   ` Suren Baghdasaryan
2018-07-20 15:11     ` Patrick Bellasi
2018-07-21  0:25       ` Suren Baghdasaryan
2018-07-23 13:36         ` Patrick Bellasi
2018-07-16  8:28 ` [PATCH v2 03/12] sched/core: uclamp: add CPU's clamp groups accounting Patrick Bellasi
2018-07-20 20:25   ` Suren Baghdasaryan
2018-07-16  8:28 ` [PATCH v2 04/12] sched/core: uclamp: update CPU's refcount on clamp changes Patrick Bellasi
2018-07-16  8:28 ` [PATCH v2 05/12] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks Patrick Bellasi
2018-07-16  8:29 ` [PATCH v2 06/12] sched/cpufreq: uclamp: add utilization clamping for RT tasks Patrick Bellasi
2018-07-16  8:29 ` [PATCH v2 07/12] sched/core: uclamp: enforce last task UCLAMP_MAX Patrick Bellasi
2018-07-21  1:23   ` Suren Baghdasaryan
2018-07-23 15:02     ` Patrick Bellasi
2018-07-23 16:40       ` Suren Baghdasaryan
2018-07-16  8:29 ` [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller Patrick Bellasi
2018-07-21  2:37   ` Suren Baghdasaryan
2018-07-21  3:16     ` Suren Baghdasaryan
2018-07-23 15:17     ` Patrick Bellasi
2018-07-23 15:30   ` Tejun Heo
2018-07-23 17:22     ` Patrick Bellasi
2018-07-24 13:29       ` Tejun Heo
2018-07-24 15:39         ` Patrick Bellasi
2018-07-27  0:39         ` Joel Fernandes
2018-07-27  8:09           ` Quentin Perret
2018-07-16  8:29 ` [PATCH v2 09/12] sched/core: uclamp: map TG's clamp values into CPU's clamp groups Patrick Bellasi
2018-07-16  8:29 ` Patrick Bellasi [this message]
2018-07-22  3:05   ` [PATCH v2 10/12] sched/core: uclamp: use TG's clamps to restrict Task's clamps Suren Baghdasaryan
2018-07-23 15:40     ` Patrick Bellasi
2018-07-23 17:11       ` Suren Baghdasaryan
2018-07-24  9:56         ` Patrick Bellasi
2018-07-24 15:28           ` Suren Baghdasaryan
2018-07-24 15:49             ` Patrick Bellasi
2018-07-16  8:29 ` [PATCH v2 11/12] sched/core: uclamp: update CPU's refcount on TG's clamp changes Patrick Bellasi
2018-07-22  3:17   ` Suren Baghdasaryan
2018-07-16  8:29 ` [PATCH v2 12/12] sched/core: uclamp: use percentage clamp values Patrick Bellasi
2018-07-22  4:04   ` Suren Baghdasaryan
2018-07-24 16:43     ` Patrick Bellasi
2018-07-24 17:11       ` Suren Baghdasaryan
2018-07-24 17:17         ` Patrick Bellasi
2018-07-17 13:03 ` [PATCH v2 00/12] Add utilization clamping support Joel Fernandes
2018-07-17 13:41   ` Patrick Bellasi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180716082906.6061-11-patrick.bellasi@arm.com \
    --to=patrick.bellasi@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=smuckle@google.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=tkjos@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).