LKML Archive on lore.kernel.org
 help / Atom feed
From: Patrick Bellasi <patrick.bellasi@arm.com>
To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>,
	"Rafael J . Wysocki" <rafael.j.wysocki@intel.com>,
	Viresh Kumar <viresh.kumar@linaro.org>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Paul Turner <pjt@google.com>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Morten Rasmussen <morten.rasmussen@arm.com>,
	Juri Lelli <juri.lelli@redhat.com>, Todd Kjos <tkjos@google.com>,
	Joel Fernandes <joelaf@google.com>,
	Steve Muckle <smuckle@google.com>,
	Suren Baghdasaryan <surenb@google.com>
Subject: [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping
Date: Mon,  6 Aug 2018 17:39:33 +0100
Message-ID: <20180806163946.28380-2-patrick.bellasi@arm.com> (raw)
In-Reply-To: <20180806163946.28380-1-patrick.bellasi@arm.com>

The SCHED_DEADLINE scheduling class provides an advanced and formal
model to define tasks requirements which can be translated into proper
decisions for both task placements and frequencies selections.
Other classes have a more simplified model which is essentially based on
the relatively simple concept of POSIX priorities.

Such a simple priority based model however does not allow to exploit
some of the most advanced features of the Linux scheduler like, for
example, driving frequencies selection via the schedutil cpufreq
governor. However, also for non SCHED_DEADLINE tasks, it's still
interesting to define tasks properties which can be used to better
support certain scheduler decisions.

Utilization clamping aims at exposing to user-space a new set of
per-task attributes which can be used to provide the scheduler with some
hints about the expected/required utilization for a task.
This will allow to implement a more advanced per-task frequency control
mechanism which is not based just on a "passive" measured task
utilization but on a more "active" approach. For example, it could be
possible to boost interactive tasks, thus getting better performance, or
cap background tasks, thus being more energy efficient.
Ultimately, such a mechanism can be considered similar to the cpufreq's
powersave, performance and userspace governor but with a much fine
grained and per-task control.

Let's introduce a new API to set utilization clamping values for a
specified task by extending sched_setattr, a syscall which already
allows to define task specific properties for different scheduling
classes.
Specifically, a new pair of attributes allows to specify a minimum and
maximum utilization which the scheduler should consider for a task.

Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Cc: Paul Turner <pjt@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Todd Kjos <tkjos@google.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Steve Muckle <smuckle@google.com>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-pm@vger.kernel.org

---
Changes in v3:
 Message-ID: <CAJuCfpF6=L=0LrmNnJrTNPazT4dWKqNv+thhN0dwpKCgUzs9sg@mail.gmail.com>
 - removed UCLAMP_NONE not used by this patch
 Others:
 - rebased on tip/sched/core

Changes in v2:
 - rebased on v4.18-rc4
 - move at the head of the series

As discussed at OSPM, using a [0..SCHED_CAPACITY_SCALE] range seems to
be acceptable. However, an additional patch has been added at the end of
the series which introduces a simple abstraction to use a more
generic [0..100] range.

At OSPM we also discarded the idea to "recycle" the usage of
sched_runtime and sched_period which would have made the API too
much complex for limited benefits.
---
 include/linux/sched.h            | 13 +++++++
 include/uapi/linux/sched.h       |  4 +-
 include/uapi/linux/sched/types.h | 64 +++++++++++++++++++++++++++-----
 init/Kconfig                     | 19 ++++++++++
 kernel/sched/core.c              | 39 +++++++++++++++++++
 5 files changed, 129 insertions(+), 10 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index e0f4f56c9310..42f6439378e1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -279,6 +279,14 @@ struct vtime {
 	u64			gtime;
 };
 
+enum uclamp_id {
+	UCLAMP_MIN = 0, /* Minimum utilization */
+	UCLAMP_MAX,     /* Maximum utilization */
+
+	/* Utilization clamping constraints count */
+	UCLAMP_CNT
+};
+
 struct sched_info {
 #ifdef CONFIG_SCHED_INFO
 	/* Cumulative counters: */
@@ -649,6 +657,11 @@ struct task_struct {
 #endif
 	struct sched_dl_entity		dl;
 
+#ifdef CONFIG_UCLAMP_TASK
+	/* Utlization clamp values for this task */
+	int				uclamp[UCLAMP_CNT];
+#endif
+
 #ifdef CONFIG_PREEMPT_NOTIFIERS
 	/* List of struct preempt_notifier: */
 	struct hlist_head		preempt_notifiers;
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 22627f80063e..c27d6e81517b 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -50,9 +50,11 @@
 #define SCHED_FLAG_RESET_ON_FORK	0x01
 #define SCHED_FLAG_RECLAIM		0x02
 #define SCHED_FLAG_DL_OVERRUN		0x04
+#define SCHED_FLAG_UTIL_CLAMP		0x08
 
 #define SCHED_FLAG_ALL	(SCHED_FLAG_RESET_ON_FORK	| \
 			 SCHED_FLAG_RECLAIM		| \
-			 SCHED_FLAG_DL_OVERRUN)
+			 SCHED_FLAG_DL_OVERRUN		| \
+			 SCHED_FLAG_UTIL_CLAMP)
 
 #endif /* _UAPI_LINUX_SCHED_H */
diff --git a/include/uapi/linux/sched/types.h b/include/uapi/linux/sched/types.h
index 10fbb8031930..7421cd25354d 100644
--- a/include/uapi/linux/sched/types.h
+++ b/include/uapi/linux/sched/types.h
@@ -21,8 +21,33 @@ struct sched_param {
  * the tasks may be useful for a wide variety of application fields, e.g.,
  * multimedia, streaming, automation and control, and many others.
  *
- * This variant (sched_attr) is meant at describing a so-called
- * sporadic time-constrained task. In such model a task is specified by:
+ * This variant (sched_attr) allows to define additional attributes to
+ * improve the scheduler knowledge about task requirements.
+ *
+ * Scheduling Class Attributes
+ * ===========================
+ *
+ * A subset of sched_attr attributes specifies the
+ * scheduling policy and relative POSIX attributes:
+ *
+ *  @size		size of the structure, for fwd/bwd compat.
+ *
+ *  @sched_policy	task's scheduling policy
+ *  @sched_nice		task's nice value      (SCHED_NORMAL/BATCH)
+ *  @sched_priority	task's static priority (SCHED_FIFO/RR)
+ *
+ * Certain more advanced scheduling features can be controlled by a
+ * predefined set of flags via the attribute:
+ *
+ *  @sched_flags	for customizing the scheduler behaviour
+ *
+ * Sporadic Time-Constrained Tasks Attributes
+ * ==========================================
+ *
+ * A subset of sched_attr attributes allows to describe a so-called
+ * sporadic time-constrained task.
+ *
+ * In such model a task is specified by:
  *  - the activation period or minimum instance inter-arrival time;
  *  - the maximum (or average, depending on the actual scheduling
  *    discipline) computation time of all instances, a.k.a. runtime;
@@ -34,14 +59,8 @@ struct sched_param {
  * than the runtime and must be completed by time instant t equal to
  * the instance activation time + the deadline.
  *
- * This is reflected by the actual fields of the sched_attr structure:
+ * This is reflected by the following fields of the sched_attr structure:
  *
- *  @size		size of the structure, for fwd/bwd compat.
- *
- *  @sched_policy	task's scheduling policy
- *  @sched_flags	for customizing the scheduler behaviour
- *  @sched_nice		task's nice value      (SCHED_NORMAL/BATCH)
- *  @sched_priority	task's static priority (SCHED_FIFO/RR)
  *  @sched_deadline	representative of the task's deadline
  *  @sched_runtime	representative of the task's runtime
  *  @sched_period	representative of the task's period
@@ -53,6 +72,28 @@ struct sched_param {
  * As of now, the SCHED_DEADLINE policy (sched_dl scheduling class) is the
  * only user of this new interface. More information about the algorithm
  * available in the scheduling class file or in Documentation/.
+ *
+ * Task Utilization Attributes
+ * ===========================
+ *
+ * A subset of sched_attr attributes allows to specify the utilization which
+ * should be expected by a task. These attributes allows to inform the
+ * scheduler about the utilization boundaries within which is safe to schedule
+ * the task. These utilization boundaries are valuable information to support
+ * scheduler decisions on both task placement and frequencies selection.
+ *
+ *  @sched_util_min	represents the minimum utilization
+ *  @sched_util_max	represents the maximum utilization
+ *
+ * Utilization is a value in the range [0..SCHED_CAPACITY_SCALE] which
+ * represents the percentage of CPU time used by a task when running at the
+ * maximum frequency on the highest capacity CPU of the system. Thus, for
+ * example, a 20% utilization task is a task running for 2ms every 10ms.
+ *
+ * A task with a min utilization value bigger then 0 is more likely to be
+ * scheduled on a CPU which can provide that bandwidth.
+ * A task with a max utilization value smaller then 1024 is more likely to be
+ * scheduled on a CPU which do not provide more then the required bandwidth.
  */
 struct sched_attr {
 	__u32 size;
@@ -70,6 +111,11 @@ struct sched_attr {
 	__u64 sched_runtime;
 	__u64 sched_deadline;
 	__u64 sched_period;
+
+	/* Utilization hints */
+	__u32 sched_util_min;
+	__u32 sched_util_max;
+
 };
 
 #endif /* _UAPI_LINUX_SCHED_TYPES_H */
diff --git a/init/Kconfig b/init/Kconfig
index 041f3a022122..1d45a6877d6f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -583,6 +583,25 @@ config HAVE_UNSTABLE_SCHED_CLOCK
 config GENERIC_SCHED_CLOCK
 	bool
 
+menu "Scheduler features"
+
+config UCLAMP_TASK
+	bool "Enable utilization clamping for RT/FAIR tasks"
+	depends on CPU_FREQ_GOV_SCHEDUTIL
+	default false
+	help
+	  This feature enables the scheduler to track the clamped utilization
+	  of each CPU based on RUNNABLE tasks currently scheduled on that CPU.
+
+	  When this option is enabled, the user can specify a min and max CPU
+          bandwidth which is allowed for a task.
+	  The max bandwidth allows to clamp the maximum frequency a task can
+	  use, while the min bandwidth allows to define a minimum frequency a
+          task will always use.
+
+	  If in doubt, say N.
+
+endmenu
 #
 # For architectures that want to enable the support for NUMA-affine scheduler
 # balancing logic:
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index deafa9fe602b..2cabbbcaa447 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -716,6 +716,28 @@ static void set_load_weight(struct task_struct *p, bool update_load)
 	}
 }
 
+#ifdef CONFIG_UCLAMP_TASK
+static inline int __setscheduler_uclamp(struct task_struct *p,
+					const struct sched_attr *attr)
+{
+	if (attr->sched_util_min > attr->sched_util_max)
+		return -EINVAL;
+	if (attr->sched_util_max > SCHED_CAPACITY_SCALE)
+		return -EINVAL;
+
+	p->uclamp[UCLAMP_MIN] = attr->sched_util_min;
+	p->uclamp[UCLAMP_MAX] = attr->sched_util_max;
+
+	return 0;
+}
+#else /* CONFIG_UCLAMP_TASK */
+static inline int __setscheduler_uclamp(struct task_struct *p,
+					const struct sched_attr *attr)
+{
+	return -EINVAL;
+}
+#endif /* CONFIG_UCLAMP_TASK */
+
 static inline void enqueue_task(struct rq *rq, struct task_struct *p, int flags)
 {
 	if (!(flags & ENQUEUE_NOCLOCK))
@@ -2156,6 +2178,11 @@ static void __sched_fork(unsigned long clone_flags, struct task_struct *p)
 	p->se.cfs_rq			= NULL;
 #endif
 
+#ifdef CONFIG_UCLAMP_TASK
+	p->uclamp[UCLAMP_MIN] = 0;
+	p->uclamp[UCLAMP_MAX] = SCHED_CAPACITY_SCALE;
+#endif
+
 #ifdef CONFIG_SCHEDSTATS
 	/* Even if schedstat is disabled, there should not be garbage */
 	memset(&p->se.statistics, 0, sizeof(p->se.statistics));
@@ -4218,6 +4245,13 @@ static int __sched_setscheduler(struct task_struct *p,
 			return retval;
 	}
 
+	/* Configure utilization clamps for the task */
+	if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP) {
+		retval = __setscheduler_uclamp(p, attr);
+		if (retval)
+			return retval;
+	}
+
 	/*
 	 * Make sure no PI-waiters arrive (or leave) while we are
 	 * changing the priority of the task:
@@ -4724,6 +4758,11 @@ SYSCALL_DEFINE4(sched_getattr, pid_t, pid, struct sched_attr __user *, uattr,
 	else
 		attr.sched_nice = task_nice(p);
 
+#ifdef CONFIG_UCLAMP_TASK
+	attr.sched_util_min = p->uclamp[UCLAMP_MIN];
+	attr.sched_util_max = p->uclamp[UCLAMP_MAX];
+#endif
+
 	rcu_read_unlock();
 
 	retval = sched_read_attr(uattr, &attr, size);
-- 
2.18.0


  reply index

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-06 16:39 [PATCH v3 00/14] Add utilization clamping support Patrick Bellasi
2018-08-06 16:39 ` Patrick Bellasi [this message]
2018-08-06 16:50   ` [PATCH v3 01/14] sched/core: uclamp: extend sched_setattr to support utilization clamping Randy Dunlap
2018-08-09  8:39     ` Patrick Bellasi
2018-08-09 15:20       ` Randy Dunlap
2018-08-07  9:59   ` Juri Lelli
2018-08-13 12:14     ` Patrick Bellasi
2018-08-13 12:27       ` Juri Lelli
2018-08-07 12:35   ` Juri Lelli
2018-08-09  9:14     ` Patrick Bellasi
2018-08-09  9:50       ` Juri Lelli
2018-08-09 15:23         ` Patrick Bellasi
2018-08-10  7:50           ` Juri Lelli
2018-08-17 10:34           ` Quentin Perret
2018-08-17 10:57             ` Patrick Bellasi
2018-08-17 11:14               ` Quentin Perret
2018-08-06 16:39 ` [PATCH v3 02/14] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups Patrick Bellasi
2018-08-14 11:25   ` Pavan Kondeti
2018-08-14 15:21     ` Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 03/14] sched/core: uclamp: add CPU's clamp groups accounting Patrick Bellasi
2018-08-14 15:44   ` Dietmar Eggemann
2018-08-14 16:49     ` Patrick Bellasi
2018-08-15  9:37       ` Dietmar Eggemann
2018-08-15 10:54         ` Patrick Bellasi
2018-08-15 10:59           ` Dietmar Eggemann
2018-08-16 13:32             ` Patrick Bellasi
2018-08-16 13:37               ` Quentin Perret
2018-08-16 13:45                 ` Dietmar Eggemann
2018-08-16 14:21                   ` Quentin Perret
2018-08-16 15:00                     ` Dietmar Eggemann
2018-08-17 11:04   ` Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 04/14] sched/core: uclamp: update CPU's refcount on clamp changes Patrick Bellasi
2018-08-15 15:02   ` Dietmar Eggemann
2018-08-16 13:22     ` Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 05/14] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks Patrick Bellasi
2018-08-08 13:18   ` Vincent Guittot
2018-08-09 15:30     ` Patrick Bellasi
2018-08-15 15:30   ` Dietmar Eggemann
2018-08-16 13:53     ` Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 06/14] sched/cpufreq: uclamp: add utilization clamping for RT tasks Patrick Bellasi
2018-08-07 13:26   ` Juri Lelli
2018-08-09 15:34     ` Patrick Bellasi
2018-08-09 16:03       ` Vincent Guittot
2018-08-13 10:12         ` Patrick Bellasi
2018-08-13 10:50           ` Juri Lelli
2018-08-13 12:07           ` Vincent Guittot
2018-08-13 12:09             ` Vincent Guittot
2018-08-13 12:49             ` Patrick Bellasi
2018-08-13 14:06               ` Vincent Guittot
2018-08-13 15:01                 ` Patrick Bellasi
2018-08-16 10:34                   ` Dietmar Eggemann
2018-08-16 13:40                     ` Patrick Bellasi
2018-08-07 13:54   ` Quentin Perret
2018-08-09 15:41     ` Patrick Bellasi
2018-08-09 15:55       ` Quentin Perret
2018-08-13 10:17         ` Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 07/14] sched/core: uclamp: enforce last task UCLAMP_MAX Patrick Bellasi
2018-08-16 15:43   ` Dietmar Eggemann
2018-08-16 16:47     ` Patrick Bellasi
2018-08-16 17:10       ` Dietmar Eggemann
2018-08-16 17:27         ` Patrick Bellasi
2018-08-16 17:20   ` Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 08/14] sched/core: uclamp: extend cpu's cgroup controller Patrick Bellasi
2018-08-17 12:21   ` Dietmar Eggemann
2018-08-17 14:24     ` Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 09/14] sched/core: uclamp: propagate parent clamps Patrick Bellasi
2018-08-16  9:09   ` Pavan Kondeti
2018-08-16 14:07     ` Patrick Bellasi
2018-08-17 13:43   ` Dietmar Eggemann
2018-08-17 14:45     ` Patrick Bellasi
2018-08-17 15:50       ` Dietmar Eggemann
2018-08-20 10:01         ` Dietmar Eggemann
2018-08-20 12:28           ` Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 10/14] sched/core: uclamp: map TG's clamp values into CPU's clamp groups Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 11/14] sched/core: uclamp: use TG's clamps to restrict Task's clamps Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 12/14] sched/core: uclamp: add system default clamps Patrick Bellasi
2018-08-16  9:13   ` Pavan Kondeti
2018-08-16 14:37     ` Patrick Bellasi
2018-08-20 10:18   ` Dietmar Eggemann
2018-08-20 12:27     ` Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 13/14] sched/core: uclamp: update CPU's refcount on TG's clamp changes Patrick Bellasi
2018-08-06 16:39 ` [PATCH v3 14/14] sched/core: uclamp: use percentage clamp values Patrick Bellasi

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180806163946.28380-2-patrick.bellasi@arm.com \
    --to=patrick.bellasi@arm.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joelaf@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=morten.rasmussen@arm.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rafael.j.wysocki@intel.com \
    --cc=smuckle@google.com \
    --cc=surenb@google.com \
    --cc=tj@kernel.org \
    --cc=tkjos@google.com \
    --cc=vincent.guittot@linaro.org \
    --cc=viresh.kumar@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox