From: Patrick Bellasi <patrick.bellasi@arm.com> To: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org Cc: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>, "Rafael J . Wysocki" <rafael.j.wysocki@intel.com>, Viresh Kumar <viresh.kumar@linaro.org>, Vincent Guittot <vincent.guittot@linaro.org>, Paul Turner <pjt@google.com>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Morten Rasmussen <morten.rasmussen@arm.com>, Juri Lelli <juri.lelli@redhat.com>, Todd Kjos <tkjos@google.com>, Joel Fernandes <joelaf@google.com>, Steve Muckle <smuckle@google.com>, Suren Baghdasaryan <surenb@google.com> Subject: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller Date: Mon, 16 Jul 2018 09:29:02 +0100 [thread overview] Message-ID: <20180716082906.6061-9-patrick.bellasi@arm.com> (raw) In-Reply-To: <20180716082906.6061-1-patrick.bellasi@arm.com> The cgroup's CPU controller allows to assign a specified (maximum) bandwidth to the tasks of a group. However this bandwidth is defined and enforced only on a temporal base, without considering the actual frequency a CPU is running on. Thus, the amount of computation completed by a task within an allocated bandwidth can be very different depending on the actual frequency the CPU is running that task. The amount of computation can be affected also by the specific CPU a task is running on, especially when running on asymmetric capacity systems like Arm's big.LITTLE. With the availability of schedutil, the scheduler is now able to drive frequency selections based on actual task utilization. Moreover, the utilization clamping support provides a mechanism to bias the frequency selection operated by schedutil depending on constraints assigned to the tasks currently RUNNABLE on a CPU. Give the above mechanisms, it is now possible to extend the cpu controller to specify what is the minimum (or maximum) utilization which a task is expected (or allowed) to generate. Constraints on minimum and maximum utilization allowed for tasks in a CPU cgroup can improve the control on the actual amount of CPU bandwidth consumed by tasks. Utilization clamping constraints are useful not only to bias frequency selection, when a task is running, but also to better support certain scheduler decisions regarding task placement. For example, on asymmetric capacity systems, a utilization clamp value can be conveniently used to enforce important interactive tasks on more capable CPUs or to run low priority and background tasks on more energy efficient CPUs. The ultimate goal of utilization clamping is thus to enable: - boosting: by selecting an higher capacity CPU and/or higher execution frequency for small tasks which are affecting the user interactive experience. - capping: by selecting more energy efficiency CPUs or lower execution frequency, for big tasks which are mainly related to background activities, and thus without a direct impact on the user experience. Thus, a proper extension of the cpu controller with utilization clamping support will make this controller even more suitable for integration with advanced system management software (e.g. Android). Indeed, an informed user-space can provide rich information hints to the scheduler regarding the tasks it's going to schedule. This patch extends the CPU controller by adding a couple of new attributes, util_min and util_max, which can be used to enforce task's utilization boosting and capping. Specifically: - util_min: defines the minimum utilization which should be considered, e.g. when schedutil selects the frequency for a CPU while a task in this group is RUNNABLE. i.e. the task will run at least at a minimum frequency which corresponds to the min_util utilization - util_max: defines the maximum utilization which should be considered, e.g. when schedutil selects the frequency for a CPU while a task in this group is RUNNABLE. i.e. the task will run up to a maximum frequency which corresponds to the max_util utilization These attributes: a) are available only for non-root nodes, both on default and legacy hierarchies b) do not enforce any constraints and/or dependency between the parent and its child nodes, thus relying on the delegation model and permission settings defined by the system management software c) allow to (eventually) further restrict task-specific clamps defined via sched_setattr(2) This patch provides the basic support to expose the two new attributes and to validate their run-time updates. Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Todd Kjos <tkjos@google.com> Cc: Joel Fernandes <joelaf@google.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: linux-kernel@vger.kernel.org Cc: linux-pm@vger.kernel.org --- Documentation/admin-guide/cgroup-v2.rst | 25 ++++ init/Kconfig | 22 +++ kernel/sched/core.c | 186 ++++++++++++++++++++++++ kernel/sched/sched.h | 5 + 4 files changed, 238 insertions(+) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 8a2c52d5c53b..328c011cc105 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -904,6 +904,12 @@ controller implements weight and absolute bandwidth limit models for normal scheduling policy and absolute bandwidth allocation model for realtime scheduling policy. +Cycles distribution is based, by default, on a temporal base and it +does not account for the frequency at which tasks are executed. +The (optional) utilization clamping support allows to enforce a minimum +bandwidth, which should always be provided by a CPU, and a maximum bandwidth, +which should never be exceeded by a CPU. + WARNING: cgroup2 doesn't yet support control of realtime processes and the cpu controller can only be enabled when all RT processes are in the root cgroup. Be aware that system management software may already @@ -963,6 +969,25 @@ All time durations are in microseconds. $PERIOD duration. "max" for $MAX indicates no limit. If only one number is written, $MAX is updated. + cpu.util_min + A read-write single value file which exists on non-root cgroups. + The default is "0", i.e. no bandwidth boosting. + + The minimum utilization in the range [0, 1023]. + + This interface allows reading and setting minimum utilization clamp + values similar to the sched_setattr(2). This minimum utilization + value is used to clamp the task specific minimum utilization clamp. + + cpu.util_max + A read-write single value file which exists on non-root cgroups. + The default is "1023". i.e. no bandwidth clamping + + The maximum utilization in the range [0, 1023]. + + This interface allows reading and setting maximum utilization clamp + values similar to the sched_setattr(2). This maximum utilization + value is used to clamp the task specific maximum utilization clamp. Memory ------ diff --git a/init/Kconfig b/init/Kconfig index 0a377ad7c166..d7e2b74637ff 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -792,6 +792,28 @@ config RT_GROUP_SCHED endif #CGROUP_SCHED +config UCLAMP_TASK_GROUP + bool "Utilization clamping per group of tasks" + depends on CGROUP_SCHED + depends on UCLAMP_TASK + default n + help + This feature enables the scheduler to track the clamped utilization + of each CPU based on RUNNABLE tasks currently scheduled on that CPU. + + When this option is enabled, the user can specify a min and max + CPU bandwidth which is allowed for each single task in a group. + The max bandwidth allows to clamp the maximum frequency a task + can use, while the min bandwidth allows to define a minimum + frequency a task will always use. + + When task group based utilization clamping is enabled, an eventually + specified task-specific clamp value is constrained by the cgroup + specified clamp value. Both minimum and maximum task clamping cannot + be bigger than the corresponding clamping defined at task group level. + + If in doubt, say N. + config CGROUP_PIDS bool "PIDs controller" help diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 0cb6e0aa4faa..30b1d894f978 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1227,6 +1227,74 @@ static inline int uclamp_group_get(struct task_struct *p, return 0; } +#ifdef CONFIG_UCLAMP_TASK_GROUP +/** + * init_uclamp_sched_group: initialize data structures required for TG's + * utilization clamping + */ +static inline void init_uclamp_sched_group(void) +{ + struct uclamp_map *uc_map; + struct uclamp_se *uc_se; + int group_id; + int clamp_id; + + /* Root TG's is statically assigned to the first clamp group */ + group_id = 0; + + /* Initialize root TG's to default (none) clamp values */ + for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) { + uc_map = &uclamp_maps[clamp_id][0]; + + /* Map root TG's clamp value */ + uclamp_group_init(clamp_id, group_id, uclamp_none(clamp_id)); + + /* Init root TG's clamp group */ + uc_se = &root_task_group.uclamp[clamp_id]; + uc_se->value = uclamp_none(clamp_id); + uc_se->group_id = group_id; + + /* Attach root TG's clamp group */ + uc_map[group_id].se_count = 1; + } +} + +/** + * alloc_uclamp_sched_group: initialize a new TG's for utilization clamping + * @tg: the newly created task group + * @parent: its parent task group + * + * A newly created task group inherits its utilization clamp values, for all + * clamp indexes, from its parent task group. + * This ensures that its values are properly initialized and that the task + * group is accounted in the same parent's group index. + * + * Return: !0 on error + */ +static inline int alloc_uclamp_sched_group(struct task_group *tg, + struct task_group *parent) +{ + struct uclamp_se *uc_se; + int clamp_id; + + for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) { + uc_se = &tg->uclamp[clamp_id]; + + uc_se->value = parent->uclamp[clamp_id].value; + uc_se->group_id = UCLAMP_NONE; + } + + return 1; +} +#else /* CONFIG_UCLAMP_TASK_GROUP */ +static inline void init_uclamp_sched_group(void) { } +static inline int alloc_uclamp_sched_group(struct task_group *tg, + struct task_group *parent) +{ + return 1; +} +#endif /* CONFIG_UCLAMP_TASK_GROUP */ + static inline int __setscheduler_uclamp(struct task_struct *p, const struct sched_attr *attr) { @@ -1289,11 +1357,18 @@ static void __init init_uclamp(void) raw_spin_lock_init(&uc_map[group_id].se_lock); } } + + init_uclamp_sched_group(); } #else /* CONFIG_UCLAMP_TASK */ static inline void uclamp_cpu_get(struct rq *rq, struct task_struct *p) { } static inline void uclamp_cpu_put(struct rq *rq, struct task_struct *p) { } +static inline int alloc_uclamp_sched_group(struct task_group *tg, + struct task_group *parent) +{ + return 1; +} static inline int __setscheduler_uclamp(struct task_struct *p, const struct sched_attr *attr) { @@ -6890,6 +6965,9 @@ struct task_group *sched_create_group(struct task_group *parent) if (!alloc_rt_sched_group(tg, parent)) goto err; + if (!alloc_uclamp_sched_group(tg, parent)) + goto err; + return tg; err: @@ -7110,6 +7188,88 @@ static void cpu_cgroup_attach(struct cgroup_taskset *tset) sched_move_task(task); } +#ifdef CONFIG_UCLAMP_TASK_GROUP +static int cpu_util_min_write_u64(struct cgroup_subsys_state *css, + struct cftype *cftype, u64 min_value) +{ + struct task_group *tg; + int ret = -EINVAL; + + if (min_value > SCHED_CAPACITY_SCALE) + return -ERANGE; + + mutex_lock(&uclamp_mutex); + rcu_read_lock(); + + tg = css_tg(css); + if (tg->uclamp[UCLAMP_MIN].value == min_value) { + ret = 0; + goto out; + } + if (tg->uclamp[UCLAMP_MAX].value < min_value) + goto out; + +out: + rcu_read_unlock(); + mutex_unlock(&uclamp_mutex); + + return ret; +} + +static int cpu_util_max_write_u64(struct cgroup_subsys_state *css, + struct cftype *cftype, u64 max_value) +{ + struct task_group *tg; + int ret = -EINVAL; + + if (max_value > SCHED_CAPACITY_SCALE) + return -ERANGE; + + mutex_lock(&uclamp_mutex); + rcu_read_lock(); + + tg = css_tg(css); + if (tg->uclamp[UCLAMP_MAX].value == max_value) { + ret = 0; + goto out; + } + if (tg->uclamp[UCLAMP_MIN].value > max_value) + goto out; + +out: + rcu_read_unlock(); + mutex_unlock(&uclamp_mutex); + + return ret; +} + +static inline u64 cpu_uclamp_read(struct cgroup_subsys_state *css, + enum uclamp_id clamp_id) +{ + struct task_group *tg; + u64 util_clamp; + + rcu_read_lock(); + tg = css_tg(css); + util_clamp = tg->uclamp[clamp_id].value; + rcu_read_unlock(); + + return util_clamp; +} + +static u64 cpu_util_min_read_u64(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + return cpu_uclamp_read(css, UCLAMP_MIN); +} + +static u64 cpu_util_max_read_u64(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + return cpu_uclamp_read(css, UCLAMP_MAX); +} +#endif /* CONFIG_UCLAMP_TASK_GROUP */ + #ifdef CONFIG_FAIR_GROUP_SCHED static int cpu_shares_write_u64(struct cgroup_subsys_state *css, struct cftype *cftype, u64 shareval) @@ -7437,6 +7597,18 @@ static struct cftype cpu_legacy_files[] = { .read_u64 = cpu_rt_period_read_uint, .write_u64 = cpu_rt_period_write_uint, }, +#endif +#ifdef CONFIG_UCLAMP_TASK_GROUP + { + .name = "util_min", + .read_u64 = cpu_util_min_read_u64, + .write_u64 = cpu_util_min_write_u64, + }, + { + .name = "util_max", + .read_u64 = cpu_util_max_read_u64, + .write_u64 = cpu_util_max_write_u64, + }, #endif { } /* Terminate */ }; @@ -7604,6 +7776,20 @@ static struct cftype cpu_files[] = { .seq_show = cpu_max_show, .write = cpu_max_write, }, +#endif +#ifdef CONFIG_UCLAMP_TASK_GROUP + { + .name = "util_min", + .flags = CFTYPE_NOT_ON_ROOT, + .read_u64 = cpu_util_min_read_u64, + .write_u64 = cpu_util_min_write_u64, + }, + { + .name = "util_max", + .flags = CFTYPE_NOT_ON_ROOT, + .read_u64 = cpu_util_max_read_u64, + .write_u64 = cpu_util_max_write_u64, + }, #endif { } /* terminate */ }; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 7e4f10c507b7..1471a23e8f57 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -389,6 +389,11 @@ struct task_group { #endif struct cfs_bandwidth cfs_bandwidth; + +#ifdef CONFIG_UCLAMP_TASK_GROUP + struct uclamp_se uclamp[UCLAMP_CNT]; +#endif + }; #ifdef CONFIG_FAIR_GROUP_SCHED -- 2.17.1
next prev parent reply other threads:[~2018-07-16 8:29 UTC|newest] Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-07-16 8:28 [PATCH v2 00/12] Add utilization clamping support Patrick Bellasi 2018-07-16 8:28 ` [PATCH v2 01/12] sched/core: uclamp: extend sched_setattr to support utilization clamping Patrick Bellasi 2018-07-17 17:50 ` Joel Fernandes 2018-07-18 8:42 ` Patrick Bellasi 2018-07-18 17:02 ` Joel Fernandes 2018-07-17 18:04 ` Joel Fernandes 2018-07-16 8:28 ` [PATCH v2 02/12] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups Patrick Bellasi 2018-07-19 23:51 ` Suren Baghdasaryan 2018-07-20 15:11 ` Patrick Bellasi 2018-07-21 0:25 ` Suren Baghdasaryan 2018-07-23 13:36 ` Patrick Bellasi 2018-07-16 8:28 ` [PATCH v2 03/12] sched/core: uclamp: add CPU's clamp groups accounting Patrick Bellasi 2018-07-20 20:25 ` Suren Baghdasaryan 2018-07-16 8:28 ` [PATCH v2 04/12] sched/core: uclamp: update CPU's refcount on clamp changes Patrick Bellasi 2018-07-16 8:28 ` [PATCH v2 05/12] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks Patrick Bellasi 2018-07-16 8:29 ` [PATCH v2 06/12] sched/cpufreq: uclamp: add utilization clamping for RT tasks Patrick Bellasi 2018-07-16 8:29 ` [PATCH v2 07/12] sched/core: uclamp: enforce last task UCLAMP_MAX Patrick Bellasi 2018-07-21 1:23 ` Suren Baghdasaryan 2018-07-23 15:02 ` Patrick Bellasi 2018-07-23 16:40 ` Suren Baghdasaryan 2018-07-16 8:29 ` Patrick Bellasi [this message] 2018-07-21 2:37 ` [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller Suren Baghdasaryan 2018-07-21 3:16 ` Suren Baghdasaryan 2018-07-23 15:17 ` Patrick Bellasi 2018-07-23 15:30 ` Tejun Heo 2018-07-23 17:22 ` Patrick Bellasi 2018-07-24 13:29 ` Tejun Heo 2018-07-24 15:39 ` Patrick Bellasi 2018-07-27 0:39 ` Joel Fernandes 2018-07-27 8:09 ` Quentin Perret 2018-07-16 8:29 ` [PATCH v2 09/12] sched/core: uclamp: map TG's clamp values into CPU's clamp groups Patrick Bellasi 2018-07-16 8:29 ` [PATCH v2 10/12] sched/core: uclamp: use TG's clamps to restrict Task's clamps Patrick Bellasi 2018-07-22 3:05 ` Suren Baghdasaryan 2018-07-23 15:40 ` Patrick Bellasi 2018-07-23 17:11 ` Suren Baghdasaryan 2018-07-24 9:56 ` Patrick Bellasi 2018-07-24 15:28 ` Suren Baghdasaryan 2018-07-24 15:49 ` Patrick Bellasi 2018-07-16 8:29 ` [PATCH v2 11/12] sched/core: uclamp: update CPU's refcount on TG's clamp changes Patrick Bellasi 2018-07-22 3:17 ` Suren Baghdasaryan 2018-07-16 8:29 ` [PATCH v2 12/12] sched/core: uclamp: use percentage clamp values Patrick Bellasi 2018-07-22 4:04 ` Suren Baghdasaryan 2018-07-24 16:43 ` Patrick Bellasi 2018-07-24 17:11 ` Suren Baghdasaryan 2018-07-24 17:17 ` Patrick Bellasi 2018-07-17 13:03 ` [PATCH v2 00/12] Add utilization clamping support Joel Fernandes 2018-07-17 13:41 ` Patrick Bellasi
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180716082906.6061-9-patrick.bellasi@arm.com \ --to=patrick.bellasi@arm.com \ --cc=dietmar.eggemann@arm.com \ --cc=joelaf@google.com \ --cc=juri.lelli@redhat.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-pm@vger.kernel.org \ --cc=mingo@redhat.com \ --cc=morten.rasmussen@arm.com \ --cc=peterz@infradead.org \ --cc=pjt@google.com \ --cc=rafael.j.wysocki@intel.com \ --cc=smuckle@google.com \ --cc=surenb@google.com \ --cc=tj@kernel.org \ --cc=tkjos@google.com \ --cc=vincent.guittot@linaro.org \ --cc=viresh.kumar@linaro.org \ --subject='Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu'\''s cgroup controller' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).