From: Suren Baghdasaryan <surenb@google.com> To: Patrick Bellasi <patrick.bellasi@arm.com> Cc: linux-kernel@vger.kernel.org, linux-pm@vger.kernel.org, Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Tejun Heo <tj@kernel.org>, "Rafael J . Wysocki" <rafael.j.wysocki@intel.com>, Viresh Kumar <viresh.kumar@linaro.org>, Vincent Guittot <vincent.guittot@linaro.org>, Paul Turner <pjt@google.com>, Dietmar Eggemann <dietmar.eggemann@arm.com>, Morten Rasmussen <morten.rasmussen@arm.com>, Juri Lelli <juri.lelli@redhat.com>, Todd Kjos <tkjos@google.com>, Joel Fernandes <joelaf@google.com>, Steve Muckle <smuckle@google.com> Subject: Re: [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller Date: Fri, 20 Jul 2018 19:37:24 -0700 Message-ID: <CAJuCfpE4FbtrwbXNCjj=pXAvTiTLw7z1aLS4+-28X=y4V+SJ-Q@mail.gmail.com> (raw) In-Reply-To: <20180716082906.6061-9-patrick.bellasi@arm.com> On Mon, Jul 16, 2018 at 1:29 AM, Patrick Bellasi <patrick.bellasi@arm.com> wrote: > The cgroup's CPU controller allows to assign a specified (maximum) > bandwidth to the tasks of a group. However this bandwidth is defined and > enforced only on a temporal base, without considering the actual > frequency a CPU is running on. Thus, the amount of computation completed > by a task within an allocated bandwidth can be very different depending > on the actual frequency the CPU is running that task. > The amount of computation can be affected also by the specific CPU a > task is running on, especially when running on asymmetric capacity > systems like Arm's big.LITTLE. > > With the availability of schedutil, the scheduler is now able > to drive frequency selections based on actual task utilization. > Moreover, the utilization clamping support provides a mechanism to > bias the frequency selection operated by schedutil depending on > constraints assigned to the tasks currently RUNNABLE on a CPU. > > Give the above mechanisms, it is now possible to extend the cpu > controller to specify what is the minimum (or maximum) utilization which > a task is expected (or allowed) to generate. > Constraints on minimum and maximum utilization allowed for tasks in a > CPU cgroup can improve the control on the actual amount of CPU bandwidth > consumed by tasks. > > Utilization clamping constraints are useful not only to bias frequency > selection, when a task is running, but also to better support certain > scheduler decisions regarding task placement. For example, on > asymmetric capacity systems, a utilization clamp value can be > conveniently used to enforce important interactive tasks on more capable > CPUs or to run low priority and background tasks on more energy > efficient CPUs. > > The ultimate goal of utilization clamping is thus to enable: > > - boosting: by selecting an higher capacity CPU and/or higher execution > frequency for small tasks which are affecting the user > interactive experience. > > - capping: by selecting more energy efficiency CPUs or lower execution > frequency, for big tasks which are mainly related to > background activities, and thus without a direct impact on > the user experience. > > Thus, a proper extension of the cpu controller with utilization clamping > support will make this controller even more suitable for integration > with advanced system management software (e.g. Android). > Indeed, an informed user-space can provide rich information hints to the > scheduler regarding the tasks it's going to schedule. > > This patch extends the CPU controller by adding a couple of new > attributes, util_min and util_max, which can be used to enforce task's > utilization boosting and capping. Specifically: > > - util_min: defines the minimum utilization which should be considered, > e.g. when schedutil selects the frequency for a CPU while a > task in this group is RUNNABLE. > i.e. the task will run at least at a minimum frequency which > corresponds to the min_util utilization > > - util_max: defines the maximum utilization which should be considered, > e.g. when schedutil selects the frequency for a CPU while a > task in this group is RUNNABLE. > i.e. the task will run up to a maximum frequency which > corresponds to the max_util utilization > > These attributes: > > a) are available only for non-root nodes, both on default and legacy > hierarchies > b) do not enforce any constraints and/or dependency between the parent > and its child nodes, thus relying on the delegation model and > permission settings defined by the system management software > c) allow to (eventually) further restrict task-specific clamps defined > via sched_setattr(2) > > This patch provides the basic support to expose the two new attributes > and to validate their run-time updates. > > Signed-off-by: Patrick Bellasi <patrick.bellasi@arm.com> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Tejun Heo <tj@kernel.org> > Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> > Cc: Viresh Kumar <viresh.kumar@linaro.org> > Cc: Todd Kjos <tkjos@google.com> > Cc: Joel Fernandes <joelaf@google.com> > Cc: Juri Lelli <juri.lelli@redhat.com> > Cc: linux-kernel@vger.kernel.org > Cc: linux-pm@vger.kernel.org > --- > Documentation/admin-guide/cgroup-v2.rst | 25 ++++ > init/Kconfig | 22 +++ > kernel/sched/core.c | 186 ++++++++++++++++++++++++ > kernel/sched/sched.h | 5 + > 4 files changed, 238 insertions(+) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > index 8a2c52d5c53b..328c011cc105 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -904,6 +904,12 @@ controller implements weight and absolute bandwidth limit models for > normal scheduling policy and absolute bandwidth allocation model for > realtime scheduling policy. > > +Cycles distribution is based, by default, on a temporal base and it > +does not account for the frequency at which tasks are executed. > +The (optional) utilization clamping support allows to enforce a minimum > +bandwidth, which should always be provided by a CPU, and a maximum bandwidth, > +which should never be exceeded by a CPU. > + > WARNING: cgroup2 doesn't yet support control of realtime processes and > the cpu controller can only be enabled when all RT processes are in > the root cgroup. Be aware that system management software may already > @@ -963,6 +969,25 @@ All time durations are in microseconds. > $PERIOD duration. "max" for $MAX indicates no limit. If only > one number is written, $MAX is updated. > > + cpu.util_min > + A read-write single value file which exists on non-root cgroups. > + The default is "0", i.e. no bandwidth boosting. > + > + The minimum utilization in the range [0, 1023]. > + > + This interface allows reading and setting minimum utilization clamp > + values similar to the sched_setattr(2). This minimum utilization > + value is used to clamp the task specific minimum utilization clamp. > + > + cpu.util_max > + A read-write single value file which exists on non-root cgroups. > + The default is "1023". i.e. no bandwidth clamping > + > + The maximum utilization in the range [0, 1023]. > + > + This interface allows reading and setting maximum utilization clamp > + values similar to the sched_setattr(2). This maximum utilization > + value is used to clamp the task specific maximum utilization clamp. > > Memory > ------ > diff --git a/init/Kconfig b/init/Kconfig > index 0a377ad7c166..d7e2b74637ff 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -792,6 +792,28 @@ config RT_GROUP_SCHED > > endif #CGROUP_SCHED > > +config UCLAMP_TASK_GROUP > + bool "Utilization clamping per group of tasks" > + depends on CGROUP_SCHED > + depends on UCLAMP_TASK > + default n > + help > + This feature enables the scheduler to track the clamped utilization > + of each CPU based on RUNNABLE tasks currently scheduled on that CPU. > + > + When this option is enabled, the user can specify a min and max > + CPU bandwidth which is allowed for each single task in a group. > + The max bandwidth allows to clamp the maximum frequency a task > + can use, while the min bandwidth allows to define a minimum > + frequency a task will always use. > + > + When task group based utilization clamping is enabled, an eventually > + specified task-specific clamp value is constrained by the cgroup > + specified clamp value. Both minimum and maximum task clamping cannot > + be bigger than the corresponding clamping defined at task group level. > + > + If in doubt, say N. > + > config CGROUP_PIDS > bool "PIDs controller" > help > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 0cb6e0aa4faa..30b1d894f978 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -1227,6 +1227,74 @@ static inline int uclamp_group_get(struct task_struct *p, > return 0; > } > > +#ifdef CONFIG_UCLAMP_TASK_GROUP > +/** > + * init_uclamp_sched_group: initialize data structures required for TG's > + * utilization clamping > + */ > +static inline void init_uclamp_sched_group(void) > +{ > + struct uclamp_map *uc_map; > + struct uclamp_se *uc_se; > + int group_id; > + int clamp_id; > + > + /* Root TG's is statically assigned to the first clamp group */ > + group_id = 0; > + > + /* Initialize root TG's to default (none) clamp values */ > + for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) { > + uc_map = &uclamp_maps[clamp_id][0]; > + > + /* Map root TG's clamp value */ > + uclamp_group_init(clamp_id, group_id, uclamp_none(clamp_id)); > + > + /* Init root TG's clamp group */ > + uc_se = &root_task_group.uclamp[clamp_id]; > + uc_se->value = uclamp_none(clamp_id); > + uc_se->group_id = group_id; > + > + /* Attach root TG's clamp group */ > + uc_map[group_id].se_count = 1; > + } > +} > + > +/** > + * alloc_uclamp_sched_group: initialize a new TG's for utilization clamping > + * @tg: the newly created task group > + * @parent: its parent task group > + * > + * A newly created task group inherits its utilization clamp values, for all > + * clamp indexes, from its parent task group. > + * This ensures that its values are properly initialized and that the task > + * group is accounted in the same parent's group index. > + * > + * Return: !0 on error > + */ > +static inline int alloc_uclamp_sched_group(struct task_group *tg, > + struct task_group *parent) > +{ > + struct uclamp_se *uc_se; > + int clamp_id; > + > + for (clamp_id = 0; clamp_id < UCLAMP_CNT; ++clamp_id) { > + uc_se = &tg->uclamp[clamp_id]; > + > + uc_se->value = parent->uclamp[clamp_id].value; > + uc_se->group_id = UCLAMP_NONE; > + } > + > + return 1; > +} > +#else /* CONFIG_UCLAMP_TASK_GROUP */ > +static inline void init_uclamp_sched_group(void) { } > +static inline int alloc_uclamp_sched_group(struct task_group *tg, > + struct task_group *parent) > +{ > + return 1; > +} > +#endif /* CONFIG_UCLAMP_TASK_GROUP */ > + > static inline int __setscheduler_uclamp(struct task_struct *p, > const struct sched_attr *attr) > { > @@ -1289,11 +1357,18 @@ static void __init init_uclamp(void) > raw_spin_lock_init(&uc_map[group_id].se_lock); > } > } > + > + init_uclamp_sched_group(); > } > > #else /* CONFIG_UCLAMP_TASK */ > static inline void uclamp_cpu_get(struct rq *rq, struct task_struct *p) { } > static inline void uclamp_cpu_put(struct rq *rq, struct task_struct *p) { } > +static inline int alloc_uclamp_sched_group(struct task_group *tg, > + struct task_group *parent) > +{ > + return 1; > +} > static inline int __setscheduler_uclamp(struct task_struct *p, > const struct sched_attr *attr) > { > @@ -6890,6 +6965,9 @@ struct task_group *sched_create_group(struct task_group *parent) > if (!alloc_rt_sched_group(tg, parent)) > goto err; > > + if (!alloc_uclamp_sched_group(tg, parent)) > + goto err; > + > return tg; > > err: > @@ -7110,6 +7188,88 @@ static void cpu_cgroup_attach(struct cgroup_taskset *tset) > sched_move_task(task); > } > > +#ifdef CONFIG_UCLAMP_TASK_GROUP > +static int cpu_util_min_write_u64(struct cgroup_subsys_state *css, > + struct cftype *cftype, u64 min_value) > +{ > + struct task_group *tg; > + int ret = -EINVAL; > + > + if (min_value > SCHED_CAPACITY_SCALE) > + return -ERANGE; > + > + mutex_lock(&uclamp_mutex); > + rcu_read_lock(); > + > + tg = css_tg(css); > + if (tg->uclamp[UCLAMP_MIN].value == min_value) { > + ret = 0; > + goto out; > + } > + if (tg->uclamp[UCLAMP_MAX].value < min_value) > + goto out; > + + tg->uclamp[UCLAMP_MIN].value = min_value; + ret = 0; Are these assignments missing or am I missing something? Same for cpu_util_max_write_u64(). > +out: > + rcu_read_unlock(); > + mutex_unlock(&uclamp_mutex); > + > + return ret; > +} > + > +static int cpu_util_max_write_u64(struct cgroup_subsys_state *css, > + struct cftype *cftype, u64 max_value) > +{ > + struct task_group *tg; > + int ret = -EINVAL; > + > + if (max_value > SCHED_CAPACITY_SCALE) > + return -ERANGE; > + > + mutex_lock(&uclamp_mutex); > + rcu_read_lock(); > + > + tg = css_tg(css); > + if (tg->uclamp[UCLAMP_MAX].value == max_value) { > + ret = 0; > + goto out; > + } > + if (tg->uclamp[UCLAMP_MIN].value > max_value) > + goto out; > + > +out: > + rcu_read_unlock(); > + mutex_unlock(&uclamp_mutex); > + > + return ret; > +} > + > +static inline u64 cpu_uclamp_read(struct cgroup_subsys_state *css, > + enum uclamp_id clamp_id) > +{ > + struct task_group *tg; > + u64 util_clamp; > + > + rcu_read_lock(); > + tg = css_tg(css); > + util_clamp = tg->uclamp[clamp_id].value; > + rcu_read_unlock(); > + > + return util_clamp; > +} > + > +static u64 cpu_util_min_read_u64(struct cgroup_subsys_state *css, > + struct cftype *cft) > +{ > + return cpu_uclamp_read(css, UCLAMP_MIN); > +} > + > +static u64 cpu_util_max_read_u64(struct cgroup_subsys_state *css, > + struct cftype *cft) > +{ > + return cpu_uclamp_read(css, UCLAMP_MAX); > +} > +#endif /* CONFIG_UCLAMP_TASK_GROUP */ > + > #ifdef CONFIG_FAIR_GROUP_SCHED > static int cpu_shares_write_u64(struct cgroup_subsys_state *css, > struct cftype *cftype, u64 shareval) > @@ -7437,6 +7597,18 @@ static struct cftype cpu_legacy_files[] = { > .read_u64 = cpu_rt_period_read_uint, > .write_u64 = cpu_rt_period_write_uint, > }, > +#endif > +#ifdef CONFIG_UCLAMP_TASK_GROUP > + { > + .name = "util_min", > + .read_u64 = cpu_util_min_read_u64, > + .write_u64 = cpu_util_min_write_u64, > + }, > + { > + .name = "util_max", > + .read_u64 = cpu_util_max_read_u64, > + .write_u64 = cpu_util_max_write_u64, > + }, > #endif > { } /* Terminate */ > }; > @@ -7604,6 +7776,20 @@ static struct cftype cpu_files[] = { > .seq_show = cpu_max_show, > .write = cpu_max_write, > }, > +#endif > +#ifdef CONFIG_UCLAMP_TASK_GROUP > + { > + .name = "util_min", > + .flags = CFTYPE_NOT_ON_ROOT, > + .read_u64 = cpu_util_min_read_u64, > + .write_u64 = cpu_util_min_write_u64, > + }, > + { > + .name = "util_max", > + .flags = CFTYPE_NOT_ON_ROOT, > + .read_u64 = cpu_util_max_read_u64, > + .write_u64 = cpu_util_max_write_u64, > + }, > #endif > { } /* terminate */ > }; > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 7e4f10c507b7..1471a23e8f57 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -389,6 +389,11 @@ struct task_group { > #endif > > struct cfs_bandwidth cfs_bandwidth; > + > +#ifdef CONFIG_UCLAMP_TASK_GROUP > + struct uclamp_se uclamp[UCLAMP_CNT]; > +#endif > + > }; > > #ifdef CONFIG_FAIR_GROUP_SCHED > -- > 2.17.1 >
next prev parent reply index Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-07-16 8:28 [PATCH v2 00/12] Add utilization clamping support Patrick Bellasi 2018-07-16 8:28 ` [PATCH v2 01/12] sched/core: uclamp: extend sched_setattr to support utilization clamping Patrick Bellasi 2018-07-17 17:50 ` Joel Fernandes 2018-07-18 8:42 ` Patrick Bellasi 2018-07-18 17:02 ` Joel Fernandes 2018-07-17 18:04 ` Joel Fernandes 2018-07-16 8:28 ` [PATCH v2 02/12] sched/core: uclamp: map TASK's clamp values into CPU's clamp groups Patrick Bellasi 2018-07-19 23:51 ` Suren Baghdasaryan 2018-07-20 15:11 ` Patrick Bellasi 2018-07-21 0:25 ` Suren Baghdasaryan 2018-07-23 13:36 ` Patrick Bellasi 2018-07-16 8:28 ` [PATCH v2 03/12] sched/core: uclamp: add CPU's clamp groups accounting Patrick Bellasi 2018-07-20 20:25 ` Suren Baghdasaryan 2018-07-16 8:28 ` [PATCH v2 04/12] sched/core: uclamp: update CPU's refcount on clamp changes Patrick Bellasi 2018-07-16 8:28 ` [PATCH v2 05/12] sched/cpufreq: uclamp: add utilization clamping for FAIR tasks Patrick Bellasi 2018-07-16 8:29 ` [PATCH v2 06/12] sched/cpufreq: uclamp: add utilization clamping for RT tasks Patrick Bellasi 2018-07-16 8:29 ` [PATCH v2 07/12] sched/core: uclamp: enforce last task UCLAMP_MAX Patrick Bellasi 2018-07-21 1:23 ` Suren Baghdasaryan 2018-07-23 15:02 ` Patrick Bellasi 2018-07-23 16:40 ` Suren Baghdasaryan 2018-07-16 8:29 ` [PATCH v2 08/12] sched/core: uclamp: extend cpu's cgroup controller Patrick Bellasi 2018-07-21 2:37 ` Suren Baghdasaryan [this message] 2018-07-21 3:16 ` Suren Baghdasaryan 2018-07-23 15:17 ` Patrick Bellasi 2018-07-23 15:30 ` Tejun Heo 2018-07-23 17:22 ` Patrick Bellasi 2018-07-24 13:29 ` Tejun Heo 2018-07-24 15:39 ` Patrick Bellasi 2018-07-27 0:39 ` Joel Fernandes 2018-07-27 8:09 ` Quentin Perret 2018-07-16 8:29 ` [PATCH v2 09/12] sched/core: uclamp: map TG's clamp values into CPU's clamp groups Patrick Bellasi 2018-07-16 8:29 ` [PATCH v2 10/12] sched/core: uclamp: use TG's clamps to restrict Task's clamps Patrick Bellasi 2018-07-22 3:05 ` Suren Baghdasaryan 2018-07-23 15:40 ` Patrick Bellasi 2018-07-23 17:11 ` Suren Baghdasaryan 2018-07-24 9:56 ` Patrick Bellasi 2018-07-24 15:28 ` Suren Baghdasaryan 2018-07-24 15:49 ` Patrick Bellasi 2018-07-16 8:29 ` [PATCH v2 11/12] sched/core: uclamp: update CPU's refcount on TG's clamp changes Patrick Bellasi 2018-07-22 3:17 ` Suren Baghdasaryan 2018-07-16 8:29 ` [PATCH v2 12/12] sched/core: uclamp: use percentage clamp values Patrick Bellasi 2018-07-22 4:04 ` Suren Baghdasaryan 2018-07-24 16:43 ` Patrick Bellasi 2018-07-24 17:11 ` Suren Baghdasaryan 2018-07-24 17:17 ` Patrick Bellasi 2018-07-17 13:03 ` [PATCH v2 00/12] Add utilization clamping support Joel Fernandes 2018-07-17 13:41 ` Patrick Bellasi
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to='CAJuCfpE4FbtrwbXNCjj=pXAvTiTLw7z1aLS4+-28X=y4V+SJ-Q@mail.gmail.com' \ --to=surenb@google.com \ --cc=dietmar.eggemann@arm.com \ --cc=joelaf@google.com \ --cc=juri.lelli@redhat.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-pm@vger.kernel.org \ --cc=mingo@redhat.com \ --cc=morten.rasmussen@arm.com \ --cc=patrick.bellasi@arm.com \ --cc=peterz@infradead.org \ --cc=pjt@google.com \ --cc=rafael.j.wysocki@intel.com \ --cc=smuckle@google.com \ --cc=tj@kernel.org \ --cc=tkjos@google.com \ --cc=vincent.guittot@linaro.org \ --cc=viresh.kumar@linaro.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
LKML Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git git clone --mirror https://lore.kernel.org/lkml/8 lkml/git/8.git git clone --mirror https://lore.kernel.org/lkml/9 lkml/git/9.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \ linux-kernel@vger.kernel.org public-inbox-index lkml Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel AGPL code for this site: git clone https://public-inbox.org/public-inbox.git