From: Patrick Bellasi <patrick.bellasi@arm.com>
To: subhra mazumdar <subhra.mazumdar@oracle.com>
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
mingo@redhat.com, tglx@linutronix.de, steven.sistare@oracle.com,
dhaval.giani@oracle.com, daniel.lezcano@linaro.org,
vincent.guittot@linaro.org, viresh.kumar@linaro.org,
tim.c.chen@linux.intel.com, mgorman@techsingularity.net,
parth@linux.ibm.com
Subject: Re: [RFC PATCH 1/9] sched,cgroup: Add interface for latency-nice
Date: Thu, 05 Sep 2019 11:05:18 +0100 [thread overview]
Message-ID: <87pnkf2h41.fsf@arm.com> (raw)
In-Reply-To: <20190830174944.21741-2-subhra.mazumdar@oracle.com>
We already commented on adding the cgroup API after the per-task API.
However, for the cgroup bits will be super important to have
[ +tejun ]
in CC since here we are at discussing the idea to add a new cpu
controller's attribute.
There are opinions about which kind of attributes can be added to
cgroups and I'm sure a "latency-nice" attribute will generate an
interesting discussion. :)
LPC is coming up, perhaps we can get the chance to have a chat with
Tejun about the manoeuvring space in this area.
On Fri, Aug 30, 2019 at 18:49:36 +0100, subhra mazumdar wrote...
> Add Cgroup interface for latency-nice. Each CPU Cgroup adds a new file
> "latency-nice" which is shared by all the threads in that Cgroup.
>
> Signed-off-by: subhra mazumdar <subhra.mazumdar@oracle.com>
> ---
> include/linux/sched.h | 1 +
> kernel/sched/core.c | 40 ++++++++++++++++++++++++++++++++++++++++
> kernel/sched/fair.c | 1 +
> kernel/sched/sched.h | 8 ++++++++
> 4 files changed, 50 insertions(+)
>
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 1183741..b4a79c3 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -631,6 +631,7 @@ struct task_struct {
> int static_prio;
> int normal_prio;
> unsigned int rt_priority;
> + u64 latency_nice;
I guess we can save some bit here... or, if we are very brave, maybe we
can explore the possibility to pack all prios into a single u64?
( ( (tomatoes target here) ) )
> const struct sched_class *sched_class;
> struct sched_entity se;
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 874c427..47969bc 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5976,6 +5976,7 @@ void __init sched_init(void)
> init_dl_rq(&rq->dl);
> #ifdef CONFIG_FAIR_GROUP_SCHED
> root_task_group.shares = ROOT_TASK_GROUP_LOAD;
> + root_task_group.latency_nice = LATENCY_NICE_DEFAULT;
> INIT_LIST_HEAD(&rq->leaf_cfs_rq_list);
> rq->tmp_alone_branch = &rq->leaf_cfs_rq_list;
> /*
> @@ -6345,6 +6346,7 @@ static void sched_change_group(struct task_struct *tsk, int type)
> */
> tg = container_of(task_css_check(tsk, cpu_cgrp_id, true),
> struct task_group, css);
> + tsk->latency_nice = tg->latency_nice;
> tg = autogroup_task_group(tsk, tg);
> tsk->sched_task_group = tg;
>
> @@ -6812,6 +6814,34 @@ static u64 cpu_rt_period_read_uint(struct cgroup_subsys_state *css,
> }
> #endif /* CONFIG_RT_GROUP_SCHED */
>
> +static u64 cpu_latency_nice_read_u64(struct cgroup_subsys_state *css,
> + struct cftype *cft)
> +{
> + struct task_group *tg = css_tg(css);
> +
> + return tg->latency_nice;
> +}
> +
> +static int cpu_latency_nice_write_u64(struct cgroup_subsys_state *css,
> + struct cftype *cft, u64 latency_nice)
> +{
> + struct task_group *tg = css_tg(css);
> + struct css_task_iter it;
> + struct task_struct *p;
> +
> + if (latency_nice < LATENCY_NICE_MIN || latency_nice > LATENCY_NICE_MAX)
> + return -ERANGE;
> +
> + tg->latency_nice = latency_nice;
> +
> + css_task_iter_start(css, 0, &it);
> + while ((p = css_task_iter_next(&it)))
> + p->latency_nice = latency_nice;
Once (and if) the cgroup API is added we can avoid this (potentially
massive) "update on write" in favour of an "on demand composition at
wakeup-time".
We don't care about updating the latency-nice of NON RUNNABLE tasks,
do we?
AFAIK, we need that value only (or mostly) at wakeup time. Thus, when a
task wakeup up we can easily compose (and eventually cache) it's
current latency-nice value by considering, in priority order:
- the system wide upper-bound
- the task group restriction
- the task specific relaxation
Something similar to what we already do for uclamp composition with this
patch currently in tip/sched/core:
commit 3eac870a3247 ("sched/uclamp: Use TG's clamps to restrict TASK's clamps")
> + css_task_iter_end(&it);
> +
> + return 0;
> +}
> +
> static struct cftype cpu_legacy_files[] = {
> #ifdef CONFIG_FAIR_GROUP_SCHED
> {
> @@ -6848,6 +6878,11 @@ static struct cftype cpu_legacy_files[] = {
> .write_u64 = cpu_rt_period_write_uint,
> },
> #endif
> + {
> + .name = "latency-nice",
> + .read_u64 = cpu_latency_nice_read_u64,
> + .write_u64 = cpu_latency_nice_write_u64,
> + },
> { } /* Terminate */
> };
>
> @@ -7015,6 +7050,11 @@ static struct cftype cpu_files[] = {
> .write = cpu_max_write,
> },
> #endif
> + {
> + .name = "latency-nice",
> + .read_u64 = cpu_latency_nice_read_u64,
> + .write_u64 = cpu_latency_nice_write_u64,
> + },
> { } /* terminate */
> };
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index f35930f..b08d00c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -10479,6 +10479,7 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
> goto err;
>
> tg->shares = NICE_0_LOAD;
> + tg->latency_nice = LATENCY_NICE_DEFAULT;
^^^^^^^^^^^^^^^^^^^^
Maybe better NICE_0_LATENCY to be more consistent?
> init_cfs_bandwidth(tg_cfs_bandwidth(tg));
>
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index b52ed1a..365c928 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -143,6 +143,13 @@ static inline void cpu_load_update_active(struct rq *this_rq) { }
> #define NICE_0_LOAD (1L << NICE_0_LOAD_SHIFT)
>
> /*
> + * Latency-nice default value
> + */
> +#define LATENCY_NICE_DEFAULT 5
> +#define LATENCY_NICE_MIN 1
> +#define LATENCY_NICE_MAX 100
Values 1 and 5 looks kind of arbitrary.
For the range specifically, I already commented in this other message:
Message-ID: <87r24v2i14.fsf@arm.com>
https://lore.kernel.org/lkml/87r24v2i14.fsf@arm.com/
> +
> +/*
> * Single value that decides SCHED_DEADLINE internal math precision.
> * 10 -> just above 1us
> * 9 -> just above 0.5us
> @@ -362,6 +369,7 @@ struct cfs_bandwidth {
> /* Task group related information */
> struct task_group {
> struct cgroup_subsys_state css;
> + u64 latency_nice;
>
> #ifdef CONFIG_FAIR_GROUP_SCHED
> /* schedulable entities of this group on each CPU */
Best,
Patrick
--
#include <best/regards.h>
Patrick Bellasi
next prev parent reply other threads:[~2019-09-05 10:05 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-30 17:49 [RFC PATCH 0/9] Task latency-nice subhra mazumdar
2019-08-30 17:49 ` [RFC PATCH 1/9] sched,cgroup: Add interface for latency-nice subhra mazumdar
2019-09-04 17:32 ` Tim Chen
2019-09-05 6:15 ` Parth Shah
2019-09-05 10:11 ` Patrick Bellasi
2019-09-06 12:22 ` Parth Shah
2019-09-05 8:31 ` Peter Zijlstra
2019-09-05 9:45 ` Patrick Bellasi
2019-09-05 10:46 ` Peter Zijlstra
2019-09-05 11:13 ` Qais Yousef
2019-09-05 11:30 ` Peter Zijlstra
2019-09-05 11:40 ` Patrick Bellasi
2019-09-05 11:48 ` Peter Zijlstra
2019-09-05 13:32 ` Qais Yousef
2019-09-05 11:47 ` Qais Yousef
2020-04-16 0:02 ` Joel Fernandes
2020-04-16 17:23 ` Dietmar Eggemann
2020-04-18 16:01 ` Joel Fernandes
2020-04-20 11:26 ` Parth Shah
2020-04-20 19:14 ` Joel Fernandes
2020-04-20 11:47 ` Qais Yousef
2020-04-20 19:10 ` Joel Fernandes
2019-09-05 11:30 ` Patrick Bellasi
2019-09-05 11:47 ` Peter Zijlstra
2019-09-05 11:18 ` Patrick Bellasi
2019-09-05 11:40 ` Peter Zijlstra
2019-09-05 11:46 ` Patrick Bellasi
2019-09-05 11:46 ` Valentin Schneider
2019-09-05 13:07 ` Patrick Bellasi
2019-09-05 14:48 ` Valentin Schneider
2019-09-06 12:45 ` Parth Shah
2019-09-06 14:13 ` Valentin Schneider
2019-09-06 14:32 ` Vincent Guittot
2019-09-06 17:10 ` Parth Shah
2019-09-06 22:50 ` Valentin Schneider
2019-09-06 12:31 ` Parth Shah
2019-09-05 10:05 ` Patrick Bellasi [this message]
2019-09-05 10:48 ` Peter Zijlstra
2019-08-30 17:49 ` [RFC PATCH 2/9] sched: add search limit as per latency-nice subhra mazumdar
2019-09-05 6:22 ` Parth Shah
2019-08-30 17:49 ` [RFC PATCH 3/9] sched: add sched feature to disable idle core search subhra mazumdar
2019-09-05 10:17 ` Patrick Bellasi
2019-09-05 22:02 ` Subhra Mazumdar
2019-08-30 17:49 ` [RFC PATCH 4/9] sched: SIS_CORE " subhra mazumdar
2019-09-05 10:19 ` Patrick Bellasi
2019-08-30 17:49 ` [RFC PATCH 5/9] sched: Define macro for number of CPUs in core subhra mazumdar
2019-08-30 17:49 ` [RFC PATCH 6/9] x86/smpboot: Optimize cpumask_weight_sibling macro for x86 subhra mazumdar
2019-08-30 17:49 ` [RFC PATCH 7/9] sched: search SMT before LLC domain subhra mazumdar
2019-09-05 9:31 ` Peter Zijlstra
2019-09-05 20:40 ` Subhra Mazumdar
2019-08-30 17:49 ` [RFC PATCH 8/9] sched: introduce per-cpu var next_cpu to track search limit subhra mazumdar
2019-08-30 17:49 ` [RFC PATCH 9/9] sched: rotate the cpu search window for better spread subhra mazumdar
2019-09-05 6:37 ` Parth Shah
2019-09-05 5:55 ` [RFC PATCH 0/9] Task latency-nice Parth Shah
2019-09-05 10:31 ` Patrick Bellasi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87pnkf2h41.fsf@arm.com \
--to=patrick.bellasi@arm.com \
--cc=daniel.lezcano@linaro.org \
--cc=dhaval.giani@oracle.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=parth@linux.ibm.com \
--cc=peterz@infradead.org \
--cc=steven.sistare@oracle.com \
--cc=subhra.mazumdar@oracle.com \
--cc=tglx@linutronix.de \
--cc=tim.c.chen@linux.intel.com \
--cc=vincent.guittot@linaro.org \
--cc=viresh.kumar@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).