* [PATCH v3 0/2] sched: Optionally skip uclamp logic in fast path @ 2020-06-24 17:26 Qais Yousef 2020-06-24 17:26 ` [PATCH v3 1/2] sched/uclamp: Fix initialization of strut uclamp_rq Qais Yousef ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Qais Yousef @ 2020-06-24 17:26 UTC (permalink / raw) To: Ingo Molnar, Peter Zijlstra Cc: Qais Yousef, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel This series attempts to address the report that uclamp logic could be expensive sometimes and shows a regression in netperf UDP_STREAM under certain conditions. The first patch is a fix for how struct uclamp_rq is initialized which is required by the 2nd patch which contains the real 'fix'. Worth noting that the root cause of the overhead is believed to be system specific or related to potential certain code/data layout issues, leading to worse I/D $ performance. Different systems exhibited different behaviors and the regression did disappear in certain kernel version while attempting to reporoduce. More info can be found here: https://lore.kernel.org/lkml/20200616110824.dgkkbyapn3io6wik@e107158-lin/ Having the static key seemed the best thing to do to ensure the effect of uclamp is minimized for kernels that compile it in but don't have a userspace that uses it, which will allow distros to distribute uclamp capable kernels by default without having to compromise on performance for some systems that could be affected. Changes in v3: * Avoid double negatives and rename the static key to uclamp_used * Unconditionally enable the static key through any of the paths where the user can modify the default uclamp value. * Use C99 named struct initializer for struct uclamp_rq which is easier to read than the memset(). Changes in v2: * Add more info in the commit message about the result of perf diff to demonstrate that the activate/deactivate_task pressure is reduced in the fast path. * Fix sparse warning reported by the test robot. * Add an extra commit about using static_branch_likely() instead of static_branc_unlikely(). Thanks -- Qais Yousef Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> CC: Patrick Bellasi <patrick.bellasi@matbug.net> Cc: Chris Redpath <chris.redpath@arm.com> Cc: Lukasz Luba <lukasz.luba@arm.com> Cc: linux-kernel@vger.kernel.org Qais Yousef (2): sched/uclamp: Fix initialization of strut uclamp_rq sched/uclamp: Protect uclamp fast path code with static key kernel/sched/core.c | 75 +++++++++++++++++++++++++++++++++++++++------ 1 file changed, 66 insertions(+), 9 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 1/2] sched/uclamp: Fix initialization of strut uclamp_rq 2020-06-24 17:26 [PATCH v3 0/2] sched: Optionally skip uclamp logic in fast path Qais Yousef @ 2020-06-24 17:26 ` Qais Yousef 2020-06-25 0:09 ` Valentin Schneider 2020-06-24 17:26 ` [PATCH v3 2/2] sched/uclamp: Protect uclamp fast path code with static key Qais Yousef 2020-06-25 15:21 ` [PATCH v3 0/2] sched: Optionally skip uclamp logic in fast path Lukasz Luba 2 siblings, 1 reply; 9+ messages in thread From: Qais Yousef @ 2020-06-24 17:26 UTC (permalink / raw) To: Ingo Molnar, Peter Zijlstra Cc: Qais Yousef, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel struct uclamp_rq was zeroed out entirely in assumption that in the first call to uclamp_rq_inc() they'd be initialized correctly in accordance to default settings. But when next patch introduces a static key to skip uclamp_rq_{inc,dec}() until userspace opts in to use uclamp, schedutil will fail to perform any frequency changes because the rq->uclamp[UCLAMP_MAX].value is zeroed at init and stays as such. Which means all rqs are capped to 0 by default. Fix it by making sure we do proper initialization at init without relying on uclamp_rq_inc() doing it later. Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcounting") Signed-off-by: Qais Yousef <qais.yousef@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> CC: Patrick Bellasi <patrick.bellasi@matbug.net> Cc: Chris Redpath <chris.redpath@arm.com> Cc: Lukasz Luba <lukasz.luba@arm.com> Cc: linux-kernel@vger.kernel.org --- kernel/sched/core.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 8fe2ac910bed..235b2cae00a0 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1248,6 +1248,20 @@ static void uclamp_fork(struct task_struct *p) } } +static void __init init_uclamp_rq(struct rq *rq) +{ + enum uclamp_id clamp_id; + struct uclamp_rq *uc_rq = rq->uclamp; + + for_each_clamp_id(clamp_id) { + uc_rq[clamp_id] = (struct uclamp_rq) { + .value = uclamp_none(clamp_id) + }; + } + + rq->uclamp_flags = 0; +} + static void __init init_uclamp(void) { struct uclamp_se uc_max = {}; @@ -1256,11 +1270,8 @@ static void __init init_uclamp(void) mutex_init(&uclamp_mutex); - for_each_possible_cpu(cpu) { - memset(&cpu_rq(cpu)->uclamp, 0, - sizeof(struct uclamp_rq)*UCLAMP_CNT); - cpu_rq(cpu)->uclamp_flags = 0; - } + for_each_possible_cpu(cpu) + init_uclamp_rq(cpu_rq(cpu)); for_each_clamp_id(clamp_id) { uclamp_se_set(&init_task.uclamp_req[clamp_id], -- 2.17.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3 1/2] sched/uclamp: Fix initialization of strut uclamp_rq 2020-06-24 17:26 ` [PATCH v3 1/2] sched/uclamp: Fix initialization of strut uclamp_rq Qais Yousef @ 2020-06-25 0:09 ` Valentin Schneider 0 siblings, 0 replies; 9+ messages in thread From: Valentin Schneider @ 2020-06-25 0:09 UTC (permalink / raw) To: Qais Yousef Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel On 24/06/20 18:26, Qais Yousef wrote: > struct uclamp_rq was zeroed out entirely in assumption that in the first > call to uclamp_rq_inc() they'd be initialized correctly in accordance to > default settings. > > But when next patch introduces a static key to skip > uclamp_rq_{inc,dec}() until userspace opts in to use uclamp, schedutil > will fail to perform any frequency changes because the > rq->uclamp[UCLAMP_MAX].value is zeroed at init and stays as such. Which > means all rqs are capped to 0 by default. > > Fix it by making sure we do proper initialization at init without > relying on uclamp_rq_inc() doing it later. > > Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcounting") > Signed-off-by: Qais Yousef <qais.yousef@arm.com> > Cc: Juri Lelli <juri.lelli@redhat.com> > Cc: Vincent Guittot <vincent.guittot@linaro.org> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Ben Segall <bsegall@google.com> > Cc: Mel Gorman <mgorman@suse.de> > CC: Patrick Bellasi <patrick.bellasi@matbug.net> > Cc: Chris Redpath <chris.redpath@arm.com> > Cc: Lukasz Luba <lukasz.luba@arm.com> > Cc: linux-kernel@vger.kernel.org Reviewed-by: Valentin Schneider <valentin.schneider@arm.com> ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH v3 2/2] sched/uclamp: Protect uclamp fast path code with static key 2020-06-24 17:26 [PATCH v3 0/2] sched: Optionally skip uclamp logic in fast path Qais Yousef 2020-06-24 17:26 ` [PATCH v3 1/2] sched/uclamp: Fix initialization of strut uclamp_rq Qais Yousef @ 2020-06-24 17:26 ` Qais Yousef 2020-06-25 0:16 ` Valentin Schneider 2020-06-25 15:21 ` [PATCH v3 0/2] sched: Optionally skip uclamp logic in fast path Lukasz Luba 2 siblings, 1 reply; 9+ messages in thread From: Qais Yousef @ 2020-06-24 17:26 UTC (permalink / raw) To: Ingo Molnar, Peter Zijlstra Cc: Qais Yousef, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel There is a report that when uclamp is enabled, a netperf UDP test regresses compared to a kernel compiled without uclamp. https://lore.kernel.org/lkml/20200529100806.GA3070@suse.de/ While investigating the root cause, there were no sign that the uclamp code is doing anything particularly expensive but could suffer from bad cache behavior under certain circumstances that are yet to be understood. https://lore.kernel.org/lkml/20200616110824.dgkkbyapn3io6wik@e107158-lin/ To reduce the pressure on the fast path anyway, add a static key that is by default will skip executing uclamp logic in the enqueue/dequeue_task() fast path until it's needed. As soon as the user start using util clamp by: 1. Changing uclamp value of a task with sched_setattr() 2. Modifying the default sysctl_sched_util_clamp_{min, max} 3. Modifying the default cpu.uclamp.{min, max} value in cgroup We flip the static key now that the user has opted to use util clamp. Effectively re-introducing uclamp logic in the enqueue/dequeue_task() fast path. It stays on from that point forward until the next reboot. This should help minimize the effect of util clamp on workloads that don't need it but still allow distros to ship their kernels with uclamp compiled in by default. SCHED_WARN_ON() in uclamp_rq_dec_id() was removed since now we can end up with unbalanced call to uclamp_rq_dec_id() if we flip the key while a task is running in the rq. Since we know it is harmless we just quietly return if we attempt a uclamp_rq_dec_id() when rq->uclamp[].bucket[].tasks is 0. The following results demonstrates how this helps on 2 Sockets Xeon E5 2x10-Cores system. nouclamp uclamp uclamp-static-key Hmean send-64 162.43 ( 0.00%) 157.84 * -2.82%* 163.39 * 0.59%* Hmean send-128 324.71 ( 0.00%) 314.78 * -3.06%* 326.18 * 0.45%* Hmean send-256 641.55 ( 0.00%) 628.67 * -2.01%* 648.12 * 1.02%* Hmean send-1024 2525.28 ( 0.00%) 2448.26 * -3.05%* 2543.73 * 0.73%* Hmean send-2048 4836.14 ( 0.00%) 4712.08 * -2.57%* 4867.69 * 0.65%* Hmean send-3312 7540.83 ( 0.00%) 7425.45 * -1.53%* 7621.06 * 1.06%* Hmean send-4096 9124.53 ( 0.00%) 8948.82 * -1.93%* 9276.25 * 1.66%* Hmean send-8192 15589.67 ( 0.00%) 15486.35 * -0.66%* 15819.98 * 1.48%* Hmean send-16384 26386.47 ( 0.00%) 25752.25 * -2.40%* 26773.74 * 1.47%* The perf diff between nouclamp and uclamp-static-key when uclamp is disabled in the fast path: 8.73% -1.55% [kernel.kallsyms] [k] try_to_wake_up 0.07% +0.04% [kernel.kallsyms] [k] deactivate_task 0.13% -0.02% [kernel.kallsyms] [k] activate_task The diff between nouclamp and uclamp-static-key when uclamp is enabled in the fast path: 8.73% -0.72% [kernel.kallsyms] [k] try_to_wake_up 0.13% +0.39% [kernel.kallsyms] [k] activate_task 0.07% +0.38% [kernel.kallsyms] [k] deactivate_task Reported-by: Mel Gorman <mgorman@suse.de> Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcounting") Signed-off-by: Qais Yousef <qais.yousef@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ben Segall <bsegall@google.com> Cc: Mel Gorman <mgorman@suse.de> CC: Patrick Bellasi <patrick.bellasi@matbug.net> Cc: Chris Redpath <chris.redpath@arm.com> Cc: Lukasz Luba <lukasz.luba@arm.com> Cc: linux-kernel@vger.kernel.org --- This takes a different approach to PSI which introduces a config option ``` CONFIG_PSI_DEFAULT_DISABLED Require boot parameter to enable pressure stall information tracking (NEW) boot param psi ``` via commit e0c274472d5d "psi: make disabling/enabling easier for vendor kernels" uclamp has a clearer points of entry when userspace would like to use it so we can automatically flip the switch if the kernel is running on a userspace that wants to user utilclamp without any extra userspace visible switches. I wanted to make this dependent on schedutil being the governor too, but beside the complexity, uclamp is used for capacity awareness. We could certainly construct a more complex condition, but I'm not sure it's worth it. Open to hear more opinions and points of views on this :) kernel/sched/core.c | 54 +++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 50 insertions(+), 4 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 235b2cae00a0..44e03d4fd19d 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -794,6 +794,25 @@ unsigned int sysctl_sched_uclamp_util_max = SCHED_CAPACITY_SCALE; /* All clamps are required to be less or equal than these values */ static struct uclamp_se uclamp_default[UCLAMP_CNT]; +/* + * This static key is used to reduce the uclamp overhead in the fast path. It + * only disables the call to uclamp_rq_{inc, dec}() in enqueue/dequeue_task(). + * + * This allows users to continue to enable uclamp in their kernel config with + * minimum uclamp overhead in the fast path. + * + * As soon as userspace modifies any of the uclamp knobs, the static key is + * enabled, since we have an actual users that make use of uclamp + * functionality. + * + * The knobs that would enable this static key are: + * + * * A task modifying its uclamp value with sched_setattr(). + * * An admin modifying the sysctl_sched_uclamp_{min, max} via procfs. + * * An admin modifying the cgroup cpu.uclamp.{min, max} + */ +static DEFINE_STATIC_KEY_FALSE(sched_uclamp_used); + /* Integer rounded range for each bucket */ #define UCLAMP_BUCKET_DELTA DIV_ROUND_CLOSEST(SCHED_CAPACITY_SCALE, UCLAMP_BUCKETS) @@ -994,9 +1013,16 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p, lockdep_assert_held(&rq->lock); bucket = &uc_rq->bucket[uc_se->bucket_id]; - SCHED_WARN_ON(!bucket->tasks); - if (likely(bucket->tasks)) - bucket->tasks--; + + /* + * This could happen if sched_uclamp_used was enabled while the + * current task was running, hence we could end up with unbalanced call + * to uclamp_rq_dec_id(). + */ + if (unlikely(!bucket->tasks)) + return; + + bucket->tasks--; uc_se->active = false; /* @@ -1032,6 +1058,13 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p) { enum uclamp_id clamp_id; + /* + * Avoid any overhead until uclamp is actually used by the userspace. + * Including the branch if we use static_branch_likely() + */ + if (!static_branch_unlikely(&sched_uclamp_used)) + return; + if (unlikely(!p->sched_class->uclamp_enabled)) return; @@ -1047,6 +1080,13 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p) { enum uclamp_id clamp_id; + /* + * Avoid any overhead until uclamp is actually used by the userspace. + * Including the branch if we use static_branch_likely() + */ + if (!static_branch_unlikely(&sched_uclamp_used)) + return; + if (unlikely(!p->sched_class->uclamp_enabled)) return; @@ -1155,8 +1195,10 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write, update_root_tg = true; } - if (update_root_tg) + if (update_root_tg) { uclamp_update_root_tg(); + static_branch_enable(&sched_uclamp_used); + } /* * We update all RUNNABLE tasks only when task groups are in use. @@ -1221,6 +1263,8 @@ static void __setscheduler_uclamp(struct task_struct *p, if (likely(!(attr->sched_flags & SCHED_FLAG_UTIL_CLAMP))) return; + static_branch_enable(&sched_uclamp_used); + if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) { uclamp_se_set(&p->uclamp_req[UCLAMP_MIN], attr->sched_util_min, true); @@ -7387,6 +7431,8 @@ static ssize_t cpu_uclamp_write(struct kernfs_open_file *of, char *buf, if (req.ret) return req.ret; + static_branch_enable(&sched_uclamp_used); + mutex_lock(&uclamp_mutex); rcu_read_lock(); -- 2.17.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] sched/uclamp: Protect uclamp fast path code with static key 2020-06-24 17:26 ` [PATCH v3 2/2] sched/uclamp: Protect uclamp fast path code with static key Qais Yousef @ 2020-06-25 0:16 ` Valentin Schneider 2020-06-25 11:00 ` Qais Yousef 0 siblings, 1 reply; 9+ messages in thread From: Valentin Schneider @ 2020-06-25 0:16 UTC (permalink / raw) To: Qais Yousef Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel Hi Qais, On 24/06/20 18:26, Qais Yousef wrote: [...] > --- > > This takes a different approach to PSI which introduces a config option > > ``` > CONFIG_PSI_DEFAULT_DISABLED > > Require boot parameter to enable pressure stall information > tracking (NEW) > > boot param psi > ``` > > via commit e0c274472d5d "psi: make disabling/enabling easier for vendor kernels" > > uclamp has a clearer points of entry when userspace would like to use it so we > can automatically flip the switch if the kernel is running on a userspace that > wants to user utilclamp without any extra userspace visible switches. > > I wanted to make this dependent on schedutil being the governor too, but beside > the complexity, uclamp is used for capacity awareness. We could certainly > construct a more complex condition, but I'm not sure it's worth it. Open to > hear more opinions and points of views on this :) > I think the toggling conditions are good as they are. However, speaking of schedutil, doesn't this patch break the RT frequency boost? Mind you it might be too late for me to be thinking about this stuff. In schedutil_cpu_util(), when uclamp isn't compiled it, we have an explicit 'goto max'. When uclamp *is* compiled in, that's taken care of by the "natural" RT uclamp aggregation... Which doesn't happen until we flip the static key. It's yucky, but if you declare the key in the internal sched header, you could reuse it in the existing 'goto max' (or sysctl value, when we make that tweakable) path. > > kernel/sched/core.c | 54 +++++++++++++++++++++++++++++++++++++++++---- > 1 file changed, 50 insertions(+), 4 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 235b2cae00a0..44e03d4fd19d 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -794,6 +794,25 @@ unsigned int sysctl_sched_uclamp_util_max = SCHED_CAPACITY_SCALE; > /* All clamps are required to be less or equal than these values */ > static struct uclamp_se uclamp_default[UCLAMP_CNT]; > > +/* > + * This static key is used to reduce the uclamp overhead in the fast path. It > + * only disables the call to uclamp_rq_{inc, dec}() in enqueue/dequeue_task(). > + * > + * This allows users to continue to enable uclamp in their kernel config with > + * minimum uclamp overhead in the fast path. > + * > + * As soon as userspace modifies any of the uclamp knobs, the static key is > + * enabled, since we have an actual users that make use of uclamp > + * functionality. > + * > + * The knobs that would enable this static key are: > + * > + * * A task modifying its uclamp value with sched_setattr(). That one makes it not just userspace, right? While the sched_setattr() stuff is expected to be unexported, it isn't ATM and we may expect some modules to ask for a uclamp API eventually. > + * * An admin modifying the sysctl_sched_uclamp_{min, max} via procfs. > + * * An admin modifying the cgroup cpu.uclamp.{min, max} > + */ > +static DEFINE_STATIC_KEY_FALSE(sched_uclamp_used); > + > /* Integer rounded range for each bucket */ > #define UCLAMP_BUCKET_DELTA DIV_ROUND_CLOSEST(SCHED_CAPACITY_SCALE, UCLAMP_BUCKETS) > > @@ -994,9 +1013,16 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p, > lockdep_assert_held(&rq->lock); > > bucket = &uc_rq->bucket[uc_se->bucket_id]; > - SCHED_WARN_ON(!bucket->tasks); > - if (likely(bucket->tasks)) > - bucket->tasks--; > + > + /* > + * This could happen if sched_uclamp_used was enabled while the > + * current task was running, hence we could end up with unbalanced call > + * to uclamp_rq_dec_id(). > + */ > + if (unlikely(!bucket->tasks)) > + return; > + > + bucket->tasks--; > uc_se->active = false; > > /* > @@ -1032,6 +1058,13 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p) > { > enum uclamp_id clamp_id; > > + /* > + * Avoid any overhead until uclamp is actually used by the userspace. > + * Including the branch if we use static_branch_likely() I think that second point is made clear by the first one, but YMMV. > + */ > + if (!static_branch_unlikely(&sched_uclamp_used)) > + return; > + > if (unlikely(!p->sched_class->uclamp_enabled)) > return; > > @@ -1047,6 +1080,13 @@ static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p) > { > enum uclamp_id clamp_id; > > + /* > + * Avoid any overhead until uclamp is actually used by the userspace. > + * Including the branch if we use static_branch_likely() > + */ > + if (!static_branch_unlikely(&sched_uclamp_used)) > + return; > + > if (unlikely(!p->sched_class->uclamp_enabled)) > return; > > @@ -1155,8 +1195,10 @@ int sysctl_sched_uclamp_handler(struct ctl_table *table, int write, > update_root_tg = true; > } > > - if (update_root_tg) > + if (update_root_tg) { > uclamp_update_root_tg(); > + static_branch_enable(&sched_uclamp_used); I don't think it matters ATM, but shouldn't we flip that *before* updating the TG's to avoid any future surprises? > + } > > /* > * We update all RUNNABLE tasks only when task groups are in use. > @@ -1221,6 +1263,8 @@ static void __setscheduler_uclamp(struct task_struct *p, > if (likely(!(attr->sched_flags & SCHED_FLAG_UTIL_CLAMP))) > return; > > + static_branch_enable(&sched_uclamp_used); > + > if (attr->sched_flags & SCHED_FLAG_UTIL_CLAMP_MIN) { > uclamp_se_set(&p->uclamp_req[UCLAMP_MIN], > attr->sched_util_min, true); > @@ -7387,6 +7431,8 @@ static ssize_t cpu_uclamp_write(struct kernfs_open_file *of, char *buf, > if (req.ret) > return req.ret; > > + static_branch_enable(&sched_uclamp_used); > + > mutex_lock(&uclamp_mutex); > rcu_read_lock(); ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] sched/uclamp: Protect uclamp fast path code with static key 2020-06-25 0:16 ` Valentin Schneider @ 2020-06-25 11:00 ` Qais Yousef 2020-06-25 11:26 ` Valentin Schneider 0 siblings, 1 reply; 9+ messages in thread From: Qais Yousef @ 2020-06-25 11:00 UTC (permalink / raw) To: Valentin Schneider Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel Hi Valentin On 06/25/20 01:16, Valentin Schneider wrote: > > Hi Qais, > > On 24/06/20 18:26, Qais Yousef wrote: > [...] > > --- > > > > This takes a different approach to PSI which introduces a config option > > > > ``` > > CONFIG_PSI_DEFAULT_DISABLED > > > > Require boot parameter to enable pressure stall information > > tracking (NEW) > > > > boot param psi > > ``` > > > > via commit e0c274472d5d "psi: make disabling/enabling easier for vendor kernels" > > > > uclamp has a clearer points of entry when userspace would like to use it so we > > can automatically flip the switch if the kernel is running on a userspace that > > wants to user utilclamp without any extra userspace visible switches. > > > > I wanted to make this dependent on schedutil being the governor too, but beside > > the complexity, uclamp is used for capacity awareness. We could certainly > > construct a more complex condition, but I'm not sure it's worth it. Open to > > hear more opinions and points of views on this :) > > > > I think the toggling conditions are good as they are. However, speaking of > schedutil, doesn't this patch break the RT frequency boost? Mind you it > might be too late for me to be thinking about this stuff. Good catch. I did test RT, but I just realized the RT test ran after it enabled uclamp again. So I missed the case when it wasn't enabled. > > In schedutil_cpu_util(), when uclamp isn't compiled it, we have an explicit > 'goto max'. When uclamp *is* compiled in, that's taken care of by the > "natural" RT uclamp aggregation... Which doesn't happen until we flip the > static key. > > It's yucky, but if you declare the key in the internal sched header, you > could reuse it in the existing 'goto max' (or sysctl value, when we make > that tweakable) path. Not sure if this is the best way forward. I need to think about it. While I am not keen on enabling in kernel users of util clamp, but there was already an attempt to do so. This approach will not allow us to implement something in the future for that. Which maybe is what we want.. > > > > > kernel/sched/core.c | 54 +++++++++++++++++++++++++++++++++++++++++---- > > 1 file changed, 50 insertions(+), 4 deletions(-) > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index 235b2cae00a0..44e03d4fd19d 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -794,6 +794,25 @@ unsigned int sysctl_sched_uclamp_util_max = SCHED_CAPACITY_SCALE; > > /* All clamps are required to be less or equal than these values */ > > static struct uclamp_se uclamp_default[UCLAMP_CNT]; > > > > +/* > > + * This static key is used to reduce the uclamp overhead in the fast path. It > > + * only disables the call to uclamp_rq_{inc, dec}() in enqueue/dequeue_task(). > > + * > > + * This allows users to continue to enable uclamp in their kernel config with > > + * minimum uclamp overhead in the fast path. > > + * > > + * As soon as userspace modifies any of the uclamp knobs, the static key is > > + * enabled, since we have an actual users that make use of uclamp > > + * functionality. > > + * > > + * The knobs that would enable this static key are: > > + * > > + * * A task modifying its uclamp value with sched_setattr(). > > That one makes it not just userspace, right? While the sched_setattr() > stuff is expected to be unexported, it isn't ATM and we may expect some > modules to ask for a uclamp API eventually. This has already come up with another thread [1]. I think in-kernel users shouldn't go through this path. I will propose something to give stronger guarantees in this regard. > > - if (update_root_tg) > > + if (update_root_tg) { > > uclamp_update_root_tg(); > > + static_branch_enable(&sched_uclamp_used); > > I don't think it matters ATM, but shouldn't we flip that *before* updating > the TG's to avoid any future surprises? What sort of surprises are you thinking of? Thanks -- Qais Yousef [1] https://lore.kernel.org/lkml/20200624175236.nblndmg6dfq2vr2u@e107158-lin.cambridge.arm.com/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] sched/uclamp: Protect uclamp fast path code with static key 2020-06-25 11:00 ` Qais Yousef @ 2020-06-25 11:26 ` Valentin Schneider 2020-06-25 11:34 ` Qais Yousef 0 siblings, 1 reply; 9+ messages in thread From: Valentin Schneider @ 2020-06-25 11:26 UTC (permalink / raw) To: Qais Yousef Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel On 25/06/20 12:00, Qais Yousef wrote: > Hi Valentin > > On 06/25/20 01:16, Valentin Schneider wrote: >> In schedutil_cpu_util(), when uclamp isn't compiled it, we have an explicit >> 'goto max'. When uclamp *is* compiled in, that's taken care of by the >> "natural" RT uclamp aggregation... Which doesn't happen until we flip the >> static key. >> >> It's yucky, but if you declare the key in the internal sched header, you >> could reuse it in the existing 'goto max' (or sysctl value, when we make >> that tweakable) path. > > Not sure if this is the best way forward. I need to think about it. > While I am not keen on enabling in kernel users of util clamp, but there was > already an attempt to do so. This approach will not allow us to implement > something in the future for that. Which maybe is what we want.. > Just to be clear, I'm not suggesting to add any in-kernel toggling of uclamp outside of the scheduler: by keeping that to the internal sched header & schedutil, we're keeping it contained to internal scheduler land. Since a diff is worth a thousand words, here's what I was thinking of (not even compiled): --- diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 7fbaee24c824..68731c3316ef 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -210,7 +210,7 @@ unsigned long schedutil_cpu_util(int cpu, unsigned long util_cfs, unsigned long dl_util, util, irq; struct rq *rq = cpu_rq(cpu); - if (!IS_BUILTIN(CONFIG_UCLAMP_TASK) && + if ((!IS_BUILTIN(CONFIG_UCLAMP_TASK) || !static_branch_likely(&sched_uclamp_used)) && type == FREQUENCY_UTIL && rt_rq_is_runnable(&rq->rt)) { return max; } diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 1d4e94c1e5fe..3fd5c792f247 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1638,6 +1638,7 @@ static const_debug __maybe_unused unsigned int sysctl_sched_features = extern struct static_key_false sched_numa_balancing; extern struct static_key_false sched_schedstats; +extern struct static_key_false sched_uclamp_used; static inline u64 global_rt_period(void) { --- >> > + * As soon as userspace modifies any of the uclamp knobs, the static key is >> > + * enabled, since we have an actual users that make use of uclamp >> > + * functionality. >> > + * >> > + * The knobs that would enable this static key are: >> > + * >> > + * * A task modifying its uclamp value with sched_setattr(). >> >> That one makes it not just userspace, right? While the sched_setattr() >> stuff is expected to be unexported, it isn't ATM and we may expect some >> modules to ask for a uclamp API eventually. > > This has already come up with another thread [1]. I think in-kernel users > shouldn't go through this path. I will propose something to give stronger > guarantees in this regard. > True; and they'll have to go through another path anyway once the unexport thing happens. >> > - if (update_root_tg) >> > + if (update_root_tg) { >> > uclamp_update_root_tg(); >> > + static_branch_enable(&sched_uclamp_used); >> >> I don't think it matters ATM, but shouldn't we flip that *before* updating >> the TG's to avoid any future surprises? > > What sort of surprises are you thinking of? > Say if we end up adding static key checks in some other uclamp functions (which are called in the TG update) and don't change this here, someone will have to scratch their heads to figure out the key enablement needs to be moved one line higher. It's harmless future-proofing, I think. > Thanks ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v3 2/2] sched/uclamp: Protect uclamp fast path code with static key 2020-06-25 11:26 ` Valentin Schneider @ 2020-06-25 11:34 ` Qais Yousef 0 siblings, 0 replies; 9+ messages in thread From: Qais Yousef @ 2020-06-25 11:34 UTC (permalink / raw) To: Valentin Schneider Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Patrick Bellasi, Chris Redpath, Lukasz Luba, linux-kernel On 06/25/20 12:26, Valentin Schneider wrote: > > On 25/06/20 12:00, Qais Yousef wrote: > > Hi Valentin > > > > On 06/25/20 01:16, Valentin Schneider wrote: > >> In schedutil_cpu_util(), when uclamp isn't compiled it, we have an explicit > >> 'goto max'. When uclamp *is* compiled in, that's taken care of by the > >> "natural" RT uclamp aggregation... Which doesn't happen until we flip the > >> static key. > >> > >> It's yucky, but if you declare the key in the internal sched header, you > >> could reuse it in the existing 'goto max' (or sysctl value, when we make > >> that tweakable) path. > > > > Not sure if this is the best way forward. I need to think about it. > > While I am not keen on enabling in kernel users of util clamp, but there was > > already an attempt to do so. This approach will not allow us to implement > > something in the future for that. Which maybe is what we want.. > > > > Just to be clear, I'm not suggesting to add any in-kernel toggling of > uclamp outside of the scheduler: by keeping that to the internal sched > header & schedutil, we're keeping it contained to internal scheduler land. > > Since a diff is worth a thousand words, here's what I was thinking of (not > even compiled): Yeah I understood and already thought about this. But this approach could prevent potential in kernel-users. Whether we care or not about this now, I don't know. But it seems the simplest thing to do. > > >> > - if (update_root_tg) > >> > + if (update_root_tg) { > >> > uclamp_update_root_tg(); > >> > + static_branch_enable(&sched_uclamp_used); > >> > >> I don't think it matters ATM, but shouldn't we flip that *before* updating > >> the TG's to avoid any future surprises? > > > > What sort of surprises are you thinking of? > > > > Say if we end up adding static key checks in some other uclamp functions > (which are called in the TG update) and don't change this here, someone > will have to scratch their heads to figure out the key enablement needs to > be moved one line higher. It's harmless future-proofing, I think. Hehe okay, I'll change that. Thanks -- Qais Yousef ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH v3 0/2] sched: Optionally skip uclamp logic in fast path 2020-06-24 17:26 [PATCH v3 0/2] sched: Optionally skip uclamp logic in fast path Qais Yousef 2020-06-24 17:26 ` [PATCH v3 1/2] sched/uclamp: Fix initialization of strut uclamp_rq Qais Yousef 2020-06-24 17:26 ` [PATCH v3 2/2] sched/uclamp: Protect uclamp fast path code with static key Qais Yousef @ 2020-06-25 15:21 ` Lukasz Luba 2 siblings, 0 replies; 9+ messages in thread From: Lukasz Luba @ 2020-06-25 15:21 UTC (permalink / raw) To: Qais Yousef Cc: Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot, Dietmar Eggemann, Steven Rostedt, Ben Segall, Mel Gorman, Patrick Bellasi, Chris Redpath, linux-kernel Hi Qais, On 6/24/20 6:26 PM, Qais Yousef wrote: > This series attempts to address the report that uclamp logic could be expensive > sometimes and shows a regression in netperf UDP_STREAM under certain > conditions. > > The first patch is a fix for how struct uclamp_rq is initialized which is > required by the 2nd patch which contains the real 'fix'. > > Worth noting that the root cause of the overhead is believed to be system > specific or related to potential certain code/data layout issues, leading to > worse I/D $ performance. > > Different systems exhibited different behaviors and the regression did > disappear in certain kernel version while attempting to reporoduce. > > More info can be found here: > > https://lore.kernel.org/lkml/20200616110824.dgkkbyapn3io6wik@e107158-lin/ > > Having the static key seemed the best thing to do to ensure the effect of > uclamp is minimized for kernels that compile it in but don't have a userspace > that uses it, which will allow distros to distribute uclamp capable kernels by > default without having to compromise on performance for some systems that could > be affected. > > Changes in v3: > * Avoid double negatives and rename the static key to uclamp_used > * Unconditionally enable the static key through any of the paths where > the user can modify the default uclamp value. > * Use C99 named struct initializer for struct uclamp_rq which is easier > to read than the memset(). > > Changes in v2: > * Add more info in the commit message about the result of perf diff to > demonstrate that the activate/deactivate_task pressure is reduced in > the fast path. > > * Fix sparse warning reported by the test robot. > > * Add an extra commit about using static_branch_likely() instead of > static_branc_unlikely(). > > Thanks > > -- > Qais Yousef > > Cc: Juri Lelli <juri.lelli@redhat.com> > Cc: Vincent Guittot <vincent.guittot@linaro.org> > Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Ben Segall <bsegall@google.com> > Cc: Mel Gorman <mgorman@suse.de> > CC: Patrick Bellasi <patrick.bellasi@matbug.net> > Cc: Chris Redpath <chris.redpath@arm.com> > Cc: Lukasz Luba <lukasz.luba@arm.com> > Cc: linux-kernel@vger.kernel.org > > Qais Yousef (2): > sched/uclamp: Fix initialization of strut uclamp_rq > sched/uclamp: Protect uclamp fast path code with static key > > kernel/sched/core.c | 75 +++++++++++++++++++++++++++++++++++++++------ > 1 file changed, 66 insertions(+), 9 deletions(-) > The results for this v3 series from mmtest netperf-udp (30x each UDP size) are good. v5.7-rc7-base-noucl v5.7-rc7-ucl-tsk-nofix v5.7-rc7-ucl-tsk-grp-fix_v3 Hmean send-64 62.15 ( 0.00%) 59.65 * -4.02%* 65.83 * 5.93%* Hmean send-128 122.88 ( 0.00%) 119.37 * -2.85%* 133.20 * 8.40%* Hmean send-256 244.85 ( 0.00%) 234.26 * -4.32%* 264.01 * 7.83%* Hmean send-1024 919.24 ( 0.00%) 880.67 * -4.20%* 1005.54 * 9.39%* Hmean send-2048 1689.45 ( 0.00%) 1647.54 * -2.48%* 1845.64 * 9.25%* Hmean send-3312 2542.36 ( 0.00%) 2485.23 * -2.25%* 2729.11 * 7.35%* Hmean send-4096 2935.69 ( 0.00%) 2861.09 * -2.54%* 3161.16 * 7.68%* Hmean send-8192 4800.35 ( 0.00%) 4680.09 * -2.51%* 5090.38 * 6.04%* Hmean send-16384 7473.66 ( 0.00%) 7349.60 * -1.66%* 7786.42 * 4.18%* Hmean recv-64 62.15 ( 0.00%) 59.65 * -4.03%* 65.82 * 5.91%* Hmean recv-128 122.88 ( 0.00%) 119.37 * -2.85%* 133.20 * 8.40%* Hmean recv-256 244.84 ( 0.00%) 234.26 * -4.32%* 264.01 * 7.83%* Hmean recv-1024 919.24 ( 0.00%) 880.67 * -4.20%* 1005.54 * 9.39%* Hmean recv-2048 1689.44 ( 0.00%) 1647.54 * -2.48%* 1845.06 * 9.21%* Hmean recv-3312 2542.36 ( 0.00%) 2485.23 * -2.25%* 2728.74 * 7.33%* Hmean recv-4096 2935.69 ( 0.00%) 2861.09 * -2.54%* 3160.74 * 7.67%* Hmean recv-8192 4800.35 ( 0.00%) 4678.15 * -2.55%* 5090.36 * 6.04%* Hmean recv-16384 7473.63 ( 0.00%) 7349.52 * -1.66%* 7786.25 * 4.18%* I am happy to re-run v4 if there will be, but for now: Tested-by: Lukasz Luba <lukasz.luba@arm.com> Regards, Lukasz ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2020-06-25 15:21 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-06-24 17:26 [PATCH v3 0/2] sched: Optionally skip uclamp logic in fast path Qais Yousef 2020-06-24 17:26 ` [PATCH v3 1/2] sched/uclamp: Fix initialization of strut uclamp_rq Qais Yousef 2020-06-25 0:09 ` Valentin Schneider 2020-06-24 17:26 ` [PATCH v3 2/2] sched/uclamp: Protect uclamp fast path code with static key Qais Yousef 2020-06-25 0:16 ` Valentin Schneider 2020-06-25 11:00 ` Qais Yousef 2020-06-25 11:26 ` Valentin Schneider 2020-06-25 11:34 ` Qais Yousef 2020-06-25 15:21 ` [PATCH v3 0/2] sched: Optionally skip uclamp logic in fast path Lukasz Luba
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).