linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Patrick Bellasi <patrick.bellasi@matbug.net>
To: Qais Yousef <qais.yousef@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>,
	Mel Gorman <mgorman@suse.de>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Juri Lelli <juri.lelli@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Iurii Zaikin <yzaikin@google.com>,
	Quentin Perret <qperret@google.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Pavan Kondeti <pkondeti@codeaurora.org>,
	linux-doc@vger.kernel.org,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-fs <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 1/2] sched/uclamp: Add a new sysctl to control RT default boost value
Date: Fri, 05 Jun 2020 09:55:41 +0200	[thread overview]
Message-ID: <875zc60ww2.derkling@matbug.net> (raw)
In-Reply-To: <20200603165200.v2ypeagziht7kxdw@e107158-lin.cambridge.arm.com>


Hi Qais,

On Wed, Jun 03, 2020 at 18:52:00 +0200, Qais Yousef <qais.yousef@arm.com> wrote...

> On 06/03/20 16:59, Vincent Guittot wrote:
>> When I want to stress the fast path i usually use "perf bench sched pipe -T "
>> The tip/sched/core on my arm octo core gives the following results for
>> 20 iterations of perf bench sched pipe -T -l 50000
>> 
>> all uclamp config disabled  50035.4(+/- 0.334%)
>> all uclamp config enabled  48749.8(+/- 0.339%)   -2.64%

I use to run the same test but I don't remember such big numbers :/

>> It's quite easy to reproduce and probably easier to study the impact
>
> Thanks Vincent. This is very useful!
>
> I could reproduce that on my Juno.
>
> One of the codepath I was suspecting seems to affect it.
>
>
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 0464569f26a7..9f48090eb926 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1063,10 +1063,12 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
>          * e.g. due to future modification, warn and fixup the expected value.
>          */
>         SCHED_WARN_ON(bucket->value > rq_clamp);
> +#if 0
>         if (bucket->value >= rq_clamp) {
>                 bkt_clamp = uclamp_rq_max_value(rq, clamp_id, uc_se->value);
>                 WRITE_ONCE(uc_rq->value, bkt_clamp);
>         }
> +#endif

Yep, that's likely where we have most of the overhead at dequeue time,
sine _sometimes_ we need to update the cpu's clamp value.

However, while running perf sched pipe, I expect:
 - all tasks to have the same clamp value
 - all CPUs to have _always_ at least one RUNNABLE task

Given these two conditions above, if the CPU is never "CFS idle" (i.e.
without RUNNABLE CFS tasks), the code above should never be triggered.
More on that later...

>  }
>
>  static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
>
>
>
> uclamp_rq_max_value() could be expensive as it loops over all buckets.

It loops over UCLAMP_CNT values which are defined to fit into a single
$L. That was the optimal space/time complexity compromise we found to
get the MAX of a set of values.

> Commenting this whole path out strangely doesn't just 'fix' it,
> but produces  better results to no-uclamp kernel :-/
>
> # ./perf bench -r 20 sched pipe -T -l 50000
> Without uclamp:		5039
> With uclamp:		4832
> With uclamp+patch:	5729

I explain it below: with that code removed you never decrease the CPU's
uclamp value. Thus, the first time you schedule an RT task you go to MAX
OPP and stay there forever.

> It might be because schedutil gets biased differently by uclamp..? If I move to
> performance governor these numbers almost double.
>
> I don't know. But this promoted me to look closer and

Just to resume, when a task is dequeued we can have only these cases:

- uclamp(task) < uclamp(cpu):
  this happens when the task was co-scheduled with other tasks with
  higher clamp values which are still RUNNABLE.
  In this case there are no uclamp(cpu) updates.

- uclamp(task) == uclamp(cpu):
  this happens when the task was one of the tasks defining the current
  uclamp(cpu) value, which is defined to track the MAX of the RUNNABLE
  tasks clamp values.

In this last case we _not_ always need to do a uclamp(cpu) update.
Indeed the update is required _only_ when that task was _the last_ task
defining the current uclamp(cpu) value.

In this case we use uclamp_rq_max_value() to do a linear scan of
UCLAMP_CNT values which fits into a single cache line.

> I think I spotted a bug where in the if condition we check for '>='
> instead of '>', causing us to take the supposedly impossible fail safe
> path.

The fail safe path is when the '>' condition matches, which is what the
SCHED_WARN_ON tell us. Indeed, we never expect uclamp(cpu) to be bigger
than one of its RUNNABLE tasks. If that should happen we WARN and fix
the cpu clamp value for the best.

The normal path is instead '=' and, according to by previous resume,
it's expected to be executed _only_ when we dequeue the last task of the
clamp group defining the current uclamp(cpu) value.

> Mind trying with the below patch please?
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 0464569f26a7..50d66d4016ff 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1063,7 +1063,7 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
>          * e.g. due to future modification, warn and fixup the expected value.
>          */
>         SCHED_WARN_ON(bucket->value > rq_clamp);
> -       if (bucket->value >= rq_clamp) {
> +       if (bucket->value > rq_clamp) {
>                 bkt_clamp = uclamp_rq_max_value(rq, clamp_id, uc_se->value);
>                 WRITE_ONCE(uc_rq->value, bkt_clamp);
>         }

This patch is thus bogus, since we never expect to have uclamp(cpu)
bigger than uclamp(task) and thus we will never reset the clamp value of
a cpu.


  parent reply	other threads:[~2020-06-05  7:55 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-11 15:40 [PATCH 1/2] sched/uclamp: Add a new sysctl to control RT default boost value Qais Yousef
2020-05-11 15:40 ` [PATCH 2/2] Documentation/sysctl: Document uclamp sysctl knobs Qais Yousef
2020-05-11 17:18 ` [PATCH 1/2] sched/uclamp: Add a new sysctl to control RT default boost value Qais Yousef
2020-05-12  2:10 ` Pavan Kondeti
2020-05-12 11:46   ` Qais Yousef
2020-05-15 11:08 ` Patrick Bellasi
2020-05-18  8:31 ` Dietmar Eggemann
2020-05-18 16:49   ` Qais Yousef
2020-05-28 13:23 ` Peter Zijlstra
2020-05-28 15:58   ` Qais Yousef
2020-05-28 16:11     ` Peter Zijlstra
2020-05-28 16:51       ` Qais Yousef
2020-05-28 18:29         ` Peter Zijlstra
2020-05-28 19:08           ` Patrick Bellasi
2020-05-28 19:20           ` Dietmar Eggemann
2020-05-29  9:11           ` Qais Yousef
2020-05-29 10:21         ` Mel Gorman
2020-05-29 15:11           ` Qais Yousef
2020-05-29 16:02             ` Mel Gorman
2020-05-29 16:05               ` Qais Yousef
2020-05-29 10:08       ` Mel Gorman
2020-05-29 16:04         ` Qais Yousef
2020-05-29 16:57           ` Mel Gorman
2020-06-02 16:46         ` Dietmar Eggemann
2020-06-03  8:29           ` Patrick Bellasi
2020-06-03 10:10             ` Mel Gorman
2020-06-03 14:59               ` Vincent Guittot
2020-06-03 16:52                 ` Qais Yousef
2020-06-04 12:14                   ` Vincent Guittot
2020-06-05 10:45                     ` Qais Yousef
2020-06-09 15:29                       ` Vincent Guittot
2020-06-08 12:31                     ` Qais Yousef
2020-06-08 13:06                       ` Valentin Schneider
2020-06-08 14:44                       ` Steven Rostedt
2020-06-11 10:13                         ` Qais Yousef
2020-06-09 17:10                       ` Vincent Guittot
2020-06-11 10:24                         ` Qais Yousef
2020-06-11 12:01                           ` Vincent Guittot
2020-06-23 15:44                             ` Qais Yousef
2020-06-24  8:45                               ` Vincent Guittot
2020-06-05  7:55                   ` Patrick Bellasi [this message]
2020-06-05 11:32                     ` Qais Yousef
2020-06-05 13:27                       ` Patrick Bellasi
2020-06-03  9:40           ` Mel Gorman
2020-06-03 12:41             ` Qais Yousef
2020-06-04 13:40               ` Mel Gorman
2020-06-05 10:58                 ` Qais Yousef
2020-06-11 10:58                 ` Qais Yousef
2020-06-16 11:08                   ` Qais Yousef
2020-06-16 13:56                     ` Lukasz Luba
  -- strict thread matches above, loose matches on Subject: below --
2020-04-03 12:30 Qais Yousef
2020-04-14 18:21 ` Patrick Bellasi
2020-04-15  7:46   ` Patrick Bellasi
2020-04-20 15:04     ` Qais Yousef
2020-04-20  8:24   ` Dietmar Eggemann
2020-04-20 15:19     ` Qais Yousef
2020-04-21  0:52       ` Steven Rostedt
2020-04-21 11:16         ` Dietmar Eggemann
2020-04-21 11:23           ` Qais Yousef
2020-04-20 14:50   ` Qais Yousef
2020-04-15 10:11 ` Quentin Perret
2020-04-20 15:08   ` Qais Yousef
2020-04-20  8:29 ` Dietmar Eggemann
2020-04-20 15:13   ` Qais Yousef
2020-04-21 11:18     ` Dietmar Eggemann
2020-04-21 11:27       ` Qais Yousef
2020-04-22 10:59         ` Dietmar Eggemann
2020-04-22 13:13           ` Qais Yousef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875zc60ww2.derkling@matbug.net \
    --to=patrick.bellasi@matbug.net \
    --cc=bsegall@google.com \
    --cc=corbet@lwn.net \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pkondeti@codeaurora.org \
    --cc=qais.yousef@arm.com \
    --cc=qperret@google.com \
    --cc=rdunlap@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=valentin.schneider@arm.com \
    --cc=vincent.guittot@linaro.org \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).