All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Qais Yousef <qais.yousef@arm.com>
Cc: Mel Gorman <mgorman@suse.de>,
	Patrick Bellasi <patrick.bellasi@matbug.net>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Juri Lelli <juri.lelli@redhat.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Iurii Zaikin <yzaikin@google.com>,
	Quentin Perret <qperret@google.com>,
	Valentin Schneider <valentin.schneider@arm.com>,
	Pavan Kondeti <pkondeti@codeaurora.org>,
	linux-doc@vger.kernel.org,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-fs <linux-fsdevel@vger.kernel.org>
Subject: Re: [PATCH 1/2] sched/uclamp: Add a new sysctl to control RT default boost value
Date: Tue, 9 Jun 2020 19:10:07 +0200	[thread overview]
Message-ID: <CAKfTPtCKS-2RoaMHhKGigjzc7dhXhx0z3dYNQLD3Q9aRC_tCnw@mail.gmail.com> (raw)
In-Reply-To: <20200608123102.6sdhdhit7lac5cfl@e107158-lin.cambridge.arm.com>

On Mon, 8 Jun 2020 at 14:31, Qais Yousef <qais.yousef@arm.com> wrote:
>
> On 06/04/20 14:14, Vincent Guittot wrote:
>
> [...]
>
> > I have tried your patch and I don't see any difference compared to
> > previous tests. Let me give you more details of my setup:
> > I create 3 levels of cgroups and usually run the tests in the 4 levels
> > (which includes root). The result above are for the root level
> >
> > But I see a difference at other levels:
> >
> >                            root           level 1       level 2       level 3
> >
> > /w patch uclamp disable     50097         46615         43806         41078
> > tip uclamp enable           48706(-2.78%) 45583(-2.21%) 42851(-2.18%)
> > 40313(-1.86%)
> > /w patch uclamp enable      48882(-2.43%) 45774(-1.80%) 43108(-1.59%)
> > 40667(-1.00%)
> >
> > Whereas tip with uclamp stays around 2% behind tip without uclamp, the
> > diff of uclamp with your patch tends to decrease when we increase the
> > number of level
>
> So I did try to dig more into this, but I think it's either not a good
> reproducer or what we're observing here is uArch level latencies caused by the
> new code that seem to produce a bigger knock on effect than what they really
> are.
>
> First, CONFIG_FAIR_GROUP_SCHED is 'expensive', for some definition of
> expensive..

yes, enabling CONFIG_FAIR_GROUP_SCHED adds an overhead

>
> *** uclamp disabled/fair group enabled ***
>
>         # Executed 50000 pipe operations between two threads
>
>              Total time: 0.958 [sec]
>
>               19.177100 usecs/op
>                   52145 ops/sec
>
> *** uclamp disabled/fair group disabled ***
>
>         # Executed 50000 pipe operations between two threads
>              Total time: 0.808 [sec]
>
>              16.176200 usecs/op
>                  61819 ops/sec
>
> So there's a 15.6% drop in ops/sec when enabling this option. I think it's good
> to look at the absolutely number of usecs/op, Fair group adds around
> 3 usecs/op.
>
> I dropped FAIR_GROUP_SCHED from my config to eliminate this overhead and focus
> on solely on uclamp overhead.

Have you checked that both tests run at the root level ?
Your function-graph log below shows several calls to
update_cfs_group() which means that your trace below has not been made
at root level but most probably at the 3rd level and I wonder if you
used the same setup for running the benchmark above. This could
explain such huge difference because I don't have such difference on
my platform but more around 2%

For uclamp disable/fair group enable/ function graph enable :  47994ops/sec
For uclamp disable/fair group disable/ function graph enable : 49107ops/sec

>
> With uclamp enabled but no fair group I get
>
> *** uclamp enabled/fair group disabled ***
>
>         # Executed 50000 pipe operations between two threads
>              Total time: 0.856 [sec]
>
>              17.125740 usecs/op
>                  58391 ops/sec
>
> The drop is 5.5% in ops/sec. Or 1 usecs/op.
>
> I don't know what's the expectation here. 1 us could be a lot, but I don't
> think we expect the new code to take more than few 100s of ns anyway. If you
> add potential caching effects, reaching 1 us wouldn't be that hard.
>
> Note that in my runs I chose performance governor and use `taskset 0x2` to

You might want to set 2 CPUs in your cpumask instead of 1 in order to
have 1 CPU for each thread

> force running on a big core to make sure the runs are repeatable.

I also use performance governor but don't pinned tasks because I use smp.

>
> On Juno-r2 I managed to scrap most of the 1 us with the below patch. It seems
> there was weird branching behavior that affects the I$ in my case. It'd be good
> to try it out to see if it makes a difference for you.

The perf are slightly worse on my setup:
For uclamp enable/fair group disable/ function graph enable : 48413ops/sec
with patch  below : 47804os/sec

>
> The I$ effect is my best educated guess. Perf doesn't catch this path and
> I couldn't convince it to look at cache and branch misses between 2 specific
> points.
>
> Other subtle code shuffling did have weird effect on the result too. One worthy
> one is making uclamp_rq_dec() noinline gains back ~400 ns. Making
> uclamp_rq_inc() noinline *too* cancels this gain out :-/
>
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 0464569f26a7..0835ee20a3c7 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -1071,13 +1071,11 @@ static inline void uclamp_rq_dec_id(struct rq *rq, struct task_struct *p,
>
>  static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
>  {
> -       enum uclamp_id clamp_id;
> -
>         if (unlikely(!p->sched_class->uclamp_enabled))
>                 return;
>
> -       for_each_clamp_id(clamp_id)
> -               uclamp_rq_inc_id(rq, p, clamp_id);
> +       uclamp_rq_inc_id(rq, p, UCLAMP_MIN);
> +       uclamp_rq_inc_id(rq, p, UCLAMP_MAX);
>
>         /* Reset clamp idle holding when there is one RUNNABLE task */
>         if (rq->uclamp_flags & UCLAMP_FLAG_IDLE)
> @@ -1086,13 +1084,11 @@ static inline void uclamp_rq_inc(struct rq *rq, struct task_struct *p)
>
>  static inline void uclamp_rq_dec(struct rq *rq, struct task_struct *p)
>  {
> -       enum uclamp_id clamp_id;
> -
>         if (unlikely(!p->sched_class->uclamp_enabled))
>                 return;
>
> -       for_each_clamp_id(clamp_id)
> -               uclamp_rq_dec_id(rq, p, clamp_id);
> +       uclamp_rq_dec_id(rq, p, UCLAMP_MIN);
> +       uclamp_rq_dec_id(rq, p, UCLAMP_MAX);
>  }
>
>  static inline void
>
>
> FWIW I fail to see activate/deactivate_task in perf record. They don't show up
> on the list which means this micro benchmark doesn't stress them as Mel's test
> does.

Strange because I have been able to trace them.

>
> Worth noting that I did try running the same test on 2 vCPU VirtualBox VM and
> 64 vCPU qemu and I couldn't spot a difference when uclamp was enabled/disabled
> in these 2 environments.
>
> >
> > Beside this, that's also interesting to notice the ~6% of perf impact
> > between each level for the same image
>
> Beside my observation above, I captured this function_graph when
> FAIR_GROUP_SCHED is enabled. What I pasted below is a particularly bad
> deactivation, it's not always that costly.
>
> This ran happened was recorded with uclamp disabled.
>
> I admit I don't know how much of these numbers is ftrace overhead. When trying
> to capture similar runs for uclamp, the numbers didn't add up compared to
> running the test without ftrace generating the graph. If juno is suffering from
> bad branching costs in this path, then I suspect ftrace will amplify this as
> AFAIU it'll cause extra jumps on entry and exit.
>
>
>
>       sched-pipe-6532  [001]  9407.276302: funcgraph_entry:                   |  deactivate_task() {
>       sched-pipe-6532  [001]  9407.276302: funcgraph_entry:                   |    dequeue_task_fair() {
>       sched-pipe-6532  [001]  9407.276303: funcgraph_entry:                   |      update_curr() {
>       sched-pipe-6532  [001]  9407.276304: funcgraph_entry:        0.780 us   |        update_min_vruntime();
>       sched-pipe-6532  [001]  9407.276306: funcgraph_entry:                   |        cpuacct_charge() {
>       sched-pipe-6532  [001]  9407.276306: funcgraph_entry:        0.820 us   |          __rcu_read_lock();
>       sched-pipe-6532  [001]  9407.276308: funcgraph_entry:        0.740 us   |          __rcu_read_unlock();
>       sched-pipe-6532  [001]  9407.276309: funcgraph_exit:         3.980 us   |        }
>       sched-pipe-6532  [001]  9407.276310: funcgraph_entry:        0.720 us   |        __rcu_read_lock();
>       sched-pipe-6532  [001]  9407.276312: funcgraph_entry:        0.720 us   |        __rcu_read_unlock();
>       sched-pipe-6532  [001]  9407.276313: funcgraph_exit:         9.840 us   |      }
>       sched-pipe-6532  [001]  9407.276314: funcgraph_entry:                   |      __update_load_avg_se() {
>       sched-pipe-6532  [001]  9407.276315: funcgraph_entry:        0.720 us   |        __accumulate_pelt_segments();
>       sched-pipe-6532  [001]  9407.276316: funcgraph_exit:         2.260 us   |      }
>       sched-pipe-6532  [001]  9407.276317: funcgraph_entry:                   |      __update_load_avg_cfs_rq() {
>       sched-pipe-6532  [001]  9407.276318: funcgraph_entry:        0.860 us   |        __accumulate_pelt_segments();
>       sched-pipe-6532  [001]  9407.276319: funcgraph_exit:         2.340 us   |      }
>       sched-pipe-6532  [001]  9407.276320: funcgraph_entry:        0.760 us   |      clear_buddies();
>       sched-pipe-6532  [001]  9407.276321: funcgraph_entry:        0.800 us   |      account_entity_dequeue();
>       sched-pipe-6532  [001]  9407.276323: funcgraph_entry:        0.720 us   |      update_cfs_group();
>       sched-pipe-6532  [001]  9407.276324: funcgraph_entry:        0.740 us   |      update_min_vruntime();
>       sched-pipe-6532  [001]  9407.276326: funcgraph_entry:        0.720 us   |      set_next_buddy();
>       sched-pipe-6532  [001]  9407.276327: funcgraph_entry:                   |      __update_load_avg_se() {
>       sched-pipe-6532  [001]  9407.276328: funcgraph_entry:        0.740 us   |        __accumulate_pelt_segments();
>       sched-pipe-6532  [001]  9407.276329: funcgraph_exit:         2.220 us   |      }
>       sched-pipe-6532  [001]  9407.276330: funcgraph_entry:                   |      __update_load_avg_cfs_rq() {
>       sched-pipe-6532  [001]  9407.276331: funcgraph_entry:        0.740 us   |        __accumulate_pelt_segments();
>       sched-pipe-6532  [001]  9407.276332: funcgraph_exit:         2.180 us   |      }
>       sched-pipe-6532  [001]  9407.276333: funcgraph_entry:                   |      update_cfs_group() {
>       sched-pipe-6532  [001]  9407.276334: funcgraph_entry:                   |        reweight_entity() {
>       sched-pipe-6532  [001]  9407.276335: funcgraph_entry:                   |          update_curr() {
>       sched-pipe-6532  [001]  9407.276335: funcgraph_entry:        0.720 us   |            __calc_delta();
>       sched-pipe-6532  [001]  9407.276337: funcgraph_entry:        0.740 us   |            update_min_vruntime();
>       sched-pipe-6532  [001]  9407.276338: funcgraph_exit:         3.560 us   |          }
>       sched-pipe-6532  [001]  9407.276339: funcgraph_entry:        0.720 us   |          account_entity_dequeue();
>       sched-pipe-6532  [001]  9407.276340: funcgraph_entry:        0.720 us   |          account_entity_enqueue();
>       sched-pipe-6532  [001]  9407.276342: funcgraph_exit:         7.860 us   |        }
>       sched-pipe-6532  [001]  9407.276342: funcgraph_exit:         9.280 us   |      }
>       sched-pipe-6532  [001]  9407.276343: funcgraph_entry:                   |      __update_load_avg_se() {
>       sched-pipe-6532  [001]  9407.276344: funcgraph_entry:        0.720 us   |        __accumulate_pelt_segments();
>       sched-pipe-6532  [001]  9407.276345: funcgraph_exit:         2.180 us   |      }
>       sched-pipe-6532  [001]  9407.276346: funcgraph_entry:                   |      __update_load_avg_cfs_rq() {
>       sched-pipe-6532  [001]  9407.276347: funcgraph_entry:        0.740 us   |        __accumulate_pelt_segments();
>       sched-pipe-6532  [001]  9407.276348: funcgraph_exit:         2.180 us   |      }
>       sched-pipe-6532  [001]  9407.276349: funcgraph_entry:                   |      update_cfs_group() {
>       sched-pipe-6532  [001]  9407.276350: funcgraph_entry:                   |        reweight_entity() {
>       sched-pipe-6532  [001]  9407.276350: funcgraph_entry:                   |          update_curr() {
>       sched-pipe-6532  [001]  9407.276351: funcgraph_entry:        0.740 us   |            __calc_delta();
>       sched-pipe-6532  [001]  9407.276353: funcgraph_entry:        0.720 us   |            update_min_vruntime();
>       sched-pipe-6532  [001]  9407.276354: funcgraph_exit:         3.580 us   |          }
>       sched-pipe-6532  [001]  9407.276355: funcgraph_entry:        0.740 us   |          account_entity_dequeue();
>       sched-pipe-6532  [001]  9407.276356: funcgraph_entry:        0.720 us   |          account_entity_enqueue();
>       sched-pipe-6532  [001]  9407.276358: funcgraph_exit:         7.960 us   |        }
>       sched-pipe-6532  [001]  9407.276358: funcgraph_exit:         9.400 us   |      }
>       sched-pipe-6532  [001]  9407.276360: funcgraph_entry:                   |      __update_load_avg_se() {
>       sched-pipe-6532  [001]  9407.276360: funcgraph_entry:        0.740 us   |        __accumulate_pelt_segments();
>       sched-pipe-6532  [001]  9407.276362: funcgraph_exit:         2.220 us   |      }
>       sched-pipe-6532  [001]  9407.276362: funcgraph_entry:                   |      __update_load_avg_cfs_rq() {
>       sched-pipe-6532  [001]  9407.276363: funcgraph_entry:        0.740 us   |        __accumulate_pelt_segments();
>       sched-pipe-6532  [001]  9407.276365: funcgraph_exit:         2.160 us   |      }
>       sched-pipe-6532  [001]  9407.276366: funcgraph_entry:                   |      update_cfs_group() {
>       sched-pipe-6532  [001]  9407.276367: funcgraph_entry:                   |        reweight_entity() {
>       sched-pipe-6532  [001]  9407.276368: funcgraph_entry:                   |          update_curr() {
>       sched-pipe-6532  [001]  9407.276368: funcgraph_entry:        0.720 us   |            __calc_delta();
>       sched-pipe-6532  [001]  9407.276370: funcgraph_entry:        0.720 us   |            update_min_vruntime();
>       sched-pipe-6532  [001]  9407.276371: funcgraph_exit:         3.540 us   |          }
>       sched-pipe-6532  [001]  9407.276372: funcgraph_entry:        0.740 us   |          account_entity_dequeue();
>       sched-pipe-6532  [001]  9407.276373: funcgraph_entry:        0.720 us   |          account_entity_enqueue();
>       sched-pipe-6532  [001]  9407.276375: funcgraph_exit:         7.840 us   |        }
>       sched-pipe-6532  [001]  9407.276375: funcgraph_exit:         9.300 us   |      }
>       sched-pipe-6532  [001]  9407.276376: funcgraph_entry:        0.720 us   |      hrtick_update();
>       sched-pipe-6532  [001]  9407.276377: funcgraph_exit:       + 75.000 us  |    }
>       sched-pipe-6532  [001]  9407.276378: funcgraph_exit:       + 76.700 us  |  }
>
>
> Cheers
>
> --
> Qais Yousef

  parent reply	other threads:[~2020-06-09 17:10 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-11 15:40 [PATCH 1/2] sched/uclamp: Add a new sysctl to control RT default boost value Qais Yousef
2020-05-11 15:40 ` [PATCH 2/2] Documentation/sysctl: Document uclamp sysctl knobs Qais Yousef
2020-05-11 17:18 ` [PATCH 1/2] sched/uclamp: Add a new sysctl to control RT default boost value Qais Yousef
2020-05-12  2:10 ` Pavan Kondeti
2020-05-12 11:46   ` Qais Yousef
2020-05-15 11:08 ` Patrick Bellasi
2020-05-18  8:31 ` Dietmar Eggemann
2020-05-18 16:49   ` Qais Yousef
2020-05-28 13:23 ` Peter Zijlstra
2020-05-28 15:58   ` Qais Yousef
2020-05-28 16:11     ` Peter Zijlstra
2020-05-28 16:51       ` Qais Yousef
2020-05-28 18:29         ` Peter Zijlstra
2020-05-28 19:08           ` Patrick Bellasi
2020-05-28 19:20           ` Dietmar Eggemann
2020-05-29  9:11           ` Qais Yousef
2020-05-29 10:21         ` Mel Gorman
2020-05-29 15:11           ` Qais Yousef
2020-05-29 16:02             ` Mel Gorman
2020-05-29 16:05               ` Qais Yousef
2020-05-29 10:08       ` Mel Gorman
2020-05-29 16:04         ` Qais Yousef
2020-05-29 16:57           ` Mel Gorman
2020-06-02 16:46         ` Dietmar Eggemann
2020-06-03  8:29           ` Patrick Bellasi
2020-06-03 10:10             ` Mel Gorman
2020-06-03 14:59               ` Vincent Guittot
2020-06-03 16:52                 ` Qais Yousef
2020-06-04 12:14                   ` Vincent Guittot
2020-06-05 10:45                     ` Qais Yousef
2020-06-09 15:29                       ` Vincent Guittot
2020-06-08 12:31                     ` Qais Yousef
2020-06-08 13:06                       ` Valentin Schneider
2020-06-08 14:44                       ` Steven Rostedt
2020-06-11 10:13                         ` Qais Yousef
2020-06-09 17:10                       ` Vincent Guittot [this message]
2020-06-11 10:24                         ` Qais Yousef
2020-06-11 12:01                           ` Vincent Guittot
2020-06-23 15:44                             ` Qais Yousef
2020-06-24  8:45                               ` Vincent Guittot
2020-06-05  7:55                   ` Patrick Bellasi
2020-06-05 11:32                     ` Qais Yousef
2020-06-05 13:27                       ` Patrick Bellasi
2020-06-03  9:40           ` Mel Gorman
2020-06-03 12:41             ` Qais Yousef
2020-06-04 13:40               ` Mel Gorman
2020-06-05 10:58                 ` Qais Yousef
2020-06-11 10:58                 ` Qais Yousef
2020-06-16 11:08                   ` Qais Yousef
2020-06-16 13:56                     ` Lukasz Luba
  -- strict thread matches above, loose matches on Subject: below --
2020-04-03 12:30 Qais Yousef
2020-04-14 18:21 ` Patrick Bellasi
2020-04-15  7:46   ` Patrick Bellasi
2020-04-20 15:04     ` Qais Yousef
2020-04-20  8:24   ` Dietmar Eggemann
2020-04-20 15:19     ` Qais Yousef
2020-04-21  0:52       ` Steven Rostedt
2020-04-21 11:16         ` Dietmar Eggemann
2020-04-21 11:23           ` Qais Yousef
2020-04-20 14:50   ` Qais Yousef
2020-04-15 10:11 ` Quentin Perret
2020-04-20 15:08   ` Qais Yousef
2020-04-20  8:29 ` Dietmar Eggemann
2020-04-20 15:13   ` Qais Yousef
2020-04-21 11:18     ` Dietmar Eggemann
2020-04-21 11:27       ` Qais Yousef
2020-04-22 10:59         ` Dietmar Eggemann
2020-04-22 13:13           ` Qais Yousef

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKfTPtCKS-2RoaMHhKGigjzc7dhXhx0z3dYNQLD3Q9aRC_tCnw@mail.gmail.com \
    --to=vincent.guittot@linaro.org \
    --cc=bsegall@google.com \
    --cc=corbet@lwn.net \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=keescook@chromium.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=patrick.bellasi@matbug.net \
    --cc=peterz@infradead.org \
    --cc=pkondeti@codeaurora.org \
    --cc=qais.yousef@arm.com \
    --cc=qperret@google.com \
    --cc=rdunlap@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=valentin.schneider@arm.com \
    --cc=yzaikin@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.