From: Mel Gorman <mgorman@suse.de>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Qais Yousef <qais.yousef@arm.com>, Ingo Molnar <mingo@redhat.com>,
Randy Dunlap <rdunlap@infradead.org>,
Jonathan Corbet <corbet@lwn.net>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>,
Luis Chamberlain <mcgrof@kernel.org>,
Kees Cook <keescook@chromium.org>,
Iurii Zaikin <yzaikin@google.com>,
Quentin Perret <qperret@google.com>,
Valentin Schneider <valentin.schneider@arm.com>,
Patrick Bellasi <patrick.bellasi@matbug.net>,
Pavan Kondeti <pkondeti@codeaurora.org>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 1/2] sched/uclamp: Add a new sysctl to control RT default boost value
Date: Fri, 29 May 2020 11:08:06 +0100 [thread overview]
Message-ID: <20200529100806.GA3070@suse.de> (raw)
In-Reply-To: <20200528161112.GI2483@worktop.programming.kicks-ass.net>
On Thu, May 28, 2020 at 06:11:12PM +0200, Peter Zijlstra wrote:
> > FWIW, I think you're referring to Mel's notice in OSPM regarding the overhead.
> > Trying to see what goes on in there.
>
> Indeed, that one. The fact that regular distros cannot enable this
> feature due to performance overhead is unfortunate. It means there is a
> lot less potential for this stuff.
During that talk, I was a vague about the cost, admitted I had not looked
too closely at mainline performance and had since deleted the data given
that the problem was first spotted in early April. If I heard someone
else making statements like I did at the talk, I would consider it a bit
vague, potentially FUD, possibly wrong and worth rechecking myself. In
terms of distributions "cannot enable this", we could but I was unwilling
to pay the cost for a feature no one has asked for yet. If they had, I
would endevour to put it behind static branches and disable it by default
(like what happened for PSI). I was contacted offlist about my comments
at OSPM and gathered new data to respond properly. For the record, here
is an editted version of my response;
--8<--
(Some context deleted that is not relevant)
> Does it need any special admin configuration for system
> services, cgroups, scripts, etc?
Nothing special -- out of box configuration. Tests were executed via
mmtests.
> Which mmtests config file did you use?
>
I used network-netperf-unbound and network-netperf-cstate.
network-netperf-unbound is usually the default but for some issues, I
use the cstate configuration to limit C-states.
For a perf profile, I used network-netperf-cstate-small and
network-netperf-unbound-small to limit the amount of profile data that
was collected. Just collecting data for 64 byte buffers was enough.
> The server that I am going to configure is x86_64 numa, not arm64.
That's fine, I didn't actually test arm64 at all.
> I have a 2 socket 24 CPUs X86 server (4 NUMA nodes, AMD Opteron 6174,
> L2 512KB/cpu, L3 6MB/node, RAM 40GB/node).
> Which machine did you run it on?
>
It was a 2-socket Haswell machine (E5-2670 v3) with 2 NUMA nodes. I used
5.7-rc7 with the openSUSE Leap 15.1 kernel configuration as a baseline.
I compared with and without uclamp enabled.
For network-netperf-unbound I see
netperf-udp
5.7.0-rc7 5.7.0-rc7
with-clamp without-clamp
Hmean send-64 238.52 ( 0.00%) 257.28 * 7.87%*
Hmean send-128 477.10 ( 0.00%) 511.57 * 7.23%*
Hmean send-256 945.53 ( 0.00%) 982.50 * 3.91%*
Hmean send-1024 3655.74 ( 0.00%) 3846.98 * 5.23%*
Hmean send-2048 6926.84 ( 0.00%) 7247.04 * 4.62%*
Hmean send-3312 10767.47 ( 0.00%) 10976.73 ( 1.94%)
Hmean send-4096 12821.77 ( 0.00%) 13506.03 * 5.34%*
Hmean send-8192 22037.72 ( 0.00%) 22275.29 ( 1.08%)
Hmean send-16384 35935.31 ( 0.00%) 34737.63 * -3.33%*
Hmean recv-64 238.52 ( 0.00%) 257.28 * 7.87%*
Hmean recv-128 477.10 ( 0.00%) 511.57 * 7.23%*
Hmean recv-256 945.45 ( 0.00%) 982.50 * 3.92%*
Hmean recv-1024 3655.74 ( 0.00%) 3846.98 * 5.23%*
Hmean recv-2048 6926.84 ( 0.00%) 7246.51 * 4.62%*
Hmean recv-3312 10767.47 ( 0.00%) 10975.93 ( 1.94%)
Hmean recv-4096 12821.76 ( 0.00%) 13506.02 * 5.34%*
Hmean recv-8192 22037.71 ( 0.00%) 22274.55 ( 1.07%)
Hmean recv-16384 35934.82 ( 0.00%) 34737.50 * -3.33%*
netperf-tcp
5.7.0-rc7 5.7.0-rc7
with-clamp without-clamp
Min 64 2004.71 ( 0.00%) 2033.23 ( 1.42%)
Min 128 3657.58 ( 0.00%) 3733.35 ( 2.07%)
Min 256 6063.25 ( 0.00%) 6105.67 ( 0.70%)
Min 1024 18152.50 ( 0.00%) 18487.00 ( 1.84%)
Min 2048 28544.54 ( 0.00%) 29218.11 ( 2.36%)
Min 3312 33962.06 ( 0.00%) 36094.97 ( 6.28%)
Min 4096 36234.82 ( 0.00%) 38223.60 ( 5.49%)
Min 8192 42324.06 ( 0.00%) 43328.72 ( 2.37%)
Min 16384 44323.33 ( 0.00%) 45315.21 ( 2.24%)
Hmean 64 2018.36 ( 0.00%) 2038.53 * 1.00%*
Hmean 128 3700.12 ( 0.00%) 3758.20 * 1.57%*
Hmean 256 6236.14 ( 0.00%) 6212.77 ( -0.37%)
Hmean 1024 18214.97 ( 0.00%) 18601.01 * 2.12%*
Hmean 2048 28749.56 ( 0.00%) 29728.26 * 3.40%*
Hmean 3312 34585.50 ( 0.00%) 36345.09 * 5.09%*
Hmean 4096 36777.62 ( 0.00%) 38576.17 * 4.89%*
Hmean 8192 43149.08 ( 0.00%) 43903.77 * 1.75%*
Hmean 16384 45478.27 ( 0.00%) 46372.93 ( 1.97%)
The cstate-limited config had similar results for UDP_STREAM but was
mostly indifferent for TCP_STREAM.
So for UDP_STREAM,. there is a fairly sizable difference for uclamp. There
are caveats, netperf is not 100% stable from a performance perspective on
NUMA machines. That's improved quite a bit with 5.7 but it still should
be treated with care.
When I first saw a problem, I was using ftrace looking for latencies and
uclamp appeared to crop up. As I didn't actually need uclamp and there was
no user request to support it, I simply dropped it from the master config
so it would get propogated to any distro we release with a 5.x kernel.
From a perf profile, it's not particularly obvious that uclamp is
involved so it could be in error but I doubt it. A diff of without vs
with looks like
# Event 'cycles:ppp'
#
# Baseline Delta Abs Shared Object Symbol
# ........ ......... ........................ ..............................................
#
9.59% -2.87% [kernel.vmlinux] [k] poll_idle
0.19% +1.85% [kernel.vmlinux] [k] activate_task
+1.17% [kernel.vmlinux] [k] dequeue_task
+0.89% [kernel.vmlinux] [k] update_rq_clock.part.73
3.88% +0.73% [kernel.vmlinux] [k] try_to_wake_up
3.17% +0.68% [kernel.vmlinux] [k] __schedule
1.16% -0.60% [kernel.vmlinux] [k] __update_load_avg_cfs_rq
2.20% -0.54% [kernel.vmlinux] [k] resched_curr
2.08% -0.29% [kernel.vmlinux] [k] _raw_spin_lock_irqsave
0.44% -0.29% [kernel.vmlinux] [k] cpus_share_cache
1.13% +0.23% [kernel.vmlinux] [k] _raw_spin_lock_bh
A lot of the uclamp functions appear to be inlined so it is not be
particularly obvious from a raw profile but it shows up in the annotated
profile in activate_task and dequeue_task for example. In the case of
dequeue_task, uclamp_rq_dec_id() is extremely expensive according to the
annotated profile.
I'm afraid I did not dig into this deeply once I knew I could just disable
it even within the distribution.
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2020-05-29 10:08 UTC|newest]
Thread overview: 68+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-11 15:40 [PATCH 1/2] sched/uclamp: Add a new sysctl to control RT default boost value Qais Yousef
2020-05-11 15:40 ` [PATCH 2/2] Documentation/sysctl: Document uclamp sysctl knobs Qais Yousef
2020-05-11 17:18 ` [PATCH 1/2] sched/uclamp: Add a new sysctl to control RT default boost value Qais Yousef
2020-05-12 2:10 ` Pavan Kondeti
2020-05-12 11:46 ` Qais Yousef
2020-05-15 11:08 ` Patrick Bellasi
2020-05-18 8:31 ` Dietmar Eggemann
2020-05-18 16:49 ` Qais Yousef
2020-05-28 13:23 ` Peter Zijlstra
2020-05-28 15:58 ` Qais Yousef
2020-05-28 16:11 ` Peter Zijlstra
2020-05-28 16:51 ` Qais Yousef
2020-05-28 18:29 ` Peter Zijlstra
2020-05-28 19:08 ` Patrick Bellasi
2020-05-28 19:20 ` Dietmar Eggemann
2020-05-29 9:11 ` Qais Yousef
2020-05-29 10:21 ` Mel Gorman
2020-05-29 15:11 ` Qais Yousef
2020-05-29 16:02 ` Mel Gorman
2020-05-29 16:05 ` Qais Yousef
2020-05-29 10:08 ` Mel Gorman [this message]
2020-05-29 16:04 ` Qais Yousef
2020-05-29 16:57 ` Mel Gorman
2020-06-02 16:46 ` Dietmar Eggemann
2020-06-03 8:29 ` Patrick Bellasi
2020-06-03 10:10 ` Mel Gorman
2020-06-03 14:59 ` Vincent Guittot
2020-06-03 16:52 ` Qais Yousef
2020-06-04 12:14 ` Vincent Guittot
2020-06-05 10:45 ` Qais Yousef
2020-06-09 15:29 ` Vincent Guittot
2020-06-08 12:31 ` Qais Yousef
2020-06-08 13:06 ` Valentin Schneider
2020-06-08 14:44 ` Steven Rostedt
2020-06-11 10:13 ` Qais Yousef
2020-06-09 17:10 ` Vincent Guittot
2020-06-11 10:24 ` Qais Yousef
2020-06-11 12:01 ` Vincent Guittot
2020-06-23 15:44 ` Qais Yousef
2020-06-24 8:45 ` Vincent Guittot
2020-06-05 7:55 ` Patrick Bellasi
2020-06-05 11:32 ` Qais Yousef
2020-06-05 13:27 ` Patrick Bellasi
2020-06-03 9:40 ` Mel Gorman
2020-06-03 12:41 ` Qais Yousef
2020-06-04 13:40 ` Mel Gorman
2020-06-05 10:58 ` Qais Yousef
2020-06-11 10:58 ` Qais Yousef
2020-06-16 11:08 ` Qais Yousef
2020-06-16 13:56 ` Lukasz Luba
-- strict thread matches above, loose matches on Subject: below --
2020-04-03 12:30 Qais Yousef
2020-04-14 18:21 ` Patrick Bellasi
2020-04-15 7:46 ` Patrick Bellasi
2020-04-20 15:04 ` Qais Yousef
2020-04-20 8:24 ` Dietmar Eggemann
2020-04-20 15:19 ` Qais Yousef
2020-04-21 0:52 ` Steven Rostedt
2020-04-21 11:16 ` Dietmar Eggemann
2020-04-21 11:23 ` Qais Yousef
2020-04-20 14:50 ` Qais Yousef
2020-04-15 10:11 ` Quentin Perret
2020-04-20 15:08 ` Qais Yousef
2020-04-20 8:29 ` Dietmar Eggemann
2020-04-20 15:13 ` Qais Yousef
2020-04-21 11:18 ` Dietmar Eggemann
2020-04-21 11:27 ` Qais Yousef
2020-04-22 10:59 ` Dietmar Eggemann
2020-04-22 13:13 ` Qais Yousef
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200529100806.GA3070@suse.de \
--to=mgorman@suse.de \
--cc=bsegall@google.com \
--cc=corbet@lwn.net \
--cc=dietmar.eggemann@arm.com \
--cc=juri.lelli@redhat.com \
--cc=keescook@chromium.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=mingo@redhat.com \
--cc=patrick.bellasi@matbug.net \
--cc=peterz@infradead.org \
--cc=pkondeti@codeaurora.org \
--cc=qais.yousef@arm.com \
--cc=qperret@google.com \
--cc=rdunlap@infradead.org \
--cc=rostedt@goodmis.org \
--cc=valentin.schneider@arm.com \
--cc=vincent.guittot@linaro.org \
--cc=yzaikin@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).