From: Peter Zijlstra <peterz@infradead.org>
To: mingo@kernel.org, vincent.guittot@linaro.org
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
juri.lelli@redhat.com, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
bristot@redhat.com, corbet@lwn.net, qyousef@layalina.io,
chris.hyser@oracle.com, patrick.bellasi@matbug.net,
pjt@google.com, pavel@ucw.cz, qperret@google.com,
tim.c.chen@linux.intel.com, joshdon@google.com, timj@gnu.org,
kprateek.nayak@amd.com, yu.c.chen@intel.com,
youssefesmat@chromium.org, joel@joelfernandes.org, efault@gmx.de
Subject: [PATCH 00/17] sched: EEVDF using latency-nice
Date: Tue, 28 Mar 2023 11:26:22 +0200 [thread overview]
Message-ID: <20230328092622.062917921@infradead.org> (raw)
Hi!
Latest version of the EEVDF [1] patches.
Many changes since last time; most notably it now fully replaces CFS and uses
lag based placement for migrations. Smaller changes include:
- uses scale_load_down() for avg_vruntime; I measured the max delta to be ~44
bits on a system/cgroup based kernel build.
- fixed a bunch of reweight / cgroup placement issues
- adaptive placement strategy for smaller slices
- rename se->lag to se->vlag
There's a bunch of RFC patches at the end and one DEBUG patch. Of those, the
PLACE_BONUS patch is a mixed bag of pain. A number of benchmarks regress
because EEVDF is actually fair and gives a 100% parent vs a 50% child a 67%/33%
split (stress-futex, stress-nanosleep, starve, etc..) instead of a 50%/50%
split that sleeper bonus achieves. Mostly I think these benchmarks are somewhat
artificial/daft but who knows.
The PLACE_BONUS thing horribly messes up things like hackbench and latency-nice
because it places things too far to the left in the tree. Basically it messes
with the whole 'when', by placing a task back in history you're putting a
burden on the now to accomodate catching up. More tinkering required.
But over-all the thing seems to be fairly usable and could do with more
extensive testing.
[1] https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=805acf7726282721504c8f00575d91ebfd750564
Results:
hackbech -g $nr_cpu + cyclictest --policy other results:
EEVDF CFS
# Min Latencies: 00054
LNICE(19) # Avg Latencies: 00660
# Max Latencies: 23103
# Min Latencies: 00052 00053
LNICE(0) # Avg Latencies: 00318 00687
# Max Latencies: 08593 13913
# Min Latencies: 00054
LNICE(-19) # Avg Latencies: 00055
# Max Latencies: 00061
Some preliminary results from Chen Yu on a slightly older version:
schbench (95% tail latency, lower is better)
=================================================================================
case nr_instance baseline (std%) compare% ( std%)
normal 25% 1.00 (2.49%) -81.2% (4.27%)
normal 50% 1.00 (2.47%) -84.5% (0.47%)
normal 75% 1.00 (2.5%) -81.3% (1.27%)
normal 100% 1.00 (3.14%) -79.2% (0.72%)
normal 125% 1.00 (3.07%) -77.5% (0.85%)
normal 150% 1.00 (3.35%) -76.4% (0.10%)
normal 175% 1.00 (3.06%) -76.2% (0.56%)
normal 200% 1.00 (3.11%) -76.3% (0.39%)
==================================================================================
hackbench (throughput, higher is better)
==============================================================================
case nr_instance baseline(std%) compare%( std%)
threads-pipe 25% 1.00 (<2%) -17.5 (<2%)
threads-socket 25% 1.00 (<2%) -1.9 (<2%)
threads-pipe 50% 1.00 (<2%) +6.7 (<2%)
threads-socket 50% 1.00 (<2%) -6.3 (<2%)
threads-pipe 100% 1.00 (3%) +110.1 (3%)
threads-socket 100% 1.00 (<2%) -40.2 (<2%)
threads-pipe 150% 1.00 (<2%) +125.4 (<2%)
threads-socket 150% 1.00 (<2%) -24.7 (<2%)
threads-pipe 200% 1.00 (<2%) -89.5 (<2%)
threads-socket 200% 1.00 (<2%) -27.4 (<2%)
process-pipe 25% 1.00 (<2%) -15.0 (<2%)
process-socket 25% 1.00 (<2%) -3.9 (<2%)
process-pipe 50% 1.00 (<2%) -0.4 (<2%)
process-socket 50% 1.00 (<2%) -5.3 (<2%)
process-pipe 100% 1.00 (<2%) +62.0 (<2%)
process-socket 100% 1.00 (<2%) -39.5 (<2%)
process-pipe 150% 1.00 (<2%) +70.0 (<2%)
process-socket 150% 1.00 (<2%) -20.3 (<2%)
process-pipe 200% 1.00 (<2%) +79.2 (<2%)
process-socket 200% 1.00 (<2%) -22.4 (<2%)
==============================================================================
stress-ng (throughput, higher is better)
==============================================================================
case nr_instance baseline(std%) compare%( std%)
switch 25% 1.00 (<2%) -6.5 (<2%)
switch 50% 1.00 (<2%) -9.2 (<2%)
switch 75% 1.00 (<2%) -1.2 (<2%)
switch 100% 1.00 (<2%) +11.1 (<2%)
switch 125% 1.00 (<2%) -16.7% (9%)
switch 150% 1.00 (<2%) -13.6 (<2%)
switch 175% 1.00 (<2%) -16.2 (<2%)
switch 200% 1.00 (<2%) -19.4% (<2%)
fork 50% 1.00 (<2%) -0.1 (<2%)
fork 75% 1.00 (<2%) -0.3 (<2%)
fork 100% 1.00 (<2%) -0.1 (<2%)
fork 125% 1.00 (<2%) -6.9 (<2%)
fork 150% 1.00 (<2%) -8.8 (<2%)
fork 200% 1.00 (<2%) -3.3 (<2%)
futex 25% 1.00 (<2%) -3.2 (<2%)
futex 50% 1.00 (3%) -19.9 (5%)
futex 75% 1.00 (6%) -19.1 (2%)
futex 100% 1.00 (16%) -30.5 (10%)
futex 125% 1.00 (25%) -39.3 (11%)
futex 150% 1.00 (20%) -27.2% (17%)
futex 175% 1.00 (<2%) -18.6 (<2%)
futex 200% 1.00 (<2%) -47.5 (<2%)
nanosleep 25% 1.00 (<2%) -0.1 (<2%)
nanosleep 50% 1.00 (<2%) -0.0% (<2%)
nanosleep 75% 1.00 (<2%) +15.2% (<2%)
nanosleep 100% 1.00 (<2%) -26.4 (<2%)
nanosleep 125% 1.00 (<2%) -1.3 (<2%)
nanosleep 150% 1.00 (<2%) +2.1 (<2%)
nanosleep 175% 1.00 (<2%) +8.3 (<2%)
nanosleep 200% 1.00 (<2%) +2.0% (<2%)
===============================================================================
unixbench (throughput, higher is better)
==============================================================================
case nr_instance baseline(std%) compare%( std%)
spawn 125% 1.00 (<2%) +8.1 (<2%)
context1 100% 1.00 (6%) +17.4 (6%)
context1 75% 1.00 (13%) +18.8 (8%)
=================================================================================
netperf (throughput, higher is better)
===========================================================================
case nr_instance baseline(std%) compare%( std%)
UDP_RR 25% 1.00 (<2%) -1.5% (<2%)
UDP_RR 50% 1.00 (<2%) -0.3% (<2%)
UDP_RR 75% 1.00 (<2%) +12.5% (<2%)
UDP_RR 100% 1.00 (<2%) -4.3% (<2%)
UDP_RR 125% 1.00 (<2%) -4.9% (<2%)
UDP_RR 150% 1.00 (<2%) -4.7% (<2%)
UDP_RR 175% 1.00 (<2%) -6.1% (<2%)
UDP_RR 200% 1.00 (<2%) -6.6% (<2%)
TCP_RR 25% 1.00 (<2%) -1.4% (<2%)
TCP_RR 50% 1.00 (<2%) -0.2% (<2%)
TCP_RR 75% 1.00 (<2%) -3.9% (<2%)
TCP_RR 100% 1.00 (2%) +3.6% (5%)
TCP_RR 125% 1.00 (<2%) -4.2% (<2%)
TCP_RR 150% 1.00 (<2%) -6.0% (<2%)
TCP_RR 175% 1.00 (<2%) -7.4% (<2%)
TCP_RR 200% 1.00 (<2%) -8.4% (<2%)
==========================================================================
---
Also available at:
git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/eevdf
---
Parth Shah (1):
sched: Introduce latency-nice as a per-task attribute
Peter Zijlstra (14):
sched/fair: Add avg_vruntime
sched/fair: Remove START_DEBIT
sched/fair: Add lag based placement
rbtree: Add rb_add_augmented_cached() helper
sched/fair: Implement an EEVDF like policy
sched: Commit to lag based placement
sched/smp: Use lag to simplify cross-runqueue placement
sched: Commit to EEVDF
sched/debug: Rename min_granularity to base_slice
sched: Merge latency_offset into slice
sched/eevdf: Better handle mixed slice length
sched/eevdf: Sleeper bonus
sched/eevdf: Minimal vavg option
sched/eevdf: Debug / validation crud
Vincent Guittot (2):
sched/fair: Add latency_offset
sched/fair: Add sched group latency support
Documentation/admin-guide/cgroup-v2.rst | 10 +
include/linux/rbtree_augmented.h | 26 +
include/linux/sched.h | 6 +
include/uapi/linux/sched.h | 4 +-
include/uapi/linux/sched/types.h | 19 +
init/init_task.c | 3 +-
kernel/sched/core.c | 65 +-
kernel/sched/debug.c | 49 +-
kernel/sched/fair.c | 1199 ++++++++++++++++---------------
kernel/sched/features.h | 29 +-
kernel/sched/sched.h | 23 +-
tools/include/uapi/linux/sched.h | 4 +-
12 files changed, 794 insertions(+), 643 deletions(-)
next reply other threads:[~2023-03-28 11:07 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-28 9:26 Peter Zijlstra [this message]
2023-03-28 9:26 ` [PATCH 01/17] sched: Introduce latency-nice as a per-task attribute Peter Zijlstra
2023-03-28 9:26 ` [PATCH 02/17] sched/fair: Add latency_offset Peter Zijlstra
2023-03-28 9:26 ` [PATCH 03/17] sched/fair: Add sched group latency support Peter Zijlstra
2023-03-28 9:26 ` [PATCH 04/17] sched/fair: Add avg_vruntime Peter Zijlstra
2023-03-28 23:57 ` Josh Don
2023-03-29 7:50 ` Peter Zijlstra
2023-04-05 19:13 ` Peter Zijlstra
2023-03-28 9:26 ` [PATCH 05/17] sched/fair: Remove START_DEBIT Peter Zijlstra
2023-03-28 9:26 ` [PATCH 06/17] sched/fair: Add lag based placement Peter Zijlstra
2023-04-03 9:18 ` Chen Yu
2023-04-05 9:47 ` Peter Zijlstra
2023-04-06 3:03 ` Chen Yu
2023-04-13 15:42 ` Chen Yu
2023-04-13 15:55 ` Chen Yu
2023-03-28 9:26 ` [PATCH 07/17] rbtree: Add rb_add_augmented_cached() helper Peter Zijlstra
2023-03-28 9:26 ` [PATCH 08/17] sched/fair: Implement an EEVDF like policy Peter Zijlstra
2023-03-29 1:26 ` Josh Don
2023-03-29 8:02 ` Peter Zijlstra
2023-03-29 8:06 ` Peter Zijlstra
2023-03-29 8:22 ` Peter Zijlstra
2023-03-29 18:48 ` Josh Don
2023-03-29 8:12 ` Peter Zijlstra
2023-03-29 18:54 ` Josh Don
2023-03-29 8:18 ` Peter Zijlstra
2023-03-29 14:35 ` Vincent Guittot
2023-03-30 8:01 ` Peter Zijlstra
2023-03-30 17:05 ` Vincent Guittot
2023-04-04 12:00 ` Peter Zijlstra
2023-03-28 9:26 ` [PATCH 09/17] sched: Commit to lag based placement Peter Zijlstra
2023-03-28 9:26 ` [PATCH 10/17] sched/smp: Use lag to simplify cross-runqueue placement Peter Zijlstra
2023-03-28 9:26 ` [PATCH 11/17] sched: Commit to EEVDF Peter Zijlstra
2023-03-28 9:26 ` [PATCH 12/17] sched/debug: Rename min_granularity to base_slice Peter Zijlstra
2023-03-28 9:26 ` [PATCH 13/17] sched: Merge latency_offset into slice Peter Zijlstra
2023-03-28 9:26 ` [PATCH 14/17] sched/eevdf: Better handle mixed slice length Peter Zijlstra
2023-03-31 15:26 ` Vincent Guittot
2023-04-04 9:29 ` Peter Zijlstra
2023-04-04 13:50 ` Joel Fernandes
2023-04-05 5:41 ` Mike Galbraith
2023-04-05 8:35 ` Peter Zijlstra
2023-04-05 20:05 ` Joel Fernandes
2023-04-14 11:18 ` Phil Auld
2023-04-16 5:10 ` Joel Fernandes
[not found] ` <20230401232355.336-1-hdanton@sina.com>
2023-04-02 2:40 ` Mike Galbraith
2023-03-28 9:26 ` [PATCH 15/17] [RFC] sched/eevdf: Sleeper bonus Peter Zijlstra
2023-03-29 9:10 ` Mike Galbraith
2023-03-28 9:26 ` [PATCH 16/17] [RFC] sched/eevdf: Minimal vavg option Peter Zijlstra
2023-03-28 9:26 ` [PATCH 17/17] [DEBUG] sched/eevdf: Debug / validation crud Peter Zijlstra
2023-04-03 7:42 ` [PATCH 00/17] sched: EEVDF using latency-nice Shrikanth Hegde
2023-04-10 3:13 ` David Vernet
2023-04-11 2:09 ` David Vernet
[not found] ` <20230410082307.1327-1-hdanton@sina.com>
2023-04-11 10:15 ` Mike Galbraith
[not found] ` <20230411133333.1790-1-hdanton@sina.com>
2023-04-11 14:56 ` Mike Galbraith
[not found] ` <20230412025042.1413-1-hdanton@sina.com>
2023-04-12 4:05 ` Mike Galbraith
2023-04-25 12:32 ` Phil Auld
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230328092622.062917921@infradead.org \
--to=peterz@infradead.org \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=chris.hyser@oracle.com \
--cc=corbet@lwn.net \
--cc=dietmar.eggemann@arm.com \
--cc=efault@gmx.de \
--cc=joel@joelfernandes.org \
--cc=joshdon@google.com \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=patrick.bellasi@matbug.net \
--cc=pavel@ucw.cz \
--cc=pjt@google.com \
--cc=qperret@google.com \
--cc=qyousef@layalina.io \
--cc=rostedt@goodmis.org \
--cc=tim.c.chen@linux.intel.com \
--cc=timj@gnu.org \
--cc=vincent.guittot@linaro.org \
--cc=youssefesmat@chromium.org \
--cc=yu.c.chen@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).