linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: mingo@kernel.org, vincent.guittot@linaro.org
Cc: linux-kernel@vger.kernel.org, peterz@infradead.org,
	juri.lelli@redhat.com, dietmar.eggemann@arm.com,
	rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	bristot@redhat.com, corbet@lwn.net, qyousef@layalina.io,
	chris.hyser@oracle.com, patrick.bellasi@matbug.net,
	pjt@google.com, pavel@ucw.cz, qperret@google.com,
	tim.c.chen@linux.intel.com, joshdon@google.com, timj@gnu.org,
	kprateek.nayak@amd.com, yu.c.chen@intel.com,
	youssefesmat@chromium.org, joel@joelfernandes.org, efault@gmx.de
Subject: [PATCH 00/17] sched: EEVDF using latency-nice
Date: Tue, 28 Mar 2023 11:26:22 +0200	[thread overview]
Message-ID: <20230328092622.062917921@infradead.org> (raw)

Hi!

Latest version of the EEVDF [1] patches.

Many changes since last time; most notably it now fully replaces CFS and uses
lag based placement for migrations. Smaller changes include:

 - uses scale_load_down() for avg_vruntime; I measured the max delta to be ~44
   bits on a system/cgroup based kernel build.
 - fixed a bunch of reweight / cgroup placement issues
 - adaptive placement strategy for smaller slices
 - rename se->lag to se->vlag

There's a bunch of RFC patches at the end and one DEBUG patch. Of those, the
PLACE_BONUS patch is a mixed bag of pain. A number of benchmarks regress
because EEVDF is actually fair and gives a 100% parent vs a 50% child a 67%/33%
split (stress-futex, stress-nanosleep, starve, etc..) instead of a 50%/50%
split that sleeper bonus achieves. Mostly I think these benchmarks are somewhat
artificial/daft but who knows.

The PLACE_BONUS thing horribly messes up things like hackbench and latency-nice
because it places things too far to the left in the tree. Basically it messes
with the whole 'when', by placing a task back in history you're putting a
burden on the now to accomodate catching up. More tinkering required.

But over-all the thing seems to be fairly usable and could do with more
extensive testing.

[1] https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=805acf7726282721504c8f00575d91ebfd750564

Results:

  hackbech -g $nr_cpu + cyclictest --policy other results:

			EEVDF			 CFS

		# Min Latencies: 00054
  LNICE(19)	# Avg Latencies: 00660
		# Max Latencies: 23103

		# Min Latencies: 00052		00053
  LNICE(0)	# Avg Latencies: 00318		00687
		# Max Latencies: 08593		13913

		# Min Latencies: 00054
  LNICE(-19)	# Avg Latencies: 00055
		# Max Latencies: 00061


Some preliminary results from Chen Yu on a slightly older version:

  schbench  (95% tail latency, lower is better)
  =================================================================================
  case                    nr_instance            baseline (std%)    compare% ( std%)
  normal                   25%                     1.00  (2.49%)    -81.2%   (4.27%)
  normal                   50%                     1.00  (2.47%)    -84.5%   (0.47%)
  normal                   75%                     1.00  (2.5%)     -81.3%   (1.27%)
  normal                  100%                     1.00  (3.14%)    -79.2%   (0.72%)
  normal                  125%                     1.00  (3.07%)    -77.5%   (0.85%)
  normal                  150%                     1.00  (3.35%)    -76.4%   (0.10%)
  normal                  175%                     1.00  (3.06%)    -76.2%   (0.56%)
  normal                  200%                     1.00  (3.11%)    -76.3%   (0.39%)
  ==================================================================================

  hackbench (throughput, higher is better)
  ==============================================================================
  case                    nr_instance            baseline(std%)  compare%( std%)
  threads-pipe              25%                      1.00 (<2%)    -17.5 (<2%)
  threads-socket            25%                      1.00 (<2%)    -1.9 (<2%)
  threads-pipe              50%                      1.00 (<2%)     +6.7 (<2%)
  threads-socket            50%                      1.00 (<2%)    -6.3  (<2%)
  threads-pipe              100%                     1.00 (3%)     +110.1 (3%)
  threads-socket            100%                     1.00 (<2%)    -40.2 (<2%)
  threads-pipe              150%                     1.00 (<2%)    +125.4 (<2%)
  threads-socket            150%                     1.00 (<2%)    -24.7 (<2%)
  threads-pipe              200%                     1.00 (<2%)    -89.5 (<2%)
  threads-socket            200%                     1.00 (<2%)    -27.4 (<2%)
  process-pipe              25%                      1.00 (<2%)    -15.0 (<2%)
  process-socket            25%                      1.00 (<2%)    -3.9 (<2%)
  process-pipe              50%                      1.00 (<2%)    -0.4  (<2%)
  process-socket            50%                      1.00 (<2%)    -5.3  (<2%)
  process-pipe              100%                     1.00 (<2%)    +62.0 (<2%)
  process-socket            100%                     1.00 (<2%)    -39.5  (<2%)
  process-pipe              150%                     1.00 (<2%)    +70.0 (<2%)
  process-socket            150%                     1.00 (<2%)    -20.3 (<2%)
  process-pipe              200%                     1.00 (<2%)    +79.2 (<2%)
  process-socket            200%                     1.00 (<2%)    -22.4  (<2%)
  ==============================================================================

  stress-ng (throughput, higher is better)
  ==============================================================================
  case                    nr_instance            baseline(std%)  compare%( std%)
  switch                  25%                      1.00 (<2%)    -6.5 (<2%)
  switch                  50%                      1.00 (<2%)    -9.2 (<2%)
  switch                  75%                      1.00 (<2%)    -1.2 (<2%)
  switch                  100%                     1.00 (<2%)    +11.1 (<2%)
  switch                  125%                     1.00 (<2%)    -16.7% (9%)
  switch                  150%                     1.00 (<2%)    -13.6 (<2%)
  switch                  175%                     1.00 (<2%)    -16.2 (<2%)
  switch                  200%                     1.00 (<2%)    -19.4% (<2%)
  fork                    50%                      1.00 (<2%)    -0.1 (<2%)
  fork                    75%                      1.00 (<2%)    -0.3 (<2%)
  fork                    100%                     1.00 (<2%)    -0.1 (<2%)
  fork                    125%                     1.00 (<2%)    -6.9 (<2%)
  fork                    150%                     1.00 (<2%)    -8.8 (<2%)
  fork                    200%                     1.00 (<2%)    -3.3 (<2%)
  futex                   25%                      1.00 (<2%)    -3.2 (<2%)
  futex                   50%                      1.00 (3%)     -19.9 (5%)
  futex                   75%                      1.00 (6%)     -19.1 (2%)
  futex                   100%                     1.00 (16%)    -30.5 (10%)
  futex                   125%                     1.00 (25%)    -39.3 (11%)
  futex                   150%                     1.00 (20%)    -27.2% (17%)
  futex                   175%                     1.00 (<2%)    -18.6 (<2%)
  futex                   200%                     1.00 (<2%)    -47.5 (<2%)
  nanosleep               25%                      1.00 (<2%)    -0.1 (<2%)
  nanosleep               50%                      1.00 (<2%)    -0.0% (<2%)
  nanosleep               75%                      1.00 (<2%)    +15.2% (<2%)
  nanosleep               100%                     1.00 (<2%)    -26.4 (<2%)
  nanosleep               125%                     1.00 (<2%)    -1.3 (<2%)
  nanosleep               150%                     1.00 (<2%)    +2.1  (<2%)
  nanosleep               175%                     1.00 (<2%)    +8.3 (<2%)
  nanosleep               200%                     1.00 (<2%)    +2.0% (<2%)
  ===============================================================================

  unixbench (throughput, higher is better)
  ==============================================================================
  case                    nr_instance            baseline(std%)  compare%( std%)
  spawn                   125%                      1.00 (<2%)    +8.1 (<2%)
  context1                100%                      1.00 (6%)     +17.4 (6%)
  context1                75%                       1.00 (13%)    +18.8 (8%)
  =================================================================================

  netperf  (throughput, higher is better)
  ===========================================================================
  case                    nr_instance          baseline(std%)  compare%( std%)
  UDP_RR                  25%                   1.00    (<2%)    -1.5%  (<2%)
  UDP_RR                  50%                   1.00    (<2%)    -0.3%  (<2%)
  UDP_RR                  75%                   1.00    (<2%)    +12.5% (<2%)
  UDP_RR                 100%                   1.00    (<2%)    -4.3%  (<2%)
  UDP_RR                 125%                   1.00    (<2%)    -4.9%  (<2%)
  UDP_RR                 150%                   1.00    (<2%)    -4.7%  (<2%)
  UDP_RR                 175%                   1.00    (<2%)    -6.1%  (<2%)
  UDP_RR                 200%                   1.00    (<2%)    -6.6%  (<2%)
  TCP_RR                  25%                   1.00    (<2%)    -1.4%  (<2%)
  TCP_RR                  50%                   1.00    (<2%)    -0.2%  (<2%)
  TCP_RR                  75%                   1.00    (<2%)    -3.9%  (<2%)
  TCP_RR                 100%                   1.00    (2%)     +3.6%  (5%)
  TCP_RR                 125%                   1.00    (<2%)    -4.2%  (<2%)
  TCP_RR                 150%                   1.00    (<2%)    -6.0%  (<2%)
  TCP_RR                 175%                   1.00    (<2%)    -7.4%  (<2%)
  TCP_RR                 200%                   1.00    (<2%)    -8.4%  (<2%)
  ==========================================================================


---
Also available at:

  git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git sched/eevdf

---
Parth Shah (1):
      sched: Introduce latency-nice as a per-task attribute

Peter Zijlstra (14):
      sched/fair: Add avg_vruntime
      sched/fair: Remove START_DEBIT
      sched/fair: Add lag based placement
      rbtree: Add rb_add_augmented_cached() helper
      sched/fair: Implement an EEVDF like policy
      sched: Commit to lag based placement
      sched/smp: Use lag to simplify cross-runqueue placement
      sched: Commit to EEVDF
      sched/debug: Rename min_granularity to base_slice
      sched: Merge latency_offset into slice
      sched/eevdf: Better handle mixed slice length
      sched/eevdf: Sleeper bonus
      sched/eevdf: Minimal vavg option
      sched/eevdf: Debug / validation crud

Vincent Guittot (2):
      sched/fair: Add latency_offset
      sched/fair: Add sched group latency support

 Documentation/admin-guide/cgroup-v2.rst |   10 +
 include/linux/rbtree_augmented.h        |   26 +
 include/linux/sched.h                   |    6 +
 include/uapi/linux/sched.h              |    4 +-
 include/uapi/linux/sched/types.h        |   19 +
 init/init_task.c                        |    3 +-
 kernel/sched/core.c                     |   65 +-
 kernel/sched/debug.c                    |   49 +-
 kernel/sched/fair.c                     | 1199 ++++++++++++++++---------------
 kernel/sched/features.h                 |   29 +-
 kernel/sched/sched.h                    |   23 +-
 tools/include/uapi/linux/sched.h        |    4 +-
 12 files changed, 794 insertions(+), 643 deletions(-)


             reply	other threads:[~2023-03-28 11:07 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-28  9:26 Peter Zijlstra [this message]
2023-03-28  9:26 ` [PATCH 01/17] sched: Introduce latency-nice as a per-task attribute Peter Zijlstra
2023-03-28  9:26 ` [PATCH 02/17] sched/fair: Add latency_offset Peter Zijlstra
2023-03-28  9:26 ` [PATCH 03/17] sched/fair: Add sched group latency support Peter Zijlstra
2023-03-28  9:26 ` [PATCH 04/17] sched/fair: Add avg_vruntime Peter Zijlstra
2023-03-28 23:57   ` Josh Don
2023-03-29  7:50     ` Peter Zijlstra
2023-04-05 19:13       ` Peter Zijlstra
2023-03-28  9:26 ` [PATCH 05/17] sched/fair: Remove START_DEBIT Peter Zijlstra
2023-03-28  9:26 ` [PATCH 06/17] sched/fair: Add lag based placement Peter Zijlstra
2023-04-03  9:18   ` Chen Yu
2023-04-05  9:47     ` Peter Zijlstra
2023-04-06  3:03       ` Chen Yu
2023-04-13 15:42       ` Chen Yu
2023-04-13 15:55         ` Chen Yu
2023-03-28  9:26 ` [PATCH 07/17] rbtree: Add rb_add_augmented_cached() helper Peter Zijlstra
2023-03-28  9:26 ` [PATCH 08/17] sched/fair: Implement an EEVDF like policy Peter Zijlstra
2023-03-29  1:26   ` Josh Don
2023-03-29  8:02     ` Peter Zijlstra
2023-03-29  8:06     ` Peter Zijlstra
2023-03-29  8:22       ` Peter Zijlstra
2023-03-29 18:48         ` Josh Don
2023-03-29  8:12     ` Peter Zijlstra
2023-03-29 18:54       ` Josh Don
2023-03-29  8:18     ` Peter Zijlstra
2023-03-29 14:35   ` Vincent Guittot
2023-03-30  8:01     ` Peter Zijlstra
2023-03-30 17:05       ` Vincent Guittot
2023-04-04 12:00         ` Peter Zijlstra
2023-03-28  9:26 ` [PATCH 09/17] sched: Commit to lag based placement Peter Zijlstra
2023-03-28  9:26 ` [PATCH 10/17] sched/smp: Use lag to simplify cross-runqueue placement Peter Zijlstra
2023-03-28  9:26 ` [PATCH 11/17] sched: Commit to EEVDF Peter Zijlstra
2023-03-28  9:26 ` [PATCH 12/17] sched/debug: Rename min_granularity to base_slice Peter Zijlstra
2023-03-28  9:26 ` [PATCH 13/17] sched: Merge latency_offset into slice Peter Zijlstra
2023-03-28  9:26 ` [PATCH 14/17] sched/eevdf: Better handle mixed slice length Peter Zijlstra
2023-03-31 15:26   ` Vincent Guittot
2023-04-04  9:29     ` Peter Zijlstra
2023-04-04 13:50       ` Joel Fernandes
2023-04-05  5:41         ` Mike Galbraith
2023-04-05  8:35         ` Peter Zijlstra
2023-04-05 20:05           ` Joel Fernandes
2023-04-14 11:18             ` Phil Auld
2023-04-16  5:10               ` Joel Fernandes
     [not found]   ` <20230401232355.336-1-hdanton@sina.com>
2023-04-02  2:40     ` Mike Galbraith
2023-03-28  9:26 ` [PATCH 15/17] [RFC] sched/eevdf: Sleeper bonus Peter Zijlstra
2023-03-29  9:10   ` Mike Galbraith
2023-03-28  9:26 ` [PATCH 16/17] [RFC] sched/eevdf: Minimal vavg option Peter Zijlstra
2023-03-28  9:26 ` [PATCH 17/17] [DEBUG] sched/eevdf: Debug / validation crud Peter Zijlstra
2023-04-03  7:42 ` [PATCH 00/17] sched: EEVDF using latency-nice Shrikanth Hegde
2023-04-10  3:13 ` David Vernet
2023-04-11  2:09   ` David Vernet
     [not found] ` <20230410082307.1327-1-hdanton@sina.com>
2023-04-11 10:15   ` Mike Galbraith
     [not found]   ` <20230411133333.1790-1-hdanton@sina.com>
2023-04-11 14:56     ` Mike Galbraith
     [not found]     ` <20230412025042.1413-1-hdanton@sina.com>
2023-04-12  4:05       ` Mike Galbraith
2023-04-25 12:32 ` Phil Auld

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230328092622.062917921@infradead.org \
    --to=peterz@infradead.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=chris.hyser@oracle.com \
    --cc=corbet@lwn.net \
    --cc=dietmar.eggemann@arm.com \
    --cc=efault@gmx.de \
    --cc=joel@joelfernandes.org \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=patrick.bellasi@matbug.net \
    --cc=pavel@ucw.cz \
    --cc=pjt@google.com \
    --cc=qperret@google.com \
    --cc=qyousef@layalina.io \
    --cc=rostedt@goodmis.org \
    --cc=tim.c.chen@linux.intel.com \
    --cc=timj@gnu.org \
    --cc=vincent.guittot@linaro.org \
    --cc=youssefesmat@chromium.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).