linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Daniel Jordan <daniel.m.jordan@oracle.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: mingo@kernel.org, vincent.guittot@linaro.org,
	linux-kernel@vger.kernel.org, juri.lelli@redhat.com,
	dietmar.eggemann@arm.com, rostedt@goodmis.org,
	bsegall@google.com, mgorman@suse.de, bristot@redhat.com,
	corbet@lwn.net, qyousef@layalina.io, chris.hyser@oracle.com,
	patrick.bellasi@matbug.net, pjt@google.com, pavel@ucw.cz,
	qperret@google.com, tim.c.chen@linux.intel.com,
	joshdon@google.com, timj@gnu.org, kprateek.nayak@amd.com,
	yu.c.chen@intel.com, youssefesmat@chromium.org,
	joel@joelfernandes.org, efault@gmx.de, tglx@linutronix.de
Subject: Re: [PATCH 00/15] sched: EEVDF and latency-nice and/or slice-attr
Date: Wed, 23 Aug 2023 20:52:26 -0400	[thread overview]
Message-ID: <dlbtvvm5cewqzh5bcpl4cqhcwxmnnjb6pdle5jzywiiznlactd@cmhnpim42m3p> (raw)
In-Reply-To: <20230531115839.089944915@infradead.org>

Hi Peter,

On Wed, May 31, 2023 at 01:58:39PM +0200, Peter Zijlstra wrote:
> 
> Hi!
> 
> Latest version of the EEVDF [1] patches.
> 
> The only real change since last time is the fix for tick-preemption [2], and a 
> simple safe-guard for the mixed slice heuristic.

We're seeing regressions from EEVDF with SPEC CPU, a database workload,
and a Java workload.  We tried SPEC CPU on five systems, and here are
numbers from one of them (high core count, two-socket x86 machine).

    SPECrate2017 oversubscribed by 2x (two copies of the test per CPU)

    Base: v6.3-based kernel
    EEVDF: Base + patches from May 31 [0]

    Performance comparison: >0 if EEVDF wins

    Integer
     
     -0.5% 500.perlbench_r
     -6.6% 502.gcc_r
     -8.7% 505.mcf_r
     -9.2% 520.omnetpp_r
     -6.6% 523.xalancbmk_r
     -0.7% 525.x264_r
     -2.1% 531.deepsjeng_r
     -0.4% 541.leela_r
     -0.3% 548.exchange2_r
     -2.6% 557.xz_r
     
     -3.8% Est(*) SPECrate2017_int_base
     
    Floating Point
     
     -0.6% 503.bwaves_r
     -1.3% 507.cactuBSSN_r
     -0.8% 508.namd_r
    -17.8% 510.parest_r
      0.3% 511.povray_r
     -1.0% 519.lbm_r
     -7.7% 521.wrf_r
     -2.4% 526.blender_r
     -6.1% 527.cam4_r
     -2.0% 538.imagick_r
      0.1% 544.nab_r
     -0.7% 549.fotonik3d_r
    -11.3% 554.roms_r
     
     -4.1% Est(*) SPECrate2017_fp_base
     
    (*) SPEC CPU Fair Use rules require that tests with non-production
        components must be marked as estimates.

The other machines show similarly consistent regressions, and we've tried a
v6.5-rc4-based kernel with the latest EEVDF patches from tip/sched/core
including the recent fixlet "sched/eevdf: Curb wakeup-preemption".  I can post
the rest of the numbers, but I'm trying to keep this on the shorter side for
now.

Running the database workload on a two-socket x86 server, we see
regressions of up to 6% when the number of users exceeds the number of
CPUs.

With the Java workload on another two-socket x86 server, we see a 10%
regression.

We're investigating the other benchmarks, but here's what I've found so far
with SPEC CPU.  Some schedstats showed that eevdf is tick-preemption happy
(patches below).  These stats were taken over 1 minute near the middle of a ~26
minute benchmark (502.gcc_r).

    Base: v6.5-rc4-based kernel
    EEVDF: Base + the latest EEVDF patches from tip/sched/core

    schedstat                     Base            EEVDF

    sched                    1,243,911        3,947,251

    tick_check_preempts     12,899,049
    tick_preempts            1,022,998

    check_deadline                           15,878,463
    update_deadline                           3,895,530
    preempt_deadline                          3,751,580

In both kernels, tick preemption is primarily what drives schedule()s.
Preemptions happen over three times more often for EEVDF because in the base,
tick preemption happens after a task has run through its ideal timeslice as a
fraction of sched_latency (so two tasks sharing a CPU each get 12ms on a server
with enough CPUs, sched_latency being 24ms), whereas with eevdf, a task's base
slice determines when it gets tick-preempted, and that's 3ms by default.  It
seems SPEC CPU isn't liking the increased scheduling of EEVDF in a cpu-bound
load like this.  When I set the base_slice_ns sysctl to 12000000, the
regression disappears.

I'm still thinking about how to fix it.  Pre-EEVDF, tick preemption was
more flexible in that a task's timeslice could change depending on how
much competition it had on the same CPU, but with EEVDF the timeslice is
fixed no matter what else is running, and growing or shrinking it
depending on nr_running doesn't honor whatever deadline was set for the
task.

The schedstat patch for the base:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index b3e25be58e2b..fb5a35aa07ec 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -4996,6 +4996,8 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
        struct sched_entity *se;
        s64 delta;

+       schedstat_inc(rq_of(cfs_rq)->tick_check_preempts);
+
        /*
         * When many tasks blow up the sched_period; it is possible that
         * sched_slice() reports unusually large results (when many tasks are
@@ -5005,6 +5007,7 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)

        delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
        if (delta_exec > ideal_runtime) {
+               schedstat_inc(rq_of(cfs_rq)->tick_preempts);
                resched_curr(rq_of(cfs_rq));
                /*
                 * The current task ran long enough, ensure it doesn't get
@@ -5028,8 +5031,10 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
        if (delta < 0)
                return;

-       if (delta > ideal_runtime)
+       if (delta > ideal_runtime) {
+               schedstat_inc(rq_of(cfs_rq)->tick_preempts);
                resched_curr(rq_of(cfs_rq));
+       }
 }

 static void
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index e93e006a942b..1bf12e271756 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1123,6 +1123,10 @@ struct rq {
        /* try_to_wake_up() stats */
        unsigned int            ttwu_count;
        unsigned int            ttwu_local;
+
+       /* tick preempt stats */
+       unsigned int            tick_check_preempts;
+       unsigned int            tick_preempts;
 #endif

 #ifdef CONFIG_CPU_IDLE
diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 857f837f52cb..7997b8538b72 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -133,12 +133,13 @@ static int show_schedstat(struct seq_file *seq, void *v)

                /* runqueue-specific stats */
                seq_printf(seq,
-                   "cpu%d %u 0 %u %u %u %u %llu %llu %lu",
+                   "cpu%d %u 0 %u %u %u %u %llu %llu %lu %u %u",
                    cpu, rq->yld_count,
                    rq->sched_count, rq->sched_goidle,
                    rq->ttwu_count, rq->ttwu_local,
                    rq->rq_cpu_time,
-                   rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount);
+                   rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount,
+                   rq->tick_check_preempts, rq->tick_preempts);

                seq_printf(seq, "\n");


The schedstat patch for eevdf:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index cffec98724f3..675f4bbac471 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -975,18 +975,21 @@ static void clear_buddies(struct cfs_rq *cfs_rq, struct sched_entity *se);
  */
 static void update_deadline(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
+       schedstat_inc(rq_of(cfs_rq)->check_deadline);
        if ((s64)(se->vruntime - se->deadline) < 0)
                return;

        /*
         * EEVDF: vd_i = ve_i + r_i / w_i
         */
+       schedstat_inc(rq_of(cfs_rq)->update_deadline);
        se->deadline = se->vruntime + calc_delta_fair(se->slice, se);

        /*
         * The task has consumed its request, reschedule.
         */
        if (cfs_rq->nr_running > 1) {
+               schedstat_inc(rq_of(cfs_rq)->preempt_deadline);
                resched_curr(rq_of(cfs_rq));
                clear_buddies(cfs_rq, se);
        }
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 93c2dc80143f..c44b59556367 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1129,6 +1129,11 @@ struct rq {
        /* try_to_wake_up() stats */
        unsigned int            ttwu_count;
        unsigned int            ttwu_local;
+
+       /* update_deadline() stats */
+       unsigned int            check_deadline;
+       unsigned int            update_deadline;
+       unsigned int            preempt_deadline;
 #endif

 #ifdef CONFIG_CPU_IDLE
diff --git a/kernel/sched/stats.c b/kernel/sched/stats.c
index 857f837f52cb..2a8bd742507d 100644
--- a/kernel/sched/stats.c
+++ b/kernel/sched/stats.c
@@ -133,12 +133,14 @@ static int show_schedstat(struct seq_file *seq, void *v)

                /* runqueue-specific stats */
                seq_printf(seq,
-                   "cpu%d %u 0 %u %u %u %u %llu %llu %lu",
+                   "cpu%d %u 0 %u %u %u %u %llu %llu %lu %u %u %u",
                    cpu, rq->yld_count,
                    rq->sched_count, rq->sched_goidle,
                    rq->ttwu_count, rq->ttwu_local,
                    rq->rq_cpu_time,
-                   rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount);
+                   rq->rq_sched_info.run_delay, rq->rq_sched_info.pcount,
+                   rq->check_deadline, rq->update_deadline,
+                   rq->preempt_deadline);

                seq_printf(seq, "\n");


[0] https://lore.kernel.org/all/20230531115839.089944915@infradead.org/

  parent reply	other threads:[~2023-08-24  0:54 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-31 11:58 [PATCH 00/15] sched: EEVDF and latency-nice and/or slice-attr Peter Zijlstra
2023-05-31 11:58 ` [PATCH 01/15] sched/fair: Add avg_vruntime Peter Zijlstra
2023-06-02 13:51   ` Vincent Guittot
2023-06-02 14:27     ` Peter Zijlstra
2023-06-05  7:18       ` Vincent Guittot
2023-08-10  7:10   ` [tip: sched/core] sched/fair: Add cfs_rq::avg_vruntime tip-bot2 for Peter Zijlstra
2023-10-11  4:15   ` [PATCH 01/15] sched/fair: Add avg_vruntime Abel Wu
2023-10-11  7:30     ` Peter Zijlstra
2023-10-11  8:30       ` Abel Wu
2023-10-11  9:45         ` Peter Zijlstra
2023-10-11 10:05           ` Peter Zijlstra
2023-10-11 13:08       ` Peter Zijlstra
2023-05-31 11:58 ` [PATCH 02/15] sched/fair: Remove START_DEBIT Peter Zijlstra
2023-08-10  7:10   ` [tip: sched/core] sched/fair: Remove sched_feat(START_DEBIT) tip-bot2 for Peter Zijlstra
2023-05-31 11:58 ` [PATCH 03/15] sched/fair: Add lag based placement Peter Zijlstra
2023-08-10  7:10   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-10-11 12:00   ` [PATCH 03/15] " Abel Wu
2023-10-11 13:24     ` Peter Zijlstra
2023-10-12  7:04       ` Abel Wu
2023-10-13  7:37         ` Peter Zijlstra
2023-10-13  8:14           ` Abel Wu
2023-10-12 19:15   ` Benjamin Segall
2023-10-12 22:34     ` Peter Zijlstra
2023-10-13 16:35       ` Peter Zijlstra
2023-10-14  8:08         ` Mike Galbraith
2023-10-13 14:34     ` Peter Zijlstra
2023-05-31 11:58 ` [PATCH 04/15] rbtree: Add rb_add_augmented_cached() helper Peter Zijlstra
2023-08-10  7:10   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-05-31 11:58 ` [PATCH 05/15] sched/fair: Implement an EEVDF like policy Peter Zijlstra
2023-08-10  7:10   ` [tip: sched/core] sched/fair: Implement an EEVDF-like scheduling policy tip-bot2 for Peter Zijlstra
2023-09-29 21:40   ` [PATCH 05/15] sched/fair: Implement an EEVDF like policy Benjamin Segall
2023-10-02 17:39     ` Peter Zijlstra
2023-10-11  4:14     ` Abel Wu
2023-10-11  7:33       ` Peter Zijlstra
2023-10-11 11:49         ` Abel Wu
2023-09-30  0:09   ` [PATCH] sched/fair: fix pick_eevdf to always find the correct se Benjamin Segall
2023-10-03 10:42     ` [tip: sched/urgent] sched/fair: Fix pick_eevdf() tip-bot2 for Benjamin Segall
     [not found]     ` <CGME20231004203940eucas1p2f73b017497d1f4239a6e236fdb6019e2@eucas1p2.samsung.com>
2023-10-04 20:39       ` [PATCH] sched/fair: fix pick_eevdf to always find the correct se Marek Szyprowski
2023-10-09  7:53     ` [tip: sched/urgent] sched/eevdf: Fix pick_eevdf() tip-bot2 for Benjamin Segall
2023-10-11 12:12     ` [PATCH] sched/fair: fix pick_eevdf to always find the correct se Abel Wu
2023-10-11 13:14       ` Peter Zijlstra
2023-10-12 10:04         ` Abel Wu
2023-10-11 21:01       ` Benjamin Segall
2023-10-12 10:25         ` Abel Wu
2023-10-12 17:51           ` Benjamin Segall
2023-10-13  3:46             ` Abel Wu
2023-10-13 16:51               ` Benjamin Segall
2023-05-31 11:58 ` [PATCH 06/15] sched: Commit to lag based placement Peter Zijlstra
2023-08-10  7:10   ` [tip: sched/core] sched/fair: " tip-bot2 for Peter Zijlstra
2023-05-31 11:58 ` [PATCH 07/15] sched/smp: Use lag to simplify cross-runqueue placement Peter Zijlstra
2023-08-10  7:10   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-09-12 15:32   ` [PATCH 07/15] " Sebastian Andrzej Siewior
2023-09-13  9:03     ` Peter Zijlstra
2023-10-04  1:17   ` [PATCH] sched/fair: Preserve PLACE_DEADLINE_INITIAL deadline Daniel Jordan
2023-10-04 13:09     ` [PATCH v2] " Daniel Jordan
2023-10-04 15:46       ` Chen Yu
2023-10-06 16:31         ` Daniel Jordan
2023-10-12  4:48       ` K Prateek Nayak
2023-10-05  5:56     ` [PATCH] " K Prateek Nayak
2023-10-06 16:35       ` Daniel Jordan
2023-10-06 16:48   ` [PATCH] sched/fair: Always update_curr() before placing at enqueue Daniel Jordan
2023-10-06 19:58     ` Peter Zijlstra
2023-10-18  0:43       ` Daniel Jordan
2023-10-16  5:39     ` K Prateek Nayak
2023-05-31 11:58 ` [PATCH 08/15] sched: Commit to EEVDF Peter Zijlstra
2023-06-16 21:23   ` Joel Fernandes
2023-06-22 12:01     ` Ingo Molnar
2023-06-22 13:11       ` Joel Fernandes
2023-08-10  7:10   ` [tip: sched/core] sched/fair: " tip-bot2 for Peter Zijlstra
2023-05-31 11:58 ` [PATCH 09/15] sched/debug: Rename min_granularity to base_slice Peter Zijlstra
2023-08-10  7:10   ` [tip: sched/core] sched/debug: Rename sysctl_sched_min_granularity to sysctl_sched_base_slice tip-bot2 for Peter Zijlstra
2023-05-31 11:58 ` [PATCH 10/15] sched/fair: Propagate enqueue flags into place_entity() Peter Zijlstra
2023-08-10  7:10   ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2023-05-31 11:58 ` [PATCH 11/15] sched/eevdf: Better handle mixed slice length Peter Zijlstra
2023-06-02 13:45   ` Vincent Guittot
2023-06-02 15:06     ` Peter Zijlstra
2023-06-10  6:34   ` Chen Yu
2023-06-10 11:22     ` Peter Zijlstra
2023-05-31 11:58 ` [RFC][PATCH 12/15] sched: Introduce latency-nice as a per-task attribute Peter Zijlstra
2023-05-31 11:58 ` [RFC][PATCH 13/15] sched/fair: Implement latency-nice Peter Zijlstra
2023-06-06 14:54   ` Vincent Guittot
2023-06-08 10:34     ` Peter Zijlstra
2023-06-08 12:44       ` Peter Zijlstra
2023-10-11 23:24   ` Benjamin Segall
2023-05-31 11:58 ` [RFC][PATCH 14/15] sched/fair: Add sched group latency support Peter Zijlstra
2023-05-31 11:58 ` [RFC][PATCH 15/15] sched/eevdf: Use sched_attr::sched_runtime to set request/slice Peter Zijlstra
2023-06-01 13:55   ` Vincent Guittot
2023-06-08 11:52     ` Peter Zijlstra
2023-08-24  0:52 ` Daniel Jordan [this message]
2023-09-06 13:13   ` [PATCH 00/15] sched: EEVDF and latency-nice and/or slice-attr Peter Zijlstra
2023-09-29 16:54     ` Youssef Esmat
2023-10-02 15:55       ` Youssef Esmat
2023-10-02 18:41       ` Peter Zijlstra
2023-10-05 12:05         ` Peter Zijlstra
2023-10-05 14:14           ` Peter Zijlstra
2023-10-05 14:42             ` Peter Zijlstra
2023-10-05 18:23           ` Youssef Esmat
2023-10-06  0:36             ` Youssef Esmat
2023-10-10  8:08             ` Peter Zijlstra
2023-10-07 22:04           ` Peter Zijlstra
2023-10-09 14:41             ` Peter Zijlstra
2023-10-10  0:51             ` Youssef Esmat
2023-10-10  8:01               ` Peter Zijlstra
2023-10-16 16:50               ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dlbtvvm5cewqzh5bcpl4cqhcwxmnnjb6pdle5jzywiiznlactd@cmhnpim42m3p \
    --to=daniel.m.jordan@oracle.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=chris.hyser@oracle.com \
    --cc=corbet@lwn.net \
    --cc=dietmar.eggemann@arm.com \
    --cc=efault@gmx.de \
    --cc=joel@joelfernandes.org \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@kernel.org \
    --cc=patrick.bellasi@matbug.net \
    --cc=pavel@ucw.cz \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=qperret@google.com \
    --cc=qyousef@layalina.io \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=tim.c.chen@linux.intel.com \
    --cc=timj@gnu.org \
    --cc=vincent.guittot@linaro.org \
    --cc=youssefesmat@chromium.org \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).