All of lore.kernel.org
 help / color / mirror / Atom feed
From: Youssef Esmat <youssefesmat@chromium.org>
To: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	 Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	 Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	 Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	 Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	linux-kernel@vger.kernel.org,
	 Tobias Huschle <huschle@linux.ibm.com>,
	Luis Machado <luis.machado@arm.com>,
	 Chen Yu <yu.c.chen@intel.com>,
	Abel Wu <wuyun.abel@bytedance.com>,
	 Tianchen Ding <dtcccc@linux.alibaba.com>,
	Xuewen Yan <xuewen.yan94@gmail.com>,
	 "Gautham R. Shenoy" <gautham.shenoy@amd.com>
Subject: Re: [RFC PATCH 1/1] sched/eevdf: Skip eligibility check for current entity during wakeup preemption
Date: Mon, 25 Mar 2024 10:13:17 -0500	[thread overview]
Message-ID: <CA+q576OCK24VSp+s4OLD2ogO48i95y39_JO=zV=TwHSEg3_b1w@mail.gmail.com> (raw)
In-Reply-To: <20240325060226.1540-2-kprateek.nayak@amd.com>

On Mon, Mar 25, 2024 at 1:03 AM K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>
> With the curr entity's eligibility check, a wakeup preemption is very
> likely when an entity with positive lag joins the runqueue pushing the
> avg_vruntime of the runqueue backwards, making the vruntime of the
> current entity ineligible. This leads to aggressive wakeup preemption
> which was previously guarded by wakeup_granularity_ns in legacy CFS.
> Below figure depicts one such aggressive preemption scenario with EEVDF
> in DeathStarBench [1]:
>
>                                 deadline for Nginx
>                    |
>         +-------+  |                    |
>     /-- | Nginx | -|------------------> |
>     |   +-------+  |                    |
>     |              |
>     |   -----------|-------------------------------> vruntime timeline
>     |              \--> rq->avg_vruntime
>     |
>     |   wakes service on the same runqueue since system is busy
>     |
>     |   +---------+|
>     \-->| Service || (service has +ve lag pushes avg_vruntime backwards)
>         +---------+|
>           |        |
>    wakeup |     +--|-----+                       |
>  preempts \---->| N|ginx | --------------------> | {deadline for Nginx}
>                 +--|-----+                       |
>            (Nginx ineligible)
>         -----------|-------------------------------> vruntime timeline
>                    \--> rq->avg_vruntime
>
> When NGINX server is involuntarily switched out, it cannot accept any
> incoming request, leading to longer turn around time for the clients and
> thus loss in DeathStarBench throughput.
>
>     ==================================================================
>     Test          : DeathStarBench
>     Units         : Normalized latency
>     Interpretation: Lower is better
>     Statistic     : Mean
>     ==================================================================
>     tip         1.00
>     eevdf       1.14 (+14.61%)
>
> For current running task, skip eligibility check in pick_eevdf() if it
> has not exhausted the slice promised to it during selection despite the
> situation having changed since. The behavior is guarded by
> RUN_TO_PARITY_WAKEUP sched_feat to simplify testing. With
> RUN_TO_PARITY_WAKEUP enabled, performance loss seen with DeathStarBench
> since the merge of EEVDF disappears. Following are the results from
> testing on a Dual Socket 3rd Generation EPYC server (2 x 64C/128T):
>
>     ==================================================================
>     Test          : DeathStarBench
>     Units         : Normalized throughput
>     Interpretation: Higher is better
>     Statistic     : Mean
>     ==================================================================
>     Pinning      scaling     tip            run-to-parity-wakeup(pct imp)
>      1CCD           1       1.00            1.16 (%diff: 16%)
>      2CCD           2       1.00            1.03 (%diff: 3%)
>      4CCD           4       1.00            1.12 (%diff: 12%)
>      8CCD           8       1.00            1.05 (%diff: 6%)
>
> With spec_rstack_overflow=off, the DeathStarBench performance with the
> proposed solution is same as the performance on v6.5 release before
> EEVDF was merged.

Thanks for sharing this Prateek.
We actually noticed we could also gain performance by disabling
eligibility checks (but disable it on all paths).
The following are a few threads we had on the topic:

Discussion around eligibility:
https://lore.kernel.org/lkml/CA+q576MS0-MV1Oy-eecvmYpvNT3tqxD8syzrpxQ-Zk310hvRbw@mail.gmail.com/
Some of our results:
https://lore.kernel.org/lkml/CA+q576Mov1jpdfZhPBoy_hiVh3xSWuJjXdP3nS4zfpqfOXtq7Q@mail.gmail.com/
Sched feature to disable eligibility:
https://lore.kernel.org/lkml/20231013030213.2472697-1-youssefesmat@chromium.org/

>
> This may lead to newly waking task waiting longer for its turn on the
> CPU, however, testing on the same system did not reveal any consistent
> regressions with the standard benchmarks.
>
> Link: https://github.com/delimitrou/DeathStarBench/ [1]
> Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> ---
>  kernel/sched/fair.c     | 24 ++++++++++++++++++++----
>  kernel/sched/features.h |  1 +
>  2 files changed, 21 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 6a16129f9a5c..a9b145a4eab0 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -875,7 +875,7 @@ struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
>   *
>   * Which allows tree pruning through eligibility.
>   */
> -static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
> +static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq, bool wakeup_preempt)
>  {
>         struct rb_node *node = cfs_rq->tasks_timeline.rb_root.rb_node;
>         struct sched_entity *se = __pick_first_entity(cfs_rq);
> @@ -889,7 +889,23 @@ static struct sched_entity *pick_eevdf(struct cfs_rq *cfs_rq)
>         if (cfs_rq->nr_running == 1)
>                 return curr && curr->on_rq ? curr : se;
>
> -       if (curr && (!curr->on_rq || !entity_eligible(cfs_rq, curr)))
> +       if (curr && !curr->on_rq)
> +               curr = NULL;
> +
> +       /*
> +        * When an entity with positive lag wakes up, it pushes the
> +        * avg_vruntime of the runqueue backwards. This may causes the
> +        * current entity to be ineligible soon into its run leading to
> +        * wakeup preemption.
> +        *
> +        * To prevent such aggressive preemption of the current running
> +        * entity during task wakeups, skip the eligibility check if the
> +        * slice promised to the entity since its selection has not yet
> +        * elapsed.
> +        */
> +       if (curr &&
> +           !(sched_feat(RUN_TO_PARITY_WAKEUP) && wakeup_preempt && curr->vlag == curr->deadline) &&
> +           !entity_eligible(cfs_rq, curr))
>                 curr = NULL;
>
>         /*
> @@ -5460,7 +5476,7 @@ pick_next_entity(struct cfs_rq *cfs_rq)
>             cfs_rq->next && entity_eligible(cfs_rq, cfs_rq->next))
>                 return cfs_rq->next;
>
> -       return pick_eevdf(cfs_rq);
> +       return pick_eevdf(cfs_rq, false);
>  }
>
>  static bool check_cfs_rq_runtime(struct cfs_rq *cfs_rq);
> @@ -8340,7 +8356,7 @@ static void check_preempt_wakeup_fair(struct rq *rq, struct task_struct *p, int
>         /*
>          * XXX pick_eevdf(cfs_rq) != se ?
>          */
> -       if (pick_eevdf(cfs_rq) == pse)
> +       if (pick_eevdf(cfs_rq, true) == pse)
>                 goto preempt;
>
>         return;
> diff --git a/kernel/sched/features.h b/kernel/sched/features.h
> index 143f55df890b..027bab5b4031 100644
> --- a/kernel/sched/features.h
> +++ b/kernel/sched/features.h
> @@ -7,6 +7,7 @@
>  SCHED_FEAT(PLACE_LAG, true)
>  SCHED_FEAT(PLACE_DEADLINE_INITIAL, true)
>  SCHED_FEAT(RUN_TO_PARITY, true)
> +SCHED_FEAT(RUN_TO_PARITY_WAKEUP, true)
>
>  /*
>   * Prefer to schedule the task we woke last (assuming it failed
> --
> 2.34.1
>

  reply	other threads:[~2024-03-25 15:13 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-25  6:02 [RFC PATCH 0/1] sched/eevdf: Curb wakeup preemption further K Prateek Nayak
2024-03-25  6:02 ` [RFC PATCH 1/1] sched/eevdf: Skip eligibility check for current entity during wakeup preemption K Prateek Nayak
2024-03-25 15:13   ` Youssef Esmat [this message]
2024-03-26  3:06     ` K Prateek Nayak
2024-04-17  6:08       ` K Prateek Nayak
2024-04-24 15:07   ` Peter Zijlstra
2024-04-25  3:34     ` K Prateek Nayak
2024-03-28 10:26 ` [RFC PATCH 0/1] sched/eevdf: Curb wakeup preemption further Madadi Vineeth Reddy
2024-03-29  3:16   ` K Prateek Nayak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+q576OCK24VSp+s4OLD2ogO48i95y39_JO=zV=TwHSEg3_b1w@mail.gmail.com' \
    --to=youssefesmat@chromium.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=dtcccc@linux.alibaba.com \
    --cc=gautham.shenoy@amd.com \
    --cc=huschle@linux.ibm.com \
    --cc=juri.lelli@redhat.com \
    --cc=kprateek.nayak@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=luis.machado@arm.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=wuyun.abel@bytedance.com \
    --cc=xuewen.yan94@gmail.com \
    --cc=yu.c.chen@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.