From: Barry Song <21cnbao@gmail.com>
To: Roman Gushchin <guro@fb.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Mel Gorman <mgorman@techsingularity.net>,
bpf@vger.kernel.org, LKML <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH rfc 4/6] sched: cfs: add bpf hooks to control wakeup and tick preemption
Date: Fri, 1 Oct 2021 16:35:58 +1300 [thread overview]
Message-ID: <CAGsJ_4xr0Xg3B1seT5_kcb26ZQgWaakR8QGOB-N62wehfXkt_Q@mail.gmail.com> (raw)
In-Reply-To: <20210916162451.709260-5-guro@fb.com>
On Fri, Sep 17, 2021 at 4:36 AM Roman Gushchin <guro@fb.com> wrote:
>
> This patch adds 3 hooks to control wakeup and tick preemption:
> cfs_check_preempt_tick
> cfs_check_preempt_wakeup
> cfs_wakeup_preempt_entity
>
> The first one allows to force or suppress a preemption from a tick
> context. An obvious usage example is to minimize the number of
> non-voluntary context switches and decrease an associated latency
> penalty by (conditionally) providing tasks or task groups an extended
> execution slice. It can be used instead of tweaking
> sysctl_sched_min_granularity.
>
> The second one is called from the wakeup preemption code and allows
> to redefine whether a newly woken task should preempt the execution
> of the current task. This is useful to minimize a number of
> preemptions of latency sensitive tasks. To some extent it's a more
> flexible analog of a sysctl_sched_wakeup_granularity.
This reminds me of Mel's recent work which might be relevant:
sched/fair: Scale wakeup granularity relative to nr_running
https://lore.kernel.org/lkml/20210920142614.4891-3-mgorman@techsingularity.net/
>
> The third one is similar, but it tweaks the wakeup_preempt_entity()
> function, which is called not only from a wakeup context, but also
> from pick_next_task(), which allows to influence the decision on which
> task will be running next.
>
> It's a place for a discussion whether we need both these hooks or only
> one of them: the second is more powerful, but depends more on the
> current implementation. In any case, bpf hooks are not an ABI, so it's
> not a deal breaker.
I am also curious if similar hook can benefit
newidle_balance/sched_migration_cost
tuning things in this thread:
https://lore.kernel.org/lkml/ef3b3e55-8be9-595f-6d54-886d13a7e2fd@hisilicon.com/
It seems those static values are not universal. different topology might need
different settings. but dynamically tuning them in the kernel seems to be
extremely difficult.
>
> The idea of the wakeup_preempt_entity hook belongs to Rik van Riel. He
> also contributed a lot to the whole patchset by proving his ideas,
> recommendations and a feedback for earlier (non-public) versions.
>
> Signed-off-by: Roman Gushchin <guro@fb.com>
> ---
> include/linux/bpf_sched.h | 1 +
> include/linux/sched_hook_defs.h | 4 +++-
> kernel/sched/fair.c | 27 +++++++++++++++++++++++++++
> 3 files changed, 31 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/bpf_sched.h b/include/linux/bpf_sched.h
> index 6e773aecdff7..5c238aeb853c 100644
> --- a/include/linux/bpf_sched.h
> +++ b/include/linux/bpf_sched.h
> @@ -40,6 +40,7 @@ static inline RET bpf_sched_##NAME(__VA_ARGS__) \
> { \
> return DEFAULT; \
> }
> +#include <linux/sched_hook_defs.h>
> #undef BPF_SCHED_HOOK
>
> static inline bool bpf_sched_enabled(void)
> diff --git a/include/linux/sched_hook_defs.h b/include/linux/sched_hook_defs.h
> index 14344004e335..f075b32698cd 100644
> --- a/include/linux/sched_hook_defs.h
> +++ b/include/linux/sched_hook_defs.h
> @@ -1,2 +1,4 @@
> /* SPDX-License-Identifier: GPL-2.0 */
> -BPF_SCHED_HOOK(int, 0, dummy, void)
> +BPF_SCHED_HOOK(int, 0, cfs_check_preempt_tick, struct sched_entity *curr, unsigned long delta_exec)
> +BPF_SCHED_HOOK(int, 0, cfs_check_preempt_wakeup, struct task_struct *curr, struct task_struct *p)
> +BPF_SCHED_HOOK(int, 0, cfs_wakeup_preempt_entity, struct sched_entity *curr, struct sched_entity *se)
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index ff69f245b939..35ea8911b25c 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -21,6 +21,7 @@
> * Copyright (C) 2007 Red Hat, Inc., Peter Zijlstra
> */
> #include "sched.h"
> +#include <linux/bpf_sched.h>
>
> /*
> * Targeted preemption latency for CPU-bound tasks:
> @@ -4447,6 +4448,16 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
>
> ideal_runtime = sched_slice(cfs_rq, curr);
> delta_exec = curr->sum_exec_runtime - curr->prev_sum_exec_runtime;
> +
> + if (bpf_sched_enabled()) {
> + int ret = bpf_sched_cfs_check_preempt_tick(curr, delta_exec);
> +
> + if (ret < 0)
> + return;
> + else if (ret > 0)
> + resched_curr(rq_of(cfs_rq));
> + }
> +
> if (delta_exec > ideal_runtime) {
> resched_curr(rq_of(cfs_rq));
> /*
> @@ -7083,6 +7094,13 @@ wakeup_preempt_entity(struct sched_entity *curr, struct sched_entity *se)
> {
> s64 gran, vdiff = curr->vruntime - se->vruntime;
>
> + if (bpf_sched_enabled()) {
> + int ret = bpf_sched_cfs_wakeup_preempt_entity(curr, se);
> +
> + if (ret)
> + return ret;
> + }
> +
> if (vdiff <= 0)
> return -1;
>
> @@ -7168,6 +7186,15 @@ static void check_preempt_wakeup(struct rq *rq, struct task_struct *p, int wake_
> likely(!task_has_idle_policy(p)))
> goto preempt;
>
> + if (bpf_sched_enabled()) {
> + int ret = bpf_sched_cfs_check_preempt_wakeup(current, p);
> +
> + if (ret < 0)
> + return;
> + else if (ret > 0)
> + goto preempt;
> + }
> +
> /*
> * Batch and idle tasks do not preempt non-idle tasks (their preemption
> * is driven by the tick):
> --
> 2.31.1
>
Thanks
barry
next prev parent reply other threads:[~2021-10-01 3:36 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20210915213550.3696532-1-guro@fb.com>
2021-09-16 0:19 ` [PATCH rfc 0/6] Scheduler BPF Hao Luo
2021-09-16 1:42 ` Roman Gushchin
2021-09-16 16:24 ` Roman Gushchin
2021-09-16 16:24 ` [PATCH rfc 1/6] bpf: sched: basic infrastructure for scheduler bpf Roman Gushchin
2021-09-16 16:24 ` [PATCH rfc 2/6] bpf: sched: add convenient helpers to identify sched entities Roman Gushchin
2021-11-25 6:09 ` Yafang Shao
2021-11-26 19:50 ` Roman Gushchin
2021-09-16 16:24 ` [PATCH rfc 3/6] bpf: sched: introduce bpf_sched_enable() Roman Gushchin
2021-09-16 16:24 ` [PATCH rfc 4/6] sched: cfs: add bpf hooks to control wakeup and tick preemption Roman Gushchin
2021-10-01 3:35 ` Barry Song [this message]
2021-10-02 0:13 ` Roman Gushchin
2021-09-16 16:24 ` [PATCH rfc 5/6] libbpf: add support for scheduler bpf programs Roman Gushchin
2021-09-16 16:24 ` [PATCH rfc 6/6] bpftool: recognize scheduler programs Roman Gushchin
2021-09-16 16:36 ` [PATCH rfc 0/6] Scheduler BPF Roman Gushchin
2021-10-06 16:39 ` Qais Yousef
2021-10-06 18:50 ` Roman Gushchin
2021-10-11 16:38 ` Qais Yousef
2021-10-11 18:09 ` Roman Gushchin
2021-10-12 10:16 ` Qais Yousef
[not found] ` <52EC1E80-4C89-43AD-8A59-8ACA184EAE53@gmail.com>
2021-11-25 6:00 ` Yafang Shao
2021-11-26 19:46 ` Roman Gushchin
2022-01-15 8:29 ` Huichun Feng
2022-01-18 22:54 ` Roman Gushchin
2022-07-19 13:05 ` Ren Zhijie
2022-07-19 13:17 ` Ren Zhijie
2022-07-19 23:21 ` Roman Gushchin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAGsJ_4xr0Xg3B1seT5_kcb26ZQgWaakR8QGOB-N62wehfXkt_Q@mail.gmail.com \
--to=21cnbao@gmail.com \
--cc=bpf@vger.kernel.org \
--cc=guro@fb.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mgorman@techsingularity.net \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).