linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Joel Fernandes <joel@joelfernandes.org>
To: Tejun Heo <tj@kernel.org>
Cc: torvalds@linux-foundation.org, mingo@redhat.com,
	peterz@infradead.org,  juri.lelli@redhat.com,
	vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
	 rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
	bristot@redhat.com,  vschneid@redhat.com, ast@kernel.org,
	daniel@iogearbox.net, andrii@kernel.org,  martin.lau@kernel.org,
	joshdon@google.com, brho@google.com, pjt@google.com,
	 derkling@google.com, haoluo@google.com, dvernet@meta.com,
	 dschatzberg@meta.com, dskarlat@cs.cmu.edu, riel@surriel.com,
	 changwoo@igalia.com, himadrics@inria.fr, memxor@gmail.com,
	 linux-kernel@vger.kernel.org, bpf@vger.kernel.org,
	kernel-team@meta.com,  Andrea Righi <andrea.righi@canonical.com>
Subject: Re: [PATCH 12/36] sched_ext: Implement BPF extensible scheduler class
Date: Thu, 25 Apr 2024 17:28:40 -0400	[thread overview]
Message-ID: <CAEXW_YR02g=DetfwM98ZoveWEbGbGGfb1KAikcBeC=Pkvqf4OA@mail.gmail.com> (raw)
In-Reply-To: <Zf9Tz2wHT6KYtqEG@slm.duckdns.org>

On Sat, Mar 23, 2024 at 6:12 PM Tejun Heo <tj@kernel.org> wrote:
>
> Hello, Joel.
>
> On Fri, Mar 22, 2024 at 10:37:32PM -0400, Joel Fernandes wrote:
> ...
> > I was wondering about the comment above related to 'wakeup_preempt', could
> > you clarify why it is not useful (NOOP) in the sched-ext class?
> >
> > wakeup_preempt() may be called via:
> > sched_ttwu_pending() ->
> >       ttwu_do_activate() ->
> >               wakeup_preempt()
> >
> >
> > at which point the enqueue of the task could have already happened via:
> >
> > sched_ttwu_pending() ->
> >       ttwu_do_activate() ->
> >               activate_task() ->
> >                       enqueue_task()
> >
> > But the comment above says "task isn't tied to the CPU" ?
>
> In sched_ext, a scheduling queue isn't tied to a particular CPU. For
> example, it's trivial to share a single global scheduling queue across the
> whole system or any subset of CPUs. To support this behavior, tasks can be
> hot-migrated in the dispatch path just before it starts executing:
>
>  https://github.com/sched-ext/sched_ext/blob/sched_ext/kernel/sched/ext.c#L1335
>
> So, the CPU picked by ops.select_cpu() in the enqueue path often doesn't
> determine the CPU the task is going to execute on. If the picked CPU matches
> the CPU the task is eventually going to run on, there's a small performance
> advantage as the later hot migration can be avoided.
>
> As the task isn't actually tied to the CPU being picked, it's a bit awkward
> to ask "does this task preempt this CPU?" Instead, preemption is implemented
> by calling scx_bpf_kick_cpu() w/ SCX_KICK_PREEMPT or using the
> SCX_ENQ_PREEMPT flag from the enqueue path which allows preempting any CPU.
>

Got it. I took some time to look at it some more. Now I am wondering
why check_preempt_curr() has to be separately implemented for a class
and why the enqueue() handler of each class cannot take care of
preempting curr via setting resched flags.

The only reason I can see is that, activate_task() is not always
followed by a check_preempt_curr() and sometimes there is an
unconditional resched_curr() happening following the call to
activate_task().

But such issues don't affect sched_ext in its current form I guess.

Btw, if sched_ext were to be implemented as a higher priority class
above CFS [1], then check_preempt_curr() may preempt without even
calling the class's check_preempt_curr() :

void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags)
{
        if (p->sched_class == rq->curr->sched_class)
                rq->curr->sched_class->check_preempt_curr(rq, p, flags);
        else if (sched_class_above(p->sched_class, rq->curr->sched_class))
                resched_curr(rq);

But if I understand, sched_ext is below CFS at the moment, so that
should not be an issue.

[1] By the way, now that I brought up the higher priority class thing,
I might as well discuss it here :-D :

One of my use cases is about scheduling high priority latency sensitive threads:
I think if sched_ext could have 2 classes, one lower than CFS and one
above CFS, that would be beneficial to those who want a gradual
transition to use scx, instead of switching all tasks to scx at once.

One reason is EAS (in CFS).  It may be beneficial for people to use
the existing EAS for everything but latency critical tasks (much like
how people use RT class for those). This is quite involved and
reimplementing EAS in BPF may be quite a project. Not that it
shouldn't be implemented that way, but EAS is about a decade old with
all kinds of energy modeling, math and what not. Having scx higher
than cfs alongside the lower one is less of an invasive approach than
switching everything on the system to scx.

Do you have any opinions on that? If it makes sense, I can work on
such an implementation.

Another reason for this is, general purpose systems run very varied
workloads, and big dramatic changes are likely to be reverted due to
power and performance regressions.  Hence, the request for a higher
scx, so that we (high priority task scx users) can take baby steps.

thanks,

 - Joel

  reply	other threads:[~2024-04-25 21:28 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-11  2:47 [PATCHSET v5] sched: Implement BPF extensible scheduler class Tejun Heo
2023-11-11  2:47 ` [PATCH 01/36] cgroup: Implement cgroup_show_cftypes() Tejun Heo
2023-11-11  2:47 ` [PATCH 02/36] sched: Restructure sched_class order sanity checks in sched_init() Tejun Heo
2023-11-11  2:47 ` [PATCH 03/36] sched: Allow sched_cgroup_fork() to fail and introduce sched_cancel_fork() Tejun Heo
2023-11-11  2:47 ` [PATCH 04/36] sched: Add sched_class->reweight_task() Tejun Heo
2023-11-11  2:47 ` [PATCH 05/36] sched: Add sched_class->switching_to() and expose check_class_changing/changed() Tejun Heo
2023-11-11  2:47 ` [PATCH 06/36] sched: Factor out cgroup weight conversion functions Tejun Heo
2023-11-11  2:47 ` [PATCH 07/36] sched: Expose css_tg() and __setscheduler_prio() Tejun Heo
2023-11-11  2:47 ` [PATCH 08/36] sched: Enumerate CPU cgroup file types Tejun Heo
2023-11-11  2:47 ` [PATCH 09/36] sched: Add @reason to sched_class->rq_{on|off}line() Tejun Heo
2023-11-11  2:47 ` [PATCH 10/36] sched: Add normal_policy() Tejun Heo
2023-11-11  2:47 ` [PATCH 11/36] sched_ext: Add boilerplate for extensible scheduler class Tejun Heo
2023-11-11  2:47 ` [PATCH 12/36] sched_ext: Implement BPF " Tejun Heo
2023-11-13 13:34   ` Changwoo Min
2023-11-14 19:07     ` Tejun Heo
2023-11-13 20:04   ` Andrea Righi
2023-11-14 19:07     ` Tejun Heo
2023-11-23  8:07   ` Andrea Righi
2023-11-25 19:59     ` Tejun Heo
2023-11-26  9:05       ` Andrea Righi
2023-12-07  2:04   ` [PATCH] scx: set p->scx.ops_state using atomic_long_set_release Changwoo Min
2023-12-08  0:16     ` Tejun Heo
2024-03-23  2:37   ` [PATCH 12/36] sched_ext: Implement BPF extensible scheduler class Joel Fernandes
2024-03-23 22:12     ` Tejun Heo
2024-04-25 21:28       ` Joel Fernandes [this message]
2024-04-26 16:57         ` Barret Rhoden
2024-04-26 21:58         ` Tejun Heo
2023-11-11  2:47 ` [PATCH 13/36] sched_ext: Add scx_simple and scx_example_qmap example schedulers Tejun Heo
2023-11-12  4:17   ` kernel test robot
2023-11-12 18:06     ` Tejun Heo
2023-11-11  2:47 ` [PATCH 14/36] sched_ext: Add sysrq-S which disables the BPF scheduler Tejun Heo
2023-11-11  2:47 ` [PATCH 15/36] sched_ext: Implement runnable task stall watchdog Tejun Heo
2023-11-11  2:47 ` [PATCH 16/36] sched_ext: Allow BPF schedulers to disallow specific tasks from joining SCHED_EXT Tejun Heo
2023-11-11  2:47 ` [PATCH 17/36] sched_ext: Allow BPF schedulers to switch all eligible tasks into sched_ext Tejun Heo
2023-11-11  2:47 ` [PATCH 18/36] sched_ext: Print sched_ext info when dumping stack Tejun Heo
2023-11-14 19:23   ` [PATCH v2 " Tejun Heo
2023-11-11  2:47 ` [PATCH 19/36] sched_ext: Implement scx_bpf_kick_cpu() and task preemption support Tejun Heo
2023-11-11  2:47 ` [PATCH 20/36] sched_ext: Add a central scheduler which makes all scheduling decisions on one CPU Tejun Heo
2023-11-11  2:47 ` [PATCH 21/36] sched_ext: Make watchdog handle ops.dispatch() looping stall Tejun Heo
2023-11-11  2:47 ` [PATCH 22/36] sched_ext: Add task state tracking operations Tejun Heo
2023-11-11  2:47 ` [PATCH 23/36] sched_ext: Implement tickless support Tejun Heo
2023-11-11  2:47 ` [PATCH 24/36] sched_ext: Track tasks that are subjects of the in-flight SCX operation Tejun Heo
2023-11-11  2:47 ` [PATCH 25/36] sched_ext: Add cgroup support Tejun Heo
2023-11-11  2:47 ` [PATCH 26/36] sched_ext: Add a cgroup-based core-scheduling scheduler Tejun Heo
2023-11-11  2:47 ` [PATCH 27/36] sched_ext: Add a cgroup scheduler which uses flattened hierarchy Tejun Heo
2023-11-11  2:47 ` [PATCH 28/36] sched_ext: Implement SCX_KICK_WAIT Tejun Heo
2023-11-11  2:47 ` [PATCH 29/36] sched_ext: Implement sched_ext_ops.cpu_acquire/release() Tejun Heo
2023-11-11  2:47 ` [PATCH 30/36] sched_ext: Implement sched_ext_ops.cpu_online/offline() Tejun Heo
2023-11-11  2:47 ` [PATCH 31/36] sched_ext: Implement core-sched support Tejun Heo
2023-11-11  2:47 ` [PATCH 32/36] sched_ext: Add vtime-ordered priority queue to dispatch_q's Tejun Heo
2023-11-11  2:47 ` [PATCH 33/36] sched_ext: Documentation: scheduler: Document extensible scheduler class Tejun Heo
2023-11-11  2:48 ` [PATCH 34/36] sched_ext: Add a basic, userland vruntime scheduler Tejun Heo
2023-11-11  2:48 ` [PATCH 35/36] sched_ext: Add scx_rusty, a rust userspace hybrid scheduler Tejun Heo
2023-11-11  2:48 ` [PATCH 36/36] sched_ext: Add scx_layered, a highly configurable multi-layer scheduler Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEXW_YR02g=DetfwM98ZoveWEbGbGGfb1KAikcBeC=Pkvqf4OA@mail.gmail.com' \
    --to=joel@joelfernandes.org \
    --cc=andrea.righi@canonical.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brho@google.com \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=changwoo@igalia.com \
    --cc=daniel@iogearbox.net \
    --cc=derkling@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=dschatzberg@meta.com \
    --cc=dskarlat@cs.cmu.edu \
    --cc=dvernet@meta.com \
    --cc=haoluo@google.com \
    --cc=himadrics@inria.fr \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kernel-team@meta.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=martin.lau@kernel.org \
    --cc=memxor@gmail.com \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=riel@surriel.com \
    --cc=rostedt@goodmis.org \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).