archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <>
To: Josh Don <>
Cc: Ingo Molnar <>,
	Peter Zijlstra <>,
	Juri Lelli <>,
	Vincent Guittot <>,
	Dietmar Eggemann <>,
	Steven Rostedt <>,
	Ben Segall <>, Mel Gorman <>,
	Daniel Bristot de Oliveira <>,
	Paul Turner <>,
	David Rientjes <>,
	Oleg Rombakh <>,
	Viresh Kumar <>,
	Steve Sistare <>,, Rik van Riel <>
Subject: Re: [PATCH] sched: cgroup SCHED_IDLE support
Date: Wed, 16 Jun 2021 11:42:05 -0400	[thread overview]
Message-ID: <> (raw)
In-Reply-To: <>


On Tue, Jun 08, 2021 at 04:11:32PM -0700, Josh Don wrote:
> This extends SCHED_IDLE to cgroups.
> Interface: cgroup/cpu.idle.
>  0: default behavior
> Extending SCHED_IDLE to cgroups means that we incorporate the existing
> aspects of SCHED_IDLE; a SCHED_IDLE cgroup will count all of its
> descendant threads towards the idle_h_nr_running count of all of its
> ancestor cgroups. Thus, sched_idle_rq() will work properly.
> Additionally, SCHED_IDLE cgroups are configured with minimum weight.
> There are two key differences between the per-task and per-cgroup
> SCHED_IDLE interface:
> - The cgroup interface allows tasks within a SCHED_IDLE hierarchy to
> maintain their relative weights. The entity that is "idle" is the
> cgroup, not the tasks themselves.
> - Since the idle entity is the cgroup, our SCHED_IDLE wakeup preemption
> decision is not made by comparing the current task with the woken task,
> but rather by comparing their matching sched_entity.
> A typical use-case for this is a user that creates an idle and a
> non-idle subtree. The non-idle subtree will dominate competition vs
> the idle subtree, but the idle subtree will still be high priority
> vs other users on the system. The latter is accomplished via comparing
> matching sched_entity in the waken preemption path (this could also be
> improved by making the sched_idle_rq() decision dependent on the
> perspective of a specific task).

A high-level problem that I see with the proposal is that this would bake
the current recursive implementation into the interface. The semantics of
the currently exposed interface, at least the weight based part, is abstract
and doesn't necessarily dictate how the scheduling is actually performed.
Adding this would mean that we're now codifying the current behavior of
fully nested scheduling into the interface.

There are several practical challenges with the current implementation
caused by the full nesting - e.g. nesting levels are expensive for context
switch heavy applicaitons often going over >1% per level, and heuristics
which assume global queue may behave unexpectedly - ie. we can create
conditions where things like idle-wakeup boost behave very differently
depending on whether tasks are inside a cgroup or not even when the eventual
relative weights and past usages are similar.

Can you please give more details on why this is beneficial? Is the benefit
mostly around making configuration easy or are there actual scheduling
behaviors that you can't achieve otherwise?



  parent reply	other threads:[~2021-06-16 15:48 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-08 23:11 [PATCH] sched: cgroup SCHED_IDLE support Josh Don
2021-06-10 12:53 ` Dietmar Eggemann
2021-06-10 19:14   ` Josh Don
2021-06-11 16:43     ` Dietmar Eggemann
2021-06-11 23:34       ` Josh Don
2021-06-15 10:06         ` Dietmar Eggemann
2021-06-15 23:30           ` Josh Don
2021-06-25  9:24           ` Peter Zijlstra
2021-06-16 15:42 ` Tejun Heo [this message]
2021-06-17  1:01   ` Josh Don
2021-06-26  9:57     ` Tejun Heo
2021-06-29  4:57       ` Josh Don
2021-06-25  8:08   ` Peter Zijlstra
2021-06-26 10:06     ` Tejun Heo
2021-06-26 11:42     ` Rik van Riel
2021-06-25  8:14 ` Peter Zijlstra
2021-06-26  0:18   ` Josh Don
2021-06-25  8:20 ` Peter Zijlstra
2021-06-26  0:35   ` Josh Don

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).