linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Josh Don <joshdon@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	linux-kernel@vger.kernel.org,
	Joel Fernandes <joel@joelfernandes.org>
Subject: Re: [PATCH v2] sched: async unthrottling for cfs bandwidth
Date: Mon, 31 Oct 2022 11:50:12 -1000	[thread overview]
Message-ID: <Y2BDFNpkSawKnE9S@slm.duckdns.org> (raw)
In-Reply-To: <CABk29Nu=XcjwRxnGBtKHfknxnDPpspghou06+W0fufnkGF6NkA@mail.gmail.com>

Hello,

On Mon, Oct 31, 2022 at 02:22:42PM -0700, Josh Don wrote:
> > So, TJ has been complaining about us throttling in kernel-space, causing
> > grief when we also happen to hold a mutex or some other resource and has
> > been prodding us to only throttle at the return-to-user boundary.
> 
> Yea, we've been having similar priority inversion issues. It isn't
> limited to CFS bandwidth though, such problems are also pretty easy to
> hit with configurations of shares, cpumasks, and SCHED_IDLE. I've

We need to distinguish between work-conserving and non-work-conserving
control schemes. Work-conserving ones - such as shares and idle - shouldn't
affect the aggregate amount of work the system can perform. There may be
local and temporary priority inversions but they shouldn't affect the
throughput of the system and the scheduler should be able to make the
eventual resource distribution conform to the configured targtes.

CPU affinity and bw control are not work conserving and thus cause a
different class of problems. While it is possible to slow down a system with
overly restrictive CPU affinities, it's a lot harder to do so severely
compared to BW control because no matter what you do, there's still at least
one CPU which can make full forward progress. BW control, it's really easy
to stall the entire system almost completely because we're giving userspace
the ability to stall tasks for an arbitrary amount of time at random places
in the kernel. This is what cgroup1 freezer did which had exactly the same
problems.

> chatted with the folks working on the proxy execution patch series,
> and it seems like that could be a better generic solution to these
> types of issues.

Care to elaborate?

> Throttle at return-to-user seems only mildly beneficial, and then only
> really with preemptive kernels. Still pretty easy to get inversion
> issues, e.g. a thread holding a kernel mutex wake back up into a
> hierarchy that is currently throttled, or a thread holding a kernel
> mutex exists in the hierarchy being throttled but is currently waiting
> to run.

I don't follow. If you only throttle at predefined safe spots, the easiest
place being the kernel-user boundary, you cannot get system-wide stalls from
BW restrictions, which is something the kernel shouldn't allow userspace to
cause. In your example, a thread holding a kernel mutex waking back up into
a hierarchy that is currently throttled should keep running in the kernel
until it encounters such safe throttling point where it would have released
the kernel mutex and then throttle.

Thanks.

-- 
tejun

  reply	other threads:[~2022-10-31 21:50 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-26 22:44 [PATCH v2] sched: async unthrottling for cfs bandwidth Josh Don
2022-10-31 13:04 ` Peter Zijlstra
2022-10-31 21:22   ` Josh Don
2022-10-31 21:50     ` Tejun Heo [this message]
2022-10-31 23:15       ` Josh Don
2022-10-31 23:53         ` Tejun Heo
2022-11-01  1:01           ` Josh Don
2022-11-01  1:45             ` Tejun Heo
2022-11-01 19:11               ` Josh Don
2022-11-01 19:15                 ` Tejun Heo
2022-11-01 20:56                   ` Josh Don
2022-11-01 21:49                     ` Tejun Heo
2022-11-01 21:59                       ` Josh Don
2022-11-01 22:38                         ` Tejun Heo
2022-11-02 17:10                           ` Michal Koutný
2022-11-02 17:18                             ` Tejun Heo
2022-10-31 21:56   ` Benjamin Segall
2022-11-02  8:40     ` Peter Zijlstra
2022-11-11  0:14       ` Josh Don
2022-11-02 16:59 ` Michal Koutný
2022-11-03  0:10   ` Josh Don
2022-11-03 10:11     ` Michal Koutný
2022-11-16  3:01   ` Josh Don
2022-11-16  9:57     ` Michal Koutný
2022-11-16 21:45       ` Josh Don

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2BDFNpkSawKnE9S@slm.duckdns.org \
    --to=tj@kernel.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joel@joelfernandes.org \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).