linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: Josh Don <joshdon@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	linux-kernel@vger.kernel.org,
	Joel Fernandes <joel@joelfernandes.org>
Subject: Re: [PATCH v2] sched: async unthrottling for cfs bandwidth
Date: Tue, 1 Nov 2022 11:49:55 -1000	[thread overview]
Message-ID: <Y2GUg8CiI68ZBznr@slm.duckdns.org> (raw)
In-Reply-To: <CABk29Nua8ZsDfhY+x+VfYDkbkjfXLXTZ5JMVR9uiBygraxDM+g@mail.gmail.com>

Hello,

On Tue, Nov 01, 2022 at 01:56:29PM -0700, Josh Don wrote:
> Maybe walking through an example would be helpful? I don't know if
> there's anything super specific. For cgroup_mutex for example, the
> same global mutex is being taken for things like cgroup mkdir and
> cgroup proc attach, regardless of which part of the hierarchy is being
> modified. So, we end up sharing that mutex between random job threads
> (ie. that may be manipulating their own cgroup sub-hierarchy), and
> control plane threads, which are attempting to manage root-level
> cgroups. Bad things happen when the cgroup_mutex (or similar) is held
> by a random thread which blocks and is of low scheduling priority,
> since when it wakes back up it may take quite a while for it to run
> again (whether that low priority be due to CFS bandwidth, sched_idle,
> or even just O(hundreds) of threads on a cpu). Starving out the
> control plane causes us significant issues, since that affects machine
> health. cgroup manipulation is not a hot path operation, but the
> control plane tends to hit it fairly often, and so those things
> combine at our scale to produce this rare problem.

I keep asking because I'm curious about the specific details of the
contentions. Control plane locking up is obviously bad but they can usually
tolerate some latencies - stalling out multiple seconds (or longer) can be
catastrophic but tens or hundreds or millisecs occasionally usually isn't.

The only times we've seen latency spikes from CPU side which is enough to
cause system-level failures were when there were severe restrictions through
bw control. Other cases sure are possible but unless you grab these mutexes
while IDLE inside a heavily contended cgroup (which is a bit silly) you
gotta push *really* hard.

If most of the problems were with cpu bw control, fixing that should do for
the time being. Otherwise, we'll have to think about finishing kernfs
locking granularity improvements and doing something similar to cgroup
locking too.

Thanks.

-- 
tejun

  reply	other threads:[~2022-11-01 21:50 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-26 22:44 [PATCH v2] sched: async unthrottling for cfs bandwidth Josh Don
2022-10-31 13:04 ` Peter Zijlstra
2022-10-31 21:22   ` Josh Don
2022-10-31 21:50     ` Tejun Heo
2022-10-31 23:15       ` Josh Don
2022-10-31 23:53         ` Tejun Heo
2022-11-01  1:01           ` Josh Don
2022-11-01  1:45             ` Tejun Heo
2022-11-01 19:11               ` Josh Don
2022-11-01 19:15                 ` Tejun Heo
2022-11-01 20:56                   ` Josh Don
2022-11-01 21:49                     ` Tejun Heo [this message]
2022-11-01 21:59                       ` Josh Don
2022-11-01 22:38                         ` Tejun Heo
2022-11-02 17:10                           ` Michal Koutný
2022-11-02 17:18                             ` Tejun Heo
2022-10-31 21:56   ` Benjamin Segall
2022-11-02  8:40     ` Peter Zijlstra
2022-11-11  0:14       ` Josh Don
2022-11-02 16:59 ` Michal Koutný
2022-11-03  0:10   ` Josh Don
2022-11-03 10:11     ` Michal Koutný
2022-11-16  3:01   ` Josh Don
2022-11-16  9:57     ` Michal Koutný
2022-11-16 21:45       ` Josh Don

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y2GUg8CiI68ZBznr@slm.duckdns.org \
    --to=tj@kernel.org \
    --cc=bristot@redhat.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=joel@joelfernandes.org \
    --cc=joshdon@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).