From: Peter Zijlstra <peterz@infradead.org>
To: Phil Auld <pauld@redhat.com>
Cc: Dave Chiluk <chiluk+linux@indeed.com>,
Ben Segall <bsegall@google.com>, Peter Oskolkov <posk@posk.io>,
Ingo Molnar <mingo@redhat.com>,
cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
Brendan Gregg <bgregg@netflix.com>, Kyle Anderson <kwa@yelp.com>,
Gabriel Munos <gmunoz@netflix.com>,
John Hammond <jhammond@indeed.com>,
Cong Wang <xiyou.wangcong@gmail.com>,
Jonathan Corbet <corbet@lwn.net>,
linux-doc@vger.kernel.org
Subject: Re: [PATCH v6 1/1] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices
Date: Fri, 26 Jul 2019 20:14:32 +0200 [thread overview]
Message-ID: <20190726181432.GR31381@hirez.programming.kicks-ass.net> (raw)
In-Reply-To: <20190723171307.GC2947@lorien.usersys.redhat.com>
On Tue, Jul 23, 2019 at 01:13:09PM -0400, Phil Auld wrote:
> Hi Dave,
>
> On Tue, Jul 23, 2019 at 11:44:26AM -0500 Dave Chiluk wrote:
> > It has been observed, that highly-threaded, non-cpu-bound applications
> > running under cpu.cfs_quota_us constraints can hit a high percentage of
> > periods throttled while simultaneously not consuming the allocated
> > amount of quota. This use case is typical of user-interactive non-cpu
> > bound applications, such as those running in kubernetes or mesos when
> > run on multiple cpu cores.
> >
> > This has been root caused to cpu-local run queue being allocated per cpu
> > bandwidth slices, and then not fully using that slice within the period.
> > At which point the slice and quota expires. This expiration of unused
> > slice results in applications not being able to utilize the quota for
> > which they are allocated.
> >
> > The non-expiration of per-cpu slices was recently fixed by
> > 'commit 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift
> > condition")'. Prior to that it appears that this had been broken since
> > at least 'commit 51f2176d74ac ("sched/fair: Fix unlocked reads of some
> > cfs_b->quota/period")' which was introduced in v3.16-rc1 in 2014. That
> > added the following conditional which resulted in slices never being
> > expired.
> >
> > if (cfs_rq->runtime_expires != cfs_b->runtime_expires) {
> > /* extend local deadline, drift is bounded above by 2 ticks */
> > cfs_rq->runtime_expires += TICK_NSEC;
> >
> > Because this was broken for nearly 5 years, and has recently been fixed
> > and is now being noticed by many users running kubernetes
> > (https://github.com/kubernetes/kubernetes/issues/67577) it is my opinion
> > that the mechanisms around expiring runtime should be removed
> > altogether.
> >
> > This allows quota already allocated to per-cpu run-queues to live longer
> > than the period boundary. This allows threads on runqueues that do not
> > use much CPU to continue to use their remaining slice over a longer
> > period of time than cpu.cfs_period_us. However, this helps prevent the
> > above condition of hitting throttling while also not fully utilizing
> > your cpu quota.
> >
> > This theoretically allows a machine to use slightly more than its
> > allotted quota in some periods. This overflow would be bounded by the
> > remaining quota left on each per-cpu runqueueu. This is typically no
> > more than min_cfs_rq_runtime=1ms per cpu. For CPU bound tasks this will
> > change nothing, as they should theoretically fully utilize all of their
> > quota in each period. For user-interactive tasks as described above this
> > provides a much better user/application experience as their cpu
> > utilization will more closely match the amount they requested when they
> > hit throttling. This means that cpu limits no longer strictly apply per
> > period for non-cpu bound applications, but that they are still accurate
> > over longer timeframes.
> >
> > This greatly improves performance of high-thread-count, non-cpu bound
> > applications with low cfs_quota_us allocation on high-core-count
> > machines. In the case of an artificial testcase (10ms/100ms of quota on
> > 80 CPU machine), this commit resulted in almost 30x performance
> > improvement, while still maintaining correct cpu quota restrictions.
> > That testcase is available at https://github.com/indeedeng/fibtest.
> >
> > Fixes: 512ac999d275 ("sched/fair: Fix bandwidth timer clock drift condition")
> > Signed-off-by: Dave Chiluk <chiluk+linux@indeed.com>
> > Reviewed-by: Ben Segall <bsegall@google.com>
>
> This still works for me. The documentation reads pretty well, too. Good job.
>
> Feel free to add my Acked-by: or Reviewed-by: Phil Auld <pauld@redhat.com>.
Thanks guys!
next prev parent reply other threads:[~2019-07-26 18:14 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-05-17 19:30 [PATCH] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu slices Dave Chiluk
2019-05-23 18:44 ` [PATCH v2 0/1] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices Dave Chiluk
2019-05-23 18:44 ` [PATCH v2 1/1] " Dave Chiluk
2019-05-23 21:01 ` Peter Oskolkov
2019-05-24 14:32 ` Phil Auld
2019-05-24 15:14 ` Dave Chiluk
2019-05-24 15:59 ` Phil Auld
2019-05-24 16:28 ` Peter Oskolkov
2019-05-24 21:35 ` Dave Chiluk
2019-05-24 22:07 ` Peter Oskolkov
2019-05-28 22:25 ` Dave Chiluk
2019-05-24 8:55 ` Peter Zijlstra
2019-05-29 19:08 ` [PATCH v3 0/1] " Dave Chiluk
2019-05-29 19:08 ` [PATCH v3 1/1] " Dave Chiluk
2019-05-29 19:28 ` Phil Auld
2019-05-29 19:50 ` bsegall
2019-05-29 21:05 ` bsegall
2019-05-30 17:53 ` Dave Chiluk
2019-05-30 20:44 ` bsegall
[not found] ` <1561391404-14450-1-git-send-email-chiluk+linux@indeed.com>
2019-06-24 15:50 ` [PATCH v4 1/1] sched/fair: Return all runtime when cfs_b has very little remaining Dave Chiluk
2019-06-24 17:33 ` bsegall
2019-06-26 22:10 ` Dave Chiluk
2019-06-27 20:18 ` bsegall
2019-06-27 19:09 ` [PATCH] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices Dave Chiluk
2019-06-27 19:49 ` [PATCH v5 0/1] " Dave Chiluk
2019-06-27 19:49 ` [PATCH v5 1/1] " Dave Chiluk
2019-07-01 20:15 ` bsegall
2019-07-11 9:51 ` Peter Zijlstra
2019-07-11 17:46 ` bsegall
[not found] ` <CAC=E7cV4sO50NpYOZ06n_BkZTcBqf1KQp83prc+oave3ircBrw@mail.gmail.com>
2019-07-12 18:01 ` bsegall
2019-07-12 22:09 ` bsegall
2019-07-15 15:44 ` Dave Chiluk
2019-07-16 19:58 ` bsegall
2019-07-23 16:44 ` [PATCH v6 0/1] " Dave Chiluk
2019-07-23 16:44 ` [PATCH v6 1/1] " Dave Chiluk
2019-07-23 17:13 ` Phil Auld
2019-07-23 22:12 ` Dave Chiluk
2019-07-23 23:26 ` Phil Auld
2019-07-26 18:14 ` Peter Zijlstra [this message]
2019-08-08 10:53 ` [tip:sched/core] " tip-bot for Dave Chiluk
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190726181432.GR31381@hirez.programming.kicks-ass.net \
--to=peterz@infradead.org \
--cc=bgregg@netflix.com \
--cc=bsegall@google.com \
--cc=cgroups@vger.kernel.org \
--cc=chiluk+linux@indeed.com \
--cc=corbet@lwn.net \
--cc=gmunoz@netflix.com \
--cc=jhammond@indeed.com \
--cc=kwa@yelp.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pauld@redhat.com \
--cc=posk@posk.io \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).