All of lore.kernel.org
 help / color / mirror / Atom feed
From: bsegall@google.com
To: Dave Chiluk <chiluk+linux@indeed.com>
Cc: Phil Auld <pauld@redhat.com>, Peter Oskolkov <posk@posk.io>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
	Brendan Gregg <bgregg@netflix.com>, Kyle Anderson <kwa@yelp.com>,
	Gabriel Munos <gmunoz@netflix.com>,
	John Hammond <jhammond@indeed.com>,
	Cong Wang <xiyou.wangcong@gmail.com>,
	Jonathan Corbet <corbet@lwn.net>,
	linux-doc@vger.kernel.org, pjt@google.com
Subject: Re: [PATCH v3 1/1] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices
Date: Thu, 30 May 2019 13:44:55 -0700	[thread overview]
Message-ID: <xm26zhn3y8mw.fsf@bsegall-linux.svl.corp.google.com> (raw)
In-Reply-To: <CAC=E7cU9GetuKVQE1HxXsSuOKgyxezXUmSH2ZDHOrLio_YZi1g@mail.gmail.com> (Dave Chiluk's message of "Thu, 30 May 2019 12:53:37 -0500")

Dave Chiluk <chiluk+linux@indeed.com> writes:

> On Wed, May 29, 2019 at 02:05:55PM -0700, bsegall@google.com wrote:
>> Dave Chiluk <chiluk+linux@indeed.com> writes:
>>
>> Yeah, having run the test, stranding only 1 ms per cpu rather than 5
>> doesn't help if you only have 10 ms of quota and even 10 threads/cpus.
>> The slack timer isn't important in this test, though I think it probably
>> should be changed.
> My min_cfs_rq_runtime was already set to 1ms.

Yeah, I meant min_cfs_rq_runtime vs the 5ms if the slack stuff was
broken.

>
> Additionally raising the amount of quota from 10ms to 50ms or even
> 100ms, still results in throttling without full quota usage.
>
>> Decreasing min_cfs_rq_runtime helps, but would mean that we have to pull
>> quota more often / always. The worst case here I think is where you
>> run/sleep for ~1ns, so you wind up taking the lock twice every
>> min_cfs_rq_runtime: once for assign and once to return all but min,
>> which you then use up doing short run/sleep. I suppose that determines
>> how much we care about this overhead at all.
> I'm not so concerned about how inefficiently the user-space application
> runs, as that's up to the invidual developer.

Increasing scheduler overhead is something we generally try to prevent
is what I was worried about.

> The fibtest testcase, is
> purely my approximation of what a java application with lots of worker
> threads might do, as I didn't have a great deterministic java
> reproducer, and I feared posting java to LKML.  I'm more concerned with
> the fact that the user requested 10ms/period or 100ms/period and they
> hit throttling while simultaneously not seeing that amount of cpu usage.
> i.e. on an 8 core machine if I
> $ ./runfibtest 1
> Iterations Completed(M): 1886
> Throttled for: 51
> CPU Usage (msecs) = 507
> $ ./runfibtest 8
> Iterations Completed(M): 1274
> Throttled for: 52
> CPU Usage (msecs) = 380
>
> You see that in the 8 core case where we have 7 do nothing threads on
> cpu's 1-7, we see only 380 ms of usage, and 52 periods of throttling
> when we should have received ~500ms of cpu usage.
>
> Looking more closely at the __return_cfs_rq_runtime logic I noticed
>         if (cfs_b->quota != RUNTIME_INF &&
>             cfs_rq->runtime_expires == cfs_b->runtime_expires) {
>
> Which is awfully similar to the logic that was fixed by 512ac999.  Is it
> possible that we are just not ever returning runtime back to the cfs_b
> because of the runtime_expires comparison here?

The relevant issue that patch fixes is that the old conditional was
backwards. Also lowering min_cfs_rq_runtime to 0 fixes your testcase, so
it's working.

>
>> Removing expiration means that in the worst case period and quota can be
>> effectively twice what the user specified, but only on very particular
>> workloads.
> I'm only removing expiration of slices that have already been assigned
> to individual cfs_rq.  My understanding is that there is at most one
> cfs_rq per cpu, and each of those can have at most one slice of
> available runtime.  So the worst case burst is slice_ms * cpus.  Please
> help me understand how you get to twice user specified quota and period
> as it's not obvious to me *(I've only been looking at this for a few
> months).

The reason that this effect is so significant is because slice_ms * cpus
is roughly 100% of the quota. So yes, it's roughly the same thing.
Unfortunately if there are more spare cpus on the system just doubling
quota and period (keeping the same ratio) would not fix your issue,
while removing expiration does while also potentially having that effect.

>
>> I think we should at least think about instead lowering
>> min_cfs_rq_runtime to some smaller value
> Do you mean lower than 1ms?

Yes

  reply	other threads:[~2019-05-30 20:45 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-17 19:30 [PATCH] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu slices Dave Chiluk
2019-05-23 18:44 ` [PATCH v2 0/1] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices Dave Chiluk
2019-05-23 18:44   ` [PATCH v2 1/1] " Dave Chiluk
2019-05-23 21:01     ` Peter Oskolkov
2019-05-24 14:32       ` Phil Auld
2019-05-24 15:14         ` Dave Chiluk
2019-05-24 15:59           ` Phil Auld
2019-05-24 16:28           ` Peter Oskolkov
2019-05-24 21:35             ` Dave Chiluk
2019-05-24 22:07               ` Peter Oskolkov
2019-05-28 22:25                 ` Dave Chiluk
2019-05-24  8:55     ` Peter Zijlstra
2019-05-29 19:08 ` [PATCH v3 0/1] " Dave Chiluk
2019-05-29 19:08   ` [PATCH v3 1/1] " Dave Chiluk
2019-05-29 19:28     ` Phil Auld
2019-05-29 19:50     ` bsegall
2019-05-29 21:05     ` bsegall
2019-05-30 17:53       ` Dave Chiluk
2019-05-30 20:44         ` bsegall [this message]
     [not found] ` <1561391404-14450-1-git-send-email-chiluk+linux@indeed.com>
2019-06-24 15:50   ` [PATCH v4 1/1] sched/fair: Return all runtime when cfs_b has very little remaining Dave Chiluk
2019-06-24 17:33     ` bsegall
2019-06-26 22:10       ` Dave Chiluk
2019-06-27 20:18         ` bsegall
2019-06-27 19:09 ` [PATCH] sched/fair: Fix low cpu usage with high throttling by removing expiration of cpu-local slices Dave Chiluk
2019-06-27 19:49 ` [PATCH v5 0/1] " Dave Chiluk
2019-06-27 19:49   ` [PATCH v5 1/1] " Dave Chiluk
2019-07-01 20:15     ` bsegall
2019-07-11  9:51       ` Peter Zijlstra
2019-07-11 17:46         ` bsegall
     [not found]           ` <CAC=E7cV4sO50NpYOZ06n_BkZTcBqf1KQp83prc+oave3ircBrw@mail.gmail.com>
2019-07-12 18:01             ` bsegall
2019-07-12 22:09             ` bsegall
2019-07-15 15:44               ` Dave Chiluk
2019-07-16 19:58     ` bsegall
2019-07-23 16:44 ` [PATCH v6 0/1] " Dave Chiluk
2019-07-23 16:44   ` [PATCH v6 1/1] " Dave Chiluk
2019-07-23 17:13     ` Phil Auld
2019-07-23 22:12       ` Dave Chiluk
2019-07-23 23:26         ` Phil Auld
2019-07-26 18:14       ` Peter Zijlstra
2019-08-08 10:53     ` [tip:sched/core] " tip-bot for Dave Chiluk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=xm26zhn3y8mw.fsf@bsegall-linux.svl.corp.google.com \
    --to=bsegall@google.com \
    --cc=bgregg@netflix.com \
    --cc=cgroups@vger.kernel.org \
    --cc=chiluk+linux@indeed.com \
    --cc=corbet@lwn.net \
    --cc=gmunoz@netflix.com \
    --cc=jhammond@indeed.com \
    --cc=kwa@yelp.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=posk@posk.io \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.