linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Huaixin Chang <changhuaixin@linux.alibaba.com>
To: changhuaixin@linux.alibaba.com
Cc: bsegall@google.com, dietmar.eggemann@arm.com,
	juri.lelli@redhat.com, khlebnikov@yandex-team.ru,
	linux-kernel@vger.kernel.org, mgorman@suse.de, mingo@redhat.com,
	odin@uged.al, odin@ugedal.com, pauld@redhead.com,
	peterz@infradead.org, pjt@google.com, rostedt@goodmis.org,
	shanpeic@linux.alibaba.com, tj@kernel.org,
	vincent.guittot@linaro.org, xiyou.wangcong@gmail.com
Subject: [PATCH v4 0/4] sched/fair: Burstable CFS bandwidth controller
Date: Tue, 16 Mar 2021 12:49:27 +0800	[thread overview]
Message-ID: <20210316044931.39733-1-changhuaixin@linux.alibaba.com> (raw)

Changelog:

v4:
- Adjust assignments in tg_set_cfs_bandwidth(), saving unnecessary
  assignemnts when quota == RUNTIME_INF.
- Getting rid of sysctl_sched_cfs_bw_burst_onset_percent, as there seems
  no justification for both controlling start bandwidth and a percent
  way.
- Comment improvement in sched_cfs_period_timer() shifts on explaining
  why max_overrun shifting to 0 is a problem.
- Rename previous_runtime to runtime_at_period_start.
- Add cgroup2 interface and documentation.
- Getting rid of exposing current_bw as there are not enough
  justification and the updating problem.
- Add justification on cpu.stat change in the changelog.
- Rebase upon v5.12-rc3.
- Correct SoB chain.
- Several indentation fixes.
- Adjust quota in schbench test from 700000 to 600000.

v3:
- Fix another issue reported by test robot.
- Update docs as Randy Dunlap suggested.
Link:
https://lore.kernel.org/lkml/20210120122715.29493-1-changhuaixin@linux.alibaba.com/

v2:
- Fix an issue reported by test robot.
- Rewriting docs. Appreciate any further suggestions or help.
Link:
https://lore.kernel.org/lkml/20210121110453.18899-1-changhuaixin@linux.alibaba.com/

v1 Link:
https://lore.kernel.org/lkml/20201217074620.58338-1-changhuaixin@linux.alibaba.com/

The CFS bandwidth controller limits CPU requests of a task group to
quota during each period. However, parallel workloads might be bursty
so that they get throttled. And they are latency sensitive at the same
time so that throttling them is undesired.

Scaling up period and quota allows greater burst capacity. But it might
cause longer stuck till next refill. We introduce "burst" to allow
accumulating unused quota from previous periods, and to be assigned when
a task group requests more CPU than quota during a specific period. Thus
allowing CPU time requests as long as the average requested CPU time is
below quota on the long run. The maximum accumulation is capped by burst
and is set 0 by default, thus the traditional behaviour remains.

A huge drop of 99th tail latency from more than 500ms to 27ms is seen for
real java workloads when using burst. Similar drops are seen when
testing with schbench too:

	echo $$ > /sys/fs/cgroup/cpu/test/cgroup.procs
	echo 600000 > /sys/fs/cgroup/cpu/test/cpu.cfs_quota_us
	echo 100000 > /sys/fs/cgroup/cpu/test/cpu.cfs_period_us
	echo 400000 > /sys/fs/cgroup/cpu/test/cpu.cfs_burst_us

	# The average CPU usage is around 500%, which is 200ms CPU time
	# every 40ms.
	./schbench -m 1 -t 30 -r 10 -c 10000 -R 500

	Without burst:

	Latency percentiles (usec)
	50.0000th: 7
	75.0000th: 8
	90.0000th: 9
	95.0000th: 10
	*99.0000th: 933
	99.5000th: 981
	99.9000th: 3068
	min=0, max=20054
	rps: 498.31 p95 (usec) 10 p99 (usec) 933 p95/cputime 0.10% p99/cputime 9.33%

	With burst:

	Latency percentiles (usec)
	50.0000th: 7
	75.0000th: 8
	90.0000th: 9
	95.0000th: 9
	*99.0000th: 12
	99.5000th: 13
	99.9000th: 19
	min=0, max=406
	rps: 498.36 p95 (usec) 9 p99 (usec) 12 p95/cputime 0.09% p99/cputime 0.12%

How much workloads with benefit from burstable CFS bandwidth control
depends on how bursty and how latency sensitive they are.

Previously, Cong Wang and Konstantin Khlebnikov proposed similar
feature:
https://lore.kernel.org/lkml/20180522062017.5193-1-xiyou.wangcong@gmail.com/
https://lore.kernel.org/lkml/157476581065.5793.4518979877345136813.stgit@buzz/

This time we present more latency statistics and handle overflow while
accumulating.

Huaixin Chang (4):
  sched/fair: Introduce primitives for CFS bandwidth burst
  sched/fair: Make CFS bandwidth controller burstable
  sched/fair: Add cfs bandwidth burst statistics
  sched/fair: Add document for burstable CFS bandwidth control

 Documentation/scheduler/sched-bwc.rst |  49 +++++++++++--
 include/linux/sched/sysctl.h          |   2 +
 kernel/sched/core.c                   | 126 +++++++++++++++++++++++++++++-----
 kernel/sched/fair.c                   |  58 +++++++++++++---
 kernel/sched/sched.h                  |   9 ++-
 kernel/sysctl.c                       |  18 +++++
 6 files changed, 232 insertions(+), 30 deletions(-)

-- 
2.14.4.44.g2045bb6


             reply	other threads:[~2021-03-16  4:51 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-16  4:49 Huaixin Chang [this message]
2021-03-16  4:49 ` [PATCH v4 1/4] sched/fair: Introduce primitives for CFS bandwidth burst Huaixin Chang
2021-03-16  9:27   ` Peter Zijlstra
2021-03-16  9:41   ` Peter Zijlstra
2021-03-16  9:54   ` Peter Zijlstra
2021-03-17  7:16     ` changhuaixin
2021-03-17  8:06       ` Peter Zijlstra
2021-03-18  1:26         ` changhuaixin
2021-03-18 12:59           ` Phil Auld
2021-03-18 15:10             ` Peter Zijlstra
2021-04-19  8:18               ` changhuaixin
2021-03-19 12:51             ` changhuaixin
2021-03-18 15:05           ` Peter Zijlstra
2021-03-19 12:39             ` changhuaixin
2021-03-20  2:06               ` changhuaixin
2021-05-12 12:41             ` changhuaixin
2021-03-16 10:40   ` Peter Zijlstra
2021-03-16  4:49 ` [PATCH v4 2/4] sched/fair: Make CFS bandwidth controller burstable Huaixin Chang
2021-03-16  9:52   ` Peter Zijlstra
2021-03-16  4:49 ` [PATCH v4 3/4] sched/fair: Add cfs bandwidth burst statistics Huaixin Chang
2021-03-16  4:49 ` [PATCH v4 4/4] sched/fair: Add document for burstable CFS bandwidth control Huaixin Chang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210316044931.39733-1-changhuaixin@linux.alibaba.com \
    --to=changhuaixin@linux.alibaba.com \
    --cc=bsegall@google.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=juri.lelli@redhat.com \
    --cc=khlebnikov@yandex-team.ru \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=odin@uged.al \
    --cc=odin@ugedal.com \
    --cc=pauld@redhead.com \
    --cc=peterz@infradead.org \
    --cc=pjt@google.com \
    --cc=rostedt@goodmis.org \
    --cc=shanpeic@linux.alibaba.com \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).