From: Andrea Righi <righi.andrea@gmail.com>
To: Tejun Heo <tj@kernel.org>, Li Zefan <lizefan@huawei.com>,
Johannes Weiner <hannes@cmpxchg.org>
Cc: Jens Axboe <axboe@kernel.dk>, Vivek Goyal <vgoyal@redhat.com>,
Josef Bacik <josef@toxicpanda.com>,
Dennis Zhou <dennis@kernel.org>,
cgroups@vger.kernel.org, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: [RFC PATCH 0/3] cgroup: fsio throttle controller
Date: Fri, 18 Jan 2019 11:31:24 +0100 [thread overview]
Message-ID: <20190118103127.325-1-righi.andrea@gmail.com> (raw)
This is a redesign of my old cgroup-io-throttle controller:
https://lwn.net/Articles/330531/
I'm resuming this old patch to point out a problem that I think is still
not solved completely.
= Problem =
The io.max controller works really well at limiting synchronous I/O
(READs), but a lot of I/O requests are initiated outside the context of
the process that is ultimately responsible for its creation (e.g.,
WRITEs).
Throttling at the block layer in some cases is too late and we may end
up slowing down processes that are not responsible for the I/O that
is being processed at that level.
= Proposed solution =
The main idea of this controller is to split I/O measurement and I/O
throttling: I/O is measured at the block layer for READS, at page cache
(dirty pages) for WRITEs, and processes are limited while they're
generating I/O at the VFS level, based on the measured I/O.
= Example =
Here's a trivial example: create 2 cgroups, set an io.max limit of
10MB/s, run a write-intensive workload on both and after a while, from a
root cgroup, run "sync".
# cat /proc/self/cgroup
0::/cg1
# fio --rw=write --bs=1M --size=32M --numjobs=16 --name=seeker --time_based --runtime=30
# cat /proc/self/cgroup
0::/cg2
# fio --rw=write --bs=1M --size=32M --numjobs=16 --name=seeker --time_based --runtime=30
- io.max controller:
# echo "259:0 rbps=10485760 wbps=10485760" > /sys/fs/cgroup/unified/cg1/io.max
# echo "259:0 rbps=10485760 wbps=10485760" > /sys/fs/cgroup/unified/cg2/io.max
# cat /proc/self/cgroup
0::/
# time sync
real 0m51,241s
user 0m0,000s
sys 0m0,113s
Ideally "sync" should complete almost immediately, because the root
cgroup is unlimited and it's not doing any I/O at all, but instead it's
blocked for more than 50 sec with io.max, because the writeback is
throttled to satisfy the io.max limits.
- fsio controller:
# echo "259:0 10 10" > /sys/fs/cgroup/unified/cg1/fsio.max_mbs
# echo "259:0 10 10" > /sys/fs/cgroup/unified/cg2/fsio.max_mbs
[you can find details about the syntax in the documentation patch]
# cat /proc/self/cgroup
0::/
# time sync
real 0m0,146s
user 0m0,003s
sys 0m0,001s
= Questions =
Q: Do we need another controller?
A: Probably no, I think it would be better to integrate this policy (or
something similar) in the current blkio controller, this is just to
highlight the problem and get some ideas on how to address it.
Q: What about proportional limits / latency?
A: It should be trivial to add latency-based limits if we integrate this in the
current I/O controller. About proportional limits (weights), they're
strictly related to I/O scheduling and since this controller doesn't touch
I/O dispatching policies it's not trivial to implement proportional limits
(bandwidth limiting is definitely more straightforward).
Q: Applying delays at the VFS layer doesn't prevent I/O spikes during
writeback, right?
A: Correct, the tradeoff here is to tolerate I/O bursts during writeback to
avoid priority inversion problems in the system.
Andrea Righi (3):
fsio-throttle: documentation
fsio-throttle: controller infrastructure
fsio-throttle: instrumentation
Documentation/cgroup-v1/fsio-throttle.txt | 142 +++++++++
block/blk-core.c | 10 +
include/linux/cgroup_subsys.h | 4 +
include/linux/fsio-throttle.h | 43 +++
include/linux/writeback.h | 7 +-
init/Kconfig | 11 +
kernel/cgroup/Makefile | 1 +
kernel/cgroup/fsio-throttle.c | 501 ++++++++++++++++++++++++++++++
mm/filemap.c | 20 +-
mm/page-writeback.c | 14 +-
10 files changed, 749 insertions(+), 4 deletions(-)
next reply other threads:[~2019-01-18 10:31 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-01-18 10:31 Andrea Righi [this message]
2019-01-18 10:31 ` [RFC PATCH 1/3] fsio-throttle: documentation Andrea Righi
2019-01-18 10:31 ` [RFC PATCH 2/3] fsio-throttle: controller infrastructure Andrea Righi
2019-01-18 10:31 ` [RFC PATCH 3/3] fsio-throttle: instrumentation Andrea Righi
2019-01-18 11:04 ` [RFC PATCH 0/3] cgroup: fsio throttle controller Paolo Valente
2019-01-18 11:10 ` Andrea Righi
2019-01-18 11:11 ` Paolo Valente
2019-01-18 16:35 ` Josef Bacik
2019-01-18 17:07 ` Paolo Valente
2019-01-18 17:12 ` Josef Bacik
2019-01-18 19:02 ` Andrea Righi
2019-01-18 18:44 ` Andrea Righi
2019-01-18 19:46 ` Josef Bacik
2019-01-19 10:08 ` Andrea Righi
2019-01-21 21:47 ` Vivek Goyal
2019-01-28 17:41 ` Andrea Righi
2019-01-28 19:26 ` Vivek Goyal
2019-01-29 18:39 ` Andrea Righi
2019-01-29 18:50 ` Josef Bacik
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190118103127.325-1-righi.andrea@gmail.com \
--to=righi.andrea@gmail.com \
--cc=axboe@kernel.dk \
--cc=cgroups@vger.kernel.org \
--cc=dennis@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=josef@toxicpanda.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan@huawei.com \
--cc=tj@kernel.org \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).