linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Andrea Righi <righi.andrea@gmail.com>
To: Josef Bacik <josef@toxicpanda.com>, Tejun Heo <tj@kernel.org>
Cc: Li Zefan <lizefan@huawei.com>,
	Paolo Valente <paolo.valente@linaro.org>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jens Axboe <axboe@kernel.dk>, Vivek Goyal <vgoyal@redhat.com>,
	Dennis Zhou <dennis@kernel.org>,
	cgroups@vger.kernel.org, linux-block@vger.kernel.org,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: [PATCH 0/3] blkcg: sync() isolation
Date: Tue, 19 Feb 2019 16:27:09 +0100	[thread overview]
Message-ID: <20190219152712.9855-1-righi.andrea@gmail.com> (raw)

= Problem =

When sync() is executed from a high-priority cgroup, the process is forced to
wait the completion of the entire outstanding writeback I/O, even the I/O that
was originally generated by low-priority cgroups potentially.

This may cause massive latencies to random processes (even those running in the
root cgroup) that shouldn't be I/O-throttled at all, similarly to a classic
priority inversion problem.

This topic has been previously discussed here:
https://patchwork.kernel.org/patch/10804489/

[ Thanks to Josef for the suggestions ]

= Solution =

Here's a slightly more detailed description of the solution, as suggested by
Josef and Tejun (let me know if I misunderstood or missed anything):

 - track the submitter of wb work (when issuing sync()) and the cgroup that
   originally dirtied any inode, then use this information to determine the
   proper "sync() domain" and decide if the I/O speed needs to be boosted or
   not in order to prevent priority-inversion problems

 - by default when sync() is issued, all the outstanding writeback I/O is
   boosted to maximum speed to prevent priority inversion problems

 - if sync() is issued by the same throttled cgroup that generated the dirty
   pages, the corresponding writeback I/O is still throttled normally

 - add a new flag to cgroups (io.sync_isolation) that would make sync()'ers in
   that cgroup only be allowed to write out dirty pages that belong to its
   cgroup

= Test =

Here's a trivial example to trigger the problem:

 - create 2 cgroups: cg1 and cg2

 # mkdir /sys/fs/cgroup/unified/cg1
 # mkdir /sys/fs/cgroup/unified/cg2

 - set an I/O limit of 1MB/s on cg1/io.ma:

 # echo "8:0 rbps=1048576 wbps=1048576" > /sys/fs/cgroup/unified/cg1/io.max

 - run a write-intensive workload in cg1

 # cat /proc/self/cgroup
 0::/cg1
 # fio --rw=write --bs=1M --size=32M --numjobs=16 --name=writer --time_based --runtime=30

 - run sync in cg2 and measure time

== Vanilla kernel ==

 # cat /proc/self/cgroup
 0::/cg2

 # time sync
 real	9m32,618s
 user	0m0,000s
 sys	0m0,018s

Ideally "sync" should complete almost immediately, because cg2 is unlimited and
it's not doing any I/O at all. Instead, the entire system is totally sluggish,
waiting for the throttled writeback I/O to complete, and it also triggers many
hung task timeout warnings.

== With this patch set applied and io.sync_isolation=0 (default) ==

 # cat /proc/self/cgroup
 0::/cg2

 # time sync
 real	0m2,044s
 user	0m0,009s
 sys	0m0,000s

[ Time range goes from 2s to 4s ]

== With this patch set applied and io.sync_isolation=1 ==

 # cat /proc/self/cgroup
 0::/cg2

 # time sync

 real	0m0,768s
 user	0m0,001s
 sys	0m0,008s

[ Time range goes from 0.7s to 1.6s ]

Andrea Righi (3):
  blkcg: prevent priority inversion problem during sync()
  blkcg: introduce io.sync_isolation
  blkcg: implement sync() isolation

 Documentation/admin-guide/cgroup-v2.rst |   9 +++
 block/blk-cgroup.c                      | 120 ++++++++++++++++++++++++++++++++
 block/blk-throttle.c                    |  48 ++++++++++++-
 fs/fs-writeback.c                       |  57 ++++++++++++++-
 fs/inode.c                              |   1 +
 fs/sync.c                               |   8 ++-
 include/linux/backing-dev-defs.h        |   2 +
 include/linux/blk-cgroup.h              |  52 ++++++++++++++
 include/linux/fs.h                      |   4 ++
 mm/backing-dev.c                        |   2 +
 mm/page-writeback.c                     |   1 +
 11 files changed, 297 insertions(+), 7 deletions(-)


             reply	other threads:[~2019-02-19 15:27 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-19 15:27 Andrea Righi [this message]
2019-02-19 15:27 ` [PATCH 1/3] blkcg: prevent priority inversion problem during sync() Andrea Righi
2019-02-19 15:27 ` [PATCH 2/3] blkcg: introduce io.sync_isolation Andrea Righi
2019-02-19 15:27 ` [PATCH 3/3] blkcg: implement sync() isolation Andrea Righi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190219152712.9855-1-righi.andrea@gmail.com \
    --to=righi.andrea@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=cgroups@vger.kernel.org \
    --cc=dennis@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=josef@toxicpanda.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan@huawei.com \
    --cc=paolo.valente@linaro.org \
    --cc=tj@kernel.org \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).