From: Tejun Heo <tj@kernel.org>
To: axboe@kernel.dk
Cc: linux-kernel@vger.kernel.org, jack@suse.cz, hch@infradead.org,
hannes@cmpxchg.org, linux-fsdevel@vger.kernel.org,
vgoyal@redhat.com, lizefan@huawei.com, cgroups@vger.kernel.org,
linux-mm@kvack.org, mhocko@suse.cz, clm@fb.com,
fengguang.wu@intel.com, david@fromorbit.com, gthelen@google.com,
khlebnikov@yandex-team.ru
Subject: [PATCHSET 1/3 v4 block/for-4.2/core] writeback: cgroup writeback support
Date: Fri, 22 May 2015 17:13:14 -0400 [thread overview]
Message-ID: <1432329245-5844-1-git-send-email-tj@kernel.org> (raw)
Hello,
This is v4 of cgroup writeback support patchset. Changes from the
last take[L] are
* b9ea25152e56 ("page_writeback: clean up mess around
cancel_dirty_page()") replaced cancel_dirty_page() with
account_page_cleaned() which pushed clearing the dirty flag to the
caller; however, changes in this patchset and the following ones
require synchronization between dirty clearing and stat updates
which is a lot easier with a helper which does both operations.
0001-page_writeback-revive-cancel_dirty_page-in-a-restric.patch is
added to resurrect cancel_dirty_page() in a more restricted form.
* Recent dirtytime changes added wakeup_dirtytime_writeback() which
needs to be updated to walk through all wb's.
0042-writeback-make-wakeup_dirtytime_writeback-handle-mul.patch
added.
* Rebased on top of the current block/for-4.2/core.
blkio cgroup (blkcg) is severely crippled in that it can only control
read and direct write IOs. blkcg can't tell which cgroup should be
held responsible for a given writeback IO and charges all of them to
the root cgroup - all normal write traffic ends up in the root cgroup.
Although the problem has been identified years ago, mainly because it
interacts with so many subsystems, it hasn't been solved yet.
This patchset finally implements cgroup writeback support so that
writeback of a page is attributed to the corresponding blkcg of the
memcg that the page belongs to.
Overall design
--------------
* This requires cooperation between memcg and blkcg. Each inode is
assigned to the blkcg mapped to the memcg being dirtied.
* struct bdi_writeback (wb) was always embedded in struct
backing_dev_info (bdi) and the distinction between the two wasn't
clear. This patchset makes wb operate as an independent writeback
execution domain. bdi->wb is still embedded and serves the root
cgroup but there can be other wb's for other cgroups.
* Each wb is associated with memcg. As memcg is implicitly enabled by
blkcg on the unified hierarchy, this gives a unique wb for each
memcg-blkcg combination. When memcg-blkcg mapping changes, a new wb
is created and the existing wb is unlinked and drained.
* An inode is associated with the matching wb when it gets dirtied for
the first time and written back by that wb. A later patchset will
implement dynamic wb switching.
* All writeback operations are made per-wb instead of per-bdi.
bdi-wide operations are split across all member wb's. If some
finite amount needs to be distributed, be it number of pages to
writeback or bdi->min/max_ratio, it's distributed according to the
bandwidth proportion a wb has in the bdi.
* cgroup writeback support adds one pointer to struct inode.
Missing pieces
--------------
* It requires some cooperation from the filesystem and currently only
works with ext2. The changes necessary on the filesystem side are
almost trivial. I'll write up a documentation on it.
* blk-throttle works but cfq-iosched isn't ready for writebacks coming
down with different cgroups. cfq-iosched should be updated to have
a writeback ioc per cgroup and route writeback IOs through it.
How to test
-----------
* Boot with kernel option "cgroup__DEVEL__legacy_files_on_dfl".
* umount /sys/fs/cgroup/memory
umount /sys/fs/cgroup/blkio
mkdir /sys/fs/cgroup/unified
mount -t cgroup -o __DEVEL__sane_behavior cgroup /sys/fs/cgroup/unified
echo +blkio > /sys/fs/cgroup/unified/cgroup.subtree_control
* Build the cgroup hierarchy (don't forget to enable blkio using
subtree_control) and put processes in cgroups and run tests on ext2
filesystems and blkio.throttle.* knobs.
This patchset contains the following 51 patches.
0001-page_writeback-revive-cancel_dirty_page-in-a-restric.patch
0002-memcg-add-per-cgroup-dirty-page-accounting.patch
0003-blkcg-move-block-blk-cgroup.h-to-include-linux-blk-c.patch
0004-update-CONFIG_BLK_CGROUP-dummies-in-include-linux-bl.patch
0005-blkcg-always-create-the-blkcg_gq-for-the-root-blkcg.patch
0006-memcg-add-mem_cgroup_root_css.patch
0007-blkcg-add-blkcg_root_css.patch
0008-cgroup-block-implement-task_get_css-and-use-it-in-bi.patch
0009-blkcg-implement-task_get_blkcg_css.patch
0010-blkcg-implement-bio_associate_blkcg.patch
0011-memcg-implement-mem_cgroup_css_from_page.patch
0012-writeback-move-backing_dev_info-state-into-bdi_write.patch
0013-writeback-move-backing_dev_info-bdi_stat-into-bdi_wr.patch
0014-writeback-move-bandwidth-related-fields-from-backing.patch
0015-writeback-s-bdi-wb-in-mm-page-writeback.c.patch
0016-writeback-move-backing_dev_info-wb_lock-and-worklist.patch
0017-writeback-reorganize-mm-backing-dev.c.patch
0018-writeback-separate-out-include-linux-backing-dev-def.patch
0019-bdi-make-inode_to_bdi-inline.patch
0020-writeback-add-gfp-to-wb_init.patch
0021-bdi-separate-out-congested-state-into-a-separate-str.patch
0022-writeback-add-CONFIG-BDI_CAP-FS-_CGROUP_WRITEBACK.patch
0023-writeback-make-backing_dev_info-host-cgroup-specific.patch
0024-writeback-blkcg-associate-each-blkcg_gq-with-the-cor.patch
0025-writeback-attribute-stats-to-the-matching-per-cgroup.patch
0026-writeback-let-balance_dirty_pages-work-on-the-matchi.patch
0027-writeback-make-congestion-functions-per-bdi_writebac.patch
0028-writeback-blkcg-restructure-blk_-set-clear-_queue_co.patch
0029-writeback-blkcg-propagate-non-root-blkcg-congestion-.patch
0030-writeback-implement-and-use-inode_congested.patch
0031-writeback-implement-WB_has_dirty_io-wb_state-flag.patch
0032-writeback-implement-backing_dev_info-tot_write_bandw.patch
0033-writeback-make-bdi_has_dirty_io-take-multiple-bdi_wr.patch
0034-writeback-don-t-issue-wb_writeback_work-if-clean.patch
0035-writeback-make-bdi-min-max_ratio-handling-cgroup-wri.patch
0036-writeback-implement-bdi_for_each_wb.patch
0037-writeback-remove-bdi_start_writeback.patch
0038-writeback-make-laptop_mode_timer_fn-handle-multiple-.patch
0039-writeback-make-writeback_in_progress-take-bdi_writeb.patch
0040-writeback-make-bdi_start_background_writeback-take-b.patch
0041-writeback-make-wakeup_flusher_threads-handle-multipl.patch
0042-writeback-make-wakeup_dirtytime_writeback-handle-mul.patch
0043-writeback-add-wb_writeback_work-auto_free.patch
0044-writeback-implement-bdi_wait_for_completion.patch
0045-writeback-implement-wb_wait_for_single_work.patch
0046-writeback-restructure-try_writeback_inodes_sb-_nr.patch
0047-writeback-make-writeback-initiation-functions-handle.patch
0048-writeback-dirty-inodes-against-their-matching-cgroup.patch
0049-buffer-writeback-make-__block_write_full_page-honor-.patch
0050-mpage-make-__mpage_writepage-honor-cgroup-writeback.patch
0051-ext2-enable-cgroup-writeback-support.patch
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup-writeback-20150522
diffstat follows. Thanks.
Documentation/cgroups/memory.txt | 1
block/bio.c | 35
block/blk-cgroup.c | 124 -
block/blk-cgroup.h | 603 --------
block/blk-core.c | 70 -
block/blk-integrity.c | 1
block/blk-sysfs.c | 3
block/blk-throttle.c | 2
block/bounce.c | 1
block/cfq-iosched.c | 2
block/elevator.c | 2
block/genhd.c | 1
drivers/block/drbd/drbd_int.h | 1
drivers/block/drbd/drbd_main.c | 10
drivers/block/pktcdvd.c | 1
drivers/char/raw.c | 1
drivers/md/bcache/request.c | 1
drivers/md/dm.c | 2
drivers/md/dm.h | 1
drivers/md/md.h | 1
drivers/md/raid1.c | 4
drivers/md/raid10.c | 2
drivers/mtd/devices/block2mtd.c | 1
drivers/staging/lustre/lustre/include/linux/lustre_patchless_compat.h | 4
fs/block_dev.c | 9
fs/buffer.c | 64
fs/ext2/super.c | 2
fs/ext4/extents.c | 1
fs/ext4/mballoc.c | 1
fs/ext4/super.c | 1
fs/f2fs/node.c | 4
fs/f2fs/segment.h | 3
fs/fat/file.c | 1
fs/fat/inode.c | 1
fs/fs-writeback.c | 619 ++++++--
fs/fuse/file.c | 12
fs/gfs2/super.c | 2
fs/hfs/super.c | 1
fs/hfsplus/super.c | 1
fs/inode.c | 1
fs/mpage.c | 2
fs/nfs/filelayout/filelayout.c | 1
fs/nfs/internal.h | 2
fs/nfs/write.c | 3
fs/ocfs2/file.c | 1
fs/reiserfs/super.c | 1
fs/ufs/super.c | 1
fs/xfs/xfs_aops.c | 12
fs/xfs/xfs_file.c | 1
include/linux/backing-dev-defs.h | 188 ++
include/linux/backing-dev.h | 567 +++++---
include/linux/bio.h | 3
include/linux/blk-cgroup.h | 631 +++++++++
include/linux/blkdev.h | 21
include/linux/cgroup.h | 25
include/linux/fs.h | 13
include/linux/memcontrol.h | 10
include/linux/mm.h | 7
include/linux/pagemap.h | 3
include/linux/writeback.h | 25
include/trace/events/writeback.h | 8
init/Kconfig | 5
mm/backing-dev.c | 666 +++++++--
mm/fadvise.c | 2
mm/filemap.c | 31
mm/madvise.c | 1
mm/memcontrol.c | 59
mm/page-writeback.c | 696 +++++-----
mm/readahead.c | 2
mm/rmap.c | 2
mm/truncate.c | 18
mm/vmscan.c | 28
72 files changed, 3054 insertions(+), 1578 deletions(-)
--
tejun
[L] http://lkml.kernel.org/g/1428350318-8215-1-git-send-email-tj@kernel.org
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next reply other threads:[~2015-05-22 21:14 UTC|newest]
Thread overview: 131+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-05-22 21:13 Tejun Heo [this message]
2015-05-22 21:13 ` [PATCH 01/51] page_writeback: revive cancel_dirty_page() in a restricted form Tejun Heo
2015-05-22 21:13 ` [PATCH 02/51] memcg: add per cgroup dirty page accounting Tejun Heo
2015-05-22 21:13 ` [PATCH 03/51] blkcg: move block/blk-cgroup.h to include/linux/blk-cgroup.h Tejun Heo
2015-05-22 21:13 ` [PATCH 04/51] update !CONFIG_BLK_CGROUP dummies in include/linux/blk-cgroup.h Tejun Heo
2015-05-22 21:13 ` [PATCH 05/51] blkcg: always create the blkcg_gq for the root blkcg Tejun Heo
2015-05-22 21:13 ` [PATCH 06/51] memcg: add mem_cgroup_root_css Tejun Heo
2015-06-17 14:56 ` Michal Hocko
2015-06-17 18:25 ` Tejun Heo
2015-06-18 11:12 ` Michal Hocko
2015-06-18 17:49 ` Tejun Heo
2015-06-19 9:18 ` Michal Hocko
2015-06-19 15:17 ` Tejun Heo
2015-05-22 21:13 ` [PATCH 07/51] blkcg: add blkcg_root_css Tejun Heo
2015-05-22 21:13 ` [PATCH 08/51] cgroup, block: implement task_get_css() and use it in bio_associate_current() Tejun Heo
2015-05-22 21:13 ` [PATCH 09/51] blkcg: implement task_get_blkcg_css() Tejun Heo
2015-05-22 21:13 ` [PATCH 10/51] blkcg: implement bio_associate_blkcg() Tejun Heo
2015-05-22 21:13 ` [PATCH 11/51] memcg: implement mem_cgroup_css_from_page() Tejun Heo
2015-05-22 23:28 ` Johannes Weiner
2015-05-24 21:24 ` Tejun Heo
2015-05-27 12:58 ` Johannes Weiner
2015-05-27 16:13 ` [PATCH v2 " Tejun Heo
2015-05-27 17:09 ` Johannes Weiner
2015-05-27 17:48 ` Tejun Heo
2015-05-27 17:57 ` [PATCH v3 " Tejun Heo
2015-05-28 0:00 ` [PATCH v4 " Tejun Heo
2015-05-22 21:13 ` [PATCH 12/51] writeback: move backing_dev_info->state into bdi_writeback Tejun Heo
2015-05-22 21:13 ` [PATCH 13/51] writeback: move backing_dev_info->bdi_stat[] " Tejun Heo
2015-05-22 21:13 ` [PATCH 14/51] writeback: move bandwidth related fields from backing_dev_info " Tejun Heo
2015-05-22 21:13 ` [PATCH 15/51] writeback: s/bdi/wb/ in mm/page-writeback.c Tejun Heo
2015-05-22 21:13 ` [PATCH 16/51] writeback: move backing_dev_info->wb_lock and ->worklist into bdi_writeback Tejun Heo
2015-06-07 0:49 ` Sasha Levin
2015-06-08 5:57 ` [PATCH block/for-4.2-writeback] v9fs: fix error handling in v9fs_session_init() Tejun Heo
2015-06-08 15:10 ` Jens Axboe
2015-05-22 21:13 ` [PATCH 17/51] writeback: reorganize mm/backing-dev.c Tejun Heo
2015-05-22 21:13 ` [PATCH 18/51] writeback: separate out include/linux/backing-dev-defs.h Tejun Heo
2015-05-22 21:13 ` [PATCH 19/51] bdi: make inode_to_bdi() inline Tejun Heo
2015-06-30 6:47 ` Jan Kara
2015-05-22 21:13 ` [PATCH 20/51] writeback: add @gfp to wb_init() Tejun Heo
2015-05-22 21:13 ` [PATCH 21/51] bdi: separate out congested state into a separate struct Tejun Heo
2015-06-30 9:21 ` Jan Kara
2015-05-22 21:13 ` [PATCH 22/51] writeback: add {CONFIG|BDI_CAP|FS}_CGROUP_WRITEBACK Tejun Heo
2015-06-30 9:37 ` Jan Kara
2015-07-02 1:10 ` Tejun Heo
2015-07-03 10:49 ` Jan Kara
2015-07-03 17:14 ` Tejun Heo
2015-05-22 21:13 ` [PATCH 23/51] writeback: make backing_dev_info host cgroup-specific bdi_writebacks Tejun Heo
2015-06-30 10:14 ` Jan Kara
2015-05-22 21:13 ` [PATCH 24/51] writeback, blkcg: associate each blkcg_gq with the corresponding bdi_writeback_congested Tejun Heo
2015-06-30 9:08 ` Jan Kara
2015-05-22 21:13 ` [PATCH 25/51] writeback: attribute stats to the matching per-cgroup bdi_writeback Tejun Heo
2015-06-30 14:17 ` Jan Kara
2015-05-22 21:13 ` [PATCH 26/51] writeback: let balance_dirty_pages() work on the matching cgroup bdi_writeback Tejun Heo
2015-06-30 14:31 ` Jan Kara
2015-07-02 1:26 ` Tejun Heo
2015-05-22 21:13 ` [PATCH 27/51] writeback: make congestion functions per bdi_writeback Tejun Heo
2015-06-30 14:50 ` Jan Kara
2015-05-22 21:13 ` [PATCH 28/51] writeback, blkcg: restructure blk_{set|clear}_queue_congested() Tejun Heo
2015-06-30 15:02 ` Jan Kara
2015-07-02 1:38 ` Tejun Heo
2015-07-03 12:16 ` Jan Kara
2015-05-22 21:13 ` [PATCH 29/51] writeback, blkcg: propagate non-root blkcg congestion state Tejun Heo
2015-06-30 15:03 ` Jan Kara
2015-05-22 21:13 ` [PATCH 30/51] writeback: implement and use inode_congested() Tejun Heo
2015-06-30 15:21 ` Jan Kara
2015-07-02 1:46 ` Tejun Heo
2015-07-03 12:17 ` Jan Kara
2015-07-03 17:07 ` Tejun Heo
2015-07-04 15:12 ` [PATCH block/for-4.3] writeback: explain why @inode is allowed to be NULL for inode_congested() Tejun Heo
2015-07-08 8:12 ` Jan Kara
2015-05-22 21:13 ` [PATCH 31/51] writeback: implement WB_has_dirty_io wb_state flag Tejun Heo
2015-06-30 15:42 ` Jan Kara
2015-05-22 21:13 ` [PATCH 32/51] writeback: implement backing_dev_info->tot_write_bandwidth Tejun Heo
2015-06-30 16:14 ` Jan Kara
2015-06-30 16:42 ` Jan Kara
2015-05-22 21:13 ` [PATCH 33/51] writeback: make bdi_has_dirty_io() take multiple bdi_writeback's into account Tejun Heo
2015-06-30 16:48 ` Jan Kara
2015-07-02 2:01 ` Tejun Heo
2015-05-22 21:13 ` [PATCH 34/51] writeback: don't issue wb_writeback_work if clean Tejun Heo
2015-06-30 16:18 ` Jan Kara
2015-05-22 21:13 ` [PATCH 35/51] writeback: make bdi->min/max_ratio handling cgroup writeback aware Tejun Heo
2015-07-01 7:00 ` Jan Kara
2015-05-22 21:13 ` [PATCH 36/51] writeback: implement bdi_for_each_wb() Tejun Heo
2015-07-01 7:27 ` Jan Kara
2015-07-02 2:22 ` Tejun Heo
2015-07-03 12:26 ` Jan Kara
2015-07-03 17:06 ` Tejun Heo
2015-05-22 21:13 ` [PATCH 37/51] writeback: remove bdi_start_writeback() Tejun Heo
2015-07-01 7:30 ` Jan Kara
2015-05-22 21:13 ` [PATCH 38/51] writeback: make laptop_mode_timer_fn() handle multiple bdi_writeback's Tejun Heo
2015-07-01 7:32 ` Jan Kara
2015-05-22 21:13 ` [PATCH 39/51] writeback: make writeback_in_progress() take bdi_writeback instead of backing_dev_info Tejun Heo
2015-07-01 7:47 ` Jan Kara
2015-07-02 2:28 ` Tejun Heo
2015-05-22 21:13 ` [PATCH 40/51] writeback: make bdi_start_background_writeback() " Tejun Heo
2015-07-01 7:50 ` Jan Kara
2015-07-02 2:29 ` Tejun Heo
2015-07-06 19:36 ` [PATCH block/for-4.3] writeback: update writeback tracepoints to report cgroup Tejun Heo
2015-07-08 8:17 ` Jan Kara
2015-05-22 21:13 ` [PATCH 41/51] writeback: make wakeup_flusher_threads() handle multiple bdi_writeback's Tejun Heo
2015-07-01 8:15 ` Jan Kara
2015-07-02 2:37 ` Tejun Heo
2015-07-03 13:02 ` Jan Kara
2015-07-03 16:33 ` Tejun Heo
2015-05-22 21:13 ` [PATCH 42/51] writeback: make wakeup_dirtytime_writeback() " Tejun Heo
2015-07-01 8:20 ` Jan Kara
2015-05-22 21:13 ` [PATCH 43/51] writeback: add wb_writeback_work->auto_free Tejun Heo
2015-05-22 21:13 ` [PATCH 44/51] writeback: implement bdi_wait_for_completion() Tejun Heo
2015-07-01 16:04 ` Jan Kara
2015-07-02 3:06 ` Tejun Heo
2015-07-03 12:36 ` Jan Kara
2015-07-03 17:02 ` Tejun Heo
2015-07-01 16:09 ` Jan Kara
2015-07-02 3:01 ` Tejun Heo
2015-05-22 21:13 ` [PATCH 45/51] writeback: implement wb_wait_for_single_work() Tejun Heo
2015-07-01 19:07 ` Jan Kara
2015-07-02 3:07 ` Tejun Heo
2015-07-03 22:12 ` [PATCH block/for-4.3] writeback: remove wb_writeback_work->single_wait/done Tejun Heo
2015-07-08 8:24 ` Jan Kara
2015-05-22 21:14 ` [PATCH 46/51] writeback: restructure try_writeback_inodes_sb[_nr]() Tejun Heo
2015-05-22 21:14 ` [PATCH 47/51] writeback: make writeback initiation functions handle multiple bdi_writeback's Tejun Heo
2015-05-22 21:14 ` [PATCH 48/51] writeback: dirty inodes against their matching cgroup bdi_writeback's Tejun Heo
2015-07-01 19:16 ` Jan Kara
2015-05-22 21:14 ` [PATCH 49/51] buffer, writeback: make __block_write_full_page() honor cgroup writeback Tejun Heo
2015-07-01 19:21 ` Jan Kara
2015-07-01 19:28 ` Jan Kara
2015-05-22 21:14 ` [PATCH 50/51] mpage: make __mpage_writepage() " Tejun Heo
2015-07-01 19:26 ` Jan Kara
2015-05-22 21:14 ` [PATCH 51/51] ext2: enable cgroup writeback support Tejun Heo
2015-07-01 19:29 ` Jan Kara
2015-07-02 3:08 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1432329245-5844-1-git-send-email-tj@kernel.org \
--to=tj@kernel.org \
--cc=axboe@kernel.dk \
--cc=cgroups@vger.kernel.org \
--cc=clm@fb.com \
--cc=david@fromorbit.com \
--cc=fengguang.wu@intel.com \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=khlebnikov@yandex-team.ru \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan@huawei.com \
--cc=mhocko@suse.cz \
--cc=vgoyal@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).