All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup
@ 2017-10-10 15:54 Tejun Heo
  2017-10-10 15:54 ` [PATCH 1/5] blkcg: export blkcg_root_css Tejun Heo
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: Tejun Heo @ 2017-10-10 15:54 UTC (permalink / raw)
  To: jack, axboe, clm, jbacik; +Cc: kernel-team, linux-kernel, linux-btrfs

Hello,

Changes from the last version are

* blkcg_root_css exported to fix build breakage on modular btrfs.

* Use ext4_should_journal_data() test instead of
  EXT4_MOUNT_JOURNAL_DATA.

* Separated out create_bh_bio() and used it to implement
  submit_bh_blkcg_css() as suggested by Jan.

btrfs has different ways to issue metadata IOs and may end up issuing
metadata or otherwise shared IOs from a non-root cgroup, which can
lead to priority inversion and ineffective IO isolation.

This patchset makes sure that btrfs issues all metadata and shared IOs
from the root cgroup by exempting btree_inodes from cgroup writeback
and explicitly associating shared IOs with the root cgroup.

This patchset containst he following three patches

 [PATCH 1/5] blkcg: export blkcg_root_css
 [PATCH 2/5] cgroup, writeback: replace SB_I_CGROUPWB with per-inode
 [PATCH 3/5] buffer_head: separate out create_bh_bio() from
 [PATCH 4/5] cgroup, buffer_head: implement submit_bh_blkcg_css()
 [PATCH 5/5] btrfs: ensure that metadata and flush are issued from the

and is also available in the following git branch

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup-btrfs-metadata-v2

diffstat follows.  Thanks.

 block/blk-cgroup.c          |    1 +
 fs/block_dev.c              |    3 +--
 fs/btrfs/check-integrity.c  |    2 +-
 fs/btrfs/disk-io.c          |    4 ++++
 fs/btrfs/ioctl.c            |    6 +++++-
 fs/btrfs/super.c            |    1 -
 fs/buffer.c                 |   42 ++++++++++++++++++++++++++++++++++--------
 fs/ext2/inode.c             |    3 ++-
 fs/ext2/super.c             |    1 -
 fs/ext4/inode.c             |    5 ++++-
 fs/ext4/super.c             |    2 --
 include/linux/backing-dev.h |    2 +-
 include/linux/buffer_head.h |    3 +++
 include/linux/fs.h          |    3 ++-
 14 files changed, 58 insertions(+), 20 deletions(-)

--
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/5] blkcg: export blkcg_root_css
  2017-10-10 15:54 [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup Tejun Heo
@ 2017-10-10 15:54 ` Tejun Heo
  2017-10-11  7:50   ` Jan Kara
  2017-10-10 15:54 ` [PATCH 2/5] cgroup, writeback: replace SB_I_CGROUPWB with per-inode S_CGROUPWB Tejun Heo
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2017-10-10 15:54 UTC (permalink / raw)
  To: jack, axboe, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, Tejun Heo

Export blkcg_root_css so that filesystem modules can use it.

Signed-off-by: Tejun Heo <tj@kernel.org>
---
 block/blk-cgroup.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index d3f56ba..597a457 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -45,6 +45,7 @@ struct blkcg blkcg_root;
 EXPORT_SYMBOL_GPL(blkcg_root);
 
 struct cgroup_subsys_state * const blkcg_root_css = &blkcg_root.css;
+EXPORT_SYMBOL_GPL(blkcg_root_css);
 
 static struct blkcg_policy *blkcg_policy[BLKCG_MAX_POLS];
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/5] cgroup, writeback: replace SB_I_CGROUPWB with per-inode S_CGROUPWB
  2017-10-10 15:54 [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup Tejun Heo
  2017-10-10 15:54 ` [PATCH 1/5] blkcg: export blkcg_root_css Tejun Heo
@ 2017-10-10 15:54 ` Tejun Heo
  2017-10-10 15:54 ` [PATCH 3/5] buffer_head: separate out create_bh_bio() from submit_bh_wbc() Tejun Heo
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2017-10-10 15:54 UTC (permalink / raw)
  To: jack, axboe, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, Tejun Heo,
	Theodore Ts'o, Andreas Dilger, linux-ext4

Currently, filesystem can indiate cgroup writeback support per
superblock; however, depending on the filesystem, especially if inodes
are used to carry metadata, it can be useful to indicate cgroup
writeback support per inode.

This patch replaces the superblock flag SB_I_CGROUPWB with per-inode
S_CGROUPWB, so that cgroup writeback can be enabled selectively.

* block_dev sets the new flag in bdget() when initializing new inode.

* ext2 sets the new flag in ext2_set_inode_flags().

* ext4 sets the new flag in ext4_set_inode_flags() if
!ext4_should_journal_data(inode).  This makes inodes whose data is
journalled due to inode flags to bypass cgroup writeback.  This
behavior change is intended.

* btrfs sets the new flag in btrfs_update_iflags() function.  Note
  that this automatically excludes btree_inode which doesn't use
  btrfs_update_iflags() during initialization.  This behavior change
  is intended.

v2: Use ext4_should_journal_data() as suggested by Jan.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: linux-btrfs@vger.kernel.org
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: linux-ext4@vger.kernel.org
---
 fs/block_dev.c              | 3 +--
 fs/btrfs/ioctl.c            | 4 +++-
 fs/btrfs/super.c            | 1 -
 fs/ext2/inode.c             | 3 ++-
 fs/ext2/super.c             | 1 -
 fs/ext4/inode.c             | 5 ++++-
 fs/ext4/super.c             | 2 --
 include/linux/backing-dev.h | 2 +-
 include/linux/fs.h          | 3 ++-
 9 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 93d088f..ff9c282 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -800,8 +800,6 @@ static struct dentry *bd_mount(struct file_system_type *fs_type,
 {
 	struct dentry *dent;
 	dent = mount_pseudo(fs_type, "bdev:", &bdev_sops, NULL, BDEVFS_MAGIC);
-	if (!IS_ERR(dent))
-		dent->d_sb->s_iflags |= SB_I_CGROUPWB;
 	return dent;
 }
 
@@ -893,6 +891,7 @@ struct block_device *bdget(dev_t dev)
 		inode->i_mode = S_IFBLK;
 		inode->i_rdev = dev;
 		inode->i_bdev = bdev;
+		inode->i_flags |= S_CGROUPWB;
 		inode->i_data.a_ops = &def_blk_aops;
 		mapping_set_gfp_mask(&inode->i_data, GFP_USER);
 		spin_lock(&bdev_lock);
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 6c7a49f..117cc63 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -150,9 +150,11 @@ void btrfs_update_iflags(struct inode *inode)
 		new_fl |= S_NOATIME;
 	if (ip->flags & BTRFS_INODE_DIRSYNC)
 		new_fl |= S_DIRSYNC;
+	new_fl |= S_CGROUPWB;
 
 	set_mask_bits(&inode->i_flags,
-		      S_SYNC | S_APPEND | S_IMMUTABLE | S_NOATIME | S_DIRSYNC,
+		      S_SYNC | S_APPEND | S_IMMUTABLE | S_NOATIME | S_DIRSYNC |
+		      S_CGROUPWB,
 		      new_fl);
 }
 
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 35a128a..46a1488 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -1136,7 +1136,6 @@ static int btrfs_fill_super(struct super_block *sb,
 	sb->s_flags |= MS_POSIXACL;
 #endif
 	sb->s_flags |= MS_I_VERSION;
-	sb->s_iflags |= SB_I_CGROUPWB;
 
 	err = super_setup_bdi(sb);
 	if (err) {
diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c
index 4dca6f3..3c5d822 100644
--- a/fs/ext2/inode.c
+++ b/fs/ext2/inode.c
@@ -1372,7 +1372,7 @@ void ext2_set_inode_flags(struct inode *inode)
 	unsigned int flags = EXT2_I(inode)->i_flags;
 
 	inode->i_flags &= ~(S_SYNC | S_APPEND | S_IMMUTABLE | S_NOATIME |
-				S_DIRSYNC | S_DAX);
+				S_DIRSYNC | S_DAX | S_CGROUPWB);
 	if (flags & EXT2_SYNC_FL)
 		inode->i_flags |= S_SYNC;
 	if (flags & EXT2_APPEND_FL)
@@ -1385,6 +1385,7 @@ void ext2_set_inode_flags(struct inode *inode)
 		inode->i_flags |= S_DIRSYNC;
 	if (test_opt(inode->i_sb, DAX) && S_ISREG(inode->i_mode))
 		inode->i_flags |= S_DAX;
+	inode->i_flags |= S_CGROUPWB;
 }
 
 struct inode *ext2_iget (struct super_block *sb, unsigned long ino)
diff --git a/fs/ext2/super.c b/fs/ext2/super.c
index 1458706..e6ba669e 100644
--- a/fs/ext2/super.c
+++ b/fs/ext2/super.c
@@ -922,7 +922,6 @@ static int ext2_fill_super(struct super_block *sb, void *data, int silent)
 	sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
 		((EXT2_SB(sb)->s_mount_opt & EXT2_MOUNT_POSIX_ACL) ?
 		 MS_POSIXACL : 0);
-	sb->s_iflags |= SB_I_CGROUPWB;
 
 	if (le32_to_cpu(es->s_rev_level) == EXT2_GOOD_OLD_REV &&
 	    (EXT2_HAS_COMPAT_FEATURE(sb, ~0U) ||
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 31db875..d92c356 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4591,8 +4591,11 @@ void ext4_set_inode_flags(struct inode *inode)
 	    !ext4_should_journal_data(inode) && !ext4_has_inline_data(inode) &&
 	    !ext4_encrypted_inode(inode))
 		new_fl |= S_DAX;
+	if (!ext4_should_journal_data(inode))
+		new_fl |= S_CGROUPWB;
 	inode_set_flags(inode, new_fl,
-			S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|S_DAX);
+			S_SYNC|S_APPEND|S_IMMUTABLE|S_NOATIME|S_DIRSYNC|S_DAX|
+			S_CGROUPWB);
 }
 
 static blkcnt_t ext4_inode_blocks(struct ext4_inode *raw_inode,
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index b104096..44ddf1d 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -3620,8 +3620,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
 		}
 		if (test_opt(sb, DELALLOC))
 			clear_opt(sb, DELALLOC);
-	} else {
-		sb->s_iflags |= SB_I_CGROUPWB;
 	}
 
 	sb->s_flags = (sb->s_flags & ~MS_POSIXACL) |
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 854e1bd..7274de0 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -246,7 +246,7 @@ static inline bool inode_cgwb_enabled(struct inode *inode)
 		cgroup_subsys_on_dfl(io_cgrp_subsys) &&
 		bdi_cap_account_dirty(bdi) &&
 		(bdi->capabilities & BDI_CAP_CGROUP_WRITEBACK) &&
-		(inode->i_sb->s_iflags & SB_I_CGROUPWB);
+		IS_CGROUPWB(inode);
 }
 
 /**
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 13dab19..be2d8a6 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1308,7 +1308,6 @@ extern int send_sigurg(struct fown_struct *fown);
 #define UMOUNT_UNUSED	0x80000000	/* Flag guaranteed to be unused */
 
 /* sb->s_iflags */
-#define SB_I_CGROUPWB	0x00000001	/* cgroup-aware writeback enabled */
 #define SB_I_NOEXEC	0x00000002	/* Ignore executables on this fs */
 #define SB_I_NODEV	0x00000004	/* Ignore devices on this fs */
 
@@ -1853,6 +1852,7 @@ struct super_operations {
 #else
 #define S_DAX		0	/* Make all the DAX code disappear */
 #endif
+#define S_CGROUPWB	16384	/* Enable cgroup writeback for this inode */
 
 /*
  * Note that nosuid etc flags are inode-specific: setting some file-system
@@ -1892,6 +1892,7 @@ static inline bool sb_rdonly(const struct super_block *sb) { return sb->s_flags
 #define IS_AUTOMOUNT(inode)	((inode)->i_flags & S_AUTOMOUNT)
 #define IS_NOSEC(inode)		((inode)->i_flags & S_NOSEC)
 #define IS_DAX(inode)		((inode)->i_flags & S_DAX)
+#define IS_CGROUPWB(inode)	((inode)->i_flags & S_CGROUPWB)
 
 #define IS_WHITEOUT(inode)	(S_ISCHR(inode->i_mode) && \
 				 (inode)->i_rdev == WHITEOUT_DEV)
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/5] buffer_head: separate out create_bh_bio() from submit_bh_wbc()
  2017-10-10 15:54 [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup Tejun Heo
  2017-10-10 15:54 ` [PATCH 1/5] blkcg: export blkcg_root_css Tejun Heo
  2017-10-10 15:54 ` [PATCH 2/5] cgroup, writeback: replace SB_I_CGROUPWB with per-inode S_CGROUPWB Tejun Heo
@ 2017-10-10 15:54 ` Tejun Heo
  2017-10-11  7:54   ` Jan Kara
  2017-10-10 15:54 ` [PATCH 4/5] cgroup, buffer_head: implement submit_bh_blkcg_css() Tejun Heo
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2017-10-10 15:54 UTC (permalink / raw)
  To: jack, axboe, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, Tejun Heo

submit_bh_wbc() creates a bio matching the specific @bh and submits it
at the end.  This patch separates out the bio creation part to its own
function, create_bh_bio(), and reimplement submit_bh[_wbc]() using the
function.

As bio can now be manipulated before submitted, we can move out @wbc
handling into submit_bh_wbc() and similarly this will make adding more
submit_bh variants straight-forward.

This patch is pure refactoring and doesn't cause any functional
changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Suggested-by: Jan Kara <jack@suse.cz>
---
 fs/buffer.c | 30 ++++++++++++++++++++++--------
 1 file changed, 22 insertions(+), 8 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 170df85..b4b2169 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3086,8 +3086,8 @@ void guard_bio_eod(int op, struct bio *bio)
 	}
 }
 
-static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
-			 enum rw_hint write_hint, struct writeback_control *wbc)
+struct bio *create_bh_bio(int op, int op_flags, struct buffer_head *bh,
+                          enum rw_hint write_hint)
 {
 	struct bio *bio;
 
@@ -3109,11 +3109,6 @@ static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
 	 */
 	bio = bio_alloc(GFP_NOIO, 1);
 
-	if (wbc) {
-		wbc_init_bio(wbc, bio);
-		wbc_account_io(wbc, bh->b_page, bh->b_size);
-	}
-
 	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
 	bio_set_dev(bio, bh->b_bdev);
 	bio->bi_write_hint = write_hint;
@@ -3133,13 +3128,32 @@ static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
 		op_flags |= REQ_PRIO;
 	bio_set_op_attrs(bio, op, op_flags);
 
+	return bio;
+}
+
+static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
+			 enum rw_hint write_hint, struct writeback_control *wbc)
+{
+	struct bio *bio;
+
+	bio = create_bh_bio(op, op_flags, bh, write_hint);
+
+	if (wbc) {
+		wbc_init_bio(wbc, bio);
+		wbc_account_io(wbc, bh->b_page, bh->b_size);
+	}
+
 	submit_bio(bio);
 	return 0;
 }
 
 int submit_bh(int op, int op_flags, struct buffer_head *bh)
 {
-	return submit_bh_wbc(op, op_flags, bh, 0, NULL);
+	struct bio *bio;
+
+	bio = create_bh_bio(op, op_flags, bh, 0);
+	submit_bio(bio);
+	return 0;
 }
 EXPORT_SYMBOL(submit_bh);
 
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/5] cgroup, buffer_head: implement submit_bh_blkcg_css()
  2017-10-10 15:54 [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup Tejun Heo
                   ` (2 preceding siblings ...)
  2017-10-10 15:54 ` [PATCH 3/5] buffer_head: separate out create_bh_bio() from submit_bh_wbc() Tejun Heo
@ 2017-10-10 15:54 ` Tejun Heo
  2017-10-11  7:55   ` Jan Kara
  2017-10-10 15:54 ` [PATCH 5/5] btrfs: ensure that metadata and flush are issued from the root cgroup Tejun Heo
  2017-11-29 16:56 ` [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs " Jan Kara
  5 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2017-10-10 15:54 UTC (permalink / raw)
  To: jack, axboe, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, Tejun Heo

Implement submit_bh_blkcg_css() which will be used to override cgroup
membership on specific buffer_heads.

v2: Reimplemented using create_bh_bio() as suggested by Jan.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Jens Axboe <axboe@kernel.dk>
---
 fs/buffer.c                 | 12 ++++++++++++
 include/linux/buffer_head.h |  3 +++
 2 files changed, 15 insertions(+)

diff --git a/fs/buffer.c b/fs/buffer.c
index b4b2169..ed0e473 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -3147,6 +3147,18 @@ static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
 	return 0;
 }
 
+int submit_bh_blkcg_css(int op, int op_flags, struct buffer_head *bh,
+			struct cgroup_subsys_state *blkcg_css)
+{
+	struct bio *bio;
+
+	bio = create_bh_bio(op, op_flags, bh, 0);
+	bio_associate_blkcg(bio, blkcg_css);
+	submit_bio(bio);
+	return 0;
+}
+EXPORT_SYMBOL(submit_bh_blkcg_css);
+
 int submit_bh(int op, int op_flags, struct buffer_head *bh)
 {
 	struct bio *bio;
diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
index c8dae55..dca5b3b 100644
--- a/include/linux/buffer_head.h
+++ b/include/linux/buffer_head.h
@@ -13,6 +13,7 @@
 #include <linux/pagemap.h>
 #include <linux/wait.h>
 #include <linux/atomic.h>
+#include <linux/cgroup.h>
 
 #ifdef CONFIG_BLOCK
 
@@ -197,6 +198,8 @@ void ll_rw_block(int, int, int, struct buffer_head * bh[]);
 int sync_dirty_buffer(struct buffer_head *bh);
 int __sync_dirty_buffer(struct buffer_head *bh, int op_flags);
 void write_dirty_buffer(struct buffer_head *bh, int op_flags);
+int submit_bh_blkcg_css(int op, int op_flags, struct buffer_head *bh,
+			struct cgroup_subsys_state *blkcg_css);
 int submit_bh(int, int, struct buffer_head *);
 void write_boundary_block(struct block_device *bdev,
 			sector_t bblock, unsigned blocksize);
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/5] btrfs: ensure that metadata and flush are issued from the root cgroup
  2017-10-10 15:54 [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup Tejun Heo
                   ` (3 preceding siblings ...)
  2017-10-10 15:54 ` [PATCH 4/5] cgroup, buffer_head: implement submit_bh_blkcg_css() Tejun Heo
@ 2017-10-10 15:54 ` Tejun Heo
  2017-10-10 16:43   ` [PATCH v2 " Tejun Heo
  2017-11-29 16:56 ` [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs " Jan Kara
  5 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2017-10-10 15:54 UTC (permalink / raw)
  To: jack, axboe, clm, jbacik
  Cc: kernel-team, linux-kernel, linux-btrfs, Tejun Heo

Issuing metdata or otherwise shared IOs from !root cgroup can lead to
priority inversion.  This patch ensures that those IOs are always
issued from the root cgroup.

This patch updates btrfs_update_iflags() to not set S_CGROUPWB on
btree_inodes.  This isn't strictly necessary as those inodes don't
call the function during init; however, this serves as documentation
and prevents possible future mistakes.  If this isn't desirable,
please feel free to drop the section.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
---
 fs/btrfs/check-integrity.c | 2 +-
 fs/btrfs/disk-io.c         | 4 ++++
 fs/btrfs/ioctl.c           | 4 +++-
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 7d5a9b5..058dea6 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -2741,7 +2741,7 @@ int btrfsic_submit_bh(int op, int op_flags, struct buffer_head *bh)
 	struct btrfsic_dev_state *dev_state;
 
 	if (!btrfsic_is_initialized)
-		return submit_bh(op, op_flags, bh);
+		return submit_bh_blkcg_css(op, op_flags, blkcg_root_css);
 
 	mutex_lock(&btrfsic_mutex);
 	/* since btrfsic_submit_bh() might also be called before
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index dfdab84..fe8bbe1 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1025,6 +1025,8 @@ static blk_status_t btree_submit_bio_hook(void *private_data, struct bio *bio,
 	int async = check_async_write(bio_flags);
 	blk_status_t ret;
 
+	bio_associate_blkcg(bio, blkcg_root_css);
+
 	if (bio_op(bio) != REQ_OP_WRITE) {
 		/*
 		 * called for a read, do the setup so that checksum validation
@@ -3512,6 +3514,8 @@ static void write_dev_flush(struct btrfs_device *device)
 		return;
 
 	bio_reset(bio);
+	bio_associate_blkcg(bio, blkcg_root_css);
+
 	bio->bi_end_io = btrfs_end_empty_barrier;
 	bio_set_dev(bio, device->bdev);
 	bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 117cc63..8a7db6c 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -150,7 +150,9 @@ void btrfs_update_iflags(struct inode *inode)
 		new_fl |= S_NOATIME;
 	if (ip->flags & BTRFS_INODE_DIRSYNC)
 		new_fl |= S_DIRSYNC;
-	new_fl |= S_CGROUPWB;
+	/* btree_inodes are always in the root cgroup */
+	if (btrfs_ino(ip) != BTRFS_BTREE_INODE_OBJECTID)
+		new_fl |= S_CGROUPWB;
 
 	set_mask_bits(&inode->i_flags,
 		      S_SYNC | S_APPEND | S_IMMUTABLE | S_NOATIME | S_DIRSYNC |
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 5/5] btrfs: ensure that metadata and flush are issued from the root cgroup
  2017-10-10 15:54 ` [PATCH 5/5] btrfs: ensure that metadata and flush are issued from the root cgroup Tejun Heo
@ 2017-10-10 16:43   ` Tejun Heo
  2017-10-10 17:45     ` Liu Bo
  2017-10-11 17:07     ` David Sterba
  0 siblings, 2 replies; 22+ messages in thread
From: Tejun Heo @ 2017-10-10 16:43 UTC (permalink / raw)
  To: jack, axboe, clm, jbacik; +Cc: kernel-team, linux-kernel, linux-btrfs

>From 3bbed8c7747739cda48f592f165e8839da076a3a Mon Sep 17 00:00:00 2001

Issuing metdata or otherwise shared IOs from !root cgroup can lead to
priority inversion.  This patch ensures that those IOs are always
issued from the root cgroup.

This patch updates btrfs_update_iflags() to not set S_CGROUPWB on
btree_inodes.  This isn't strictly necessary as those inodes don't
call the function during init; however, this serves as documentation
and prevents possible future mistakes.  If this isn't desirable,
please feel free to drop the section.

v2: Fixed missing @bh in submit_bh_blkcg_css() call.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
---
 fs/btrfs/check-integrity.c | 2 +-
 fs/btrfs/disk-io.c         | 4 ++++
 fs/btrfs/ioctl.c           | 4 +++-
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
index 7d5a9b5..d66774e 100644
--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -2741,7 +2741,7 @@ int btrfsic_submit_bh(int op, int op_flags, struct buffer_head *bh)
 	struct btrfsic_dev_state *dev_state;
 
 	if (!btrfsic_is_initialized)
-		return submit_bh(op, op_flags, bh);
+		return submit_bh_blkcg_css(op, op_flags, bh, blkcg_root_css);
 
 	mutex_lock(&btrfsic_mutex);
 	/* since btrfsic_submit_bh() might also be called before
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index dfdab84..fe8bbe1 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1025,6 +1025,8 @@ static blk_status_t btree_submit_bio_hook(void *private_data, struct bio *bio,
 	int async = check_async_write(bio_flags);
 	blk_status_t ret;
 
+	bio_associate_blkcg(bio, blkcg_root_css);
+
 	if (bio_op(bio) != REQ_OP_WRITE) {
 		/*
 		 * called for a read, do the setup so that checksum validation
@@ -3512,6 +3514,8 @@ static void write_dev_flush(struct btrfs_device *device)
 		return;
 
 	bio_reset(bio);
+	bio_associate_blkcg(bio, blkcg_root_css);
+
 	bio->bi_end_io = btrfs_end_empty_barrier;
 	bio_set_dev(bio, device->bdev);
 	bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH;
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
index 117cc63..8a7db6c 100644
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -150,7 +150,9 @@ void btrfs_update_iflags(struct inode *inode)
 		new_fl |= S_NOATIME;
 	if (ip->flags & BTRFS_INODE_DIRSYNC)
 		new_fl |= S_DIRSYNC;
-	new_fl |= S_CGROUPWB;
+	/* btree_inodes are always in the root cgroup */
+	if (btrfs_ino(ip) != BTRFS_BTREE_INODE_OBJECTID)
+		new_fl |= S_CGROUPWB;
 
 	set_mask_bits(&inode->i_flags,
 		      S_SYNC | S_APPEND | S_IMMUTABLE | S_NOATIME | S_DIRSYNC |
-- 
2.9.5


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/5] btrfs: ensure that metadata and flush are issued from the root cgroup
  2017-10-10 16:43   ` [PATCH v2 " Tejun Heo
@ 2017-10-10 17:45     ` Liu Bo
  2017-10-11 17:07     ` David Sterba
  1 sibling, 0 replies; 22+ messages in thread
From: Liu Bo @ 2017-10-10 17:45 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jack, axboe, clm, jbacik, kernel-team, linux-kernel, linux-btrfs

On Tue, Oct 10, 2017 at 09:43:26AM -0700, Tejun Heo wrote:
> From 3bbed8c7747739cda48f592f165e8839da076a3a Mon Sep 17 00:00:00 2001
> 
> Issuing metdata or otherwise shared IOs from !root cgroup can lead to
> priority inversion.  This patch ensures that those IOs are always
> issued from the root cgroup.
> 
> This patch updates btrfs_update_iflags() to not set S_CGROUPWB on
> btree_inodes.  This isn't strictly necessary as those inodes don't
> call the function during init; however, this serves as documentation
> and prevents possible future mistakes.  If this isn't desirable,
> please feel free to drop the section.
> 
> v2: Fixed missing @bh in submit_bh_blkcg_css() call.
>

Looks good.

Reviewed-by: Liu Bo <bo.li.liu@oracle.com>

-liubo

> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Chris Mason <clm@fb.com>
> Cc: Josef Bacik <jbacik@fb.com>
> ---
>  fs/btrfs/check-integrity.c | 2 +-
>  fs/btrfs/disk-io.c         | 4 ++++
>  fs/btrfs/ioctl.c           | 4 +++-
>  3 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
> index 7d5a9b5..d66774e 100644
> --- a/fs/btrfs/check-integrity.c
> +++ b/fs/btrfs/check-integrity.c
> @@ -2741,7 +2741,7 @@ int btrfsic_submit_bh(int op, int op_flags, struct buffer_head *bh)
>  	struct btrfsic_dev_state *dev_state;
>  
>  	if (!btrfsic_is_initialized)
> -		return submit_bh(op, op_flags, bh);
> +		return submit_bh_blkcg_css(op, op_flags, bh, blkcg_root_css);
>  
>  	mutex_lock(&btrfsic_mutex);
>  	/* since btrfsic_submit_bh() might also be called before
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index dfdab84..fe8bbe1 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1025,6 +1025,8 @@ static blk_status_t btree_submit_bio_hook(void *private_data, struct bio *bio,
>  	int async = check_async_write(bio_flags);
>  	blk_status_t ret;
>  
> +	bio_associate_blkcg(bio, blkcg_root_css);
> +
>  	if (bio_op(bio) != REQ_OP_WRITE) {
>  		/*
>  		 * called for a read, do the setup so that checksum validation
> @@ -3512,6 +3514,8 @@ static void write_dev_flush(struct btrfs_device *device)
>  		return;
>  
>  	bio_reset(bio);
> +	bio_associate_blkcg(bio, blkcg_root_css);
> +
>  	bio->bi_end_io = btrfs_end_empty_barrier;
>  	bio_set_dev(bio, device->bdev);
>  	bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH;
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 117cc63..8a7db6c 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -150,7 +150,9 @@ void btrfs_update_iflags(struct inode *inode)
>  		new_fl |= S_NOATIME;
>  	if (ip->flags & BTRFS_INODE_DIRSYNC)
>  		new_fl |= S_DIRSYNC;
> -	new_fl |= S_CGROUPWB;
> +	/* btree_inodes are always in the root cgroup */
> +	if (btrfs_ino(ip) != BTRFS_BTREE_INODE_OBJECTID)
> +		new_fl |= S_CGROUPWB;
>  
>  	set_mask_bits(&inode->i_flags,
>  		      S_SYNC | S_APPEND | S_IMMUTABLE | S_NOATIME | S_DIRSYNC |
> -- 
> 2.9.5
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] blkcg: export blkcg_root_css
  2017-10-10 15:54 ` [PATCH 1/5] blkcg: export blkcg_root_css Tejun Heo
@ 2017-10-11  7:50   ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2017-10-11  7:50 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jack, axboe, clm, jbacik, kernel-team, linux-kernel, linux-btrfs

On Tue 10-10-17 08:54:37, Tejun Heo wrote:
> Export blkcg_root_css so that filesystem modules can use it.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>

Looks good. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  block/blk-cgroup.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
> index d3f56ba..597a457 100644
> --- a/block/blk-cgroup.c
> +++ b/block/blk-cgroup.c
> @@ -45,6 +45,7 @@ struct blkcg blkcg_root;
>  EXPORT_SYMBOL_GPL(blkcg_root);
>  
>  struct cgroup_subsys_state * const blkcg_root_css = &blkcg_root.css;
> +EXPORT_SYMBOL_GPL(blkcg_root_css);
>  
>  static struct blkcg_policy *blkcg_policy[BLKCG_MAX_POLS];
>  
> -- 
> 2.9.5
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] buffer_head: separate out create_bh_bio() from submit_bh_wbc()
  2017-10-10 15:54 ` [PATCH 3/5] buffer_head: separate out create_bh_bio() from submit_bh_wbc() Tejun Heo
@ 2017-10-11  7:54   ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2017-10-11  7:54 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jack, axboe, clm, jbacik, kernel-team, linux-kernel, linux-btrfs

On Tue 10-10-17 08:54:39, Tejun Heo wrote:
> submit_bh_wbc() creates a bio matching the specific @bh and submits it
> at the end.  This patch separates out the bio creation part to its own
> function, create_bh_bio(), and reimplement submit_bh[_wbc]() using the
> function.
> 
> As bio can now be manipulated before submitted, we can move out @wbc
> handling into submit_bh_wbc() and similarly this will make adding more
> submit_bh variants straight-forward.
> 
> This patch is pure refactoring and doesn't cause any functional
> changes.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Suggested-by: Jan Kara <jack@suse.cz>

Looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza


> ---
>  fs/buffer.c | 30 ++++++++++++++++++++++--------
>  1 file changed, 22 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index 170df85..b4b2169 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -3086,8 +3086,8 @@ void guard_bio_eod(int op, struct bio *bio)
>  	}
>  }
>  
> -static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
> -			 enum rw_hint write_hint, struct writeback_control *wbc)
> +struct bio *create_bh_bio(int op, int op_flags, struct buffer_head *bh,
> +                          enum rw_hint write_hint)
>  {
>  	struct bio *bio;
>  
> @@ -3109,11 +3109,6 @@ static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
>  	 */
>  	bio = bio_alloc(GFP_NOIO, 1);
>  
> -	if (wbc) {
> -		wbc_init_bio(wbc, bio);
> -		wbc_account_io(wbc, bh->b_page, bh->b_size);
> -	}
> -
>  	bio->bi_iter.bi_sector = bh->b_blocknr * (bh->b_size >> 9);
>  	bio_set_dev(bio, bh->b_bdev);
>  	bio->bi_write_hint = write_hint;
> @@ -3133,13 +3128,32 @@ static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
>  		op_flags |= REQ_PRIO;
>  	bio_set_op_attrs(bio, op, op_flags);
>  
> +	return bio;
> +}
> +
> +static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
> +			 enum rw_hint write_hint, struct writeback_control *wbc)
> +{
> +	struct bio *bio;
> +
> +	bio = create_bh_bio(op, op_flags, bh, write_hint);
> +
> +	if (wbc) {
> +		wbc_init_bio(wbc, bio);
> +		wbc_account_io(wbc, bh->b_page, bh->b_size);
> +	}
> +
>  	submit_bio(bio);
>  	return 0;
>  }
>  
>  int submit_bh(int op, int op_flags, struct buffer_head *bh)
>  {
> -	return submit_bh_wbc(op, op_flags, bh, 0, NULL);
> +	struct bio *bio;
> +
> +	bio = create_bh_bio(op, op_flags, bh, 0);
> +	submit_bio(bio);
> +	return 0;
>  }
>  EXPORT_SYMBOL(submit_bh);
>  
> -- 
> 2.9.5
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] cgroup, buffer_head: implement submit_bh_blkcg_css()
  2017-10-10 15:54 ` [PATCH 4/5] cgroup, buffer_head: implement submit_bh_blkcg_css() Tejun Heo
@ 2017-10-11  7:55   ` Jan Kara
  0 siblings, 0 replies; 22+ messages in thread
From: Jan Kara @ 2017-10-11  7:55 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jack, axboe, clm, jbacik, kernel-team, linux-kernel, linux-btrfs

On Tue 10-10-17 08:54:40, Tejun Heo wrote:
> Implement submit_bh_blkcg_css() which will be used to override cgroup
> membership on specific buffer_heads.
> 
> v2: Reimplemented using create_bh_bio() as suggested by Jan.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Jan Kara <jack@suse.cz>
> Cc: Jens Axboe <axboe@kernel.dk>

Looks good. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  fs/buffer.c                 | 12 ++++++++++++
>  include/linux/buffer_head.h |  3 +++
>  2 files changed, 15 insertions(+)
> 
> diff --git a/fs/buffer.c b/fs/buffer.c
> index b4b2169..ed0e473 100644
> --- a/fs/buffer.c
> +++ b/fs/buffer.c
> @@ -3147,6 +3147,18 @@ static int submit_bh_wbc(int op, int op_flags, struct buffer_head *bh,
>  	return 0;
>  }
>  
> +int submit_bh_blkcg_css(int op, int op_flags, struct buffer_head *bh,
> +			struct cgroup_subsys_state *blkcg_css)
> +{
> +	struct bio *bio;
> +
> +	bio = create_bh_bio(op, op_flags, bh, 0);
> +	bio_associate_blkcg(bio, blkcg_css);
> +	submit_bio(bio);
> +	return 0;
> +}
> +EXPORT_SYMBOL(submit_bh_blkcg_css);
> +
>  int submit_bh(int op, int op_flags, struct buffer_head *bh)
>  {
>  	struct bio *bio;
> diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h
> index c8dae55..dca5b3b 100644
> --- a/include/linux/buffer_head.h
> +++ b/include/linux/buffer_head.h
> @@ -13,6 +13,7 @@
>  #include <linux/pagemap.h>
>  #include <linux/wait.h>
>  #include <linux/atomic.h>
> +#include <linux/cgroup.h>
>  
>  #ifdef CONFIG_BLOCK
>  
> @@ -197,6 +198,8 @@ void ll_rw_block(int, int, int, struct buffer_head * bh[]);
>  int sync_dirty_buffer(struct buffer_head *bh);
>  int __sync_dirty_buffer(struct buffer_head *bh, int op_flags);
>  void write_dirty_buffer(struct buffer_head *bh, int op_flags);
> +int submit_bh_blkcg_css(int op, int op_flags, struct buffer_head *bh,
> +			struct cgroup_subsys_state *blkcg_css);
>  int submit_bh(int, int, struct buffer_head *);
>  void write_boundary_block(struct block_device *bdev,
>  			sector_t bblock, unsigned blocksize);
> -- 
> 2.9.5
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/5] btrfs: ensure that metadata and flush are issued from the root cgroup
  2017-10-10 16:43   ` [PATCH v2 " Tejun Heo
  2017-10-10 17:45     ` Liu Bo
@ 2017-10-11 17:07     ` David Sterba
  2017-10-12 15:38       ` Tejun Heo
  2017-10-12 17:06       ` [PATCH v3 " Tejun Heo
  1 sibling, 2 replies; 22+ messages in thread
From: David Sterba @ 2017-10-11 17:07 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jack, axboe, clm, jbacik, kernel-team, linux-kernel, linux-btrfs

On Tue, Oct 10, 2017 at 09:43:26AM -0700, Tejun Heo wrote:
> >From 3bbed8c7747739cda48f592f165e8839da076a3a Mon Sep 17 00:00:00 2001
> 
> Issuing metdata or otherwise shared IOs from !root cgroup can lead to
> priority inversion.  This patch ensures that those IOs are always
> issued from the root cgroup.
> 
> This patch updates btrfs_update_iflags() to not set S_CGROUPWB on
> btree_inodes.

The 'btree_inode' is only one, with inode number 1, and represents all
the metadata, so I don't understand what it means in plural.

> This isn't strictly necessary as those inodes don't
> call the function during init; however, this serves as documentation
> and prevents possible future mistakes.  If this isn't desirable,
> please feel free to drop the section.
> 
> v2: Fixed missing @bh in submit_bh_blkcg_css() call.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Chris Mason <clm@fb.com>
> Cc: Josef Bacik <jbacik@fb.com>
> ---
>  fs/btrfs/check-integrity.c | 2 +-
>  fs/btrfs/disk-io.c         | 4 ++++
>  fs/btrfs/ioctl.c           | 4 +++-
>  3 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c
> index 7d5a9b5..d66774e 100644
> --- a/fs/btrfs/check-integrity.c
> +++ b/fs/btrfs/check-integrity.c
> @@ -2741,7 +2741,7 @@ int btrfsic_submit_bh(int op, int op_flags, struct buffer_head *bh)
>  	struct btrfsic_dev_state *dev_state;
>  
>  	if (!btrfsic_is_initialized)
> -		return submit_bh(op, op_flags, bh);
> +		return submit_bh_blkcg_css(op, op_flags, bh, blkcg_root_css);
>  
>  	mutex_lock(&btrfsic_mutex);
>  	/* since btrfsic_submit_bh() might also be called before
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index dfdab84..fe8bbe1 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -1025,6 +1025,8 @@ static blk_status_t btree_submit_bio_hook(void *private_data, struct bio *bio,
>  	int async = check_async_write(bio_flags);
>  	blk_status_t ret;
>  
> +	bio_associate_blkcg(bio, blkcg_root_css);
> +
>  	if (bio_op(bio) != REQ_OP_WRITE) {
>  		/*
>  		 * called for a read, do the setup so that checksum validation
> @@ -3512,6 +3514,8 @@ static void write_dev_flush(struct btrfs_device *device)
>  		return;
>  
>  	bio_reset(bio);
> +	bio_associate_blkcg(bio, blkcg_root_css);
> +
>  	bio->bi_end_io = btrfs_end_empty_barrier;
>  	bio_set_dev(bio, device->bdev);
>  	bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH;
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 117cc63..8a7db6c 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -150,7 +150,9 @@ void btrfs_update_iflags(struct inode *inode)
>  		new_fl |= S_NOATIME;
>  	if (ip->flags & BTRFS_INODE_DIRSYNC)
>  		new_fl |= S_DIRSYNC;
> -	new_fl |= S_CGROUPWB;
> +	/* btree_inodes are always in the root cgroup */
> +	if (btrfs_ino(ip) != BTRFS_BTREE_INODE_OBJECTID)
> +		new_fl |= S_CGROUPWB;

The comment is useful, but the condition will be always true, so I don't
see the point.

	/*
	 * The btree_inode will be always in the root cgroup. The cgroup
	 * writeback can be enabled on regular inodes selectively.
	 */
	new_fl |= S_CGROUPWB;

is IMHO enough, based on my reading of patch 2/5 changelog.

>  
>  	set_mask_bits(&inode->i_flags,
>  		      S_SYNC | S_APPEND | S_IMMUTABLE | S_NOATIME | S_DIRSYNC |

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 5/5] btrfs: ensure that metadata and flush are issued from the root cgroup
  2017-10-11 17:07     ` David Sterba
@ 2017-10-12 15:38       ` Tejun Heo
  2017-10-12 17:06       ` [PATCH v3 " Tejun Heo
  1 sibling, 0 replies; 22+ messages in thread
From: Tejun Heo @ 2017-10-12 15:38 UTC (permalink / raw)
  To: dsterba, jack, axboe, clm, jbacik, kernel-team, linux-kernel,
	linux-btrfs

On Wed, Oct 11, 2017 at 07:07:23PM +0200, David Sterba wrote:
> The comment is useful, but the condition will be always true, so I don't
> see the point.
> 
> 	/*
> 	 * The btree_inode will be always in the root cgroup. The cgroup
> 	 * writeback can be enabled on regular inodes selectively.
> 	 */
> 	new_fl |= S_CGROUPWB;
> 
> is IMHO enough, based on my reading of patch 2/5 changelog.

Will update accordingly.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v3 5/5] btrfs: ensure that metadata and flush are issued from the root cgroup
  2017-10-11 17:07     ` David Sterba
  2017-10-12 15:38       ` Tejun Heo
@ 2017-10-12 17:06       ` Tejun Heo
  2017-10-12 18:56         ` David Sterba
  1 sibling, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2017-10-12 17:06 UTC (permalink / raw)
  To: dsterba, jack, axboe, clm, jbacik, kernel-team, linux-kernel,
	linux-btrfs

Issuing metdata or otherwise shared IOs from !root cgroup can lead to
priority inversion.  This patch ensures that those IOs are always
issued from the root cgroup.

v3: Dropped unnecessary btree_inode handling as suggested by David
    Sterba.

v2: Fixed missing @bh in submit_bh_blkcg_css() call.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Cc: David Sterba <dsterba@suse.cz>
Cc: Chris Mason <clm@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
---
 fs/btrfs/check-integrity.c |    2 +-
 fs/btrfs/disk-io.c         |    4 ++++
 fs/btrfs/ioctl.c           |    4 ++++
 3 files changed, 9 insertions(+), 1 deletion(-)

--- a/fs/btrfs/check-integrity.c
+++ b/fs/btrfs/check-integrity.c
@@ -2741,7 +2741,7 @@ int btrfsic_submit_bh(int op, int op_fla
 	struct btrfsic_dev_state *dev_state;
 
 	if (!btrfsic_is_initialized)
-		return submit_bh(op, op_flags, bh);
+		return submit_bh_blkcg_css(op, op_flags, bh, blkcg_root_css);
 
 	mutex_lock(&btrfsic_mutex);
 	/* since btrfsic_submit_bh() might also be called before
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1025,6 +1025,8 @@ static blk_status_t btree_submit_bio_hoo
 	int async = check_async_write(bio_flags);
 	blk_status_t ret;
 
+	bio_associate_blkcg(bio, blkcg_root_css);
+
 	if (bio_op(bio) != REQ_OP_WRITE) {
 		/*
 		 * called for a read, do the setup so that checksum validation
@@ -3512,6 +3514,8 @@ static void write_dev_flush(struct btrfs
 		return;
 
 	bio_reset(bio);
+	bio_associate_blkcg(bio, blkcg_root_css);
+
 	bio->bi_end_io = btrfs_end_empty_barrier;
 	bio_set_dev(bio, device->bdev);
 	bio->bi_opf = REQ_OP_WRITE | REQ_SYNC | REQ_PREFLUSH;
--- a/fs/btrfs/ioctl.c
+++ b/fs/btrfs/ioctl.c
@@ -150,6 +150,10 @@ void btrfs_update_iflags(struct inode *i
 		new_fl |= S_NOATIME;
 	if (ip->flags & BTRFS_INODE_DIRSYNC)
 		new_fl |= S_DIRSYNC;
+	/*
+	 * The btree_inode will be always in the root cgroup. The cgroup
+	 * writeback can be enabled on regular inodes selectively.
+	 */
 	new_fl |= S_CGROUPWB;
 
 	set_mask_bits(&inode->i_flags,

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v3 5/5] btrfs: ensure that metadata and flush are issued from the root cgroup
  2017-10-12 17:06       ` [PATCH v3 " Tejun Heo
@ 2017-10-12 18:56         ` David Sterba
  0 siblings, 0 replies; 22+ messages in thread
From: David Sterba @ 2017-10-12 18:56 UTC (permalink / raw)
  To: Tejun Heo
  Cc: dsterba, jack, axboe, clm, jbacik, kernel-team, linux-kernel,
	linux-btrfs

On Thu, Oct 12, 2017 at 10:06:28AM -0700, Tejun Heo wrote:
> Issuing metdata or otherwise shared IOs from !root cgroup can lead to
> priority inversion.  This patch ensures that those IOs are always
> issued from the root cgroup.
> 
> v3: Dropped unnecessary btree_inode handling as suggested by David
>     Sterba.
> 
> v2: Fixed missing @bh in submit_bh_blkcg_css() call.
> 
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
> Cc: David Sterba <dsterba@suse.cz>

Acked-by: David Sterba <dsterba@suse.com>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup
  2017-10-10 15:54 [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup Tejun Heo
                   ` (4 preceding siblings ...)
  2017-10-10 15:54 ` [PATCH 5/5] btrfs: ensure that metadata and flush are issued from the root cgroup Tejun Heo
@ 2017-11-29 16:56 ` Jan Kara
  2017-11-29 17:03   ` Tejun Heo
  5 siblings, 1 reply; 22+ messages in thread
From: Jan Kara @ 2017-11-29 16:56 UTC (permalink / raw)
  To: Tejun Heo
  Cc: jack, axboe, clm, jbacik, kernel-team, linux-kernel, linux-btrfs

Hi Tejun,

What has happened with this patch set?

Honza

On Tue 10-10-17 08:54:36, Tejun Heo wrote:
> Changes from the last version are
> 
> * blkcg_root_css exported to fix build breakage on modular btrfs.
> 
> * Use ext4_should_journal_data() test instead of
>   EXT4_MOUNT_JOURNAL_DATA.
> 
> * Separated out create_bh_bio() and used it to implement
>   submit_bh_blkcg_css() as suggested by Jan.
> 
> btrfs has different ways to issue metadata IOs and may end up issuing
> metadata or otherwise shared IOs from a non-root cgroup, which can
> lead to priority inversion and ineffective IO isolation.
> 
> This patchset makes sure that btrfs issues all metadata and shared IOs
> from the root cgroup by exempting btree_inodes from cgroup writeback
> and explicitly associating shared IOs with the root cgroup.
> 
> This patchset containst he following three patches
> 
>  [PATCH 1/5] blkcg: export blkcg_root_css
>  [PATCH 2/5] cgroup, writeback: replace SB_I_CGROUPWB with per-inode
>  [PATCH 3/5] buffer_head: separate out create_bh_bio() from
>  [PATCH 4/5] cgroup, buffer_head: implement submit_bh_blkcg_css()
>  [PATCH 5/5] btrfs: ensure that metadata and flush are issued from the
> 
> and is also available in the following git branch
> 
>  git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup.git review-cgroup-btrfs-metadata-v2
> 
> diffstat follows.  Thanks.
> 
>  block/blk-cgroup.c          |    1 +
>  fs/block_dev.c              |    3 +--
>  fs/btrfs/check-integrity.c  |    2 +-
>  fs/btrfs/disk-io.c          |    4 ++++
>  fs/btrfs/ioctl.c            |    6 +++++-
>  fs/btrfs/super.c            |    1 -
>  fs/buffer.c                 |   42 ++++++++++++++++++++++++++++++++++--------
>  fs/ext2/inode.c             |    3 ++-
>  fs/ext2/super.c             |    1 -
>  fs/ext4/inode.c             |    5 ++++-
>  fs/ext4/super.c             |    2 --
>  include/linux/backing-dev.h |    2 +-
>  include/linux/buffer_head.h |    3 +++
>  include/linux/fs.h          |    3 ++-
>  14 files changed, 58 insertions(+), 20 deletions(-)
> 
> --
> tejun
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup
  2017-11-29 16:56 ` [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs " Jan Kara
@ 2017-11-29 17:03   ` Tejun Heo
  2017-11-29 17:05     ` Tejun Heo
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2017-11-29 17:03 UTC (permalink / raw)
  To: Jan Kara; +Cc: axboe, clm, jbacik, kernel-team, linux-kernel, linux-btrfs

Hello,

On Wed, Nov 29, 2017 at 05:56:08PM +0100, Jan Kara wrote:
> What has happened with this patch set?

No idea.  cc'ing Chris directly.  Chris, if the patchset looks good,
can you please route them through the btrfs tree?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup
  2017-11-29 17:03   ` Tejun Heo
@ 2017-11-29 17:05     ` Tejun Heo
  2017-11-29 18:38       ` Chris Mason
  0 siblings, 1 reply; 22+ messages in thread
From: Tejun Heo @ 2017-11-29 17:05 UTC (permalink / raw)
  To: Jan Kara; +Cc: axboe, clm, jbacik, kernel-team, linux-kernel, linux-btrfs

On Wed, Nov 29, 2017 at 09:03:30AM -0800, Tejun Heo wrote:
> Hello,
> 
> On Wed, Nov 29, 2017 at 05:56:08PM +0100, Jan Kara wrote:
> > What has happened with this patch set?
> 
> No idea.  cc'ing Chris directly.  Chris, if the patchset looks good,
> can you please route them through the btrfs tree?

lol looking at the patchset again, I'm not sure that's obviously the
right tree.  It can either be cgroup, block or btrfs.  If no one
objects, I'll just route them through cgroup.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup
  2017-11-29 17:05     ` Tejun Heo
@ 2017-11-29 18:38       ` Chris Mason
  2017-11-30 17:23         ` David Sterba
  2017-12-01 10:52         ` Jan Kara
  0 siblings, 2 replies; 22+ messages in thread
From: Chris Mason @ 2017-11-29 18:38 UTC (permalink / raw)
  To: Tejun Heo, Jan Kara; +Cc: axboe, jbacik, kernel-team, linux-kernel, linux-btrfs

On 11/29/2017 12:05 PM, Tejun Heo wrote:
> On Wed, Nov 29, 2017 at 09:03:30AM -0800, Tejun Heo wrote:
>> Hello,
>>
>> On Wed, Nov 29, 2017 at 05:56:08PM +0100, Jan Kara wrote:
>>> What has happened with this patch set?
>>
>> No idea.  cc'ing Chris directly.  Chris, if the patchset looks good,
>> can you please route them through the btrfs tree?
> 
> lol looking at the patchset again, I'm not sure that's obviously the
> right tree.  It can either be cgroup, block or btrfs.  If no one
> objects, I'll just route them through cgroup.

We'll have to coordinate a bit during the next merge window but I don't 
have a problem with these going in through cgroup.  Dave does this sound 
good to you?

I'd like to include my patch to do all crcs inline (instead of handing 
off to helper threads) when io controls are in place.  By the merge 
window we should have some good data on how much it's all helping.

-chris


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup
  2017-11-29 18:38       ` Chris Mason
@ 2017-11-30 17:23         ` David Sterba
  2017-11-30 17:34           ` Chris Mason
  2017-12-01 10:52         ` Jan Kara
  1 sibling, 1 reply; 22+ messages in thread
From: David Sterba @ 2017-11-30 17:23 UTC (permalink / raw)
  To: Chris Mason
  Cc: Tejun Heo, Jan Kara, axboe, jbacik, kernel-team, linux-kernel,
	linux-btrfs

On Wed, Nov 29, 2017 at 01:38:26PM -0500, Chris Mason wrote:
> On 11/29/2017 12:05 PM, Tejun Heo wrote:
> > On Wed, Nov 29, 2017 at 09:03:30AM -0800, Tejun Heo wrote:
> >> Hello,
> >>
> >> On Wed, Nov 29, 2017 at 05:56:08PM +0100, Jan Kara wrote:
> >>> What has happened with this patch set?
> >>
> >> No idea.  cc'ing Chris directly.  Chris, if the patchset looks good,
> >> can you please route them through the btrfs tree?
> > 
> > lol looking at the patchset again, I'm not sure that's obviously the
> > right tree.  It can either be cgroup, block or btrfs.  If no one
> > objects, I'll just route them through cgroup.
> 
> We'll have to coordinate a bit during the next merge window but I don't 
> have a problem with these going in through cgroup.  Dave does this sound 
> good to you?

There are only minor changes to btrfs code so cgroup tree would be
better.

> I'd like to include my patch to do all crcs inline (instead of handing 
> off to helper threads) when io controls are in place.  By the merge 
> window we should have some good data on how much it's all helping.

Are there any problems in sight if the inline crc and cgroup chnanges go
separately? I assume there's a runtime dependency, not a code
dependency, so it could be sorted by the right merge order.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup
  2017-11-30 17:23         ` David Sterba
@ 2017-11-30 17:34           ` Chris Mason
  0 siblings, 0 replies; 22+ messages in thread
From: Chris Mason @ 2017-11-30 17:34 UTC (permalink / raw)
  To: dsterba, Tejun Heo, Jan Kara, axboe, jbacik, kernel-team,
	linux-kernel, linux-btrfs



On 11/30/2017 12:23 PM, David Sterba wrote:
> On Wed, Nov 29, 2017 at 01:38:26PM -0500, Chris Mason wrote:
>> On 11/29/2017 12:05 PM, Tejun Heo wrote:
>>> On Wed, Nov 29, 2017 at 09:03:30AM -0800, Tejun Heo wrote:
>>>> Hello,
>>>>
>>>> On Wed, Nov 29, 2017 at 05:56:08PM +0100, Jan Kara wrote:
>>>>> What has happened with this patch set?
>>>>
>>>> No idea.  cc'ing Chris directly.  Chris, if the patchset looks good,
>>>> can you please route them through the btrfs tree?
>>>
>>> lol looking at the patchset again, I'm not sure that's obviously the
>>> right tree.  It can either be cgroup, block or btrfs.  If no one
>>> objects, I'll just route them through cgroup.
>>
>> We'll have to coordinate a bit during the next merge window but I don't
>> have a problem with these going in through cgroup.  Dave does this sound
>> good to you?
> 
> There are only minor changes to btrfs code so cgroup tree would be
> better.
> 
>> I'd like to include my patch to do all crcs inline (instead of handing
>> off to helper threads) when io controls are in place.  By the merge
>> window we should have some good data on how much it's all helping.
> 
> Are there any problems in sight if the inline crc and cgroup chnanges go
> separately? I assume there's a runtime dependency, not a code
> dependency, so it could be sorted by the right merge order.
> 

The feature is just more useful with the inline crcs.  Without them we 
end up with kworkers doing both high and low prio submissions and it all 
boils down to the speed of the lowest priority.

-chris


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup
  2017-11-29 18:38       ` Chris Mason
  2017-11-30 17:23         ` David Sterba
@ 2017-12-01 10:52         ` Jan Kara
  1 sibling, 0 replies; 22+ messages in thread
From: Jan Kara @ 2017-12-01 10:52 UTC (permalink / raw)
  To: Chris Mason
  Cc: Tejun Heo, Jan Kara, axboe, jbacik, kernel-team, linux-kernel,
	linux-btrfs

On Wed 29-11-17 13:38:26, Chris Mason wrote:
> On 11/29/2017 12:05 PM, Tejun Heo wrote:
> >On Wed, Nov 29, 2017 at 09:03:30AM -0800, Tejun Heo wrote:
> >>Hello,
> >>
> >>On Wed, Nov 29, 2017 at 05:56:08PM +0100, Jan Kara wrote:
> >>>What has happened with this patch set?
> >>
> >>No idea.  cc'ing Chris directly.  Chris, if the patchset looks good,
> >>can you please route them through the btrfs tree?
> >
> >lol looking at the patchset again, I'm not sure that's obviously the
> >right tree.  It can either be cgroup, block or btrfs.  If no one
> >objects, I'll just route them through cgroup.
> 
> We'll have to coordinate a bit during the next merge window but I don't have
> a problem with these going in through cgroup.  Dave does this sound good to
> you?

Also I was wondering about another thing: How does this play with Josef's
series for metadata writeback (Metadata specific accouting and dirty
writeout)? Would the per-inode selection of cgroup writeback still be
needed when Josef's series is applied since metadata writeback then won't
be associated with any particular mapping anymore?

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-12-01 10:52 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-10 15:54 [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs from the root cgroup Tejun Heo
2017-10-10 15:54 ` [PATCH 1/5] blkcg: export blkcg_root_css Tejun Heo
2017-10-11  7:50   ` Jan Kara
2017-10-10 15:54 ` [PATCH 2/5] cgroup, writeback: replace SB_I_CGROUPWB with per-inode S_CGROUPWB Tejun Heo
2017-10-10 15:54 ` [PATCH 3/5] buffer_head: separate out create_bh_bio() from submit_bh_wbc() Tejun Heo
2017-10-11  7:54   ` Jan Kara
2017-10-10 15:54 ` [PATCH 4/5] cgroup, buffer_head: implement submit_bh_blkcg_css() Tejun Heo
2017-10-11  7:55   ` Jan Kara
2017-10-10 15:54 ` [PATCH 5/5] btrfs: ensure that metadata and flush are issued from the root cgroup Tejun Heo
2017-10-10 16:43   ` [PATCH v2 " Tejun Heo
2017-10-10 17:45     ` Liu Bo
2017-10-11 17:07     ` David Sterba
2017-10-12 15:38       ` Tejun Heo
2017-10-12 17:06       ` [PATCH v3 " Tejun Heo
2017-10-12 18:56         ` David Sterba
2017-11-29 16:56 ` [PATCHSET v2] cgroup, writeback, btrfs: make sure btrfs issues metadata IOs " Jan Kara
2017-11-29 17:03   ` Tejun Heo
2017-11-29 17:05     ` Tejun Heo
2017-11-29 18:38       ` Chris Mason
2017-11-30 17:23         ` David Sterba
2017-11-30 17:34           ` Chris Mason
2017-12-01 10:52         ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.