Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH 1/2] btrfs: punt all bios created in btrfs_submit_compressed_write()
@ 2019-12-12  0:07 Dennis Zhou
  2019-12-12  0:07 ` [PATCH 2/2] btrfs: fix compressed write bio attribution Dennis Zhou
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Dennis Zhou @ 2019-12-12  0:07 UTC (permalink / raw)
  To: David Sterba, Chris Mason, Josef Bacik
  Cc: kernel-team, linux-btrfs, Dennis Zhou

Compressed writes happen in the background via kworkers. However, this
causes bios to be attributed to root bypassing any cgroup limits from
the actual writer. We tag the first bio with REQ_CGROUP_PUNT, which will
punt the bio to an appropriate cgroup specific workqueue and attribute
the IO properly. However, if btrfs_submit_compressed_write() creates a
new bio, we don't tag it the same way. Add the appropriate tagging for
subsequent bios.

Fixes: ec39f7696ccfa ("Btrfs: use REQ_CGROUP_PUNT for worker thread submitted bios")
Cc: Chris Mason <clm@fb.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
 fs/btrfs/compression.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index ed05e5277399..4ce81571f0cd 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -491,6 +491,10 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
 			bio->bi_opf = REQ_OP_WRITE | write_flags;
 			bio->bi_private = cb;
 			bio->bi_end_io = end_compressed_bio_write;
+			if (blkcg_css) {
+				bio->bi_opf |= REQ_CGROUP_PUNT;
+				bio_associate_blkg_from_css(bio, blkcg_css);
+			}
 			bio_add_page(bio, page, PAGE_SIZE, 0);
 		}
 		if (bytes_left < PAGE_SIZE) {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 2/2] btrfs: fix compressed write bio attribution
  2019-12-12  0:07 [PATCH 1/2] btrfs: punt all bios created in btrfs_submit_compressed_write() Dennis Zhou
@ 2019-12-12  0:07 ` Dennis Zhou
  2019-12-12 15:18   ` David Sterba
  2019-12-12 18:19   ` [PATCH v2 " Dennis Zhou
  2019-12-12  0:15 ` [PATCH 1/2] btrfs: punt all bios created in btrfs_submit_compressed_write() Chris Mason
  2019-12-30 15:08 ` David Sterba
  2 siblings, 2 replies; 11+ messages in thread
From: Dennis Zhou @ 2019-12-12  0:07 UTC (permalink / raw)
  To: David Sterba, Chris Mason, Josef Bacik
  Cc: kernel-team, linux-btrfs, Dennis Zhou

Bio attribution is handled at bio_set_dev() as once we have a device, we
have a corresponding request_queue and then can derive the current css.
In special cases, we want to attribute to bio to someone else. This can
be done by calling bio_associate_blkg_from_css(). Btrfs does this for
compressed writeback as they are handled by kworkers which would be of
the root cgroup rather than the cgroup designated by the wbc.

Commit 1a41802701ec ("btrfs: drop bio_set_dev where not needed") removes
early bio_set_dev() calls prior to submit_stripe_bio(). This breaks the
above assumption that we'll have a request_queue when we are doing
association. To fix this, special case passing the bio through just for
btrfs_submit_compressed_write().

Without this, we crash in btrfs/024:
[ 3052.093088] BUG: kernel NULL pointer dereference, address: 0000000000000510
[ 3052.107013] #PF: supervisor read access in kernel mode
[ 3052.107014] #PF: error_code(0x0000) - not-present page
[ 3052.107015] PGD 0 P4D 0
[ 3052.107021] Oops: 0000 [#1] SMP
[ 3052.138904] CPU: 42 PID: 201270 Comm: kworker/u161:0 Kdump: loaded Not tainted 5.5.0-rc1-00062-g4852d8ac90a9 #712
[ 3052.138905] Hardware name: Quanta Tioga Pass Single Side 01-0032211004/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
[ 3052.138912] Workqueue: btrfs-delalloc btrfs_work_helper
[ 3052.191375] RIP: 0010:bio_associate_blkg_from_css+0x1e/0x3c0
[ 3052.191377] Code: ff 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 49 89 fc 55 53 48 89 f3 48 83 ec 08 48 8b 47 08 65 ff 05 ea 6e 9f 7e <48> 8b a8 10 05 00 00 45 31 c9 45 31 c0 31 d2 31 f6 b9 02 00 00 00
[ 3052.191379] RSP: 0018:ffffc900210cfc90 EFLAGS: 00010282
[ 3052.191380] RAX: 0000000000000000 RBX: ffff88bfe5573c00 RCX: 0000000000000000
[ 3052.191382] RDX: ffff889db48ec2f0 RSI: ffff88bfe5573c00 RDI: ffff889db48ec2f0
[ 3052.191386] RBP: 0000000000000800 R08: 0000000000203bb0 R09: ffff889db16b2400
[ 3052.293364] R10: 0000000000000000 R11: ffff88a07fffde80 R12: ffff889db48ec2f0
[ 3052.293365] R13: 0000000000001000 R14: ffff889de82bc000 R15: ffff889e2b7bdcc8
[ 3052.293367] FS:  0000000000000000(0000) GS:ffff889ffba00000(0000) knlGS:0000000000000000
[ 3052.293368] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3052.293369] CR2: 0000000000000510 CR3: 0000000002611001 CR4: 00000000007606e0
[ 3052.293370] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3052.293371] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3052.293372] PKRU: 55555554
[ 3052.293376] Call Trace:
[ 3052.402552]  btrfs_submit_compressed_write+0x137/0x390
[ 3052.402558]  submit_compressed_extents+0x40f/0x4c0
[ 3052.422401]  btrfs_work_helper+0x246/0x5a0
[ 3052.422408]  process_one_work+0x200/0x570
[ 3052.438601]  ? process_one_work+0x180/0x570
[ 3052.438605]  worker_thread+0x4c/0x3e0
[ 3052.438614]  kthread+0x103/0x140
[ 3052.460735]  ? process_one_work+0x570/0x570
[ 3052.460737]  ? kthread_mod_delayed_work+0xc0/0xc0
[ 3052.460744]  ret_from_fork+0x24/0x30

Fixes: 1a41802701ec ("btrfs: drop bio_set_dev where not needed")
Cc: David Sterba <dsterba@suse.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
 fs/btrfs/compression.c | 14 +++++---------
 fs/btrfs/volumes.c     | 18 ++++++++++++++----
 fs/btrfs/volumes.h     |  3 +++
 3 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 4ce81571f0cd..67d604fcb606 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -444,11 +444,9 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
 	bio->bi_opf = REQ_OP_WRITE | write_flags;
 	bio->bi_private = cb;
 	bio->bi_end_io = end_compressed_bio_write;
-
-	if (blkcg_css) {
+	if (blkcg_css)
 		bio->bi_opf |= REQ_CGROUP_PUNT;
-		bio_associate_blkg_from_css(bio, blkcg_css);
-	}
+
 	refcount_set(&cb->pending_bios, 1);
 
 	/* create and submit bios for the compressed pages */
@@ -481,7 +479,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
 				BUG_ON(ret); /* -ENOMEM */
 			}
 
-			ret = btrfs_map_bio(fs_info, bio, 0);
+			ret = __btrfs_map_bio(fs_info, bio, 0, blkcg_css);
 			if (ret) {
 				bio->bi_status = ret;
 				bio_endio(bio);
@@ -491,10 +489,8 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
 			bio->bi_opf = REQ_OP_WRITE | write_flags;
 			bio->bi_private = cb;
 			bio->bi_end_io = end_compressed_bio_write;
-			if (blkcg_css) {
+			if (blkcg_css)
 				bio->bi_opf |= REQ_CGROUP_PUNT;
-				bio_associate_blkg_from_css(bio, blkcg_css);
-			}
 			bio_add_page(bio, page, PAGE_SIZE, 0);
 		}
 		if (bytes_left < PAGE_SIZE) {
@@ -515,7 +511,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
 		BUG_ON(ret); /* -ENOMEM */
 	}
 
-	ret = btrfs_map_bio(fs_info, bio, 0);
+	ret = __btrfs_map_bio(fs_info, bio, 0, blkcg_css);
 	if (ret) {
 		bio->bi_status = ret;
 		bio_endio(bio);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 66377e678504..c68d93a1aae8 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6240,7 +6240,8 @@ static void btrfs_end_bio(struct bio *bio)
 }
 
 static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio,
-			      u64 physical, int dev_nr)
+			      u64 physical, int dev_nr,
+			      struct cgroup_subsys_state *blkcg_css)
 {
 	struct btrfs_device *dev = bbio->stripes[dev_nr].dev;
 	struct btrfs_fs_info *fs_info = bbio->fs_info;
@@ -6255,6 +6256,8 @@ static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio,
 		(u_long)dev->bdev->bd_dev, rcu_str_deref(dev->name), dev->devid,
 		bio->bi_iter.bi_size);
 	bio_set_dev(bio, dev->bdev);
+	if (blkcg_css)
+		bio_associate_blkg_from_css(bio, blkcg_css);
 
 	btrfs_bio_counter_inc_noblocked(fs_info);
 
@@ -6278,8 +6281,9 @@ static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical)
 	}
 }
 
-blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
-			   int mirror_num)
+blk_status_t __btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
+			     int mirror_num,
+			     struct cgroup_subsys_state *blkcg_css)
 {
 	struct btrfs_device *dev;
 	struct bio *first_bio = bio;
@@ -6348,12 +6352,18 @@ blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 			bio = first_bio;
 
 		submit_stripe_bio(bbio, bio, bbio->stripes[dev_nr].physical,
-				  dev_nr);
+				  dev_nr, blkcg_css);
 	}
 	btrfs_bio_counter_dec(fs_info);
 	return BLK_STS_OK;
 }
 
+blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
+			   int mirror_num)
+{
+	return __btrfs_map_bio(fs_info, bio, mirror_num, NULL);
+}
+
 /*
  * Find a device specified by @devid or @uuid in the list of @fs_devices, or
  * return NULL.
diff --git a/fs/btrfs/volumes.h b/fs/btrfs/volumes.h
index 3c56ef571b00..f5fd5a86bbe9 100644
--- a/fs/btrfs/volumes.h
+++ b/fs/btrfs/volumes.h
@@ -423,6 +423,9 @@ int btrfs_read_sys_array(struct btrfs_fs_info *fs_info);
 int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info);
 int btrfs_alloc_chunk(struct btrfs_trans_handle *trans, u64 type);
 void btrfs_mapping_tree_free(struct extent_map_tree *tree);
+blk_status_t __btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
+			     int mirror_num,
+			     struct cgroup_subsys_state *blkcg_css);
 blk_status_t btrfs_map_bio(struct btrfs_fs_info *fs_info, struct bio *bio,
 			   int mirror_num);
 int btrfs_open_devices(struct btrfs_fs_devices *fs_devices,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] btrfs: punt all bios created in btrfs_submit_compressed_write()
  2019-12-12  0:07 [PATCH 1/2] btrfs: punt all bios created in btrfs_submit_compressed_write() Dennis Zhou
  2019-12-12  0:07 ` [PATCH 2/2] btrfs: fix compressed write bio attribution Dennis Zhou
@ 2019-12-12  0:15 ` Chris Mason
  2019-12-30 15:08 ` David Sterba
  2 siblings, 0 replies; 11+ messages in thread
From: Chris Mason @ 2019-12-12  0:15 UTC (permalink / raw)
  To: Dennis Zhou; +Cc: David Sterba, Josef Bacik, Kernel Team, linux-btrfs

On 11 Dec 2019, at 19:07, Dennis Zhou wrote:

> Compressed writes happen in the background via kworkers. However, this
> causes bios to be attributed to root bypassing any cgroup limits from
> the actual writer. We tag the first bio with REQ_CGROUP_PUNT, which 
> will
> punt the bio to an appropriate cgroup specific workqueue and attribute
> the IO properly. However, if btrfs_submit_compressed_write() creates a
> new bio, we don't tag it the same way. Add the appropriate tagging for
> subsequent bios.
>
> Fixes: ec39f7696ccfa ("Btrfs: use REQ_CGROUP_PUNT for worker thread 
> submitted bios")
> Cc: Chris Mason <clm@fb.com>
> Signed-off-by: Dennis Zhou <dennis@kernel.org>

Good catch Dennis.  The compression code should end up limiting the size 
of the bio such that we'll never actually fail to bio_add_page(), but 
this is still the right thing to do.

Reviewed-by: Chris Mason <clm@fb.com>

-chris

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] btrfs: fix compressed write bio attribution
  2019-12-12  0:07 ` [PATCH 2/2] btrfs: fix compressed write bio attribution Dennis Zhou
@ 2019-12-12 15:18   ` David Sterba
  2019-12-12 16:19     ` Dennis Zhou
  2019-12-12 18:19   ` [PATCH v2 " Dennis Zhou
  1 sibling, 1 reply; 11+ messages in thread
From: David Sterba @ 2019-12-12 15:18 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: David Sterba, Chris Mason, Josef Bacik, kernel-team, linux-btrfs

On Wed, Dec 11, 2019 at 04:07:07PM -0800, Dennis Zhou wrote:
> Bio attribution is handled at bio_set_dev() as once we have a device, we
> have a corresponding request_queue and then can derive the current css.
> In special cases, we want to attribute to bio to someone else. This can
> be done by calling bio_associate_blkg_from_css(). Btrfs does this for
> compressed writeback as they are handled by kworkers which would be of
> the root cgroup rather than the cgroup designated by the wbc.
> 
> Commit 1a41802701ec ("btrfs: drop bio_set_dev where not needed") removes
> early bio_set_dev() calls prior to submit_stripe_bio(). This breaks the
> above assumption that we'll have a request_queue when we are doing
> association. To fix this, special case passing the bio through just for
> btrfs_submit_compressed_write().
> 
> Without this, we crash in btrfs/024:
> [ 3052.093088] BUG: kernel NULL pointer dereference, address: 0000000000000510
> [ 3052.107013] #PF: supervisor read access in kernel mode
> [ 3052.107014] #PF: error_code(0x0000) - not-present page
> [ 3052.107015] PGD 0 P4D 0
> [ 3052.107021] Oops: 0000 [#1] SMP
> [ 3052.138904] CPU: 42 PID: 201270 Comm: kworker/u161:0 Kdump: loaded Not tainted 5.5.0-rc1-00062-g4852d8ac90a9 #712
> [ 3052.138905] Hardware name: Quanta Tioga Pass Single Side 01-0032211004/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
> [ 3052.138912] Workqueue: btrfs-delalloc btrfs_work_helper
> [ 3052.191375] RIP: 0010:bio_associate_blkg_from_css+0x1e/0x3c0
> [ 3052.191377] Code: ff 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 49 89 fc 55 53 48 89 f3 48 83 ec 08 48 8b 47 08 65 ff 05 ea 6e 9f 7e <48> 8b a8 10 05 00 00 45 31 c9 45 31 c0 31 d2 31 f6 b9 02 00 00 00
> [ 3052.191379] RSP: 0018:ffffc900210cfc90 EFLAGS: 00010282
> [ 3052.191380] RAX: 0000000000000000 RBX: ffff88bfe5573c00 RCX: 0000000000000000
> [ 3052.191382] RDX: ffff889db48ec2f0 RSI: ffff88bfe5573c00 RDI: ffff889db48ec2f0
> [ 3052.191386] RBP: 0000000000000800 R08: 0000000000203bb0 R09: ffff889db16b2400
> [ 3052.293364] R10: 0000000000000000 R11: ffff88a07fffde80 R12: ffff889db48ec2f0
> [ 3052.293365] R13: 0000000000001000 R14: ffff889de82bc000 R15: ffff889e2b7bdcc8
> [ 3052.293367] FS:  0000000000000000(0000) GS:ffff889ffba00000(0000) knlGS:0000000000000000
> [ 3052.293368] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 3052.293369] CR2: 0000000000000510 CR3: 0000000002611001 CR4: 00000000007606e0
> [ 3052.293370] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 3052.293371] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 3052.293372] PKRU: 55555554
> [ 3052.293376] Call Trace:
> [ 3052.402552]  btrfs_submit_compressed_write+0x137/0x390

Isn't it the same crash that Chris Murphy reported?

https://lore.kernel.org/linux-btrfs/CAJCQCtS_7vjBnqeDsedBQJYuE_ap+Xo6D=MXY=rOxf66oJZkrA@mail.gmail.com/

> [ 3052.402558]  submit_compressed_extents+0x40f/0x4c0
> [ 3052.422401]  btrfs_work_helper+0x246/0x5a0
> [ 3052.422408]  process_one_work+0x200/0x570
> [ 3052.438601]  ? process_one_work+0x180/0x570
> [ 3052.438605]  worker_thread+0x4c/0x3e0
> [ 3052.438614]  kthread+0x103/0x140
> [ 3052.460735]  ? process_one_work+0x570/0x570
> [ 3052.460737]  ? kthread_mod_delayed_work+0xc0/0xc0
> [ 3052.460744]  ret_from_fork+0x24/0x30
> 
> Fixes: 1a41802701ec ("btrfs: drop bio_set_dev where not needed")
> Cc: David Sterba <dsterba@suse.com>
> Cc: Josef Bacik <josef@toxicpanda.com>
> Signed-off-by: Dennis Zhou <dennis@kernel.org>
> ---
>  fs/btrfs/compression.c | 14 +++++---------
>  fs/btrfs/volumes.c     | 18 ++++++++++++++----
>  fs/btrfs/volumes.h     |  3 +++
>  3 files changed, 22 insertions(+), 13 deletions(-)
> 
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index 4ce81571f0cd..67d604fcb606 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -444,11 +444,9 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
>  	bio->bi_opf = REQ_OP_WRITE | write_flags;
>  	bio->bi_private = cb;
>  	bio->bi_end_io = end_compressed_bio_write;
> -
> -	if (blkcg_css) {
> +	if (blkcg_css)
>  		bio->bi_opf |= REQ_CGROUP_PUNT;
> -		bio_associate_blkg_from_css(bio, blkcg_css);
> -	}
> +
>  	refcount_set(&cb->pending_bios, 1);
>  
>  	/* create and submit bios for the compressed pages */
> @@ -481,7 +479,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
>  				BUG_ON(ret); /* -ENOMEM */
>  			}
>  
> -			ret = btrfs_map_bio(fs_info, bio, 0);
> +			ret = __btrfs_map_bio(fs_info, bio, 0, blkcg_css);
>  			if (ret) {
>  				bio->bi_status = ret;
>  				bio_endio(bio);
> @@ -491,10 +489,8 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
>  			bio->bi_opf = REQ_OP_WRITE | write_flags;
>  			bio->bi_private = cb;
>  			bio->bi_end_io = end_compressed_bio_write;
> -			if (blkcg_css) {
> +			if (blkcg_css)
>  				bio->bi_opf |= REQ_CGROUP_PUNT;
> -				bio_associate_blkg_from_css(bio, blkcg_css);
> -			}
>  			bio_add_page(bio, page, PAGE_SIZE, 0);
>  		}
>  		if (bytes_left < PAGE_SIZE) {
> @@ -515,7 +511,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
>  		BUG_ON(ret); /* -ENOMEM */
>  	}
>  
> -	ret = btrfs_map_bio(fs_info, bio, 0);
> +	ret = __btrfs_map_bio(fs_info, bio, 0, blkcg_css);
>  	if (ret) {
>  		bio->bi_status = ret;
>  		bio_endio(bio);
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 66377e678504..c68d93a1aae8 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -6240,7 +6240,8 @@ static void btrfs_end_bio(struct bio *bio)
>  }
>  
>  static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio,
> -			      u64 physical, int dev_nr)
> +			      u64 physical, int dev_nr,
> +			      struct cgroup_subsys_state *blkcg_css)
>  {
>  	struct btrfs_device *dev = bbio->stripes[dev_nr].dev;
>  	struct btrfs_fs_info *fs_info = bbio->fs_info;
> @@ -6255,6 +6256,8 @@ static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio,
>  		(u_long)dev->bdev->bd_dev, rcu_str_deref(dev->name), dev->devid,
>  		bio->bi_iter.bi_size);
>  	bio_set_dev(bio, dev->bdev);
> +	if (blkcg_css)
> +		bio_associate_blkg_from_css(bio, blkcg_css);

At this point we know the bdev is the correct one, but is the blkcg_css
different for each device or is there one for all?

Passing the blkcg_css is one way, one single point where the bio and css
are associated and probably the cleanest. I only don't like the need to
pass the blkcg_css around but that's probably not a big deal than to
forget to set bdev somewhere.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/2] btrfs: fix compressed write bio attribution
  2019-12-12 15:18   ` David Sterba
@ 2019-12-12 16:19     ` Dennis Zhou
  0 siblings, 0 replies; 11+ messages in thread
From: Dennis Zhou @ 2019-12-12 16:19 UTC (permalink / raw)
  To: dsterba, David Sterba, Chris Mason, Josef Bacik, kernel-team,
	linux-btrfs

On Thu, Dec 12, 2019 at 04:18:53PM +0100, David Sterba wrote:
> On Wed, Dec 11, 2019 at 04:07:07PM -0800, Dennis Zhou wrote:
> > Bio attribution is handled at bio_set_dev() as once we have a device, we
> > have a corresponding request_queue and then can derive the current css.
> > In special cases, we want to attribute to bio to someone else. This can
> > be done by calling bio_associate_blkg_from_css(). Btrfs does this for
> > compressed writeback as they are handled by kworkers which would be of
> > the root cgroup rather than the cgroup designated by the wbc.
> > 
> > Commit 1a41802701ec ("btrfs: drop bio_set_dev where not needed") removes
> > early bio_set_dev() calls prior to submit_stripe_bio(). This breaks the
> > above assumption that we'll have a request_queue when we are doing
> > association. To fix this, special case passing the bio through just for
> > btrfs_submit_compressed_write().
> > 
> > Without this, we crash in btrfs/024:
> > [ 3052.093088] BUG: kernel NULL pointer dereference, address: 0000000000000510
> > [ 3052.107013] #PF: supervisor read access in kernel mode
> > [ 3052.107014] #PF: error_code(0x0000) - not-present page
> > [ 3052.107015] PGD 0 P4D 0
> > [ 3052.107021] Oops: 0000 [#1] SMP
> > [ 3052.138904] CPU: 42 PID: 201270 Comm: kworker/u161:0 Kdump: loaded Not tainted 5.5.0-rc1-00062-g4852d8ac90a9 #712
> > [ 3052.138905] Hardware name: Quanta Tioga Pass Single Side 01-0032211004/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
> > [ 3052.138912] Workqueue: btrfs-delalloc btrfs_work_helper
> > [ 3052.191375] RIP: 0010:bio_associate_blkg_from_css+0x1e/0x3c0
> > [ 3052.191377] Code: ff 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 49 89 fc 55 53 48 89 f3 48 83 ec 08 48 8b 47 08 65 ff 05 ea 6e 9f 7e <48> 8b a8 10 05 00 00 45 31 c9 45 31 c0 31 d2 31 f6 b9 02 00 00 00
> > [ 3052.191379] RSP: 0018:ffffc900210cfc90 EFLAGS: 00010282
> > [ 3052.191380] RAX: 0000000000000000 RBX: ffff88bfe5573c00 RCX: 0000000000000000
> > [ 3052.191382] RDX: ffff889db48ec2f0 RSI: ffff88bfe5573c00 RDI: ffff889db48ec2f0
> > [ 3052.191386] RBP: 0000000000000800 R08: 0000000000203bb0 R09: ffff889db16b2400
> > [ 3052.293364] R10: 0000000000000000 R11: ffff88a07fffde80 R12: ffff889db48ec2f0
> > [ 3052.293365] R13: 0000000000001000 R14: ffff889de82bc000 R15: ffff889e2b7bdcc8
> > [ 3052.293367] FS:  0000000000000000(0000) GS:ffff889ffba00000(0000) knlGS:0000000000000000
> > [ 3052.293368] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 3052.293369] CR2: 0000000000000510 CR3: 0000000002611001 CR4: 00000000007606e0
> > [ 3052.293370] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > [ 3052.293371] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > [ 3052.293372] PKRU: 55555554
> > [ 3052.293376] Call Trace:
> > [ 3052.402552]  btrfs_submit_compressed_write+0x137/0x390
> 
> Isn't it the same crash that Chris Murphy reported?
> 
> https://lore.kernel.org/linux-btrfs/CAJCQCtS_7vjBnqeDsedBQJYuE_ap+Xo6D=MXY=rOxf66oJZkrA@mail.gmail.com/
> 

Yeah, looks like the same crash.

> > [ 3052.402558]  submit_compressed_extents+0x40f/0x4c0
> > [ 3052.422401]  btrfs_work_helper+0x246/0x5a0
> > [ 3052.422408]  process_one_work+0x200/0x570
> > [ 3052.438601]  ? process_one_work+0x180/0x570
> > [ 3052.438605]  worker_thread+0x4c/0x3e0
> > [ 3052.438614]  kthread+0x103/0x140
> > [ 3052.460735]  ? process_one_work+0x570/0x570
> > [ 3052.460737]  ? kthread_mod_delayed_work+0xc0/0xc0
> > [ 3052.460744]  ret_from_fork+0x24/0x30
> > 
> > Fixes: 1a41802701ec ("btrfs: drop bio_set_dev where not needed")
> > Cc: David Sterba <dsterba@suse.com>
> > Cc: Josef Bacik <josef@toxicpanda.com>
> > Signed-off-by: Dennis Zhou <dennis@kernel.org>
> > ---
> >  fs/btrfs/compression.c | 14 +++++---------
> >  fs/btrfs/volumes.c     | 18 ++++++++++++++----
> >  fs/btrfs/volumes.h     |  3 +++
> >  3 files changed, 22 insertions(+), 13 deletions(-)
> > 
> > diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> > index 4ce81571f0cd..67d604fcb606 100644
> > --- a/fs/btrfs/compression.c
> > +++ b/fs/btrfs/compression.c
> > @@ -444,11 +444,9 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
> >  	bio->bi_opf = REQ_OP_WRITE | write_flags;
> >  	bio->bi_private = cb;
> >  	bio->bi_end_io = end_compressed_bio_write;
> > -
> > -	if (blkcg_css) {
> > +	if (blkcg_css)
> >  		bio->bi_opf |= REQ_CGROUP_PUNT;
> > -		bio_associate_blkg_from_css(bio, blkcg_css);
> > -	}
> > +
> >  	refcount_set(&cb->pending_bios, 1);
> >  
> >  	/* create and submit bios for the compressed pages */
> > @@ -481,7 +479,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
> >  				BUG_ON(ret); /* -ENOMEM */
> >  			}
> >  
> > -			ret = btrfs_map_bio(fs_info, bio, 0);
> > +			ret = __btrfs_map_bio(fs_info, bio, 0, blkcg_css);
> >  			if (ret) {
> >  				bio->bi_status = ret;
> >  				bio_endio(bio);
> > @@ -491,10 +489,8 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
> >  			bio->bi_opf = REQ_OP_WRITE | write_flags;
> >  			bio->bi_private = cb;
> >  			bio->bi_end_io = end_compressed_bio_write;
> > -			if (blkcg_css) {
> > +			if (blkcg_css)
> >  				bio->bi_opf |= REQ_CGROUP_PUNT;
> > -				bio_associate_blkg_from_css(bio, blkcg_css);
> > -			}
> >  			bio_add_page(bio, page, PAGE_SIZE, 0);
> >  		}
> >  		if (bytes_left < PAGE_SIZE) {
> > @@ -515,7 +511,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
> >  		BUG_ON(ret); /* -ENOMEM */
> >  	}
> >  
> > -	ret = btrfs_map_bio(fs_info, bio, 0);
> > +	ret = __btrfs_map_bio(fs_info, bio, 0, blkcg_css);
> >  	if (ret) {
> >  		bio->bi_status = ret;
> >  		bio_endio(bio);
> > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> > index 66377e678504..c68d93a1aae8 100644
> > --- a/fs/btrfs/volumes.c
> > +++ b/fs/btrfs/volumes.c
> > @@ -6240,7 +6240,8 @@ static void btrfs_end_bio(struct bio *bio)
> >  }
> >  
> >  static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio,
> > -			      u64 physical, int dev_nr)
> > +			      u64 physical, int dev_nr,
> > +			      struct cgroup_subsys_state *blkcg_css)
> >  {
> >  	struct btrfs_device *dev = bbio->stripes[dev_nr].dev;
> >  	struct btrfs_fs_info *fs_info = bbio->fs_info;
> > @@ -6255,6 +6256,8 @@ static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio,
> >  		(u_long)dev->bdev->bd_dev, rcu_str_deref(dev->name), dev->devid,
> >  		bio->bi_iter.bi_size);
> >  	bio_set_dev(bio, dev->bdev);
> > +	if (blkcg_css)
> > +		bio_associate_blkg_from_css(bio, blkcg_css);
> 
> At this point we know the bdev is the correct one, but is the blkcg_css
> different for each device or is there one for all?
> 

It's a single blkcg_css for all devices. It would be different blkgs as
those are the request_queue-blkcg pairs.

> Passing the blkcg_css is one way, one single point where the bio and css
> are associated and probably the cleanest. I only don't like the need to
> pass the blkcg_css around but that's probably not a big deal than to
> forget to set bdev somewhere.

Actually, let me try spinning one other way of doing this and get back
to you in a bit.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v2 2/2] btrfs: fix compressed write bio attribution
  2019-12-12  0:07 ` [PATCH 2/2] btrfs: fix compressed write bio attribution Dennis Zhou
  2019-12-12 15:18   ` David Sterba
@ 2019-12-12 18:19   ` " Dennis Zhou
  2019-12-13 12:24     ` David Sterba
  1 sibling, 1 reply; 11+ messages in thread
From: Dennis Zhou @ 2019-12-12 18:19 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: David Sterba, Chris Mason, Josef Bacik, kernel-team, linux-btrfs

From a0569aebde08e31e994c92d0b70befb84f7f5563 Mon Sep 17 00:00:00 2001
From: Dennis Zhou <dennis@kernel.org>
Date: Wed, 11 Dec 2019 15:20:15 -0800

Bio attribution is handled at bio_set_dev() as once we have a device, we
have a corresponding request_queue and then can derive the current css.
In special cases, we want to attribute to bio to someone else. This can
be done by calling bio_associate_blkg_from_css() or
kthread_associate_blkcg() depending on the scenario. Btrfs does this for
compressed writeback as they are handled by kworkers, so the latter can
be done here.

Commit 1a41802701ec ("btrfs: drop bio_set_dev where not needed") removes
early bio_set_dev() calls prior to submit_stripe_bio(). This breaks the
above assumption that we'll have a request_queue when we are doing
association. To fix this, switch to using kthread_associate_blkcg().

Without this, we crash in btrfs/024:
[ 3052.093088] BUG: kernel NULL pointer dereference, address: 0000000000000510
[ 3052.107013] #PF: supervisor read access in kernel mode
[ 3052.107014] #PF: error_code(0x0000) - not-present page
[ 3052.107015] PGD 0 P4D 0
[ 3052.107021] Oops: 0000 [#1] SMP
[ 3052.138904] CPU: 42 PID: 201270 Comm: kworker/u161:0 Kdump: loaded Not tainted 5.5.0-rc1-00062-g4852d8ac90a9 #712
[ 3052.138905] Hardware name: Quanta Tioga Pass Single Side 01-0032211004/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018
[ 3052.138912] Workqueue: btrfs-delalloc btrfs_work_helper
[ 3052.191375] RIP: 0010:bio_associate_blkg_from_css+0x1e/0x3c0
[ 3052.191377] Code: ff 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 49 89 fc 55 53 48 89 f3 48 83 ec 08 48 8b 47 08 65 ff 05 ea 6e 9f 7e <48> 8b a8 10 05 00 00 45 31 c9 45 31 c0 31 d2 31 f6 b9 02 00 00 00
[ 3052.191379] RSP: 0018:ffffc900210cfc90 EFLAGS: 00010282
[ 3052.191380] RAX: 0000000000000000 RBX: ffff88bfe5573c00 RCX: 0000000000000000
[ 3052.191382] RDX: ffff889db48ec2f0 RSI: ffff88bfe5573c00 RDI: ffff889db48ec2f0
[ 3052.191386] RBP: 0000000000000800 R08: 0000000000203bb0 R09: ffff889db16b2400
[ 3052.293364] R10: 0000000000000000 R11: ffff88a07fffde80 R12: ffff889db48ec2f0
[ 3052.293365] R13: 0000000000001000 R14: ffff889de82bc000 R15: ffff889e2b7bdcc8
[ 3052.293367] FS:  0000000000000000(0000) GS:ffff889ffba00000(0000) knlGS:0000000000000000
[ 3052.293368] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3052.293369] CR2: 0000000000000510 CR3: 0000000002611001 CR4: 00000000007606e0
[ 3052.293370] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3052.293371] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3052.293372] PKRU: 55555554
[ 3052.293376] Call Trace:
[ 3052.402552]  btrfs_submit_compressed_write+0x137/0x390
[ 3052.402558]  submit_compressed_extents+0x40f/0x4c0
[ 3052.422401]  btrfs_work_helper+0x246/0x5a0
[ 3052.422408]  process_one_work+0x200/0x570
[ 3052.438601]  ? process_one_work+0x180/0x570
[ 3052.438605]  worker_thread+0x4c/0x3e0
[ 3052.438614]  kthread+0x103/0x140
[ 3052.460735]  ? process_one_work+0x570/0x570
[ 3052.460737]  ? kthread_mod_delayed_work+0xc0/0xc0
[ 3052.460744]  ret_from_fork+0x24/0x30

Fixes: 1a41802701ec ("btrfs: drop bio_set_dev where not needed")
Cc: David Sterba <dsterba@suse.com>
Cc: Josef Bacik <josef@toxicpanda.com>
Reported-by: Chris Murphy <chris@colorremedies.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
---
v2: rely on kthread_associate_blkcg() instead.

 fs/btrfs/compression.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
index 4ce81571f0cd..de95ad27722f 100644
--- a/fs/btrfs/compression.c
+++ b/fs/btrfs/compression.c
@@ -447,7 +447,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
 
 	if (blkcg_css) {
 		bio->bi_opf |= REQ_CGROUP_PUNT;
-		bio_associate_blkg_from_css(bio, blkcg_css);
+		kthread_associate_blkcg(blkcg_css);
 	}
 	refcount_set(&cb->pending_bios, 1);
 
@@ -491,10 +491,8 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
 			bio->bi_opf = REQ_OP_WRITE | write_flags;
 			bio->bi_private = cb;
 			bio->bi_end_io = end_compressed_bio_write;
-			if (blkcg_css) {
+			if (blkcg_css)
 				bio->bi_opf |= REQ_CGROUP_PUNT;
-				bio_associate_blkg_from_css(bio, blkcg_css);
-			}
 			bio_add_page(bio, page, PAGE_SIZE, 0);
 		}
 		if (bytes_left < PAGE_SIZE) {
@@ -521,6 +519,9 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start,
 		bio_endio(bio);
 	}
 
+	if (blkcg_css)
+		kthread_associate_blkcg(NULL);
+
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] btrfs: fix compressed write bio attribution
  2019-12-12 18:19   ` [PATCH v2 " Dennis Zhou
@ 2019-12-13 12:24     ` David Sterba
  2019-12-13 22:21       ` Dennis Zhou
  0 siblings, 1 reply; 11+ messages in thread
From: David Sterba @ 2019-12-13 12:24 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: David Sterba, Chris Mason, Josef Bacik, kernel-team, linux-btrfs

On Thu, Dec 12, 2019 at 10:19:34AM -0800, Dennis Zhou wrote:
> From a0569aebde08e31e994c92d0b70befb84f7f5563 Mon Sep 17 00:00:00 2001
> From: Dennis Zhou <dennis@kernel.org>
> Date: Wed, 11 Dec 2019 15:20:15 -0800
> 
> Bio attribution is handled at bio_set_dev() as once we have a device, we
> have a corresponding request_queue and then can derive the current css.
> In special cases, we want to attribute to bio to someone else. This can
> be done by calling bio_associate_blkg_from_css() or
> kthread_associate_blkcg() depending on the scenario. Btrfs does this for
> compressed writeback as they are handled by kworkers, so the latter can
> be done here.
> 
> Commit 1a41802701ec ("btrfs: drop bio_set_dev where not needed") removes
> early bio_set_dev() calls prior to submit_stripe_bio(). This breaks the
> above assumption that we'll have a request_queue when we are doing
> association. To fix this, switch to using kthread_associate_blkcg().

Can be kthread_associate_blkcg used also for submit_extent_page that
calls bio_associate_blkg_from_css indirectly when initializing wbc?

2996                 bio_set_dev(bio, bdev);
2997                 wbc_init_bio(wbc, bio);
2998                 wbc_account_cgroup_owner(wbc, page, page_size);

wbc_init_bio:

	if (wbc)
		bio_associate_blkg_from_css();

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] btrfs: fix compressed write bio attribution
  2019-12-13 12:24     ` David Sterba
@ 2019-12-13 22:21       ` Dennis Zhou
  2019-12-17 15:05         ` David Sterba
  0 siblings, 1 reply; 11+ messages in thread
From: Dennis Zhou @ 2019-12-13 22:21 UTC (permalink / raw)
  To: David Sterba
  Cc: Dennis Zhou, David Sterba, Chris Mason, Josef Bacik, kernel-team,
	linux-btrfs

On Fri, Dec 13, 2019 at 01:24:01PM +0100, David Sterba wrote:
> On Thu, Dec 12, 2019 at 10:19:34AM -0800, Dennis Zhou wrote:
> > From a0569aebde08e31e994c92d0b70befb84f7f5563 Mon Sep 17 00:00:00 2001
> > From: Dennis Zhou <dennis@kernel.org>
> > Date: Wed, 11 Dec 2019 15:20:15 -0800
> > 
> > Bio attribution is handled at bio_set_dev() as once we have a device, we
> > have a corresponding request_queue and then can derive the current css.
> > In special cases, we want to attribute to bio to someone else. This can
> > be done by calling bio_associate_blkg_from_css() or
> > kthread_associate_blkcg() depending on the scenario. Btrfs does this for
> > compressed writeback as they are handled by kworkers, so the latter can
> > be done here.
> > 
> > Commit 1a41802701ec ("btrfs: drop bio_set_dev where not needed") removes
> > early bio_set_dev() calls prior to submit_stripe_bio(). This breaks the
> > above assumption that we'll have a request_queue when we are doing
> > association. To fix this, switch to using kthread_associate_blkcg().
> 
> Can be kthread_associate_blkcg used also for submit_extent_page that
> calls bio_associate_blkg_from_css indirectly when initializing wbc?
> 
> 2996                 bio_set_dev(bio, bdev);
> 2997                 wbc_init_bio(wbc, bio);
> 2998                 wbc_account_cgroup_owner(wbc, page, page_size);
> 
> wbc_init_bio:
> 
> 	if (wbc)
> 		bio_associate_blkg_from_css();

Correct me if I'm wrong, but I don't think submit_extent_page() is only
called from kthread contexts. So, we wouldn't be able to rely on
kthread_associate_blkcg().

I can think about how to make wbc better for association in general, but
it's a percpu decrement and increment so it shouldn't really be much in
overhead.

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] btrfs: fix compressed write bio attribution
  2019-12-13 22:21       ` Dennis Zhou
@ 2019-12-17 15:05         ` David Sterba
  2019-12-17 18:44           ` Dennis Zhou
  0 siblings, 1 reply; 11+ messages in thread
From: David Sterba @ 2019-12-17 15:05 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: David Sterba, Chris Mason, Josef Bacik, kernel-team, linux-btrfs

On Fri, Dec 13, 2019 at 02:21:49PM -0800, Dennis Zhou wrote:
> On Fri, Dec 13, 2019 at 01:24:01PM +0100, David Sterba wrote:
> > On Thu, Dec 12, 2019 at 10:19:34AM -0800, Dennis Zhou wrote:
> > > From a0569aebde08e31e994c92d0b70befb84f7f5563 Mon Sep 17 00:00:00 2001
> > > From: Dennis Zhou <dennis@kernel.org>
> > > Date: Wed, 11 Dec 2019 15:20:15 -0800
> > > 
> > > Bio attribution is handled at bio_set_dev() as once we have a device, we
> > > have a corresponding request_queue and then can derive the current css.
> > > In special cases, we want to attribute to bio to someone else. This can
> > > be done by calling bio_associate_blkg_from_css() or
> > > kthread_associate_blkcg() depending on the scenario. Btrfs does this for
> > > compressed writeback as they are handled by kworkers, so the latter can
> > > be done here.
> > > 
> > > Commit 1a41802701ec ("btrfs: drop bio_set_dev where not needed") removes
> > > early bio_set_dev() calls prior to submit_stripe_bio(). This breaks the
> > > above assumption that we'll have a request_queue when we are doing
> > > association. To fix this, switch to using kthread_associate_blkcg().
> > 
> > Can be kthread_associate_blkcg used also for submit_extent_page that
> > calls bio_associate_blkg_from_css indirectly when initializing wbc?
> > 
> > 2996                 bio_set_dev(bio, bdev);
> > 2997                 wbc_init_bio(wbc, bio);
> > 2998                 wbc_account_cgroup_owner(wbc, page, page_size);
> > 
> > wbc_init_bio:
> > 
> > 	if (wbc)
> > 		bio_associate_blkg_from_css();
> 
> Correct me if I'm wrong, but I don't think submit_extent_page() is only
> called from kthread contexts. So, we wouldn't be able to rely on
> kthread_associate_blkcg().

Yeah, the kthread is not guaranteed here.

> I can think about how to make wbc better for association in general, but
> it's a percpu decrement and increment so it shouldn't really be much in
> overhead.

Performance is not my concern here, the addition of bios and blkcg
association is new and there were some integration bugs where I
independently removed early bdev association while the blkg relied on
that. I'm looking for ways to make it less error prone and the kthread
association looks exactly like that so I was curious if it's possible to
use it everywhere. If not, the bdev needs to be found from other
available data.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2 2/2] btrfs: fix compressed write bio attribution
  2019-12-17 15:05         ` David Sterba
@ 2019-12-17 18:44           ` Dennis Zhou
  0 siblings, 0 replies; 11+ messages in thread
From: Dennis Zhou @ 2019-12-17 18:44 UTC (permalink / raw)
  To: David Sterba
  Cc: Dennis Zhou, David Sterba, Chris Mason, Josef Bacik, kernel-team,
	linux-btrfs

On Tue, Dec 17, 2019 at 04:05:48PM +0100, David Sterba wrote:
> On Fri, Dec 13, 2019 at 02:21:49PM -0800, Dennis Zhou wrote:
> > On Fri, Dec 13, 2019 at 01:24:01PM +0100, David Sterba wrote:
> > > On Thu, Dec 12, 2019 at 10:19:34AM -0800, Dennis Zhou wrote:
> > > > From a0569aebde08e31e994c92d0b70befb84f7f5563 Mon Sep 17 00:00:00 2001
> > > > From: Dennis Zhou <dennis@kernel.org>
> > > > Date: Wed, 11 Dec 2019 15:20:15 -0800
> > > > 
> > > > Bio attribution is handled at bio_set_dev() as once we have a device, we
> > > > have a corresponding request_queue and then can derive the current css.
> > > > In special cases, we want to attribute to bio to someone else. This can
> > > > be done by calling bio_associate_blkg_from_css() or
> > > > kthread_associate_blkcg() depending on the scenario. Btrfs does this for
> > > > compressed writeback as they are handled by kworkers, so the latter can
> > > > be done here.
> > > > 
> > > > Commit 1a41802701ec ("btrfs: drop bio_set_dev where not needed") removes
> > > > early bio_set_dev() calls prior to submit_stripe_bio(). This breaks the
> > > > above assumption that we'll have a request_queue when we are doing
> > > > association. To fix this, switch to using kthread_associate_blkcg().
> > > 
> > > Can be kthread_associate_blkcg used also for submit_extent_page that
> > > calls bio_associate_blkg_from_css indirectly when initializing wbc?
> > > 
> > > 2996                 bio_set_dev(bio, bdev);
> > > 2997                 wbc_init_bio(wbc, bio);
> > > 2998                 wbc_account_cgroup_owner(wbc, page, page_size);
> > > 
> > > wbc_init_bio:
> > > 
> > > 	if (wbc)
> > > 		bio_associate_blkg_from_css();
> > 
> > Correct me if I'm wrong, but I don't think submit_extent_page() is only
> > called from kthread contexts. So, we wouldn't be able to rely on
> > kthread_associate_blkcg().
> 
> Yeah, the kthread is not guaranteed here.
> 
> > I can think about how to make wbc better for association in general, but
> > it's a percpu decrement and increment so it shouldn't really be much in
> > overhead.
> 
> Performance is not my concern here, the addition of bios and blkcg
> association is new and there were some integration bugs where I
> independently removed early bdev association while the blkg relied on
> that. I'm looking for ways to make it less error prone and the kthread
> association looks exactly like that so I was curious if it's possible to
> use it everywhere. If not, the bdev needs to be found from other
> available data.

Yeah. At the time, going through bio_set_dev() was a way to guarantee we
weren't missing an association with a blk-cgroup. This simplified
auditing and prevented newer use cases from missing it. However, I do
agree it's quite error prone.. I'll put it on my list and see if I can
come up with something better.

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/2] btrfs: punt all bios created in btrfs_submit_compressed_write()
  2019-12-12  0:07 [PATCH 1/2] btrfs: punt all bios created in btrfs_submit_compressed_write() Dennis Zhou
  2019-12-12  0:07 ` [PATCH 2/2] btrfs: fix compressed write bio attribution Dennis Zhou
  2019-12-12  0:15 ` [PATCH 1/2] btrfs: punt all bios created in btrfs_submit_compressed_write() Chris Mason
@ 2019-12-30 15:08 ` David Sterba
  2 siblings, 0 replies; 11+ messages in thread
From: David Sterba @ 2019-12-30 15:08 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: David Sterba, Chris Mason, Josef Bacik, kernel-team, linux-btrfs

On Wed, Dec 11, 2019 at 04:07:06PM -0800, Dennis Zhou wrote:
> Compressed writes happen in the background via kworkers. However, this
> causes bios to be attributed to root bypassing any cgroup limits from
> the actual writer. We tag the first bio with REQ_CGROUP_PUNT, which will
> punt the bio to an appropriate cgroup specific workqueue and attribute
> the IO properly. However, if btrfs_submit_compressed_write() creates a
> new bio, we don't tag it the same way. Add the appropriate tagging for
> subsequent bios.
> 
> Fixes: ec39f7696ccfa ("Btrfs: use REQ_CGROUP_PUNT for worker thread submitted bios")
> Cc: Chris Mason <clm@fb.com>
> Signed-off-by: Dennis Zhou <dennis@kernel.org>

1 and 2 queued for 5.5-rc, thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, back to index

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-12  0:07 [PATCH 1/2] btrfs: punt all bios created in btrfs_submit_compressed_write() Dennis Zhou
2019-12-12  0:07 ` [PATCH 2/2] btrfs: fix compressed write bio attribution Dennis Zhou
2019-12-12 15:18   ` David Sterba
2019-12-12 16:19     ` Dennis Zhou
2019-12-12 18:19   ` [PATCH v2 " Dennis Zhou
2019-12-13 12:24     ` David Sterba
2019-12-13 22:21       ` Dennis Zhou
2019-12-17 15:05         ` David Sterba
2019-12-17 18:44           ` Dennis Zhou
2019-12-12  0:15 ` [PATCH 1/2] btrfs: punt all bios created in btrfs_submit_compressed_write() Chris Mason
2019-12-30 15:08 ` David Sterba

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git