* [PATCH 0/2] btrfs: qgroup: detect and fix leaked data reserved space @ 2020-06-07 7:25 Qu Wenruo 2020-06-07 7:25 ` [PATCH 1/2] btrfs: extent_io: fix qgroup reserved data space leakage when releasing a page Qu Wenruo 2020-06-07 7:25 ` [PATCH 2/2] btrfs: qgroup: catch reserved space leakage at unmount time Qu Wenruo 0 siblings, 2 replies; 12+ messages in thread From: Qu Wenruo @ 2020-06-07 7:25 UTC (permalink / raw) To: linux-btrfs There is an internal report complaining that qgroup is only half of the limit, but they still get EDQUOT errors. With some extra debugging patch added, it turns out that even fsstress with 15 steps can sometimes cause qgroup reserved data space to leak. This patch set is going to: - Fix the reserved data space leakage Mostly caused by missing btrfs_qgroup_free_data() call in release page. As I thought a dirty page either goes through finish_ordered_io(), or get invalidated directly. But due to the designed of delayed finish_ordered_io(), we can still get dirty page get released directly. - Add extra safenet to catch qgroup reserved space leakage. The existing test case btrfs/022 can already catch the bug pretty reliably. I will add a specific case for fstests if needed. Qu Wenruo (2): btrfs: extent_io: fix qgroup reserved data space leakage when releasing a page btrfs: qgroup: catch reserved space leakage at unmount time fs/btrfs/disk-io.c | 6 ++++++ fs/btrfs/extent_io.c | 34 +++++++++++++++++++++++++++------- fs/btrfs/qgroup.c | 43 +++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/qgroup.h | 2 +- 4 files changed, 77 insertions(+), 8 deletions(-) -- 2.26.2 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 1/2] btrfs: extent_io: fix qgroup reserved data space leakage when releasing a page 2020-06-07 7:25 [PATCH 0/2] btrfs: qgroup: detect and fix leaked data reserved space Qu Wenruo @ 2020-06-07 7:25 ` Qu Wenruo 2020-06-08 15:17 ` Josef Bacik 2020-06-07 7:25 ` [PATCH 2/2] btrfs: qgroup: catch reserved space leakage at unmount time Qu Wenruo 1 sibling, 1 reply; 12+ messages in thread From: Qu Wenruo @ 2020-06-07 7:25 UTC (permalink / raw) To: linux-btrfs [BUG] The following simple workload from fsstress can lead to qgroup reserved data space leakage: 0/0: creat f0 x:0 0 0 0/0: creat add id=0,parent=-1 0/1: write f0[259 1 0 0 0 0] [600030,27288] 0 0/4: dwrite - xfsctl(XFS_IOC_DIOINFO) f0[259 1 0 0 64 627318] return 25, fallback to stat() 0/4: dwrite f0[259 1 0 0 64 627318] [610304,106496] 0 This would cause btrfs qgroup to leak 20480 bytes for data reserved space. If btrfs qgroup limit is enabled, such leakage can lead to unexpected early EDQUOT and unusable space. [CAUSE] When doing direct IO, kernel will try to writeback existing buffered page cache, then invalidate them: iomap_dio_rw() |- filemap_write_and_wait_range(); |- invalidate_inode_pages2_range(); However for btrfs, the bi_end_io hook doesn't finish all its heavy work right after bio ends. In fact, it delays its work further: submit_extent_page(end_io_func=end_bio_extent_writepage); end_bio_extent_writepage() |- btrfs_writepage_endio_finish_ordered() |- btrfs_init_work(finish_ordered_fn); <<< Work queue execution >>> finish_ordered_fn() |- btrfs_finish_ordered_io(); |- Clear qgroup bits This means, when filemap_write_and_wait_range() returns, btrfs_finish_ordered_io() is not ensured to be executed, thus the qgroup bits for related range is not cleared. Now into how the leakage happens, this will only focus on the overlapping part of buffered and direct IO part. 1. After buffered write The inode had the following range with QGROUP_RESERVED bit: 596 616K |///////////////| Qgroup reserved data space: 20K 2. Writeback part for range [596K, 616K) Write back finished, but btrfs_finish_ordered_io() not get called yet. So we still have: 596K 616K |///////////////| Qgroup reserved data space: 20K 3. Pages for range [596K, 616K) get released This will clear all qgroup bits, but don't update the reserved data space. So we have: 596K 616K | | Qgroup reserved data space: 20K That number doesn't match with the qgroup bit range anymore. 4. Dio prepare space for range [596K, 700K) Qgroup reserved data space for that range, we got: 596K 616K 700K |///////////////|///////////////////////| Qgroup reserved data space: 20K + 104K = 124K 5. btrfs_finish_ordered_range() get executed for range [596K, 616K) Qgroup free reserved space for that range, we got: 596K 616K 700K | |///////////////////////| We need to free that range of reserved space. Qgroup reserved data space: 124K - 20K = 104K 6. btrfs_finish_ordered_range() get executed for range [596K, 700K) However qgroup bit for range [596K, 616K) is already cleared in previous step, so we only free 84K for qgroup reserved space. 596K 616K 700K | | | We need to free that range of reserved space. Qgroup reserved data space: 104K - 84K = 20K Now there is no way to release that 20K unless disabling qgroup or unmount the fs. [FIX] This patch will fix the problem by calling btrfs_qgroup_free_data() when a page is released. So that even a dirty page is released, its qgroup reserved data space will get freed along with it. Fixes: f695fdcef83a ("btrfs: qgroup: Introduce functions to release/free qgroup reserve data space") Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/extent_io.c | 34 +++++++++++++++++++++++++++------- 1 file changed, 27 insertions(+), 7 deletions(-) diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index c59e07360083..f86e11d571ea 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -24,6 +24,7 @@ #include "rcu-string.h" #include "backref.h" #include "disk-io.h" +#include "qgroup.h" static struct kmem_cache *extent_state_cache; static struct kmem_cache *extent_buffer_cache; @@ -4476,21 +4477,40 @@ static int try_release_extent_state(struct extent_io_tree *tree, if (test_range_bit(tree, start, end, EXTENT_LOCKED, 0, NULL)) { ret = 0; } else { + int tmp; + + /* + * When releasepage is called, some page can still be dirty and + * has qgroup reserved bit. + * + * This is caused by delayed endio work, the real endio work, + * finish_ordered_io(), is queued into another workqueue in + * bio endio function. + * Thus even writeback finishes, we do not clear dirty and + * qgroup bits immediately. + * + * So here we still need to clear qgroup bits at release page + * time, or we may leak qgroup reserved data space. + */ + tmp = btrfs_qgroup_free_data(page->mapping->host, NULL, start, + PAGE_SIZE); + if (tmp < 0) + ret = 0; + /* - * at this point we can safely clear everything except the - * locked bit and the nodatasum bit + * At this point we can safely clear everything except the + * locked bit and the nodatasum bit. */ - ret = __clear_extent_bit(tree, start, end, + tmp = __clear_extent_bit(tree, start, end, ~(EXTENT_LOCKED | EXTENT_NODATASUM), 0, 0, NULL, mask, NULL); - /* if clear_extent_bit failed for enomem reasons, + /* + * If clear_extent_bit failed for enomem reasons, * we can't allow the release to continue. */ - if (ret < 0) + if (tmp < 0) ret = 0; - else - ret = 1; } return ret; } -- 2.26.2 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] btrfs: extent_io: fix qgroup reserved data space leakage when releasing a page 2020-06-07 7:25 ` [PATCH 1/2] btrfs: extent_io: fix qgroup reserved data space leakage when releasing a page Qu Wenruo @ 2020-06-08 15:17 ` Josef Bacik 2020-06-09 1:01 ` Qu Wenruo 0 siblings, 1 reply; 12+ messages in thread From: Josef Bacik @ 2020-06-08 15:17 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs On 6/7/20 3:25 AM, Qu Wenruo wrote: > [BUG] > The following simple workload from fsstress can lead to qgroup reserved > data space leakage: > 0/0: creat f0 x:0 0 0 > 0/0: creat add id=0,parent=-1 > 0/1: write f0[259 1 0 0 0 0] [600030,27288] 0 > 0/4: dwrite - xfsctl(XFS_IOC_DIOINFO) f0[259 1 0 0 64 627318] return 25, fallback to stat() > 0/4: dwrite f0[259 1 0 0 64 627318] [610304,106496] 0 > > This would cause btrfs qgroup to leak 20480 bytes for data reserved > space. > If btrfs qgroup limit is enabled, such leakage can lead to unexpected > early EDQUOT and unusable space. > > [CAUSE] > When doing direct IO, kernel will try to writeback existing buffered > page cache, then invalidate them: > iomap_dio_rw() > |- filemap_write_and_wait_range(); > |- invalidate_inode_pages2_range(); > > However for btrfs, the bi_end_io hook doesn't finish all its heavy work > right after bio ends. > In fact, it delays its work further: > submit_extent_page(end_io_func=end_bio_extent_writepage); > end_bio_extent_writepage() > |- btrfs_writepage_endio_finish_ordered() > |- btrfs_init_work(finish_ordered_fn); > > <<< Work queue execution >>> > finish_ordered_fn() > |- btrfs_finish_ordered_io(); > |- Clear qgroup bits > > This means, when filemap_write_and_wait_range() returns, > btrfs_finish_ordered_io() is not ensured to be executed, thus the > qgroup bits for related range is not cleared. > > Now into how the leakage happens, this will only focus on the > overlapping part of buffered and direct IO part. > > 1. After buffered write > The inode had the following range with QGROUP_RESERVED bit: > 596 616K > |///////////////| > Qgroup reserved data space: 20K > > 2. Writeback part for range [596K, 616K) > Write back finished, but btrfs_finish_ordered_io() not get called > yet. > So we still have: > 596K 616K > |///////////////| > Qgroup reserved data space: 20K > > 3. Pages for range [596K, 616K) get released > This will clear all qgroup bits, but don't update the reserved data > space. > So we have: > 596K 616K > | | > Qgroup reserved data space: 20K > That number doesn't match with the qgroup bit range anymore. > > 4. Dio prepare space for range [596K, 700K) > Qgroup reserved data space for that range, we got: > 596K 616K 700K > |///////////////|///////////////////////| > Qgroup reserved data space: 20K + 104K = 124K > > 5. btrfs_finish_ordered_range() get executed for range [596K, 616K) > Qgroup free reserved space for that range, we got: > 596K 616K 700K > | |///////////////////////| > We need to free that range of reserved space. > Qgroup reserved data space: 124K - 20K = 104K > > 6. btrfs_finish_ordered_range() get executed for range [596K, 700K) > However qgroup bit for range [596K, 616K) is already cleared in > previous step, so we only free 84K for qgroup reserved space. > 596K 616K 700K > | | | > We need to free that range of reserved space. > Qgroup reserved data space: 104K - 84K = 20K > > Now there is no way to release that 20K unless disabling qgroup or > unmount the fs. > > [FIX] > This patch will fix the problem by calling btrfs_qgroup_free_data() when > a page is released. > > So that even a dirty page is released, its qgroup reserved data space > will get freed along with it. > > Fixes: f695fdcef83a ("btrfs: qgroup: Introduce functions to release/free qgroup reserve data space") > Signed-off-by: Qu Wenruo <wqu@suse.com> This seems backwards to me, and not in keeping with the actual lifetime of the changes. At the point that the ordered extent is created it is now in charge of the qgroup reservation, so it should be the ultimate arbiter of what is done with that qgroup reservation. So fix try_release_extent_state to not remove EXTENT_QGROUP_RESERVED, because it's going to get dropped elsewhere. Thanks, Josef ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/2] btrfs: extent_io: fix qgroup reserved data space leakage when releasing a page 2020-06-08 15:17 ` Josef Bacik @ 2020-06-09 1:01 ` Qu Wenruo 0 siblings, 0 replies; 12+ messages in thread From: Qu Wenruo @ 2020-06-09 1:01 UTC (permalink / raw) To: Josef Bacik, Qu Wenruo, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 4926 bytes --] On 2020/6/8 下午11:17, Josef Bacik wrote: > On 6/7/20 3:25 AM, Qu Wenruo wrote: >> [BUG] >> The following simple workload from fsstress can lead to qgroup reserved >> data space leakage: >> 0/0: creat f0 x:0 0 0 >> 0/0: creat add id=0,parent=-1 >> 0/1: write f0[259 1 0 0 0 0] [600030,27288] 0 >> 0/4: dwrite - xfsctl(XFS_IOC_DIOINFO) f0[259 1 0 0 64 627318] >> return 25, fallback to stat() >> 0/4: dwrite f0[259 1 0 0 64 627318] [610304,106496] 0 >> >> This would cause btrfs qgroup to leak 20480 bytes for data reserved >> space. >> If btrfs qgroup limit is enabled, such leakage can lead to unexpected >> early EDQUOT and unusable space. >> >> [CAUSE] >> When doing direct IO, kernel will try to writeback existing buffered >> page cache, then invalidate them: >> iomap_dio_rw() >> |- filemap_write_and_wait_range(); >> |- invalidate_inode_pages2_range(); >> >> However for btrfs, the bi_end_io hook doesn't finish all its heavy work >> right after bio ends. >> In fact, it delays its work further: >> submit_extent_page(end_io_func=end_bio_extent_writepage); >> end_bio_extent_writepage() >> |- btrfs_writepage_endio_finish_ordered() >> |- btrfs_init_work(finish_ordered_fn); >> >> <<< Work queue execution >>> >> finish_ordered_fn() >> |- btrfs_finish_ordered_io(); >> |- Clear qgroup bits >> >> This means, when filemap_write_and_wait_range() returns, >> btrfs_finish_ordered_io() is not ensured to be executed, thus the >> qgroup bits for related range is not cleared. >> >> Now into how the leakage happens, this will only focus on the >> overlapping part of buffered and direct IO part. >> >> 1. After buffered write >> The inode had the following range with QGROUP_RESERVED bit: >> 596 616K >> |///////////////| >> Qgroup reserved data space: 20K >> >> 2. Writeback part for range [596K, 616K) >> Write back finished, but btrfs_finish_ordered_io() not get called >> yet. >> So we still have: >> 596K 616K >> |///////////////| >> Qgroup reserved data space: 20K >> >> 3. Pages for range [596K, 616K) get released >> This will clear all qgroup bits, but don't update the reserved data >> space. >> So we have: >> 596K 616K >> | | >> Qgroup reserved data space: 20K >> That number doesn't match with the qgroup bit range anymore. >> >> 4. Dio prepare space for range [596K, 700K) >> Qgroup reserved data space for that range, we got: >> 596K 616K 700K >> |///////////////|///////////////////////| >> Qgroup reserved data space: 20K + 104K = 124K >> >> 5. btrfs_finish_ordered_range() get executed for range [596K, 616K) >> Qgroup free reserved space for that range, we got: >> 596K 616K 700K >> | |///////////////////////| >> We need to free that range of reserved space. >> Qgroup reserved data space: 124K - 20K = 104K >> >> 6. btrfs_finish_ordered_range() get executed for range [596K, 700K) >> However qgroup bit for range [596K, 616K) is already cleared in >> previous step, so we only free 84K for qgroup reserved space. >> 596K 616K 700K >> | | | >> We need to free that range of reserved space. >> Qgroup reserved data space: 104K - 84K = 20K >> >> Now there is no way to release that 20K unless disabling qgroup or >> unmount the fs. >> >> [FIX] >> This patch will fix the problem by calling btrfs_qgroup_free_data() when >> a page is released. >> >> So that even a dirty page is released, its qgroup reserved data space >> will get freed along with it. >> >> Fixes: f695fdcef83a ("btrfs: qgroup: Introduce functions to >> release/free qgroup reserve data space") >> Signed-off-by: Qu Wenruo <wqu@suse.com> > > This seems backwards to me, and not in keeping with the actual lifetime > of the changes. At the point that the ordered extent is created it is > now in charge of the qgroup reservation, so it should be the ultimate > arbiter of what is done with that qgroup reservation. So fix > try_release_extent_state to not remove EXTENT_QGROUP_RESERVED, because > it's going to get dropped elsewhere. Thanks, Indeed, doing the qgroup rsv work in ordered extent looks more reasonable. Although that change would make a lot of timing completely different, and won't go as smooth in the first run, it still looks like a more proper fix. Thanks for the advice, Qu > > Josef [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 2/2] btrfs: qgroup: catch reserved space leakage at unmount time 2020-06-07 7:25 [PATCH 0/2] btrfs: qgroup: detect and fix leaked data reserved space Qu Wenruo 2020-06-07 7:25 ` [PATCH 1/2] btrfs: extent_io: fix qgroup reserved data space leakage when releasing a page Qu Wenruo @ 2020-06-07 7:25 ` Qu Wenruo 2020-06-08 6:58 ` Nikolay Borisov ` (2 more replies) 1 sibling, 3 replies; 12+ messages in thread From: Qu Wenruo @ 2020-06-07 7:25 UTC (permalink / raw) To: linux-btrfs Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/disk-io.c | 6 ++++++ fs/btrfs/qgroup.c | 43 +++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/qgroup.h | 2 +- 3 files changed, 50 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index f8ec2d8606fd..48d047e64461 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -4058,6 +4058,12 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) ASSERT(list_empty(&fs_info->delayed_iputs)); set_bit(BTRFS_FS_CLOSING_DONE, &fs_info->flags); + if (btrfs_qgroup_has_leak(fs_info)) { + WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG), + KERN_ERR "BTRFS: qgroup reserved space leaked\n"); + btrfs_err(fs_info, "qgroup reserved space leaked\n"); + } + btrfs_free_qgroup_config(fs_info); ASSERT(list_empty(&fs_info->delalloc_roots)); diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c index 5bd4089ad0e1..3fccf2ffdcf1 100644 --- a/fs/btrfs/qgroup.c +++ b/fs/btrfs/qgroup.c @@ -505,6 +505,49 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info) return ret < 0 ? ret : 0; } +static u64 btrfs_qgroup_subvolid(u64 qgroupid) +{ + return (qgroupid & ((1ULL << BTRFS_QGROUP_LEVEL_SHIFT) - 1)); +} +/* + * Get called for close_ctree() when quota is still enabled. + * This verifies we don't leak some reserved space. + * + * Return false if no reserved space is left. + * Return true if some reserved space is leaked. + */ +bool btrfs_qgroup_has_leak(struct btrfs_fs_info *fs_info) +{ + struct btrfs_qgroup *qgroup; + struct rb_node *node; + bool ret = false; + + if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) + return ret; + /* + * Since we're unmounting, there is no race and no need to grab + * qgroup lock. + * And here we don't go post order to provide a more user friendly + * sorted result. + */ + for (node = rb_first(&fs_info->qgroup_tree); node; node = rb_next(node)) { + int i; + + qgroup = rb_entry(node, struct btrfs_qgroup, node); + for (i = 0; i < BTRFS_QGROUP_RSV_LAST; i++) { + if (qgroup->rsv.values[i]) { + ret = true; + btrfs_warn(fs_info, + "qgroup %llu/%llu has unreleased space, type=%d rsv=%llu", + btrfs_qgroup_level(qgroup->qgroupid), + btrfs_qgroup_subvolid(qgroup->qgroupid), + i, qgroup->rsv.values[i]); + } + } + } + return ret; +} + /* * This is called from close_ctree() or open_ctree() or btrfs_quota_disable(), * first two are in single-threaded paths.And for the third one, we have set diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h index 1bc654459469..e3e9f9df8320 100644 --- a/fs/btrfs/qgroup.h +++ b/fs/btrfs/qgroup.h @@ -415,5 +415,5 @@ int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans, int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans, struct btrfs_root *root, struct extent_buffer *eb); void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans); - +bool btrfs_qgroup_has_leak(struct btrfs_fs_info *fs_info); #endif -- 2.26.2 ^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] btrfs: qgroup: catch reserved space leakage at unmount time 2020-06-07 7:25 ` [PATCH 2/2] btrfs: qgroup: catch reserved space leakage at unmount time Qu Wenruo @ 2020-06-08 6:58 ` Nikolay Borisov 2020-06-08 7:22 ` Qu Wenruo 2020-06-08 7:20 ` Michał Mirosław 2020-06-09 18:46 ` David Sterba 2 siblings, 1 reply; 12+ messages in thread From: Nikolay Borisov @ 2020-06-08 6:58 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs On 7.06.20 г. 10:25 ч., Qu Wenruo wrote: > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/disk-io.c | 6 ++++++ > fs/btrfs/qgroup.c | 43 +++++++++++++++++++++++++++++++++++++++++++ > fs/btrfs/qgroup.h | 2 +- > 3 files changed, 50 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index f8ec2d8606fd..48d047e64461 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -4058,6 +4058,12 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) > ASSERT(list_empty(&fs_info->delayed_iputs)); > set_bit(BTRFS_FS_CLOSING_DONE, &fs_info->flags); > > + if (btrfs_qgroup_has_leak(fs_info)) { > + WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG), > + KERN_ERR "BTRFS: qgroup reserved space leaked\n"); > + btrfs_err(fs_info, "qgroup reserved space leaked\n"); I don't think the message from the WARN() brings any value, it's simply duplicated by the btrfs_err. IMO it's more concise to do: WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG)); btrfs_err(qgroup reserved space leaked); without losing any information whatsoever. > + } Is it safe calling this code here? workqueues are being destroyed after it in btrfs_stop_all_workers so it's possible that they have some lingering work which in turn might cause false positive in this check ? > + > btrfs_free_qgroup_config(fs_info); > ASSERT(list_empty(&fs_info->delalloc_roots)); > > diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c > index 5bd4089ad0e1..3fccf2ffdcf1 100644 > --- a/fs/btrfs/qgroup.c > +++ b/fs/btrfs/qgroup.c > @@ -505,6 +505,49 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info) > return ret < 0 ? ret : 0; > } > > +static u64 btrfs_qgroup_subvolid(u64 qgroupid) > +{ > + return (qgroupid & ((1ULL << BTRFS_QGROUP_LEVEL_SHIFT) - 1)); > +} > +/* > + * Get called for close_ctree() when quota is still enabled. > + * This verifies we don't leak some reserved space. > + * > + * Return false if no reserved space is left. > + * Return true if some reserved space is leaked. > + */ > +bool btrfs_qgroup_has_leak(struct btrfs_fs_info *fs_info) > +{ > + struct btrfs_qgroup *qgroup; nit:This variable is used only in the loop below just define it there to reduce its scope. > + struct rb_node *node; > + bool ret = false; > + > + if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) > + return ret; > + /* > + * Since we're unmounting, there is no race and no need to grab > + * qgroup lock. > + * And here we don't go post order to provide a more user friendly > + * sorted result. > + */ > + for (node = rb_first(&fs_info->qgroup_tree); node; node = rb_next(node)) { > + int i; > + > + qgroup = rb_entry(node, struct btrfs_qgroup, node); > + for (i = 0; i < BTRFS_QGROUP_RSV_LAST; i++) { > + if (qgroup->rsv.values[i]) { > + ret = true; > + btrfs_warn(fs_info, > + "qgroup %llu/%llu has unreleased space, type=%d rsv=%llu", > + btrfs_qgroup_level(qgroup->qgroupid), > + btrfs_qgroup_subvolid(qgroup->qgroupid), > + i, qgroup->rsv.values[i]); > + } > + } > + } > + return ret; > +} > + > /* > * This is called from close_ctree() or open_ctree() or btrfs_quota_disable(), > * first two are in single-threaded paths.And for the third one, we have set > diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h > index 1bc654459469..e3e9f9df8320 100644 > --- a/fs/btrfs/qgroup.h > +++ b/fs/btrfs/qgroup.h > @@ -415,5 +415,5 @@ int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans, > int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans, > struct btrfs_root *root, struct extent_buffer *eb); > void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans); > - > +bool btrfs_qgroup_has_leak(struct btrfs_fs_info *fs_info); > #endif > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] btrfs: qgroup: catch reserved space leakage at unmount time 2020-06-08 6:58 ` Nikolay Borisov @ 2020-06-08 7:22 ` Qu Wenruo 0 siblings, 0 replies; 12+ messages in thread From: Qu Wenruo @ 2020-06-08 7:22 UTC (permalink / raw) To: Nikolay Borisov, Qu Wenruo, linux-btrfs On 2020/6/8 下午2:58, Nikolay Borisov wrote: > > > On 7.06.20 г. 10:25 ч., Qu Wenruo wrote: >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/disk-io.c | 6 ++++++ >> fs/btrfs/qgroup.c | 43 +++++++++++++++++++++++++++++++++++++++++++ >> fs/btrfs/qgroup.h | 2 +- >> 3 files changed, 50 insertions(+), 1 deletion(-) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index f8ec2d8606fd..48d047e64461 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -4058,6 +4058,12 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) >> ASSERT(list_empty(&fs_info->delayed_iputs)); >> set_bit(BTRFS_FS_CLOSING_DONE, &fs_info->flags); >> >> + if (btrfs_qgroup_has_leak(fs_info)) { >> + WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG), >> + KERN_ERR "BTRFS: qgroup reserved space leaked\n"); >> + btrfs_err(fs_info, "qgroup reserved space leaked\n"); > I don't think the message from the WARN() brings any value, it's simply > duplicated by the btrfs_err. IMO it's more concise to do: > > WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG)); > btrfs_err(qgroup reserved space leaked); > > without losing any information whatsoever. Makes sense, also another cleanup item for existing similar cases. > >> + } > > Is it safe calling this code here? workqueues are being destroyed after > it in btrfs_stop_all_workers so it's possible that they have some > lingering work which in turn might cause false positive in this check ? The safety here is as safe as other calls in close_ctree(), we expect no new trans started nor existing running trans (finish ordered io call), and no dirty pages (invalidate/release page call) So at this timing there should be nothing to modify qgroup and we're safe. Or did I miss something for the close_ctree() context? >> + >> btrfs_free_qgroup_config(fs_info); >> ASSERT(list_empty(&fs_info->delalloc_roots)); >> >> diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c >> index 5bd4089ad0e1..3fccf2ffdcf1 100644 >> --- a/fs/btrfs/qgroup.c >> +++ b/fs/btrfs/qgroup.c >> @@ -505,6 +505,49 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info) >> return ret < 0 ? ret : 0; >> } >> >> +static u64 btrfs_qgroup_subvolid(u64 qgroupid) >> +{ >> + return (qgroupid & ((1ULL << BTRFS_QGROUP_LEVEL_SHIFT) - 1)); >> +} >> +/* >> + * Get called for close_ctree() when quota is still enabled. >> + * This verifies we don't leak some reserved space. >> + * >> + * Return false if no reserved space is left. >> + * Return true if some reserved space is leaked. >> + */ >> +bool btrfs_qgroup_has_leak(struct btrfs_fs_info *fs_info) >> +{ >> + struct btrfs_qgroup *qgroup; > > nit:This variable is used only in the loop below just define it there to > reduce its scope. Sure. Thanks, Qu > >> + struct rb_node *node; >> + bool ret = false; >> + >> + if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) >> + return ret; >> + /* >> + * Since we're unmounting, there is no race and no need to grab >> + * qgroup lock. >> + * And here we don't go post order to provide a more user friendly >> + * sorted result. >> + */ >> + for (node = rb_first(&fs_info->qgroup_tree); node; node = rb_next(node)) { >> + int i; >> + >> + qgroup = rb_entry(node, struct btrfs_qgroup, node); >> + for (i = 0; i < BTRFS_QGROUP_RSV_LAST; i++) { >> + if (qgroup->rsv.values[i]) { >> + ret = true; >> + btrfs_warn(fs_info, >> + "qgroup %llu/%llu has unreleased space, type=%d rsv=%llu", >> + btrfs_qgroup_level(qgroup->qgroupid), >> + btrfs_qgroup_subvolid(qgroup->qgroupid), >> + i, qgroup->rsv.values[i]); >> + } >> + } >> + } >> + return ret; >> +} >> + >> /* >> * This is called from close_ctree() or open_ctree() or btrfs_quota_disable(), >> * first two are in single-threaded paths.And for the third one, we have set >> diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h >> index 1bc654459469..e3e9f9df8320 100644 >> --- a/fs/btrfs/qgroup.h >> +++ b/fs/btrfs/qgroup.h >> @@ -415,5 +415,5 @@ int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans, >> int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans, >> struct btrfs_root *root, struct extent_buffer *eb); >> void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans); >> - >> +bool btrfs_qgroup_has_leak(struct btrfs_fs_info *fs_info); >> #endif >> ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] btrfs: qgroup: catch reserved space leakage at unmount time 2020-06-07 7:25 ` [PATCH 2/2] btrfs: qgroup: catch reserved space leakage at unmount time Qu Wenruo 2020-06-08 6:58 ` Nikolay Borisov @ 2020-06-08 7:20 ` Michał Mirosław 2020-06-08 7:24 ` Qu Wenruo 2020-06-09 18:46 ` David Sterba 2 siblings, 1 reply; 12+ messages in thread From: Michał Mirosław @ 2020-06-08 7:20 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Sun, Jun 07, 2020 at 03:25:12PM +0800, Qu Wenruo wrote: > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/disk-io.c | 6 ++++++ > fs/btrfs/qgroup.c | 43 +++++++++++++++++++++++++++++++++++++++++++ > fs/btrfs/qgroup.h | 2 +- > 3 files changed, 50 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index f8ec2d8606fd..48d047e64461 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -4058,6 +4058,12 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) > ASSERT(list_empty(&fs_info->delayed_iputs)); > set_bit(BTRFS_FS_CLOSING_DONE, &fs_info->flags); > > + if (btrfs_qgroup_has_leak(fs_info)) { > + WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG), > + KERN_ERR "BTRFS: qgroup reserved space leaked\n"); > + btrfs_err(fs_info, "qgroup reserved space leaked\n"); > + } This looks like debugging aid, so: if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) btrfs_check_qgroup_leak(fs_info); would be more readable (WARN() pushed to the function). Best Regards, Michał Mirosław ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] btrfs: qgroup: catch reserved space leakage at unmount time 2020-06-08 7:20 ` Michał Mirosław @ 2020-06-08 7:24 ` Qu Wenruo 2020-06-08 7:44 ` Michał Mirosław 0 siblings, 1 reply; 12+ messages in thread From: Qu Wenruo @ 2020-06-08 7:24 UTC (permalink / raw) To: Michał Mirosław, Qu Wenruo; +Cc: linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 1359 bytes --] On 2020/6/8 下午3:20, Michał Mirosław wrote: > On Sun, Jun 07, 2020 at 03:25:12PM +0800, Qu Wenruo wrote: >> Signed-off-by: Qu Wenruo <wqu@suse.com> >> --- >> fs/btrfs/disk-io.c | 6 ++++++ >> fs/btrfs/qgroup.c | 43 +++++++++++++++++++++++++++++++++++++++++++ >> fs/btrfs/qgroup.h | 2 +- >> 3 files changed, 50 insertions(+), 1 deletion(-) >> >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >> index f8ec2d8606fd..48d047e64461 100644 >> --- a/fs/btrfs/disk-io.c >> +++ b/fs/btrfs/disk-io.c >> @@ -4058,6 +4058,12 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) >> ASSERT(list_empty(&fs_info->delayed_iputs)); >> set_bit(BTRFS_FS_CLOSING_DONE, &fs_info->flags); >> >> + if (btrfs_qgroup_has_leak(fs_info)) { >> + WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG), >> + KERN_ERR "BTRFS: qgroup reserved space leaked\n"); >> + btrfs_err(fs_info, "qgroup reserved space leaked\n"); >> + } > > This looks like debugging aid, so: > > if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) > btrfs_check_qgroup_leak(fs_info); > > would be more readable (WARN() pushed to the function). We want to check to be executed even on production system, but just less noisy (no kernel backtrace dump). Just like tree-checker and EXTENT_QUOTA_RESERVED check. Thanks, Qu > > Best Regards, > Michał Mirosław > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] btrfs: qgroup: catch reserved space leakage at unmount time 2020-06-08 7:24 ` Qu Wenruo @ 2020-06-08 7:44 ` Michał Mirosław 2020-06-08 9:37 ` Qu Wenruo 0 siblings, 1 reply; 12+ messages in thread From: Michał Mirosław @ 2020-06-08 7:44 UTC (permalink / raw) To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs On Mon, Jun 08, 2020 at 03:24:10PM +0800, Qu Wenruo wrote: > On 2020/6/8 下午3:20, Michał Mirosław wrote: > > On Sun, Jun 07, 2020 at 03:25:12PM +0800, Qu Wenruo wrote: > >> Signed-off-by: Qu Wenruo <wqu@suse.com> > >> --- > >> fs/btrfs/disk-io.c | 6 ++++++ > >> fs/btrfs/qgroup.c | 43 +++++++++++++++++++++++++++++++++++++++++++ > >> fs/btrfs/qgroup.h | 2 +- > >> 3 files changed, 50 insertions(+), 1 deletion(-) > >> > >> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > >> index f8ec2d8606fd..48d047e64461 100644 > >> --- a/fs/btrfs/disk-io.c > >> +++ b/fs/btrfs/disk-io.c > >> @@ -4058,6 +4058,12 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) > >> ASSERT(list_empty(&fs_info->delayed_iputs)); > >> set_bit(BTRFS_FS_CLOSING_DONE, &fs_info->flags); > >> > >> + if (btrfs_qgroup_has_leak(fs_info)) { > >> + WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG), > >> + KERN_ERR "BTRFS: qgroup reserved space leaked\n"); > >> + btrfs_err(fs_info, "qgroup reserved space leaked\n"); > >> + } > > > > This looks like debugging aid, so: > > > > if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) > > btrfs_check_qgroup_leak(fs_info); > > > > would be more readable (WARN() pushed to the function). > > We want to check to be executed even on production system, but just less > noisy (no kernel backtrace dump). > Just like tree-checker and EXTENT_QUOTA_RESERVED check. In that case I suggest: btrfs_err(...); WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG)); as I expect people look for messages before the Oops for more information. Best Regards, Michał Mirosław ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] btrfs: qgroup: catch reserved space leakage at unmount time 2020-06-08 7:44 ` Michał Mirosław @ 2020-06-08 9:37 ` Qu Wenruo 0 siblings, 0 replies; 12+ messages in thread From: Qu Wenruo @ 2020-06-08 9:37 UTC (permalink / raw) To: Michał Mirosław, Qu Wenruo; +Cc: linux-btrfs On 2020/6/8 下午3:44, Michał Mirosław wrote: > On Mon, Jun 08, 2020 at 03:24:10PM +0800, Qu Wenruo wrote: >> On 2020/6/8 下午3:20, Michał Mirosław wrote: >>> On Sun, Jun 07, 2020 at 03:25:12PM +0800, Qu Wenruo wrote: >>>> Signed-off-by: Qu Wenruo <wqu@suse.com> >>>> --- >>>> fs/btrfs/disk-io.c | 6 ++++++ >>>> fs/btrfs/qgroup.c | 43 +++++++++++++++++++++++++++++++++++++++++++ >>>> fs/btrfs/qgroup.h | 2 +- >>>> 3 files changed, 50 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c >>>> index f8ec2d8606fd..48d047e64461 100644 >>>> --- a/fs/btrfs/disk-io.c >>>> +++ b/fs/btrfs/disk-io.c >>>> @@ -4058,6 +4058,12 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) >>>> ASSERT(list_empty(&fs_info->delayed_iputs)); >>>> set_bit(BTRFS_FS_CLOSING_DONE, &fs_info->flags); >>>> >>>> + if (btrfs_qgroup_has_leak(fs_info)) { >>>> + WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG), >>>> + KERN_ERR "BTRFS: qgroup reserved space leaked\n"); >>>> + btrfs_err(fs_info, "qgroup reserved space leaked\n"); >>>> + } >>> >>> This looks like debugging aid, so: >>> >>> if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) >>> btrfs_check_qgroup_leak(fs_info); >>> >>> would be more readable (WARN() pushed to the function). >> >> We want to check to be executed even on production system, but just less >> noisy (no kernel backtrace dump). >> Just like tree-checker and EXTENT_QUOTA_RESERVED check. > > In that case I suggest: > > btrfs_err(...); > WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG)); > > as I expect people look for messages before the Oops for more information. Yep, exactly what Nik suggested and what I would do in next version. Thanks, Qu > > Best Regards, > Michał Mirosław > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 2/2] btrfs: qgroup: catch reserved space leakage at unmount time 2020-06-07 7:25 ` [PATCH 2/2] btrfs: qgroup: catch reserved space leakage at unmount time Qu Wenruo 2020-06-08 6:58 ` Nikolay Borisov 2020-06-08 7:20 ` Michał Mirosław @ 2020-06-09 18:46 ` David Sterba 2 siblings, 0 replies; 12+ messages in thread From: David Sterba @ 2020-06-09 18:46 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Sun, Jun 07, 2020 at 03:25:12PM +0800, Qu Wenruo wrote: Please write some changelog. > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/disk-io.c | 6 ++++++ > fs/btrfs/qgroup.c | 43 +++++++++++++++++++++++++++++++++++++++++++ > fs/btrfs/qgroup.h | 2 +- > 3 files changed, 50 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c > index f8ec2d8606fd..48d047e64461 100644 > --- a/fs/btrfs/disk-io.c > +++ b/fs/btrfs/disk-io.c > @@ -4058,6 +4058,12 @@ void __cold close_ctree(struct btrfs_fs_info *fs_info) > ASSERT(list_empty(&fs_info->delayed_iputs)); > set_bit(BTRFS_FS_CLOSING_DONE, &fs_info->flags); > > + if (btrfs_qgroup_has_leak(fs_info)) { > + WARN(IS_ENABLED(CONFIG_BTRFS_DEBUG), > + KERN_ERR "BTRFS: qgroup reserved space leaked\n"); > + btrfs_err(fs_info, "qgroup reserved space leaked\n"); No newline in the btrfs_err strings. > + } > + > btrfs_free_qgroup_config(fs_info); > ASSERT(list_empty(&fs_info->delalloc_roots)); > > diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c > index 5bd4089ad0e1..3fccf2ffdcf1 100644 > --- a/fs/btrfs/qgroup.c > +++ b/fs/btrfs/qgroup.c > @@ -505,6 +505,49 @@ int btrfs_read_qgroup_config(struct btrfs_fs_info *fs_info) > return ret < 0 ? ret : 0; > } > > +static u64 btrfs_qgroup_subvolid(u64 qgroupid) > +{ > + return (qgroupid & ((1ULL << BTRFS_QGROUP_LEVEL_SHIFT) - 1)); > +} Missing newline. > +/* > + * Get called for close_ctree() when quota is still enabled. > + * This verifies we don't leak some reserved space. > + * > + * Return false if no reserved space is left. > + * Return true if some reserved space is leaked. > + */ > +bool btrfs_qgroup_has_leak(struct btrfs_fs_info *fs_info) I think we've been naming such functions with 'check' eg. btrfs_check_quota_leak, and return true/false. > +{ > + struct btrfs_qgroup *qgroup; > + struct rb_node *node; > + bool ret = false; > + > + if (!test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags)) > + return ret; > + /* > + * Since we're unmounting, there is no race and no need to grab > + * qgroup lock. > + * And here we don't go post order to provide a more user friendly > + * sorted result. > + */ > + for (node = rb_first(&fs_info->qgroup_tree); node; node = rb_next(node)) { > + int i; > + > + qgroup = rb_entry(node, struct btrfs_qgroup, node); > + for (i = 0; i < BTRFS_QGROUP_RSV_LAST; i++) { > + if (qgroup->rsv.values[i]) { > + ret = true; > + btrfs_warn(fs_info, > + "qgroup %llu/%llu has unreleased space, type=%d rsv=%llu", If this could be potentially noisy, the ratelimited version would be more suitable. The message wording sounds ok, as it points to the qgroups and there's one global message printed from close_ctree that says it's a 'leak'. > + btrfs_qgroup_level(qgroup->qgroupid), > + btrfs_qgroup_subvolid(qgroup->qgroupid), > + i, qgroup->rsv.values[i]); > + } > + } > + } > + return ret; > +} > + > /* > * This is called from close_ctree() or open_ctree() or btrfs_quota_disable(), > * first two are in single-threaded paths.And for the third one, we have set > diff --git a/fs/btrfs/qgroup.h b/fs/btrfs/qgroup.h > index 1bc654459469..e3e9f9df8320 100644 > --- a/fs/btrfs/qgroup.h > +++ b/fs/btrfs/qgroup.h > @@ -415,5 +415,5 @@ int btrfs_qgroup_add_swapped_blocks(struct btrfs_trans_handle *trans, > int btrfs_qgroup_trace_subtree_after_cow(struct btrfs_trans_handle *trans, > struct btrfs_root *root, struct extent_buffer *eb); > void btrfs_qgroup_destroy_extent_records(struct btrfs_transaction *trans); > - > +bool btrfs_qgroup_has_leak(struct btrfs_fs_info *fs_info); So when I'm pointing out newlines, please keep one between the text and the last #endif. Thanks. > #endif > -- > 2.26.2 ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2020-06-09 18:46 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-06-07 7:25 [PATCH 0/2] btrfs: qgroup: detect and fix leaked data reserved space Qu Wenruo 2020-06-07 7:25 ` [PATCH 1/2] btrfs: extent_io: fix qgroup reserved data space leakage when releasing a page Qu Wenruo 2020-06-08 15:17 ` Josef Bacik 2020-06-09 1:01 ` Qu Wenruo 2020-06-07 7:25 ` [PATCH 2/2] btrfs: qgroup: catch reserved space leakage at unmount time Qu Wenruo 2020-06-08 6:58 ` Nikolay Borisov 2020-06-08 7:22 ` Qu Wenruo 2020-06-08 7:20 ` Michał Mirosław 2020-06-08 7:24 ` Qu Wenruo 2020-06-08 7:44 ` Michał Mirosław 2020-06-08 9:37 ` Qu Wenruo 2020-06-09 18:46 ` David Sterba
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.