* [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports @ 2019-12-11 5:00 Qu Wenruo 2019-12-11 5:00 ` [PATCH 1/3] btrfs: relocation: Fix a KASAN use-after-free bug due to extended reloc tree lifespan Qu Wenruo ` (3 more replies) 0 siblings, 4 replies; 18+ messages in thread From: Qu Wenruo @ 2019-12-11 5:00 UTC (permalink / raw) To: linux-btrfs Due to commit d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots"), reloc tree lifespan is extended. Although we always set root->reloc_root to NULL before we drop the reloc tree, but that's not multi-core safe since we have no proper memory barrier to ensure other cores can see the same root->reloc_root. The proper root fix should be some proper root refcount, and make btrfs_drop_snapshot() to wait for all other root owner to release the root before dropping it. But for now, let's just check the DEAD_RELOC_ROOT bit before accessing root->reloc_root. Qu Wenruo (3): btrfs: relocation: Fix a KASAN use-after-free bug due to extended reloc tree lifespan btrfs: relocation: Fix KASAN report on create_reloc_tree due to extended reloc tree lifepsan btrfs: relocation: Fix a KASAN report on btrfs_reloc_pre_snapshot() due to extended reloc root lifespan fs/btrfs/relocation.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) -- 2.24.0 ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/3] btrfs: relocation: Fix a KASAN use-after-free bug due to extended reloc tree lifespan 2019-12-11 5:00 [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports Qu Wenruo @ 2019-12-11 5:00 ` Qu Wenruo 2019-12-11 14:53 ` Josef Bacik 2019-12-11 5:00 ` [PATCH 2/3] btrfs: relocation: Fix KASAN report on create_reloc_tree due to extended reloc tree lifepsan Qu Wenruo ` (2 subsequent siblings) 3 siblings, 1 reply; 18+ messages in thread From: Qu Wenruo @ 2019-12-11 5:00 UTC (permalink / raw) To: linux-btrfs; +Cc: Zygo Blaxell [BUG] When running a workload with balance start/cancel, snapshot creation and deletion, and fsstress, we can get the following KASAN report: ================================================================== BUG: KASAN: use-after-free in should_ignore_root+0x54/0xb0 [btrfs] Read of size 8 at addr ffff888146e340f0 by task btrfs/1216 CPU: 6 PID: 1216 Comm: btrfs Tainted: G O 5.5.0-rc1-custom+ #40 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack+0xc2/0x11a print_address_description.constprop.0+0x20/0x210 __kasan_report.cold+0x1b/0x41 kasan_report+0x12/0x20 check_memory_region+0x13c/0x1b0 __asan_loadN+0xf/0x20 should_ignore_root+0x54/0xb0 [btrfs] build_backref_tree+0x11af/0x2280 [btrfs] relocate_tree_blocks+0x391/0xb80 [btrfs] relocate_block_group+0x3e5/0xa00 [btrfs] btrfs_relocate_block_group+0x240/0x4d0 [btrfs] btrfs_relocate_chunk+0x53/0xf0 [btrfs] btrfs_balance+0xc91/0x1840 [btrfs] btrfs_ioctl_balance+0x416/0x4e0 [btrfs] btrfs_ioctl+0x8af/0x3e60 [btrfs] do_vfs_ioctl+0x831/0xb10 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x43/0x50 do_syscall_64+0x79/0xe0 entry_SYSCALL_64_after_hwframe+0x49/0xbe [CAUSE] When should_ignore_root() accessing root->reloc_root, the reloc_root can be being dropped, thus root->reloc_root can be unreliable. [FIX] Check DEAD_RELOC_ROOT bit before accessing root->reloc_root. Furthermore in the context of should_ignore_root(), if the root has already gone through tree merge, we don't need to trace that root anymore. Thus we need to return 1, for root with dead reloc root. Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots") Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/relocation.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index d897a8e5e430..bb41b981e493 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -525,6 +525,10 @@ static int should_ignore_root(struct btrfs_root *root) if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state)) return 0; + /* This root has been merged with its reloc tree, so we can ignore it */ + if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state)) + return 1; + reloc_root = root->reloc_root; if (!reloc_root) return 0; -- 2.24.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 1/3] btrfs: relocation: Fix a KASAN use-after-free bug due to extended reloc tree lifespan 2019-12-11 5:00 ` [PATCH 1/3] btrfs: relocation: Fix a KASAN use-after-free bug due to extended reloc tree lifespan Qu Wenruo @ 2019-12-11 14:53 ` Josef Bacik 0 siblings, 0 replies; 18+ messages in thread From: Josef Bacik @ 2019-12-11 14:53 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs; +Cc: Zygo Blaxell On 12/11/19 12:00 AM, Qu Wenruo wrote: > [BUG] > When running a workload with balance start/cancel, snapshot creation and > deletion, and fsstress, we can get the following KASAN report: > > ================================================================== > BUG: KASAN: use-after-free in should_ignore_root+0x54/0xb0 [btrfs] > Read of size 8 at addr ffff888146e340f0 by task btrfs/1216 > > CPU: 6 PID: 1216 Comm: btrfs Tainted: G O 5.5.0-rc1-custom+ #40 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 > Call Trace: > dump_stack+0xc2/0x11a > print_address_description.constprop.0+0x20/0x210 > __kasan_report.cold+0x1b/0x41 > kasan_report+0x12/0x20 > check_memory_region+0x13c/0x1b0 > __asan_loadN+0xf/0x20 > should_ignore_root+0x54/0xb0 [btrfs] > build_backref_tree+0x11af/0x2280 [btrfs] > relocate_tree_blocks+0x391/0xb80 [btrfs] > relocate_block_group+0x3e5/0xa00 [btrfs] > btrfs_relocate_block_group+0x240/0x4d0 [btrfs] > btrfs_relocate_chunk+0x53/0xf0 [btrfs] > btrfs_balance+0xc91/0x1840 [btrfs] > btrfs_ioctl_balance+0x416/0x4e0 [btrfs] > btrfs_ioctl+0x8af/0x3e60 [btrfs] > do_vfs_ioctl+0x831/0xb10 > ksys_ioctl+0x67/0x90 > __x64_sys_ioctl+0x43/0x50 > do_syscall_64+0x79/0xe0 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > [CAUSE] > When should_ignore_root() accessing root->reloc_root, the reloc_root can > be being dropped, thus root->reloc_root can be unreliable. > > [FIX] > Check DEAD_RELOC_ROOT bit before accessing root->reloc_root. > > Furthermore in the context of should_ignore_root(), if the root has > already gone through tree merge, we don't need to trace that root > anymore. > Thus we need to return 1, for root with dead reloc root. > > Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> > Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots") > Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 2/3] btrfs: relocation: Fix KASAN report on create_reloc_tree due to extended reloc tree lifepsan 2019-12-11 5:00 [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports Qu Wenruo 2019-12-11 5:00 ` [PATCH 1/3] btrfs: relocation: Fix a KASAN use-after-free bug due to extended reloc tree lifespan Qu Wenruo @ 2019-12-11 5:00 ` Qu Wenruo 2019-12-11 14:55 ` Josef Bacik 2019-12-11 5:00 ` [PATCH 3/3] btrfs: relocation: Fix a KASAN report on btrfs_reloc_pre_snapshot() due to extended reloc root lifespan Qu Wenruo 2019-12-11 15:34 ` [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports David Sterba 3 siblings, 1 reply; 18+ messages in thread From: Qu Wenruo @ 2019-12-11 5:00 UTC (permalink / raw) To: linux-btrfs; +Cc: Zygo Blaxell [BUG] When running workload with balance start/cancel, snapshot creation/deletion and fsstress, we can hit the following KASAN report: ================================================================== BUG: KASAN: use-after-free in create_reloc_root+0x9f/0x460 [btrfs] Read of size 8 at addr ffff8881571741f0 by task btrfs/3539 CPU: 6 PID: 3539 Comm: btrfs Tainted: G O 5.5.0-rc1-custom+ #40 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack+0xc2/0x11a print_address_description.constprop.0+0x20/0x210 __kasan_report.cold+0x1b/0x41 kasan_report+0x12/0x20 __asan_load8+0x54/0x90 create_reloc_root+0x9f/0x460 [btrfs] btrfs_reloc_post_snapshot+0xff/0x6c0 [btrfs] create_pending_snapshot+0xa9b/0x15f0 [btrfs] create_pending_snapshots+0x111/0x140 [btrfs] btrfs_commit_transaction+0x7a6/0x1360 [btrfs] btrfs_mksubvol+0x915/0x960 [btrfs] btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs] btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs] btrfs_ioctl+0x241b/0x3e60 [btrfs] do_vfs_ioctl+0x831/0xb10 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x43/0x50 do_syscall_64+0x79/0xe0 entry_SYSCALL_64_after_hwframe+0x49/0xbe [CAUSE] This is another case where root->reloc_root is accessed without checking if the reloc root is already dead. [FIX] Also check DEAD_RELOC_TREE bit before accessing root->reloc_root. Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots") Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/relocation.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index bb41b981e493..619ccb183515 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -4755,7 +4755,14 @@ int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans, struct reloc_control *rc = root->fs_info->reloc_ctl; int ret; - if (!root->reloc_root || !rc) + /* + * We don't need to use reloc tree if: + * - No reloc tree + * - Relocation not running + * - Reloc tree already merged + */ + if (!root->reloc_root || !rc || test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, + &root->state)) return 0; rc = root->fs_info->reloc_ctl; -- 2.24.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] btrfs: relocation: Fix KASAN report on create_reloc_tree due to extended reloc tree lifepsan 2019-12-11 5:00 ` [PATCH 2/3] btrfs: relocation: Fix KASAN report on create_reloc_tree due to extended reloc tree lifepsan Qu Wenruo @ 2019-12-11 14:55 ` Josef Bacik 2019-12-11 15:15 ` David Sterba 0 siblings, 1 reply; 18+ messages in thread From: Josef Bacik @ 2019-12-11 14:55 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs; +Cc: Zygo Blaxell On 12/11/19 12:00 AM, Qu Wenruo wrote: > [BUG] > When running workload with balance start/cancel, snapshot > creation/deletion and fsstress, we can hit the following KASAN report: > > ================================================================== > BUG: KASAN: use-after-free in create_reloc_root+0x9f/0x460 [btrfs] > Read of size 8 at addr ffff8881571741f0 by task btrfs/3539 > > CPU: 6 PID: 3539 Comm: btrfs Tainted: G O 5.5.0-rc1-custom+ #40 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 > Call Trace: > dump_stack+0xc2/0x11a > print_address_description.constprop.0+0x20/0x210 > __kasan_report.cold+0x1b/0x41 > kasan_report+0x12/0x20 > __asan_load8+0x54/0x90 > create_reloc_root+0x9f/0x460 [btrfs] > btrfs_reloc_post_snapshot+0xff/0x6c0 [btrfs] > create_pending_snapshot+0xa9b/0x15f0 [btrfs] > create_pending_snapshots+0x111/0x140 [btrfs] > btrfs_commit_transaction+0x7a6/0x1360 [btrfs] > btrfs_mksubvol+0x915/0x960 [btrfs] > btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs] > btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs] > btrfs_ioctl+0x241b/0x3e60 [btrfs] > do_vfs_ioctl+0x831/0xb10 > ksys_ioctl+0x67/0x90 > __x64_sys_ioctl+0x43/0x50 > do_syscall_64+0x79/0xe0 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > [CAUSE] > This is another case where root->reloc_root is accessed without checking > if the reloc root is already dead. > > [FIX] > Also check DEAD_RELOC_TREE bit before accessing root->reloc_root. > > Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> > Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots") > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/relocation.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c > index bb41b981e493..619ccb183515 100644 > --- a/fs/btrfs/relocation.c > +++ b/fs/btrfs/relocation.c > @@ -4755,7 +4755,14 @@ int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans, > struct reloc_control *rc = root->fs_info->reloc_ctl; > int ret; > > - if (!root->reloc_root || !rc) > + /* > + * We don't need to use reloc tree if: > + * - No reloc tree > + * - Relocation not running > + * - Reloc tree already merged > + */ > + if (!root->reloc_root || !rc || test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, > + &root->state)) This is awkward formatting, can we move the test_bit() to the first thing we check so it's less weird? Then you can add Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 2/3] btrfs: relocation: Fix KASAN report on create_reloc_tree due to extended reloc tree lifepsan 2019-12-11 14:55 ` Josef Bacik @ 2019-12-11 15:15 ` David Sterba 0 siblings, 0 replies; 18+ messages in thread From: David Sterba @ 2019-12-11 15:15 UTC (permalink / raw) To: Josef Bacik; +Cc: Qu Wenruo, linux-btrfs, Zygo Blaxell On Wed, Dec 11, 2019 at 09:55:04AM -0500, Josef Bacik wrote: > > + /* > > + * We don't need to use reloc tree if: > > + * - No reloc tree > > + * - Relocation not running > > + * - Reloc tree already merged > > + */ > > + if (!root->reloc_root || !rc || test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, > > + &root->state)) > > This is awkward formatting, can we move the test_bit() to the first thing we > check so it's less weird? Then you can add I had the same thought, will move the test_bit on the next line. ^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 3/3] btrfs: relocation: Fix a KASAN report on btrfs_reloc_pre_snapshot() due to extended reloc root lifespan 2019-12-11 5:00 [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports Qu Wenruo 2019-12-11 5:00 ` [PATCH 1/3] btrfs: relocation: Fix a KASAN use-after-free bug due to extended reloc tree lifespan Qu Wenruo 2019-12-11 5:00 ` [PATCH 2/3] btrfs: relocation: Fix KASAN report on create_reloc_tree due to extended reloc tree lifepsan Qu Wenruo @ 2019-12-11 5:00 ` Qu Wenruo 2019-12-11 14:55 ` Josef Bacik 2019-12-11 15:34 ` [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports David Sterba 3 siblings, 1 reply; 18+ messages in thread From: Qu Wenruo @ 2019-12-11 5:00 UTC (permalink / raw) To: linux-btrfs; +Cc: Zygo Blaxell [BUG] When running workload with balance start/cancel, snapshot creation/deletion and fsstress, we can hit the following KASAN report: ================================================================== BUG: KASAN: use-after-free in btrfs_reloc_pre_snapshot+0x85/0xc0 [btrfs] Read of size 4 at addr ffff888157140100 by task btrfs/1822 CPU: 2 PID: 1822 Comm: btrfs Tainted: G O 5.5.0-rc1-custom+ #40 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack+0xc2/0x11a print_address_description.constprop.0+0x20/0x210 __kasan_report.cold+0x1b/0x41 kasan_report+0x12/0x20 check_memory_region+0x13c/0x1b0 __asan_loadN+0xf/0x20 btrfs_reloc_pre_snapshot+0x85/0xc0 [btrfs] create_pending_snapshot+0x209/0x15f0 [btrfs] create_pending_snapshots+0x111/0x140 [btrfs] btrfs_commit_transaction+0x7a6/0x1360 [btrfs] btrfs_mksubvol+0x915/0x960 [btrfs] btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs] btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs] btrfs_ioctl+0x241b/0x3e60 [btrfs] do_vfs_ioctl+0x831/0xb10 ksys_ioctl+0x67/0x90 __x64_sys_ioctl+0x43/0x50 do_syscall_64+0x79/0xe0 entry_SYSCALL_64_after_hwframe+0x49/0xbe [CAUSE] Again, we try to access root->reloc_root without verify if that reloc root is already dead (being freed). [FIX] Check if the root has dead reloc root before accessing it. Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots") Signed-off-by: Qu Wenruo <wqu@suse.com> --- fs/btrfs/relocation.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index 619ccb183515..2d2cd1596ec9 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -4721,7 +4721,8 @@ void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending, struct btrfs_root *root = pending->root; struct reloc_control *rc = root->fs_info->reloc_ctl; - if (!root->reloc_root || !rc) + if (!root->reloc_root || !rc || test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, + &root->state)) return; if (!rc->merge_reloc_tree) -- 2.24.0 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 3/3] btrfs: relocation: Fix a KASAN report on btrfs_reloc_pre_snapshot() due to extended reloc root lifespan 2019-12-11 5:00 ` [PATCH 3/3] btrfs: relocation: Fix a KASAN report on btrfs_reloc_pre_snapshot() due to extended reloc root lifespan Qu Wenruo @ 2019-12-11 14:55 ` Josef Bacik 0 siblings, 0 replies; 18+ messages in thread From: Josef Bacik @ 2019-12-11 14:55 UTC (permalink / raw) To: Qu Wenruo, linux-btrfs; +Cc: Zygo Blaxell On 12/11/19 12:00 AM, Qu Wenruo wrote: > [BUG] > When running workload with balance start/cancel, snapshot > creation/deletion and fsstress, we can hit the following KASAN report: > ================================================================== > BUG: KASAN: use-after-free in btrfs_reloc_pre_snapshot+0x85/0xc0 [btrfs] > Read of size 4 at addr ffff888157140100 by task btrfs/1822 > > CPU: 2 PID: 1822 Comm: btrfs Tainted: G O 5.5.0-rc1-custom+ #40 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 > Call Trace: > dump_stack+0xc2/0x11a > print_address_description.constprop.0+0x20/0x210 > __kasan_report.cold+0x1b/0x41 > kasan_report+0x12/0x20 > check_memory_region+0x13c/0x1b0 > __asan_loadN+0xf/0x20 > btrfs_reloc_pre_snapshot+0x85/0xc0 [btrfs] > create_pending_snapshot+0x209/0x15f0 [btrfs] > create_pending_snapshots+0x111/0x140 [btrfs] > btrfs_commit_transaction+0x7a6/0x1360 [btrfs] > btrfs_mksubvol+0x915/0x960 [btrfs] > btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs] > btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs] > btrfs_ioctl+0x241b/0x3e60 [btrfs] > do_vfs_ioctl+0x831/0xb10 > ksys_ioctl+0x67/0x90 > __x64_sys_ioctl+0x43/0x50 > do_syscall_64+0x79/0xe0 > entry_SYSCALL_64_after_hwframe+0x49/0xbe > > [CAUSE] > Again, we try to access root->reloc_root without verify if that reloc > root is already dead (being freed). > > [FIX] > Check if the root has dead reloc root before accessing it. > > Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org> > Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots") > Signed-off-by: Qu Wenruo <wqu@suse.com> > --- > fs/btrfs/relocation.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c > index 619ccb183515..2d2cd1596ec9 100644 > --- a/fs/btrfs/relocation.c > +++ b/fs/btrfs/relocation.c > @@ -4721,7 +4721,8 @@ void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending, > struct btrfs_root *root = pending->root; > struct reloc_control *rc = root->fs_info->reloc_ctl; > > - if (!root->reloc_root || !rc) > + if (!root->reloc_root || !rc || test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, > + &root->state)) > return; > Same comment here as the other one, you can add Reviewed-by: Josef Bacik <josef@toxicpanda.com> Once it's fixed. Thanks, Josef ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports 2019-12-11 5:00 [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports Qu Wenruo ` (2 preceding siblings ...) 2019-12-11 5:00 ` [PATCH 3/3] btrfs: relocation: Fix a KASAN report on btrfs_reloc_pre_snapshot() due to extended reloc root lifespan Qu Wenruo @ 2019-12-11 15:34 ` David Sterba 2019-12-12 0:39 ` Qu Wenruo 3 siblings, 1 reply; 18+ messages in thread From: David Sterba @ 2019-12-11 15:34 UTC (permalink / raw) To: Qu Wenruo; +Cc: linux-btrfs On Wed, Dec 11, 2019 at 01:00:01PM +0800, Qu Wenruo wrote: > Due to commit d2311e698578 ("btrfs: relocation: Delay reloc tree > deletion after merge_reloc_roots"), reloc tree lifespan is extended. > > Although we always set root->reloc_root to NULL before we drop the reloc > tree, but that's not multi-core safe since we have no proper memory > barrier to ensure other cores can see the same root->reloc_root. > > The proper root fix should be some proper root refcount, and make > btrfs_drop_snapshot() to wait for all other root owner to release the > root before dropping it. This would block cleaning deleted subvolumes, no? We can skip the dead tree (and add it back to the list) in that can and not wait. The cleaner thread is able to process the list repeatedly. > But for now, let's just check the DEAD_RELOC_ROOT bit before accessing > root->reloc_root. Ok, the bit is safe way to sync that as long as the correct order of setting/clearing is done. The ops are atomic wrt to the value itself but need barriers around as they're simple atomic ops (not RMW, according to Documentation/atomic_bitops.txt) and there's no outer synchronization. Check: smp_mb__before_atomic(); if (test_bit() ...) return; Set: set_bit() smp_mb__after_atomic(); (delete reloc_root) reloc_root = NULL Clearing of the bit is done when there are not potential other users so that part does not need the barrier (I think). The checking part could use a helper so we don't have barriers scattered around code. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports 2019-12-11 15:34 ` [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports David Sterba @ 2019-12-12 0:39 ` Qu Wenruo 2019-12-12 14:28 ` David Sterba 2020-01-03 15:52 ` David Sterba 0 siblings, 2 replies; 18+ messages in thread From: Qu Wenruo @ 2019-12-12 0:39 UTC (permalink / raw) To: dsterba, Qu Wenruo, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 2780 bytes --] On 2019/12/11 下午11:34, David Sterba wrote: > On Wed, Dec 11, 2019 at 01:00:01PM +0800, Qu Wenruo wrote: >> Due to commit d2311e698578 ("btrfs: relocation: Delay reloc tree >> deletion after merge_reloc_roots"), reloc tree lifespan is extended. >> >> Although we always set root->reloc_root to NULL before we drop the reloc >> tree, but that's not multi-core safe since we have no proper memory >> barrier to ensure other cores can see the same root->reloc_root. >> >> The proper root fix should be some proper root refcount, and make >> btrfs_drop_snapshot() to wait for all other root owner to release the >> root before dropping it. > > This would block cleaning deleted subvolumes, no? We can skip the dead > tree (and add it back to the list) in that can and not wait. The > cleaner thread is able to process the list repeatedly. What I mean is: - For consumer (reading root->reloc_root) spin_lock(&root->reloc_lock); if (!root->reloc_root) { spin_unlock(&root->reloc_lock); return NULL } refcount_inc(&root->reloc_root->refcount); return(root->reloc_root); spin_unlock(&root->reloc_lock); And of cource, release it after grabbing reloc_root. - For cleaner grab reloc_root just like consumer. retry: wait_event(refcount_read(&root->reloc_root->ref_count) == 1); spin_lock(&root->reloc_lock); if (&root->reloc_root->ref_count != 1){ spin_unlock(); goto retry; } root->reloc_root = NULL; spin_unlock(&root->reloc_lock); /* Now we're the only owner, delete the root */ > >> But for now, let's just check the DEAD_RELOC_ROOT bit before accessing >> root->reloc_root. > > Ok, the bit is safe way to sync that as long as the correct order of > setting/clearing is done. The ops are atomic wrt to the value itself > but need barriers around as they're simple atomic ops (not RMW, > according to Documentation/atomic_bitops.txt) and there's no outer > synchronization. > > Check: > > smp_mb__before_atomic(); > if (test_bit() ...) > return; > > Set: > > set_bit() > smp_mb__after_atomic(); > (delete reloc_root) > reloc_root = NULL > > Clearing of the bit is done when there are not potential other users so > that part does not need the barrier (I think). > > The checking part could use a helper so we don't have barriers scattered > around code. > I'm still not confident enough for the "reloc_root = NULL" assignment and "reloc_root == NULL" test. But since the set_bit()/test_bit() is safe, and it happens before we modify reloc_root, it's safer and is what we used in this quick fix. Still, I'm really looking forward to Josef's root refcount work, that should be the real fix for all the problems. Thanks, Qu [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports 2019-12-12 0:39 ` Qu Wenruo @ 2019-12-12 14:28 ` David Sterba 2020-01-03 15:52 ` David Sterba 1 sibling, 0 replies; 18+ messages in thread From: David Sterba @ 2019-12-12 14:28 UTC (permalink / raw) To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs On Thu, Dec 12, 2019 at 08:39:43AM +0800, Qu Wenruo wrote: > > > On 2019/12/11 下午11:34, David Sterba wrote: > > On Wed, Dec 11, 2019 at 01:00:01PM +0800, Qu Wenruo wrote: > >> Due to commit d2311e698578 ("btrfs: relocation: Delay reloc tree > >> deletion after merge_reloc_roots"), reloc tree lifespan is extended. > >> > >> Although we always set root->reloc_root to NULL before we drop the reloc > >> tree, but that's not multi-core safe since we have no proper memory > >> barrier to ensure other cores can see the same root->reloc_root. > >> > >> The proper root fix should be some proper root refcount, and make > >> btrfs_drop_snapshot() to wait for all other root owner to release the > >> root before dropping it. > > > > This would block cleaning deleted subvolumes, no? We can skip the dead > > tree (and add it back to the list) in that can and not wait. The > > cleaner thread is able to process the list repeatedly. > > What I mean is: > - For consumer (reading root->reloc_root) > spin_lock(&root->reloc_lock); > if (!root->reloc_root) { > spin_unlock(&root->reloc_lock); > return NULL > } > refcount_inc(&root->reloc_root->refcount); > return(root->reloc_root); > spin_unlock(&root->reloc_lock); > > And of cource, release it after grabbing reloc_root. > > - For cleaner > grab reloc_root just like consumer. > retry: > wait_event(refcount_read(&root->reloc_root->ref_count) == 1); > spin_lock(&root->reloc_lock); > if (&root->reloc_root->ref_count != 1){ > spin_unlock(); goto retry; > } > root->reloc_root = NULL; > spin_unlock(&root->reloc_lock); > /* Now we're the only owner, delete the root */ The spinlock should be safe as well, do you mean to take it to verify that reloc_root is valid everywhere? > > Clearing of the bit is done when there are not potential other users so > > that part does not need the barrier (I think). > > > > The checking part could use a helper so we don't have barriers scattered > > around code. > > > I'm still not confident enough for the "reloc_root = NULL" assignment > and "reloc_root == NULL" test. > > But since the set_bit()/test_bit() is safe, and it happens before we > modify reloc_root, it's safer and is what we used in this quick fix. > > Still, I'm really looking forward to Josef's root refcount work, that > should be the real fix for all the problems. That's a huge series and unsuitable for backports to stable, we need something like your patches first. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports 2019-12-12 0:39 ` Qu Wenruo 2019-12-12 14:28 ` David Sterba @ 2020-01-03 15:52 ` David Sterba 2020-01-03 16:15 ` David Sterba 2020-01-04 1:32 ` Qu Wenruo 1 sibling, 2 replies; 18+ messages in thread From: David Sterba @ 2020-01-03 15:52 UTC (permalink / raw) To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs We need to get this series moving because the bug affects a few stable versions. On Thu, Dec 12, 2019 at 08:39:43AM +0800, Qu Wenruo wrote: > On 2019/12/11 下午11:34, David Sterba wrote: > > On Wed, Dec 11, 2019 at 01:00:01PM +0800, Qu Wenruo wrote: > >> Due to commit d2311e698578 ("btrfs: relocation: Delay reloc tree > >> deletion after merge_reloc_roots"), reloc tree lifespan is extended. > >> > >> Although we always set root->reloc_root to NULL before we drop the reloc > >> tree, but that's not multi-core safe since we have no proper memory > >> barrier to ensure other cores can see the same root->reloc_root. > >> > >> The proper root fix should be some proper root refcount, and make > >> btrfs_drop_snapshot() to wait for all other root owner to release the > >> root before dropping it. > > > > This would block cleaning deleted subvolumes, no? We can skip the dead > > tree (and add it back to the list) in that can and not wait. The > > cleaner thread is able to process the list repeatedly. > > What I mean is: > - For consumer (reading root->reloc_root) > spin_lock(&root->reloc_lock); > if (!root->reloc_root) { > spin_unlock(&root->reloc_lock); > return NULL > } > refcount_inc(&root->reloc_root->refcount); > return(root->reloc_root); > spin_unlock(&root->reloc_lock); > > And of cource, release it after grabbing reloc_root. > > - For cleaner > grab reloc_root just like consumer. > retry: > wait_event(refcount_read(&root->reloc_root->ref_count) == 1); > spin_lock(&root->reloc_lock); > if (&root->reloc_root->ref_count != 1){ > spin_unlock(); goto retry; > } > root->reloc_root = NULL; > spin_unlock(&root->reloc_lock); > /* Now we're the only owner, delete the root */ So it's one bit vs refcount and a lock. For the backports I'd go with the bit, but this needs the barriers as mentioned in my previous reply. Can you please update the patches? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports 2020-01-03 15:52 ` David Sterba @ 2020-01-03 16:15 ` David Sterba 2020-01-04 9:37 ` Qu Wenruo 2020-01-06 7:04 ` Qu Wenruo 2020-01-04 1:32 ` Qu Wenruo 1 sibling, 2 replies; 18+ messages in thread From: David Sterba @ 2020-01-03 16:15 UTC (permalink / raw) To: dsterba, Qu Wenruo, Qu Wenruo, linux-btrfs On Fri, Jan 03, 2020 at 04:52:59PM +0100, David Sterba wrote: > So it's one bit vs refcount and a lock. For the backports I'd go with > the bit, but this needs the barriers as mentioned in my previous reply. > Can you please update the patches? The idea is in the diff below (compile tested only). I found one more case that was not addressed by your patches, it's in btrfs_update_reloc_root. Given that the type of the fix is the same, I'd rather do that in one patch. The reported stack traces are more or less the same. diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c index af4dd49a71c7..aeba3a7506e1 100644 --- a/fs/btrfs/relocation.c +++ b/fs/btrfs/relocation.c @@ -517,6 +517,15 @@ static int update_backref_cache(struct btrfs_trans_handle *trans, return 1; } +static bool have_reloc_root(struct btrfs_root *root) +{ + smp_mb__before_atomic(); + if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state)) + return false; + if (!root->reloc_root) + return false; + return true; +} static int should_ignore_root(struct btrfs_root *root) { @@ -525,9 +534,9 @@ static int should_ignore_root(struct btrfs_root *root) if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state)) return 0; - reloc_root = root->reloc_root; - if (!reloc_root) + if (!have_reloc_root(root)) return 0; + reloc_root = root->reloc_root; if (btrfs_root_last_snapshot(&reloc_root->root_item) == root->fs_info->running_transaction->transid - 1) @@ -1439,6 +1448,7 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans, * The subvolume has reloc tree but the swap is finished, no need to * create/update the dead reloc tree */ + smp_mb__before_atomic(); if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state)) return 0; @@ -1478,8 +1488,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans, struct btrfs_root_item *root_item; int ret; - if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state) || - !root->reloc_root) + if (!have_reloc_root(root)) goto out; reloc_root = root->reloc_root; @@ -1489,6 +1498,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans, if (fs_info->reloc_ctl->merge_reloc_tree && btrfs_root_refs(root_item) == 0) { set_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state); + smp_mb__after_atomic(); __del_reloc_root(reloc_root); } @@ -2201,6 +2211,7 @@ static int clean_dirty_subvols(struct reloc_control *rc) if (ret2 < 0 && !ret) ret = ret2; } + smp_mb__before_atomic(); clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state); btrfs_put_fs_root(root); } else { @@ -4730,7 +4741,7 @@ void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending, struct btrfs_root *root = pending->root; struct reloc_control *rc = root->fs_info->reloc_ctl; - if (!root->reloc_root || !rc) + if (!rc || !have_reloc_root(root)) return; if (!rc->merge_reloc_tree) @@ -4764,7 +4775,7 @@ int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans, struct reloc_control *rc = root->fs_info->reloc_ctl; int ret; - if (!root->reloc_root || !rc) + if (!rc || !have_reloc_root(root)) return 0; rc = root->fs_info->reloc_ctl; ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports 2020-01-03 16:15 ` David Sterba @ 2020-01-04 9:37 ` Qu Wenruo 2020-01-04 13:18 ` Qu Wenruo 2020-01-06 7:04 ` Qu Wenruo 1 sibling, 1 reply; 18+ messages in thread From: Qu Wenruo @ 2020-01-04 9:37 UTC (permalink / raw) To: dsterba, Qu Wenruo, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 4060 bytes --] On 2020/1/4 上午12:15, David Sterba wrote: > On Fri, Jan 03, 2020 at 04:52:59PM +0100, David Sterba wrote: >> So it's one bit vs refcount and a lock. For the backports I'd go with >> the bit, but this needs the barriers as mentioned in my previous reply. >> Can you please update the patches? > > The idea is in the diff below (compile tested only). I found one more > case that was not addressed by your patches, it's in > btrfs_update_reloc_root. But fix in btrfs_update_reloc_root() is already included in commit d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots"). Or would you mind to share more details about the missing check? > > Given that the type of the fix is the same, I'd rather do that in one > patch. The reported stack traces are more or less the same. To merge them into patch set is no problem, and should make backports a little easier. But I still didn't understand the barrier part. If we're relying on that bit operation before accessing reloc_root, it should be safe enough, even without memory barrier. Would you please explain a little more? Thanks, Qu > > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c > index af4dd49a71c7..aeba3a7506e1 100644 > --- a/fs/btrfs/relocation.c > +++ b/fs/btrfs/relocation.c > @@ -517,6 +517,15 @@ static int update_backref_cache(struct btrfs_trans_handle *trans, > return 1; > } > > +static bool have_reloc_root(struct btrfs_root *root) > +{ > + smp_mb__before_atomic(); > + if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state)) > + return false; > + if (!root->reloc_root) > + return false; > + return true; > +} > > static int should_ignore_root(struct btrfs_root *root) > { > @@ -525,9 +534,9 @@ static int should_ignore_root(struct btrfs_root *root) > if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state)) > return 0; > > - reloc_root = root->reloc_root; > - if (!reloc_root) > + if (!have_reloc_root(root)) > return 0; > + reloc_root = root->reloc_root; > > if (btrfs_root_last_snapshot(&reloc_root->root_item) == > root->fs_info->running_transaction->transid - 1) > @@ -1439,6 +1448,7 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans, > * The subvolume has reloc tree but the swap is finished, no need to > * create/update the dead reloc tree > */ > + smp_mb__before_atomic(); > if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state)) > return 0; > > @@ -1478,8 +1488,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans, > struct btrfs_root_item *root_item; > int ret; > > - if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state) || > - !root->reloc_root) > + if (!have_reloc_root(root)) > goto out; > > reloc_root = root->reloc_root; > @@ -1489,6 +1498,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans, > if (fs_info->reloc_ctl->merge_reloc_tree && > btrfs_root_refs(root_item) == 0) { > set_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state); > + smp_mb__after_atomic(); > __del_reloc_root(reloc_root); > } > > @@ -2201,6 +2211,7 @@ static int clean_dirty_subvols(struct reloc_control *rc) > if (ret2 < 0 && !ret) > ret = ret2; > } > + smp_mb__before_atomic(); > clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state); > btrfs_put_fs_root(root); > } else { > @@ -4730,7 +4741,7 @@ void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending, > struct btrfs_root *root = pending->root; > struct reloc_control *rc = root->fs_info->reloc_ctl; > > - if (!root->reloc_root || !rc) > + if (!rc || !have_reloc_root(root)) > return; > > if (!rc->merge_reloc_tree) > @@ -4764,7 +4775,7 @@ int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans, > struct reloc_control *rc = root->fs_info->reloc_ctl; > int ret; > > - if (!root->reloc_root || !rc) > + if (!rc || !have_reloc_root(root)) > return 0; > > rc = root->fs_info->reloc_ctl; > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports 2020-01-04 9:37 ` Qu Wenruo @ 2020-01-04 13:18 ` Qu Wenruo 0 siblings, 0 replies; 18+ messages in thread From: Qu Wenruo @ 2020-01-04 13:18 UTC (permalink / raw) To: Qu Wenruo, dsterba, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 4417 bytes --] On 2020/1/4 下午5:37, Qu Wenruo wrote: > > > On 2020/1/4 上午12:15, David Sterba wrote: >> On Fri, Jan 03, 2020 at 04:52:59PM +0100, David Sterba wrote: >>> So it's one bit vs refcount and a lock. For the backports I'd go with >>> the bit, but this needs the barriers as mentioned in my previous reply. >>> Can you please update the patches? >> >> The idea is in the diff below (compile tested only). I found one more >> case that was not addressed by your patches, it's in >> btrfs_update_reloc_root. > > But fix in btrfs_update_reloc_root() is already included in commit > d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after > merge_reloc_roots"). > > Or would you mind to share more details about the missing check? >> >> Given that the type of the fix is the same, I'd rather do that in one >> patch. The reported stack traces are more or less the same. > > To merge them into patch set is no problem, and should make backports a > little easier. > > But I still didn't understand the barrier part. > If we're relying on that bit operation before accessing reloc_root, it > should be safe enough, even without memory barrier. My bad, set_bit() and test_bit() themselves doesn't imply memory barrier. (I always thought the opposite) I'll put memory barriers in next version. Thanks, Qu > > Would you please explain a little more? > > Thanks, > Qu >> >> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c >> index af4dd49a71c7..aeba3a7506e1 100644 >> --- a/fs/btrfs/relocation.c >> +++ b/fs/btrfs/relocation.c >> @@ -517,6 +517,15 @@ static int update_backref_cache(struct btrfs_trans_handle *trans, >> return 1; >> } >> >> +static bool have_reloc_root(struct btrfs_root *root) >> +{ >> + smp_mb__before_atomic(); >> + if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state)) >> + return false; >> + if (!root->reloc_root) >> + return false; >> + return true; >> +} >> >> static int should_ignore_root(struct btrfs_root *root) >> { >> @@ -525,9 +534,9 @@ static int should_ignore_root(struct btrfs_root *root) >> if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state)) >> return 0; >> >> - reloc_root = root->reloc_root; >> - if (!reloc_root) >> + if (!have_reloc_root(root)) >> return 0; >> + reloc_root = root->reloc_root; >> >> if (btrfs_root_last_snapshot(&reloc_root->root_item) == >> root->fs_info->running_transaction->transid - 1) >> @@ -1439,6 +1448,7 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans, >> * The subvolume has reloc tree but the swap is finished, no need to >> * create/update the dead reloc tree >> */ >> + smp_mb__before_atomic(); >> if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state)) >> return 0; >> >> @@ -1478,8 +1488,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans, >> struct btrfs_root_item *root_item; >> int ret; >> >> - if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state) || >> - !root->reloc_root) >> + if (!have_reloc_root(root)) >> goto out; >> >> reloc_root = root->reloc_root; >> @@ -1489,6 +1498,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans, >> if (fs_info->reloc_ctl->merge_reloc_tree && >> btrfs_root_refs(root_item) == 0) { >> set_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state); >> + smp_mb__after_atomic(); >> __del_reloc_root(reloc_root); >> } >> >> @@ -2201,6 +2211,7 @@ static int clean_dirty_subvols(struct reloc_control *rc) >> if (ret2 < 0 && !ret) >> ret = ret2; >> } >> + smp_mb__before_atomic(); >> clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state); >> btrfs_put_fs_root(root); >> } else { >> @@ -4730,7 +4741,7 @@ void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending, >> struct btrfs_root *root = pending->root; >> struct reloc_control *rc = root->fs_info->reloc_ctl; >> >> - if (!root->reloc_root || !rc) >> + if (!rc || !have_reloc_root(root)) >> return; >> >> if (!rc->merge_reloc_tree) >> @@ -4764,7 +4775,7 @@ int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans, >> struct reloc_control *rc = root->fs_info->reloc_ctl; >> int ret; >> >> - if (!root->reloc_root || !rc) >> + if (!rc || !have_reloc_root(root)) >> return 0; >> >> rc = root->fs_info->reloc_ctl; >> > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports 2020-01-03 16:15 ` David Sterba 2020-01-04 9:37 ` Qu Wenruo @ 2020-01-06 7:04 ` Qu Wenruo 2020-01-06 18:23 ` David Sterba 1 sibling, 1 reply; 18+ messages in thread From: Qu Wenruo @ 2020-01-06 7:04 UTC (permalink / raw) To: dsterba, Qu Wenruo, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 2843 bytes --] On 2020/1/4 上午12:15, David Sterba wrote: > On Fri, Jan 03, 2020 at 04:52:59PM +0100, David Sterba wrote: >> So it's one bit vs refcount and a lock. For the backports I'd go with >> the bit, but this needs the barriers as mentioned in my previous reply. >> Can you please update the patches? > > The idea is in the diff below (compile tested only). I found one more > case that was not addressed by your patches, it's in > btrfs_update_reloc_root. > > Given that the type of the fix is the same, I'd rather do that in one > patch. The reported stack traces are more or less the same. > > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c > index af4dd49a71c7..aeba3a7506e1 100644 > --- a/fs/btrfs/relocation.c > +++ b/fs/btrfs/relocation.c > @@ -517,6 +517,15 @@ static int update_backref_cache(struct btrfs_trans_handle *trans, > return 1; > } > > +static bool have_reloc_root(struct btrfs_root *root) > +{ > + smp_mb__before_atomic(); Mind to explain why the before_atomic() is needed? Is it just paired with smp_mb__after_atomic() for the set_bit()/clear_bit() part? > reloc_root = root->reloc_root; > @@ -1489,6 +1498,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans, > if (fs_info->reloc_ctl->merge_reloc_tree && > btrfs_root_refs(root_item) == 0) { > set_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state); > + smp_mb__after_atomic(); I get the point here, to make sure all other users see this bit change. > __del_reloc_root(reloc_root); Interestingly in that function we immediately triggers spin_lock() which implies memory barrier. (Not an excuse to skip memory barrier anyway) > } > > @@ -2201,6 +2211,7 @@ static int clean_dirty_subvols(struct reloc_control *rc) > if (ret2 < 0 && !ret) > ret = ret2; > } > + smp_mb__before_atomic(); > clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state); I guess this should be a smp_mb__after_atomic(); > btrfs_put_fs_root(root); And btrfs_put_fs_root() triggers a release memory ordering. So it looks memory order is not completely screwed up before, completely by pure luck... Thanks, Qu > } else { > @@ -4730,7 +4741,7 @@ void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending, > struct btrfs_root *root = pending->root; > struct reloc_control *rc = root->fs_info->reloc_ctl; > > - if (!root->reloc_root || !rc) > + if (!rc || !have_reloc_root(root)) > return; > > if (!rc->merge_reloc_tree) > @@ -4764,7 +4775,7 @@ int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans, > struct reloc_control *rc = root->fs_info->reloc_ctl; > int ret; > > - if (!root->reloc_root || !rc) > + if (!rc || !have_reloc_root(root)) > return 0; > > rc = root->fs_info->reloc_ctl; > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 520 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports 2020-01-06 7:04 ` Qu Wenruo @ 2020-01-06 18:23 ` David Sterba 0 siblings, 0 replies; 18+ messages in thread From: David Sterba @ 2020-01-06 18:23 UTC (permalink / raw) To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs On Mon, Jan 06, 2020 at 03:04:32PM +0800, Qu Wenruo wrote: > On 2020/1/4 上午12:15, David Sterba wrote: > > On Fri, Jan 03, 2020 at 04:52:59PM +0100, David Sterba wrote: > >> So it's one bit vs refcount and a lock. For the backports I'd go with > >> the bit, but this needs the barriers as mentioned in my previous reply. > >> Can you please update the patches? > > > > The idea is in the diff below (compile tested only). I found one more > > case that was not addressed by your patches, it's in > > btrfs_update_reloc_root. > > > > Given that the type of the fix is the same, I'd rather do that in one > > patch. The reported stack traces are more or less the same. > > > > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c > > index af4dd49a71c7..aeba3a7506e1 100644 > > --- a/fs/btrfs/relocation.c > > +++ b/fs/btrfs/relocation.c > > @@ -517,6 +517,15 @@ static int update_backref_cache(struct btrfs_trans_handle *trans, > > return 1; > > } > > > > +static bool have_reloc_root(struct btrfs_root *root) > > +{ > > + smp_mb__before_atomic(); > > Mind to explain why the before_atomic() is needed? > > Is it just paired with smp_mb__after_atomic() for the > set_bit()/clear_bit() part? Yes. The reading part of a barrier must flush any pending state, then read it. > > reloc_root = root->reloc_root; > > @@ -1489,6 +1498,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans, > > if (fs_info->reloc_ctl->merge_reloc_tree && > > btrfs_root_refs(root_item) == 0) { > > set_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state); > > + smp_mb__after_atomic(); > > I get the point here, to make sure all other users see this bit change. > > > __del_reloc_root(reloc_root); > > Interestingly in that function we immediately triggers spin_lock() which > implies memory barrier. > (Not an excuse to skip memory barrier anyway) Beware that spin_lock and spin_unlock are only half barriers. Full barrier is implied by unlock/lock sequence. > > > } > > > > @@ -2201,6 +2211,7 @@ static int clean_dirty_subvols(struct reloc_control *rc) > > if (ret2 < 0 && !ret) > > ret = ret2; > > } > > + smp_mb__before_atomic(); > > clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state); > > I guess this should be a smp_mb__after_atomic(); No, we want everything that happens before the clear bit to be stored before the bit is cleared. IOW cleared bit must not be seen before all the previous updates are done. > > > btrfs_put_fs_root(root); > > And btrfs_put_fs_root() triggers a release memory ordering. But it's too late. > So it looks memory order is not completely screwed up before, completely > by pure luck... Well, no :) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports 2020-01-03 15:52 ` David Sterba 2020-01-03 16:15 ` David Sterba @ 2020-01-04 1:32 ` Qu Wenruo 1 sibling, 0 replies; 18+ messages in thread From: Qu Wenruo @ 2020-01-04 1:32 UTC (permalink / raw) To: dsterba, Qu Wenruo, linux-btrfs [-- Attachment #1.1: Type: text/plain, Size: 2115 bytes --] On 2020/1/3 下午11:52, David Sterba wrote: > We need to get this series moving because the bug affects a few stable > versions. > > On Thu, Dec 12, 2019 at 08:39:43AM +0800, Qu Wenruo wrote: >> On 2019/12/11 下午11:34, David Sterba wrote: >>> On Wed, Dec 11, 2019 at 01:00:01PM +0800, Qu Wenruo wrote: >>>> Due to commit d2311e698578 ("btrfs: relocation: Delay reloc tree >>>> deletion after merge_reloc_roots"), reloc tree lifespan is extended. >>>> >>>> Although we always set root->reloc_root to NULL before we drop the reloc >>>> tree, but that's not multi-core safe since we have no proper memory >>>> barrier to ensure other cores can see the same root->reloc_root. >>>> >>>> The proper root fix should be some proper root refcount, and make >>>> btrfs_drop_snapshot() to wait for all other root owner to release the >>>> root before dropping it. >>> >>> This would block cleaning deleted subvolumes, no? We can skip the dead >>> tree (and add it back to the list) in that can and not wait. The >>> cleaner thread is able to process the list repeatedly. >> >> What I mean is: >> - For consumer (reading root->reloc_root) >> spin_lock(&root->reloc_lock); >> if (!root->reloc_root) { >> spin_unlock(&root->reloc_lock); >> return NULL >> } >> refcount_inc(&root->reloc_root->refcount); >> return(root->reloc_root); >> spin_unlock(&root->reloc_lock); >> >> And of cource, release it after grabbing reloc_root. >> >> - For cleaner >> grab reloc_root just like consumer. >> retry: >> wait_event(refcount_read(&root->reloc_root->ref_count) == 1); >> spin_lock(&root->reloc_lock); >> if (&root->reloc_root->ref_count != 1){ >> spin_unlock(); goto retry; >> } >> root->reloc_root = NULL; >> spin_unlock(&root->reloc_lock); >> /* Now we're the only owner, delete the root */ > > So it's one bit vs refcount and a lock. For the backports I'd go with > the bit, but this needs the barriers as mentioned in my previous reply. > Can you please update the patches? > Sure, I'll update them soon. Thanks, Qu [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2020-01-06 18:23 UTC | newest] Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-12-11 5:00 [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports Qu Wenruo 2019-12-11 5:00 ` [PATCH 1/3] btrfs: relocation: Fix a KASAN use-after-free bug due to extended reloc tree lifespan Qu Wenruo 2019-12-11 14:53 ` Josef Bacik 2019-12-11 5:00 ` [PATCH 2/3] btrfs: relocation: Fix KASAN report on create_reloc_tree due to extended reloc tree lifepsan Qu Wenruo 2019-12-11 14:55 ` Josef Bacik 2019-12-11 15:15 ` David Sterba 2019-12-11 5:00 ` [PATCH 3/3] btrfs: relocation: Fix a KASAN report on btrfs_reloc_pre_snapshot() due to extended reloc root lifespan Qu Wenruo 2019-12-11 14:55 ` Josef Bacik 2019-12-11 15:34 ` [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports David Sterba 2019-12-12 0:39 ` Qu Wenruo 2019-12-12 14:28 ` David Sterba 2020-01-03 15:52 ` David Sterba 2020-01-03 16:15 ` David Sterba 2020-01-04 9:37 ` Qu Wenruo 2020-01-04 13:18 ` Qu Wenruo 2020-01-06 7:04 ` Qu Wenruo 2020-01-06 18:23 ` David Sterba 2020-01-04 1:32 ` Qu Wenruo
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.