All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports
@ 2019-12-11  5:00 Qu Wenruo
  2019-12-11  5:00 ` [PATCH 1/3] btrfs: relocation: Fix a KASAN use-after-free bug due to extended reloc tree lifespan Qu Wenruo
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Qu Wenruo @ 2019-12-11  5:00 UTC (permalink / raw)
  To: linux-btrfs

Due to commit d2311e698578 ("btrfs: relocation: Delay reloc tree
deletion after merge_reloc_roots"), reloc tree lifespan is extended.

Although we always set root->reloc_root to NULL before we drop the reloc
tree, but that's not multi-core safe since we have no proper memory
barrier to ensure other cores can see the same root->reloc_root.

The proper root fix should be some proper root refcount, and make
btrfs_drop_snapshot() to wait for all other root owner to release the
root before dropping it.

But for now, let's just check the DEAD_RELOC_ROOT bit before accessing
root->reloc_root.

Qu Wenruo (3):
  btrfs: relocation: Fix a KASAN use-after-free bug due to extended
    reloc tree lifespan
  btrfs: relocation: Fix KASAN report on create_reloc_tree due to
    extended reloc tree lifepsan
  btrfs: relocation: Fix a KASAN report on btrfs_reloc_pre_snapshot()
    due to extended reloc root lifespan

 fs/btrfs/relocation.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

-- 
2.24.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/3] btrfs: relocation: Fix a KASAN use-after-free bug due to extended reloc tree lifespan
  2019-12-11  5:00 [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports Qu Wenruo
@ 2019-12-11  5:00 ` Qu Wenruo
  2019-12-11 14:53   ` Josef Bacik
  2019-12-11  5:00 ` [PATCH 2/3] btrfs: relocation: Fix KASAN report on create_reloc_tree due to extended reloc tree lifepsan Qu Wenruo
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2019-12-11  5:00 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Zygo Blaxell

[BUG]
When running a workload with balance start/cancel, snapshot creation and
deletion, and fsstress, we can get the following KASAN report:

  ==================================================================
  BUG: KASAN: use-after-free in should_ignore_root+0x54/0xb0 [btrfs]
  Read of size 8 at addr ffff888146e340f0 by task btrfs/1216

  CPU: 6 PID: 1216 Comm: btrfs Tainted: G           O      5.5.0-rc1-custom+ #40
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  Call Trace:
   dump_stack+0xc2/0x11a
   print_address_description.constprop.0+0x20/0x210
   __kasan_report.cold+0x1b/0x41
   kasan_report+0x12/0x20
   check_memory_region+0x13c/0x1b0
   __asan_loadN+0xf/0x20
   should_ignore_root+0x54/0xb0 [btrfs]
   build_backref_tree+0x11af/0x2280 [btrfs]
   relocate_tree_blocks+0x391/0xb80 [btrfs]
   relocate_block_group+0x3e5/0xa00 [btrfs]
   btrfs_relocate_block_group+0x240/0x4d0 [btrfs]
   btrfs_relocate_chunk+0x53/0xf0 [btrfs]
   btrfs_balance+0xc91/0x1840 [btrfs]
   btrfs_ioctl_balance+0x416/0x4e0 [btrfs]
   btrfs_ioctl+0x8af/0x3e60 [btrfs]
   do_vfs_ioctl+0x831/0xb10
   ksys_ioctl+0x67/0x90
   __x64_sys_ioctl+0x43/0x50
   do_syscall_64+0x79/0xe0
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

[CAUSE]
When should_ignore_root() accessing root->reloc_root, the reloc_root can
be being dropped, thus root->reloc_root can be unreliable.

[FIX]
Check DEAD_RELOC_ROOT bit before accessing root->reloc_root.

Furthermore in the context of should_ignore_root(), if the root has
already gone through tree merge, we don't need to trace that root
anymore.
Thus we need to return 1, for root with dead reloc root.

Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/relocation.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index d897a8e5e430..bb41b981e493 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -525,6 +525,10 @@ static int should_ignore_root(struct btrfs_root *root)
 	if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state))
 		return 0;
 
+	/* This root has been merged with its reloc tree, so we can ignore it */
+	if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
+		return 1;
+
 	reloc_root = root->reloc_root;
 	if (!reloc_root)
 		return 0;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/3] btrfs: relocation: Fix KASAN report on create_reloc_tree due to extended reloc tree lifepsan
  2019-12-11  5:00 [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports Qu Wenruo
  2019-12-11  5:00 ` [PATCH 1/3] btrfs: relocation: Fix a KASAN use-after-free bug due to extended reloc tree lifespan Qu Wenruo
@ 2019-12-11  5:00 ` Qu Wenruo
  2019-12-11 14:55   ` Josef Bacik
  2019-12-11  5:00 ` [PATCH 3/3] btrfs: relocation: Fix a KASAN report on btrfs_reloc_pre_snapshot() due to extended reloc root lifespan Qu Wenruo
  2019-12-11 15:34 ` [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports David Sterba
  3 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2019-12-11  5:00 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Zygo Blaxell

[BUG]
When running workload with balance start/cancel, snapshot
creation/deletion and fsstress, we can hit the following KASAN report:

  ==================================================================
  BUG: KASAN: use-after-free in create_reloc_root+0x9f/0x460 [btrfs]
  Read of size 8 at addr ffff8881571741f0 by task btrfs/3539

  CPU: 6 PID: 3539 Comm: btrfs Tainted: G           O      5.5.0-rc1-custom+ #40
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  Call Trace:
   dump_stack+0xc2/0x11a
   print_address_description.constprop.0+0x20/0x210
   __kasan_report.cold+0x1b/0x41
   kasan_report+0x12/0x20
   __asan_load8+0x54/0x90
   create_reloc_root+0x9f/0x460 [btrfs]
   btrfs_reloc_post_snapshot+0xff/0x6c0 [btrfs]
   create_pending_snapshot+0xa9b/0x15f0 [btrfs]
   create_pending_snapshots+0x111/0x140 [btrfs]
   btrfs_commit_transaction+0x7a6/0x1360 [btrfs]
   btrfs_mksubvol+0x915/0x960 [btrfs]
   btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs]
   btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs]
   btrfs_ioctl+0x241b/0x3e60 [btrfs]
   do_vfs_ioctl+0x831/0xb10
   ksys_ioctl+0x67/0x90
   __x64_sys_ioctl+0x43/0x50
   do_syscall_64+0x79/0xe0
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

[CAUSE]
This is another case where root->reloc_root is accessed without checking
if the reloc root is already dead.

[FIX]
Also check DEAD_RELOC_TREE bit before accessing root->reloc_root.

Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/relocation.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index bb41b981e493..619ccb183515 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4755,7 +4755,14 @@ int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans,
 	struct reloc_control *rc = root->fs_info->reloc_ctl;
 	int ret;
 
-	if (!root->reloc_root || !rc)
+	/*
+	 * We don't need to use reloc tree if:
+	 * - No reloc tree
+	 * - Relocation not running
+	 * - Reloc tree already merged
+	 */
+	if (!root->reloc_root || !rc || test_bit(BTRFS_ROOT_DEAD_RELOC_TREE,
+				&root->state))
 		return 0;
 
 	rc = root->fs_info->reloc_ctl;
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/3] btrfs: relocation: Fix a KASAN report on btrfs_reloc_pre_snapshot() due to extended reloc root lifespan
  2019-12-11  5:00 [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports Qu Wenruo
  2019-12-11  5:00 ` [PATCH 1/3] btrfs: relocation: Fix a KASAN use-after-free bug due to extended reloc tree lifespan Qu Wenruo
  2019-12-11  5:00 ` [PATCH 2/3] btrfs: relocation: Fix KASAN report on create_reloc_tree due to extended reloc tree lifepsan Qu Wenruo
@ 2019-12-11  5:00 ` Qu Wenruo
  2019-12-11 14:55   ` Josef Bacik
  2019-12-11 15:34 ` [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports David Sterba
  3 siblings, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2019-12-11  5:00 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Zygo Blaxell

[BUG]
When running workload with balance start/cancel, snapshot
creation/deletion and fsstress, we can hit the following KASAN report:
  ==================================================================
  BUG: KASAN: use-after-free in btrfs_reloc_pre_snapshot+0x85/0xc0 [btrfs]
  Read of size 4 at addr ffff888157140100 by task btrfs/1822

  CPU: 2 PID: 1822 Comm: btrfs Tainted: G           O      5.5.0-rc1-custom+ #40
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
  Call Trace:
   dump_stack+0xc2/0x11a
   print_address_description.constprop.0+0x20/0x210
   __kasan_report.cold+0x1b/0x41
   kasan_report+0x12/0x20
   check_memory_region+0x13c/0x1b0
   __asan_loadN+0xf/0x20
   btrfs_reloc_pre_snapshot+0x85/0xc0 [btrfs]
   create_pending_snapshot+0x209/0x15f0 [btrfs]
   create_pending_snapshots+0x111/0x140 [btrfs]
   btrfs_commit_transaction+0x7a6/0x1360 [btrfs]
   btrfs_mksubvol+0x915/0x960 [btrfs]
   btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs]
   btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs]
   btrfs_ioctl+0x241b/0x3e60 [btrfs]
   do_vfs_ioctl+0x831/0xb10
   ksys_ioctl+0x67/0x90
   __x64_sys_ioctl+0x43/0x50
   do_syscall_64+0x79/0xe0
   entry_SYSCALL_64_after_hwframe+0x49/0xbe

[CAUSE]
Again, we try to access root->reloc_root without verify if that reloc
root is already dead (being freed).

[FIX]
Check if the root has dead reloc root before accessing it.

Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/relocation.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 619ccb183515..2d2cd1596ec9 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4721,7 +4721,8 @@ void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending,
 	struct btrfs_root *root = pending->root;
 	struct reloc_control *rc = root->fs_info->reloc_ctl;
 
-	if (!root->reloc_root || !rc)
+	if (!root->reloc_root || !rc || test_bit(BTRFS_ROOT_DEAD_RELOC_TREE,
+				&root->state))
 		return;
 
 	if (!rc->merge_reloc_tree)
-- 
2.24.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/3] btrfs: relocation: Fix a KASAN use-after-free bug due to extended reloc tree lifespan
  2019-12-11  5:00 ` [PATCH 1/3] btrfs: relocation: Fix a KASAN use-after-free bug due to extended reloc tree lifespan Qu Wenruo
@ 2019-12-11 14:53   ` Josef Bacik
  0 siblings, 0 replies; 18+ messages in thread
From: Josef Bacik @ 2019-12-11 14:53 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Zygo Blaxell

On 12/11/19 12:00 AM, Qu Wenruo wrote:
> [BUG]
> When running a workload with balance start/cancel, snapshot creation and
> deletion, and fsstress, we can get the following KASAN report:
> 
>    ==================================================================
>    BUG: KASAN: use-after-free in should_ignore_root+0x54/0xb0 [btrfs]
>    Read of size 8 at addr ffff888146e340f0 by task btrfs/1216
> 
>    CPU: 6 PID: 1216 Comm: btrfs Tainted: G           O      5.5.0-rc1-custom+ #40
>    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
>    Call Trace:
>     dump_stack+0xc2/0x11a
>     print_address_description.constprop.0+0x20/0x210
>     __kasan_report.cold+0x1b/0x41
>     kasan_report+0x12/0x20
>     check_memory_region+0x13c/0x1b0
>     __asan_loadN+0xf/0x20
>     should_ignore_root+0x54/0xb0 [btrfs]
>     build_backref_tree+0x11af/0x2280 [btrfs]
>     relocate_tree_blocks+0x391/0xb80 [btrfs]
>     relocate_block_group+0x3e5/0xa00 [btrfs]
>     btrfs_relocate_block_group+0x240/0x4d0 [btrfs]
>     btrfs_relocate_chunk+0x53/0xf0 [btrfs]
>     btrfs_balance+0xc91/0x1840 [btrfs]
>     btrfs_ioctl_balance+0x416/0x4e0 [btrfs]
>     btrfs_ioctl+0x8af/0x3e60 [btrfs]
>     do_vfs_ioctl+0x831/0xb10
>     ksys_ioctl+0x67/0x90
>     __x64_sys_ioctl+0x43/0x50
>     do_syscall_64+0x79/0xe0
>     entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> [CAUSE]
> When should_ignore_root() accessing root->reloc_root, the reloc_root can
> be being dropped, thus root->reloc_root can be unreliable.
> 
> [FIX]
> Check DEAD_RELOC_ROOT bit before accessing root->reloc_root.
> 
> Furthermore in the context of should_ignore_root(), if the root has
> already gone through tree merge, we don't need to trace that root
> anymore.
> Thus we need to return 1, for root with dead reloc root.
> 
> Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
> Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
> Signed-off-by: Qu Wenruo <wqu@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] btrfs: relocation: Fix KASAN report on create_reloc_tree due to extended reloc tree lifepsan
  2019-12-11  5:00 ` [PATCH 2/3] btrfs: relocation: Fix KASAN report on create_reloc_tree due to extended reloc tree lifepsan Qu Wenruo
@ 2019-12-11 14:55   ` Josef Bacik
  2019-12-11 15:15     ` David Sterba
  0 siblings, 1 reply; 18+ messages in thread
From: Josef Bacik @ 2019-12-11 14:55 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Zygo Blaxell

On 12/11/19 12:00 AM, Qu Wenruo wrote:
> [BUG]
> When running workload with balance start/cancel, snapshot
> creation/deletion and fsstress, we can hit the following KASAN report:
> 
>    ==================================================================
>    BUG: KASAN: use-after-free in create_reloc_root+0x9f/0x460 [btrfs]
>    Read of size 8 at addr ffff8881571741f0 by task btrfs/3539
> 
>    CPU: 6 PID: 3539 Comm: btrfs Tainted: G           O      5.5.0-rc1-custom+ #40
>    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
>    Call Trace:
>     dump_stack+0xc2/0x11a
>     print_address_description.constprop.0+0x20/0x210
>     __kasan_report.cold+0x1b/0x41
>     kasan_report+0x12/0x20
>     __asan_load8+0x54/0x90
>     create_reloc_root+0x9f/0x460 [btrfs]
>     btrfs_reloc_post_snapshot+0xff/0x6c0 [btrfs]
>     create_pending_snapshot+0xa9b/0x15f0 [btrfs]
>     create_pending_snapshots+0x111/0x140 [btrfs]
>     btrfs_commit_transaction+0x7a6/0x1360 [btrfs]
>     btrfs_mksubvol+0x915/0x960 [btrfs]
>     btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs]
>     btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs]
>     btrfs_ioctl+0x241b/0x3e60 [btrfs]
>     do_vfs_ioctl+0x831/0xb10
>     ksys_ioctl+0x67/0x90
>     __x64_sys_ioctl+0x43/0x50
>     do_syscall_64+0x79/0xe0
>     entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> [CAUSE]
> This is another case where root->reloc_root is accessed without checking
> if the reloc root is already dead.
> 
> [FIX]
> Also check DEAD_RELOC_TREE bit before accessing root->reloc_root.
> 
> Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
> Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/relocation.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index bb41b981e493..619ccb183515 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -4755,7 +4755,14 @@ int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans,
>   	struct reloc_control *rc = root->fs_info->reloc_ctl;
>   	int ret;
>   
> -	if (!root->reloc_root || !rc)
> +	/*
> +	 * We don't need to use reloc tree if:
> +	 * - No reloc tree
> +	 * - Relocation not running
> +	 * - Reloc tree already merged
> +	 */
> +	if (!root->reloc_root || !rc || test_bit(BTRFS_ROOT_DEAD_RELOC_TREE,
> +				&root->state))

This is awkward formatting, can we move the test_bit() to the first thing we 
check so it's less weird?  Then you can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/3] btrfs: relocation: Fix a KASAN report on btrfs_reloc_pre_snapshot() due to extended reloc root lifespan
  2019-12-11  5:00 ` [PATCH 3/3] btrfs: relocation: Fix a KASAN report on btrfs_reloc_pre_snapshot() due to extended reloc root lifespan Qu Wenruo
@ 2019-12-11 14:55   ` Josef Bacik
  0 siblings, 0 replies; 18+ messages in thread
From: Josef Bacik @ 2019-12-11 14:55 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Zygo Blaxell

On 12/11/19 12:00 AM, Qu Wenruo wrote:
> [BUG]
> When running workload with balance start/cancel, snapshot
> creation/deletion and fsstress, we can hit the following KASAN report:
>    ==================================================================
>    BUG: KASAN: use-after-free in btrfs_reloc_pre_snapshot+0x85/0xc0 [btrfs]
>    Read of size 4 at addr ffff888157140100 by task btrfs/1822
> 
>    CPU: 2 PID: 1822 Comm: btrfs Tainted: G           O      5.5.0-rc1-custom+ #40
>    Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
>    Call Trace:
>     dump_stack+0xc2/0x11a
>     print_address_description.constprop.0+0x20/0x210
>     __kasan_report.cold+0x1b/0x41
>     kasan_report+0x12/0x20
>     check_memory_region+0x13c/0x1b0
>     __asan_loadN+0xf/0x20
>     btrfs_reloc_pre_snapshot+0x85/0xc0 [btrfs]
>     create_pending_snapshot+0x209/0x15f0 [btrfs]
>     create_pending_snapshots+0x111/0x140 [btrfs]
>     btrfs_commit_transaction+0x7a6/0x1360 [btrfs]
>     btrfs_mksubvol+0x915/0x960 [btrfs]
>     btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs]
>     btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs]
>     btrfs_ioctl+0x241b/0x3e60 [btrfs]
>     do_vfs_ioctl+0x831/0xb10
>     ksys_ioctl+0x67/0x90
>     __x64_sys_ioctl+0x43/0x50
>     do_syscall_64+0x79/0xe0
>     entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> [CAUSE]
> Again, we try to access root->reloc_root without verify if that reloc
> root is already dead (being freed).
> 
> [FIX]
> Check if the root has dead reloc root before accessing it.
> 
> Reported-by: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
> Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>   fs/btrfs/relocation.c | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 619ccb183515..2d2cd1596ec9 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -4721,7 +4721,8 @@ void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending,
>   	struct btrfs_root *root = pending->root;
>   	struct reloc_control *rc = root->fs_info->reloc_ctl;
>   
> -	if (!root->reloc_root || !rc)
> +	if (!root->reloc_root || !rc || test_bit(BTRFS_ROOT_DEAD_RELOC_TREE,
> +				&root->state))
>   		return;
>  

Same comment here as the other one, you can add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Once it's fixed.  Thanks,

Josef

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/3] btrfs: relocation: Fix KASAN report on create_reloc_tree due to extended reloc tree lifepsan
  2019-12-11 14:55   ` Josef Bacik
@ 2019-12-11 15:15     ` David Sterba
  0 siblings, 0 replies; 18+ messages in thread
From: David Sterba @ 2019-12-11 15:15 UTC (permalink / raw)
  To: Josef Bacik; +Cc: Qu Wenruo, linux-btrfs, Zygo Blaxell

On Wed, Dec 11, 2019 at 09:55:04AM -0500, Josef Bacik wrote:
> > +	/*
> > +	 * We don't need to use reloc tree if:
> > +	 * - No reloc tree
> > +	 * - Relocation not running
> > +	 * - Reloc tree already merged
> > +	 */
> > +	if (!root->reloc_root || !rc || test_bit(BTRFS_ROOT_DEAD_RELOC_TREE,
> > +				&root->state))
> 
> This is awkward formatting, can we move the test_bit() to the first thing we 
> check so it's less weird?  Then you can add

I had the same thought, will move the test_bit on the next line.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports
  2019-12-11  5:00 [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports Qu Wenruo
                   ` (2 preceding siblings ...)
  2019-12-11  5:00 ` [PATCH 3/3] btrfs: relocation: Fix a KASAN report on btrfs_reloc_pre_snapshot() due to extended reloc root lifespan Qu Wenruo
@ 2019-12-11 15:34 ` David Sterba
  2019-12-12  0:39   ` Qu Wenruo
  3 siblings, 1 reply; 18+ messages in thread
From: David Sterba @ 2019-12-11 15:34 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs

On Wed, Dec 11, 2019 at 01:00:01PM +0800, Qu Wenruo wrote:
> Due to commit d2311e698578 ("btrfs: relocation: Delay reloc tree
> deletion after merge_reloc_roots"), reloc tree lifespan is extended.
> 
> Although we always set root->reloc_root to NULL before we drop the reloc
> tree, but that's not multi-core safe since we have no proper memory
> barrier to ensure other cores can see the same root->reloc_root.
> 
> The proper root fix should be some proper root refcount, and make
> btrfs_drop_snapshot() to wait for all other root owner to release the
> root before dropping it.

This would block cleaning deleted subvolumes, no? We can skip the dead
tree (and add it back to the list) in that can and not wait. The
cleaner thread is able to process the list repeatedly.

> But for now, let's just check the DEAD_RELOC_ROOT bit before accessing
> root->reloc_root.

Ok, the bit is safe way to sync that as long as the correct order of
setting/clearing is done.  The ops are atomic wrt to the value itself
but need barriers around as they're simple atomic ops (not RMW,
according to Documentation/atomic_bitops.txt) and there's no outer
synchronization.

Check:

	smp_mb__before_atomic();
	if (test_bit() ...)
		return;

Set:

	set_bit()
	smp_mb__after_atomic();
	(delete reloc_root)
	reloc_root = NULL

Clearing of the bit is done when there are not potential other users so
that part does not need the barrier (I think).

The checking part could use a helper so we don't have barriers scattered
around code.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports
  2019-12-11 15:34 ` [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports David Sterba
@ 2019-12-12  0:39   ` Qu Wenruo
  2019-12-12 14:28     ` David Sterba
  2020-01-03 15:52     ` David Sterba
  0 siblings, 2 replies; 18+ messages in thread
From: Qu Wenruo @ 2019-12-12  0:39 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2780 bytes --]



On 2019/12/11 下午11:34, David Sterba wrote:
> On Wed, Dec 11, 2019 at 01:00:01PM +0800, Qu Wenruo wrote:
>> Due to commit d2311e698578 ("btrfs: relocation: Delay reloc tree
>> deletion after merge_reloc_roots"), reloc tree lifespan is extended.
>>
>> Although we always set root->reloc_root to NULL before we drop the reloc
>> tree, but that's not multi-core safe since we have no proper memory
>> barrier to ensure other cores can see the same root->reloc_root.
>>
>> The proper root fix should be some proper root refcount, and make
>> btrfs_drop_snapshot() to wait for all other root owner to release the
>> root before dropping it.
> 
> This would block cleaning deleted subvolumes, no? We can skip the dead
> tree (and add it back to the list) in that can and not wait. The
> cleaner thread is able to process the list repeatedly.

What I mean is:
- For consumer (reading root->reloc_root)
  spin_lock(&root->reloc_lock);
  if (!root->reloc_root) {
      spin_unlock(&root->reloc_lock);
      return NULL
  }
  refcount_inc(&root->reloc_root->refcount);
  return(root->reloc_root);
  spin_unlock(&root->reloc_lock);

  And of cource, release it after grabbing reloc_root.

- For cleaner
  grab reloc_root just like consumer.
retry:
  wait_event(refcount_read(&root->reloc_root->ref_count) == 1);
  spin_lock(&root->reloc_lock);
  if (&root->reloc_root->ref_count != 1){
      spin_unlock(); goto retry;
  }
  root->reloc_root = NULL;
  spin_unlock(&root->reloc_lock);
  /* Now we're the only owner, delete the root */


> 
>> But for now, let's just check the DEAD_RELOC_ROOT bit before accessing
>> root->reloc_root.
> 
> Ok, the bit is safe way to sync that as long as the correct order of
> setting/clearing is done.  The ops are atomic wrt to the value itself
> but need barriers around as they're simple atomic ops (not RMW,
> according to Documentation/atomic_bitops.txt) and there's no outer
> synchronization.
> 
> Check:
> 
> 	smp_mb__before_atomic();
> 	if (test_bit() ...)
> 		return;
> 
> Set:
> 
> 	set_bit()
> 	smp_mb__after_atomic();
> 	(delete reloc_root)
> 	reloc_root = NULL
> 
> Clearing of the bit is done when there are not potential other users so
> that part does not need the barrier (I think).
> 
> The checking part could use a helper so we don't have barriers scattered
> around code.
> 
I'm still not confident enough for the "reloc_root = NULL" assignment
and "reloc_root == NULL" test.

But since the set_bit()/test_bit() is safe, and it happens before we
modify reloc_root, it's safer and is what we used in this quick fix.

Still, I'm really looking forward to Josef's root refcount work, that
should be the real fix for all the problems.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports
  2019-12-12  0:39   ` Qu Wenruo
@ 2019-12-12 14:28     ` David Sterba
  2020-01-03 15:52     ` David Sterba
  1 sibling, 0 replies; 18+ messages in thread
From: David Sterba @ 2019-12-12 14:28 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Thu, Dec 12, 2019 at 08:39:43AM +0800, Qu Wenruo wrote:
> 
> 
> On 2019/12/11 下午11:34, David Sterba wrote:
> > On Wed, Dec 11, 2019 at 01:00:01PM +0800, Qu Wenruo wrote:
> >> Due to commit d2311e698578 ("btrfs: relocation: Delay reloc tree
> >> deletion after merge_reloc_roots"), reloc tree lifespan is extended.
> >>
> >> Although we always set root->reloc_root to NULL before we drop the reloc
> >> tree, but that's not multi-core safe since we have no proper memory
> >> barrier to ensure other cores can see the same root->reloc_root.
> >>
> >> The proper root fix should be some proper root refcount, and make
> >> btrfs_drop_snapshot() to wait for all other root owner to release the
> >> root before dropping it.
> > 
> > This would block cleaning deleted subvolumes, no? We can skip the dead
> > tree (and add it back to the list) in that can and not wait. The
> > cleaner thread is able to process the list repeatedly.
> 
> What I mean is:
> - For consumer (reading root->reloc_root)
>   spin_lock(&root->reloc_lock);
>   if (!root->reloc_root) {
>       spin_unlock(&root->reloc_lock);
>       return NULL
>   }
>   refcount_inc(&root->reloc_root->refcount);
>   return(root->reloc_root);
>   spin_unlock(&root->reloc_lock);
> 
>   And of cource, release it after grabbing reloc_root.
> 
> - For cleaner
>   grab reloc_root just like consumer.
> retry:
>   wait_event(refcount_read(&root->reloc_root->ref_count) == 1);
>   spin_lock(&root->reloc_lock);
>   if (&root->reloc_root->ref_count != 1){
>       spin_unlock(); goto retry;
>   }
>   root->reloc_root = NULL;
>   spin_unlock(&root->reloc_lock);
>   /* Now we're the only owner, delete the root */

The spinlock should be safe as well, do you mean to take it to verify
that reloc_root is valid everywhere?

> > Clearing of the bit is done when there are not potential other users so
> > that part does not need the barrier (I think).
> > 
> > The checking part could use a helper so we don't have barriers scattered
> > around code.
> > 
> I'm still not confident enough for the "reloc_root = NULL" assignment
> and "reloc_root == NULL" test.
> 
> But since the set_bit()/test_bit() is safe, and it happens before we
> modify reloc_root, it's safer and is what we used in this quick fix.
> 
> Still, I'm really looking forward to Josef's root refcount work, that
> should be the real fix for all the problems.

That's a huge series and unsuitable for backports to stable, we need
something like your patches first.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports
  2019-12-12  0:39   ` Qu Wenruo
  2019-12-12 14:28     ` David Sterba
@ 2020-01-03 15:52     ` David Sterba
  2020-01-03 16:15       ` David Sterba
  2020-01-04  1:32       ` Qu Wenruo
  1 sibling, 2 replies; 18+ messages in thread
From: David Sterba @ 2020-01-03 15:52 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

We need to get this series moving because the bug affects a few stable
versions.

On Thu, Dec 12, 2019 at 08:39:43AM +0800, Qu Wenruo wrote:
> On 2019/12/11 下午11:34, David Sterba wrote:
> > On Wed, Dec 11, 2019 at 01:00:01PM +0800, Qu Wenruo wrote:
> >> Due to commit d2311e698578 ("btrfs: relocation: Delay reloc tree
> >> deletion after merge_reloc_roots"), reloc tree lifespan is extended.
> >>
> >> Although we always set root->reloc_root to NULL before we drop the reloc
> >> tree, but that's not multi-core safe since we have no proper memory
> >> barrier to ensure other cores can see the same root->reloc_root.
> >>
> >> The proper root fix should be some proper root refcount, and make
> >> btrfs_drop_snapshot() to wait for all other root owner to release the
> >> root before dropping it.
> > 
> > This would block cleaning deleted subvolumes, no? We can skip the dead
> > tree (and add it back to the list) in that can and not wait. The
> > cleaner thread is able to process the list repeatedly.
> 
> What I mean is:
> - For consumer (reading root->reloc_root)
>   spin_lock(&root->reloc_lock);
>   if (!root->reloc_root) {
>       spin_unlock(&root->reloc_lock);
>       return NULL
>   }
>   refcount_inc(&root->reloc_root->refcount);
>   return(root->reloc_root);
>   spin_unlock(&root->reloc_lock);
> 
>   And of cource, release it after grabbing reloc_root.
> 
> - For cleaner
>   grab reloc_root just like consumer.
> retry:
>   wait_event(refcount_read(&root->reloc_root->ref_count) == 1);
>   spin_lock(&root->reloc_lock);
>   if (&root->reloc_root->ref_count != 1){
>       spin_unlock(); goto retry;
>   }
>   root->reloc_root = NULL;
>   spin_unlock(&root->reloc_lock);
>   /* Now we're the only owner, delete the root */

So it's one bit vs refcount and a lock. For the backports I'd go with
the bit, but this needs the barriers as mentioned in my previous reply.
Can you please update the patches?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports
  2020-01-03 15:52     ` David Sterba
@ 2020-01-03 16:15       ` David Sterba
  2020-01-04  9:37         ` Qu Wenruo
  2020-01-06  7:04         ` Qu Wenruo
  2020-01-04  1:32       ` Qu Wenruo
  1 sibling, 2 replies; 18+ messages in thread
From: David Sterba @ 2020-01-03 16:15 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, Qu Wenruo, linux-btrfs

On Fri, Jan 03, 2020 at 04:52:59PM +0100, David Sterba wrote:
> So it's one bit vs refcount and a lock. For the backports I'd go with
> the bit, but this needs the barriers as mentioned in my previous reply.
> Can you please update the patches?

The idea is in the diff below (compile tested only). I found one more
case that was not addressed by your patches, it's in
btrfs_update_reloc_root.

Given that the type of the fix is the same, I'd rather do that in one
patch. The reported stack traces are more or less the same.

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index af4dd49a71c7..aeba3a7506e1 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -517,6 +517,15 @@ static int update_backref_cache(struct btrfs_trans_handle *trans,
 	return 1;
 }
 
+static bool have_reloc_root(struct btrfs_root *root)
+{
+	smp_mb__before_atomic();
+	if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
+		return false;
+	if (!root->reloc_root)
+		return false;
+	return true;
+}
 
 static int should_ignore_root(struct btrfs_root *root)
 {
@@ -525,9 +534,9 @@ static int should_ignore_root(struct btrfs_root *root)
 	if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state))
 		return 0;
 
-	reloc_root = root->reloc_root;
-	if (!reloc_root)
+	if (!have_reloc_root(root))
 		return 0;
+	reloc_root = root->reloc_root;
 
 	if (btrfs_root_last_snapshot(&reloc_root->root_item) ==
 	    root->fs_info->running_transaction->transid - 1)
@@ -1439,6 +1448,7 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans,
 	 * The subvolume has reloc tree but the swap is finished, no need to
 	 * create/update the dead reloc tree
 	 */
+	smp_mb__before_atomic();
 	if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
 		return 0;
 
@@ -1478,8 +1488,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
 	struct btrfs_root_item *root_item;
 	int ret;
 
-	if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state) ||
-	    !root->reloc_root)
+	if (!have_reloc_root(root))
 		goto out;
 
 	reloc_root = root->reloc_root;
@@ -1489,6 +1498,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
 	if (fs_info->reloc_ctl->merge_reloc_tree &&
 	    btrfs_root_refs(root_item) == 0) {
 		set_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
+		smp_mb__after_atomic();
 		__del_reloc_root(reloc_root);
 	}
 
@@ -2201,6 +2211,7 @@ static int clean_dirty_subvols(struct reloc_control *rc)
 				if (ret2 < 0 && !ret)
 					ret = ret2;
 			}
+			smp_mb__before_atomic();
 			clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
 			btrfs_put_fs_root(root);
 		} else {
@@ -4730,7 +4741,7 @@ void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending,
 	struct btrfs_root *root = pending->root;
 	struct reloc_control *rc = root->fs_info->reloc_ctl;
 
-	if (!root->reloc_root || !rc)
+	if (!rc || !have_reloc_root(root))
 		return;
 
 	if (!rc->merge_reloc_tree)
@@ -4764,7 +4775,7 @@ int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans,
 	struct reloc_control *rc = root->fs_info->reloc_ctl;
 	int ret;
 
-	if (!root->reloc_root || !rc)
+	if (!rc || !have_reloc_root(root))
 		return 0;
 
 	rc = root->fs_info->reloc_ctl;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports
  2020-01-03 15:52     ` David Sterba
  2020-01-03 16:15       ` David Sterba
@ 2020-01-04  1:32       ` Qu Wenruo
  1 sibling, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2020-01-04  1:32 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2115 bytes --]



On 2020/1/3 下午11:52, David Sterba wrote:
> We need to get this series moving because the bug affects a few stable
> versions.
> 
> On Thu, Dec 12, 2019 at 08:39:43AM +0800, Qu Wenruo wrote:
>> On 2019/12/11 下午11:34, David Sterba wrote:
>>> On Wed, Dec 11, 2019 at 01:00:01PM +0800, Qu Wenruo wrote:
>>>> Due to commit d2311e698578 ("btrfs: relocation: Delay reloc tree
>>>> deletion after merge_reloc_roots"), reloc tree lifespan is extended.
>>>>
>>>> Although we always set root->reloc_root to NULL before we drop the reloc
>>>> tree, but that's not multi-core safe since we have no proper memory
>>>> barrier to ensure other cores can see the same root->reloc_root.
>>>>
>>>> The proper root fix should be some proper root refcount, and make
>>>> btrfs_drop_snapshot() to wait for all other root owner to release the
>>>> root before dropping it.
>>>
>>> This would block cleaning deleted subvolumes, no? We can skip the dead
>>> tree (and add it back to the list) in that can and not wait. The
>>> cleaner thread is able to process the list repeatedly.
>>
>> What I mean is:
>> - For consumer (reading root->reloc_root)
>>   spin_lock(&root->reloc_lock);
>>   if (!root->reloc_root) {
>>       spin_unlock(&root->reloc_lock);
>>       return NULL
>>   }
>>   refcount_inc(&root->reloc_root->refcount);
>>   return(root->reloc_root);
>>   spin_unlock(&root->reloc_lock);
>>
>>   And of cource, release it after grabbing reloc_root.
>>
>> - For cleaner
>>   grab reloc_root just like consumer.
>> retry:
>>   wait_event(refcount_read(&root->reloc_root->ref_count) == 1);
>>   spin_lock(&root->reloc_lock);
>>   if (&root->reloc_root->ref_count != 1){
>>       spin_unlock(); goto retry;
>>   }
>>   root->reloc_root = NULL;
>>   spin_unlock(&root->reloc_lock);
>>   /* Now we're the only owner, delete the root */
> 
> So it's one bit vs refcount and a lock. For the backports I'd go with
> the bit, but this needs the barriers as mentioned in my previous reply.
> Can you please update the patches?
> 
Sure, I'll update them soon.

Thanks,
Qu


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports
  2020-01-03 16:15       ` David Sterba
@ 2020-01-04  9:37         ` Qu Wenruo
  2020-01-04 13:18           ` Qu Wenruo
  2020-01-06  7:04         ` Qu Wenruo
  1 sibling, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2020-01-04  9:37 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 4060 bytes --]



On 2020/1/4 上午12:15, David Sterba wrote:
> On Fri, Jan 03, 2020 at 04:52:59PM +0100, David Sterba wrote:
>> So it's one bit vs refcount and a lock. For the backports I'd go with
>> the bit, but this needs the barriers as mentioned in my previous reply.
>> Can you please update the patches?
> 
> The idea is in the diff below (compile tested only). I found one more
> case that was not addressed by your patches, it's in
> btrfs_update_reloc_root.

But fix in btrfs_update_reloc_root() is already included in commit
d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after
merge_reloc_roots").

Or would you mind to share more details about the missing check?
> 
> Given that the type of the fix is the same, I'd rather do that in one
> patch. The reported stack traces are more or less the same.

To merge them into patch set is no problem, and should make backports a
little easier.

But I still didn't understand the barrier part.
If we're relying on that bit operation before accessing reloc_root, it
should be safe enough, even without memory barrier.

Would you please explain a little more?

Thanks,
Qu
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index af4dd49a71c7..aeba3a7506e1 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -517,6 +517,15 @@ static int update_backref_cache(struct btrfs_trans_handle *trans,
>  	return 1;
>  }
>  
> +static bool have_reloc_root(struct btrfs_root *root)
> +{
> +	smp_mb__before_atomic();
> +	if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
> +		return false;
> +	if (!root->reloc_root)
> +		return false;
> +	return true;
> +}
>  
>  static int should_ignore_root(struct btrfs_root *root)
>  {
> @@ -525,9 +534,9 @@ static int should_ignore_root(struct btrfs_root *root)
>  	if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state))
>  		return 0;
>  
> -	reloc_root = root->reloc_root;
> -	if (!reloc_root)
> +	if (!have_reloc_root(root))
>  		return 0;
> +	reloc_root = root->reloc_root;
>  
>  	if (btrfs_root_last_snapshot(&reloc_root->root_item) ==
>  	    root->fs_info->running_transaction->transid - 1)
> @@ -1439,6 +1448,7 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans,
>  	 * The subvolume has reloc tree but the swap is finished, no need to
>  	 * create/update the dead reloc tree
>  	 */
> +	smp_mb__before_atomic();
>  	if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
>  		return 0;
>  
> @@ -1478,8 +1488,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
>  	struct btrfs_root_item *root_item;
>  	int ret;
>  
> -	if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state) ||
> -	    !root->reloc_root)
> +	if (!have_reloc_root(root))
>  		goto out;
>  
>  	reloc_root = root->reloc_root;
> @@ -1489,6 +1498,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
>  	if (fs_info->reloc_ctl->merge_reloc_tree &&
>  	    btrfs_root_refs(root_item) == 0) {
>  		set_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
> +		smp_mb__after_atomic();
>  		__del_reloc_root(reloc_root);
>  	}
>  
> @@ -2201,6 +2211,7 @@ static int clean_dirty_subvols(struct reloc_control *rc)
>  				if (ret2 < 0 && !ret)
>  					ret = ret2;
>  			}
> +			smp_mb__before_atomic();
>  			clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
>  			btrfs_put_fs_root(root);
>  		} else {
> @@ -4730,7 +4741,7 @@ void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending,
>  	struct btrfs_root *root = pending->root;
>  	struct reloc_control *rc = root->fs_info->reloc_ctl;
>  
> -	if (!root->reloc_root || !rc)
> +	if (!rc || !have_reloc_root(root))
>  		return;
>  
>  	if (!rc->merge_reloc_tree)
> @@ -4764,7 +4775,7 @@ int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans,
>  	struct reloc_control *rc = root->fs_info->reloc_ctl;
>  	int ret;
>  
> -	if (!root->reloc_root || !rc)
> +	if (!rc || !have_reloc_root(root))
>  		return 0;
>  
>  	rc = root->fs_info->reloc_ctl;
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports
  2020-01-04  9:37         ` Qu Wenruo
@ 2020-01-04 13:18           ` Qu Wenruo
  0 siblings, 0 replies; 18+ messages in thread
From: Qu Wenruo @ 2020-01-04 13:18 UTC (permalink / raw)
  To: Qu Wenruo, dsterba, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 4417 bytes --]



On 2020/1/4 下午5:37, Qu Wenruo wrote:
> 
> 
> On 2020/1/4 上午12:15, David Sterba wrote:
>> On Fri, Jan 03, 2020 at 04:52:59PM +0100, David Sterba wrote:
>>> So it's one bit vs refcount and a lock. For the backports I'd go with
>>> the bit, but this needs the barriers as mentioned in my previous reply.
>>> Can you please update the patches?
>>
>> The idea is in the diff below (compile tested only). I found one more
>> case that was not addressed by your patches, it's in
>> btrfs_update_reloc_root.
> 
> But fix in btrfs_update_reloc_root() is already included in commit
> d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after
> merge_reloc_roots").
> 
> Or would you mind to share more details about the missing check?
>>
>> Given that the type of the fix is the same, I'd rather do that in one
>> patch. The reported stack traces are more or less the same.
> 
> To merge them into patch set is no problem, and should make backports a
> little easier.
> 
> But I still didn't understand the barrier part.
> If we're relying on that bit operation before accessing reloc_root, it
> should be safe enough, even without memory barrier.

My bad, set_bit() and test_bit() themselves doesn't imply memory
barrier. (I always thought the opposite)

I'll put memory barriers in next version.

Thanks,
Qu

> 
> Would you please explain a little more?
> 
> Thanks,
> Qu
>>
>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>> index af4dd49a71c7..aeba3a7506e1 100644
>> --- a/fs/btrfs/relocation.c
>> +++ b/fs/btrfs/relocation.c
>> @@ -517,6 +517,15 @@ static int update_backref_cache(struct btrfs_trans_handle *trans,
>>  	return 1;
>>  }
>>  
>> +static bool have_reloc_root(struct btrfs_root *root)
>> +{
>> +	smp_mb__before_atomic();
>> +	if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
>> +		return false;
>> +	if (!root->reloc_root)
>> +		return false;
>> +	return true;
>> +}
>>  
>>  static int should_ignore_root(struct btrfs_root *root)
>>  {
>> @@ -525,9 +534,9 @@ static int should_ignore_root(struct btrfs_root *root)
>>  	if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state))
>>  		return 0;
>>  
>> -	reloc_root = root->reloc_root;
>> -	if (!reloc_root)
>> +	if (!have_reloc_root(root))
>>  		return 0;
>> +	reloc_root = root->reloc_root;
>>  
>>  	if (btrfs_root_last_snapshot(&reloc_root->root_item) ==
>>  	    root->fs_info->running_transaction->transid - 1)
>> @@ -1439,6 +1448,7 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans,
>>  	 * The subvolume has reloc tree but the swap is finished, no need to
>>  	 * create/update the dead reloc tree
>>  	 */
>> +	smp_mb__before_atomic();
>>  	if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
>>  		return 0;
>>  
>> @@ -1478,8 +1488,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
>>  	struct btrfs_root_item *root_item;
>>  	int ret;
>>  
>> -	if (test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state) ||
>> -	    !root->reloc_root)
>> +	if (!have_reloc_root(root))
>>  		goto out;
>>  
>>  	reloc_root = root->reloc_root;
>> @@ -1489,6 +1498,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
>>  	if (fs_info->reloc_ctl->merge_reloc_tree &&
>>  	    btrfs_root_refs(root_item) == 0) {
>>  		set_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
>> +		smp_mb__after_atomic();
>>  		__del_reloc_root(reloc_root);
>>  	}
>>  
>> @@ -2201,6 +2211,7 @@ static int clean_dirty_subvols(struct reloc_control *rc)
>>  				if (ret2 < 0 && !ret)
>>  					ret = ret2;
>>  			}
>> +			smp_mb__before_atomic();
>>  			clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
>>  			btrfs_put_fs_root(root);
>>  		} else {
>> @@ -4730,7 +4741,7 @@ void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending,
>>  	struct btrfs_root *root = pending->root;
>>  	struct reloc_control *rc = root->fs_info->reloc_ctl;
>>  
>> -	if (!root->reloc_root || !rc)
>> +	if (!rc || !have_reloc_root(root))
>>  		return;
>>  
>>  	if (!rc->merge_reloc_tree)
>> @@ -4764,7 +4775,7 @@ int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans,
>>  	struct reloc_control *rc = root->fs_info->reloc_ctl;
>>  	int ret;
>>  
>> -	if (!root->reloc_root || !rc)
>> +	if (!rc || !have_reloc_root(root))
>>  		return 0;
>>  
>>  	rc = root->fs_info->reloc_ctl;
>>
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports
  2020-01-03 16:15       ` David Sterba
  2020-01-04  9:37         ` Qu Wenruo
@ 2020-01-06  7:04         ` Qu Wenruo
  2020-01-06 18:23           ` David Sterba
  1 sibling, 1 reply; 18+ messages in thread
From: Qu Wenruo @ 2020-01-06  7:04 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2843 bytes --]

On 2020/1/4 上午12:15, David Sterba wrote:
> On Fri, Jan 03, 2020 at 04:52:59PM +0100, David Sterba wrote:
>> So it's one bit vs refcount and a lock. For the backports I'd go with
>> the bit, but this needs the barriers as mentioned in my previous reply.
>> Can you please update the patches?
> 
> The idea is in the diff below (compile tested only). I found one more
> case that was not addressed by your patches, it's in
> btrfs_update_reloc_root.
> 
> Given that the type of the fix is the same, I'd rather do that in one
> patch. The reported stack traces are more or less the same.
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index af4dd49a71c7..aeba3a7506e1 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -517,6 +517,15 @@ static int update_backref_cache(struct btrfs_trans_handle *trans,
>  	return 1;
>  }
>  
> +static bool have_reloc_root(struct btrfs_root *root)
> +{
> +	smp_mb__before_atomic();

Mind to explain why the before_atomic() is needed?

Is it just paired with smp_mb__after_atomic() for the
set_bit()/clear_bit() part?

>  	reloc_root = root->reloc_root;
> @@ -1489,6 +1498,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
>  	if (fs_info->reloc_ctl->merge_reloc_tree &&
>  	    btrfs_root_refs(root_item) == 0) {
>  		set_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
> +		smp_mb__after_atomic();

I get the point here, to make sure all other users see this bit change.

>  		__del_reloc_root(reloc_root);

Interestingly in that function we immediately triggers spin_lock() which
implies memory barrier.
(Not an excuse to skip memory barrier anyway)

>  	}
>  
> @@ -2201,6 +2211,7 @@ static int clean_dirty_subvols(struct reloc_control *rc)
>  				if (ret2 < 0 && !ret)
>  					ret = ret2;
>  			}
> +			smp_mb__before_atomic();
>  			clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);

I guess this should be a smp_mb__after_atomic();

>  			btrfs_put_fs_root(root);

And btrfs_put_fs_root() triggers a release memory ordering.

So it looks memory order is not completely screwed up before, completely
by pure luck...

Thanks,
Qu

>  		} else {
> @@ -4730,7 +4741,7 @@ void btrfs_reloc_pre_snapshot(struct btrfs_pending_snapshot *pending,
>  	struct btrfs_root *root = pending->root;
>  	struct reloc_control *rc = root->fs_info->reloc_ctl;
>  
> -	if (!root->reloc_root || !rc)
> +	if (!rc || !have_reloc_root(root))
>  		return;
>  
>  	if (!rc->merge_reloc_tree)
> @@ -4764,7 +4775,7 @@ int btrfs_reloc_post_snapshot(struct btrfs_trans_handle *trans,
>  	struct reloc_control *rc = root->fs_info->reloc_ctl;
>  	int ret;
>  
> -	if (!root->reloc_root || !rc)
> +	if (!rc || !have_reloc_root(root))
>  		return 0;
>  
>  	rc = root->fs_info->reloc_ctl;
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 520 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports
  2020-01-06  7:04         ` Qu Wenruo
@ 2020-01-06 18:23           ` David Sterba
  0 siblings, 0 replies; 18+ messages in thread
From: David Sterba @ 2020-01-06 18:23 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: dsterba, Qu Wenruo, linux-btrfs

On Mon, Jan 06, 2020 at 03:04:32PM +0800, Qu Wenruo wrote:
> On 2020/1/4 上午12:15, David Sterba wrote:
> > On Fri, Jan 03, 2020 at 04:52:59PM +0100, David Sterba wrote:
> >> So it's one bit vs refcount and a lock. For the backports I'd go with
> >> the bit, but this needs the barriers as mentioned in my previous reply.
> >> Can you please update the patches?
> > 
> > The idea is in the diff below (compile tested only). I found one more
> > case that was not addressed by your patches, it's in
> > btrfs_update_reloc_root.
> > 
> > Given that the type of the fix is the same, I'd rather do that in one
> > patch. The reported stack traces are more or less the same.
> > 
> > diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> > index af4dd49a71c7..aeba3a7506e1 100644
> > --- a/fs/btrfs/relocation.c
> > +++ b/fs/btrfs/relocation.c
> > @@ -517,6 +517,15 @@ static int update_backref_cache(struct btrfs_trans_handle *trans,
> >  	return 1;
> >  }
> >  
> > +static bool have_reloc_root(struct btrfs_root *root)
> > +{
> > +	smp_mb__before_atomic();
> 
> Mind to explain why the before_atomic() is needed?
> 
> Is it just paired with smp_mb__after_atomic() for the
> set_bit()/clear_bit() part?

Yes. The reading part of a barrier must flush any pending state, then
read it.

> >  	reloc_root = root->reloc_root;
> > @@ -1489,6 +1498,7 @@ int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
> >  	if (fs_info->reloc_ctl->merge_reloc_tree &&
> >  	    btrfs_root_refs(root_item) == 0) {
> >  		set_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
> > +		smp_mb__after_atomic();
> 
> I get the point here, to make sure all other users see this bit change.
> 
> >  		__del_reloc_root(reloc_root);
> 
> Interestingly in that function we immediately triggers spin_lock() which
> implies memory barrier.
> (Not an excuse to skip memory barrier anyway)

Beware that spin_lock and spin_unlock are only half barriers. Full
barrier is implied by unlock/lock sequence.

> 
> >  	}
> >  
> > @@ -2201,6 +2211,7 @@ static int clean_dirty_subvols(struct reloc_control *rc)
> >  				if (ret2 < 0 && !ret)
> >  					ret = ret2;
> >  			}
> > +			smp_mb__before_atomic();
> >  			clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
> 
> I guess this should be a smp_mb__after_atomic();

No, we want everything that happens before the clear bit to be stored
before the bit is cleared. IOW cleared bit must not be seen before all
the previous updates are done.

> 
> >  			btrfs_put_fs_root(root);
> 
> And btrfs_put_fs_root() triggers a release memory ordering.

But it's too late.

> So it looks memory order is not completely screwed up before, completely
> by pure luck...

Well, no :)

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-01-06 18:23 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-11  5:00 [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports Qu Wenruo
2019-12-11  5:00 ` [PATCH 1/3] btrfs: relocation: Fix a KASAN use-after-free bug due to extended reloc tree lifespan Qu Wenruo
2019-12-11 14:53   ` Josef Bacik
2019-12-11  5:00 ` [PATCH 2/3] btrfs: relocation: Fix KASAN report on create_reloc_tree due to extended reloc tree lifepsan Qu Wenruo
2019-12-11 14:55   ` Josef Bacik
2019-12-11 15:15     ` David Sterba
2019-12-11  5:00 ` [PATCH 3/3] btrfs: relocation: Fix a KASAN report on btrfs_reloc_pre_snapshot() due to extended reloc root lifespan Qu Wenruo
2019-12-11 14:55   ` Josef Bacik
2019-12-11 15:34 ` [PATCH 0/3] btrfs: fixes for relocation to avoid KASAN reports David Sterba
2019-12-12  0:39   ` Qu Wenruo
2019-12-12 14:28     ` David Sterba
2020-01-03 15:52     ` David Sterba
2020-01-03 16:15       ` David Sterba
2020-01-04  9:37         ` Qu Wenruo
2020-01-04 13:18           ` Qu Wenruo
2020-01-06  7:04         ` Qu Wenruo
2020-01-06 18:23           ` David Sterba
2020-01-04  1:32       ` Qu Wenruo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.