All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu WenRuo <wqu@suse.com>
To: Nikolay Borisov <nborisov@suse.com>,
	"linux-btrfs@vger.kernel.org" <linux-btrfs@vger.kernel.org>
Cc: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>,
	David Sterba <DSterba@suse.com>
Subject: Re: [PATCH v2] btrfs: relocation: Fix KASAN reports caused by extended reloc tree lifespan
Date: Wed, 8 Jan 2020 12:36:20 +0000	[thread overview]
Message-ID: <b75dc3bc-b8b6-4f28-d9d0-f2e2a4e46a1e@suse.com> (raw)
In-Reply-To: <5cfff64e-0843-12ae-1ffc-37016552073d@suse.com>



On 2020/1/8 下午8:28, Nikolay Borisov wrote:
> 
> 
> On 8.01.20 г. 7:12 ч., Qu Wenruo wrote:
>> [BUG]
>> There are several different KASAN reports for balance + snapshot
>> workloads.
>> Involved call paths include:
>>
>>    should_ignore_root+0x54/0xb0 [btrfs]
>>    build_backref_tree+0x11af/0x2280 [btrfs]
>>    relocate_tree_blocks+0x391/0xb80 [btrfs]
>>    relocate_block_group+0x3e5/0xa00 [btrfs]
>>    btrfs_relocate_block_group+0x240/0x4d0 [btrfs]
>>    btrfs_relocate_chunk+0x53/0xf0 [btrfs]
>>    btrfs_balance+0xc91/0x1840 [btrfs]
>>    btrfs_ioctl_balance+0x416/0x4e0 [btrfs]
>>    btrfs_ioctl+0x8af/0x3e60 [btrfs]
>>    do_vfs_ioctl+0x831/0xb10
>>    ksys_ioctl+0x67/0x90
>>    __x64_sys_ioctl+0x43/0x50
>>    do_syscall_64+0x79/0xe0
>>    entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>>    create_reloc_root+0x9f/0x460 [btrfs]
>>    btrfs_reloc_post_snapshot+0xff/0x6c0 [btrfs]
>>    create_pending_snapshot+0xa9b/0x15f0 [btrfs]
>>    create_pending_snapshots+0x111/0x140 [btrfs]
>>    btrfs_commit_transaction+0x7a6/0x1360 [btrfs]
>>    btrfs_mksubvol+0x915/0x960 [btrfs]
>>    btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs]
>>    btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs]
>>    btrfs_ioctl+0x241b/0x3e60 [btrfs]
>>    do_vfs_ioctl+0x831/0xb10
>>    ksys_ioctl+0x67/0x90
>>    __x64_sys_ioctl+0x43/0x50
>>    do_syscall_64+0x79/0xe0
>>    entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>>    btrfs_reloc_pre_snapshot+0x85/0xc0 [btrfs]
>>    create_pending_snapshot+0x209/0x15f0 [btrfs]
>>    create_pending_snapshots+0x111/0x140 [btrfs]
>>    btrfs_commit_transaction+0x7a6/0x1360 [btrfs]
>>    btrfs_mksubvol+0x915/0x960 [btrfs]
>>    btrfs_ioctl_snap_create_transid+0x1d5/0x1e0 [btrfs]
>>    btrfs_ioctl_snap_create_v2+0x1d3/0x270 [btrfs]
>>    btrfs_ioctl+0x241b/0x3e60 [btrfs]
>>    do_vfs_ioctl+0x831/0xb10
>>    ksys_ioctl+0x67/0x90
>>    __x64_sys_ioctl+0x43/0x50
>>    do_syscall_64+0x79/0xe0
>>    entry_SYSCALL_64_after_hwframe+0x49/0xbe
>>
>> [CAUSE]
>> All these call sites are only relying on root->reloc_root, which can
>> undergo btrfs_drop_snapshot(), and since we don't have real refcount
> 
> what do you mean by "root->reloc_root can undergo btrfs_drop_snapshot" ?

I mean some caller got root->reloc_root and use it, while
root->reloc_root soon get dropped by btrfs_drop_snapshot().

> 
>> based protection to reloc roots, we can reach already dropped reloc
>> root, triggering KASAN.
> what's the relationship between not having a refcount protection and
> reaching reloc roots, perhaps you could expand the explanation?

If we had a proper refcount protection, we could wait until we're the
last holder of reloc_root before calling btrfs_drop_snapshot().

And to me, that should be the correct solution, while this patch is just
a quick and maybe dirty fix mostly for backport.

> 
>>
>> [FIX]
>> To avoid such access to unstable root->reloc_root, we should check
>> BTRFS_ROOT_DEAD_RELOC_TREE bit first.
>>
>> This patch introduces a new wrapper, have_reloc_root(), to do the proper
>> check for most callers who don't distinguish merged reloc tree and no
>> reloc tree.
>>
>> The only exception is should_ignore_root(), as merged reloc tree can be
>> ignored, while no reloc tree shouldn't.
>>
>> [CRITICAL SECTION ANALYSE]
>> Although test_bit()/set_bit()/clear_bit() doesn't imply a barrier, the
>> DEAD_RELOC_TREE bit has extra help from transaction as a higher level
>> barrier, the lifespan of root::reloc_root and DEAD_RELOC_TREE bit are:
>>
>> 	NULL: reloc_root is NULL	PTR: reloc_root is not NULL
>> 	0: DEAD_RELOC_ROOT bit not set	DEAD: DEAD_RELOC_ROOT bit set
>>
>> 	(NULL, 0)    Initial state		 __
>> 	  |					 /\ Section A
>>         btrfs_init_reloc_root()			 \/
>> 	  |				 	 __
>> 	(PTR, 0)     reloc_root initialized      /\
>>           |					 |
>> 	btrfs_update_reloc_root()		 |  Section B
>>           |					 |
>> 	(PTR, DEAD)  reloc_root has been merged  \/
>>           |					 __
>> 	=== btrfs_commit_transaction() ====================
>> 	  |					 /\
>> 	clean_dirty_subvols()			 |
>> 	  |					 |  Section C
>> 	(NULL, DEAD) reloc_root cleanup starts   \/
>>           |					 __
>> 	btrfs_drop_snapshot()			 /\
>> 	  |					 |  Section D
>> 	(NULL, 0)    Back to initial state	 \/
>>
>> Very have_reloc_root() or test_bit(DEAD_RELOC_ROOT) caller has hold a
> 
>  ^^ Perhaps you meant: Every caller of have_reloc_root or
> test_bit(DED_RELOC_ROOT) holds a transaction handle which ensures
> modifications in those function are limited to a single transaction?

Yep, I mean *E*very.
It looks I need to replace the switch of my 'e' key...

Thanks,
Qu

  reply	other threads:[~2020-01-08 12:41 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-08  5:12 [PATCH v2] btrfs: relocation: Fix KASAN reports caused by extended reloc tree lifespan Qu Wenruo
2020-01-08 12:28 ` Nikolay Borisov
2020-01-08 12:36   ` Qu WenRuo [this message]
2020-01-08 14:55 ` Josef Bacik
2020-01-08 15:03   ` Nikolay Borisov
2020-01-08 15:08     ` David Sterba
2020-01-08 15:11       ` David Sterba
2020-01-09  5:54         ` Qu Wenruo
2020-01-09 14:37           ` David Sterba
2020-01-10  0:21             ` Qu Wenruo
2020-01-10  0:58               ` Qu Wenruo
2020-01-13  4:41                 ` Qu Wenruo
2020-01-13 17:19                   ` David Sterba
2020-01-13 19:15                     ` Nikolay Borisov
2020-01-08 15:19     ` Josef Bacik
2020-01-09  0:11       ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b75dc3bc-b8b6-4f28-d9d0-f2e2a4e46a1e@suse.com \
    --to=wqu@suse.com \
    --cc=DSterba@suse.com \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nborisov@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.