Re: FAILED: patch "[PATCH] btrfs: fix race between block group removal and block group" failed to apply to 5.7-stable tree

From: Sasha Levin <sashal@kernel.org>
To: gregkh@linuxfoundation.org
Cc: fdmanana@suse.com, dsterba@suse.com, stable@vger.kernel.org
Subject: Re: FAILED: patch "[PATCH] btrfs: fix race between block group removal and block group" failed to apply to 5.7-stable tree
Date: Thu, 2 Jul 2020 13:03:16 -0400	[thread overview]
Message-ID: <20200702170316.GB2722994@sasha-vm> (raw)
In-Reply-To: <1593428502197204@kroah.com>

On Mon, Jun 29, 2020 at 01:01:42PM +0200, gregkh@linuxfoundation.org wrote:
>
>The patch below does not apply to the 5.7-stable tree.
>If someone wants it applied there, or to any other stable or longterm
>tree, then please email the backport, including the original git commit
>id to <stable@vger.kernel.org>.
>
>thanks,
>
>greg k-h
>
>------------------ original commit in Linus's tree ------------------
>
>From ffcb9d44572afbaf8fa6dbf5115bff6dab7b299e Mon Sep 17 00:00:00 2001
>From: Filipe Manana <fdmanana@suse.com>
>Date: Mon, 1 Jun 2020 19:12:19 +0100
>Subject: [PATCH] btrfs: fix race between block group removal and block group
> creation
>
>There is a race between block group removal and block group creation
>when the removal is completed by a task running fitrim or scrub. When
>this happens we end up failing the block group creation with an error
>-EEXIST since we attempt to insert a duplicate block group item key
>in the extent tree. That results in a transaction abort.
>
>The race happens like this:
>
>1) Task A is doing a fitrim, and at btrfs_trim_block_group() it freezes
>   block group X with btrfs_freeze_block_group() (until very recently
>   that was named btrfs_get_block_group_trimming());
>
>2) Task B starts removing block group X, either because it's now unused
>   or due to relocation for example. So at btrfs_remove_block_group(),
>   while holding the chunk mutex and the block group's lock, it sets
>   the 'removed' flag of the block group and it sets the local variable
>   'remove_em' to false, because the block group is currently frozen
>   (its 'frozen' counter is > 0, until very recently this counter was
>   named 'trimming');
>
>3) Task B unlocks the block group and the chunk mutex;
>
>4) Task A is done trimming the block group and unfreezes the block group
>   by calling btrfs_unfreeze_block_group() (until very recently this was
>   named btrfs_put_block_group_trimming()). In this function we lock the
>   block group and set the local variable 'cleanup' to true because we
>   were able to decrement the block group's 'frozen' counter down to 0 and
>   the flag 'removed' is set in the block group.
>
>   Since 'cleanup' is set to true, it locks the chunk mutex and removes
>   the extent mapping representing the block group from the mapping tree;
>
>5) Task C allocates a new block group Y and it picks up the logical address
>   that block group X had as the logical address for Y, because X was the
>   block group with the highest logical address and now the second block
>   group with the highest logical address, the last in the fs mapping tree,
>   ends at an offset corresponding to block group X's logical address (this
>   logical address selection is done at volumes.c:find_next_chunk()).
>
>   At this point the new block group Y does not have yet its item added
>   to the extent tree (nor the corresponding device extent items and
>   chunk item in the device and chunk trees). The new group Y is added to
>   the list of pending block groups in the transaction handle;
>
>6) Before task B proceeds to removing the block group item for block
>   group X from the extent tree, which has a key matching:
>
>   (X logical offset, BTRFS_BLOCK_GROUP_ITEM_KEY, length)
>
>   task C while ending its transaction handle calls
>   btrfs_create_pending_block_groups(), which finds block group Y and
>   tries to insert the block group item for Y into the exten tree, which
>   fails with -EEXIST since logical offset is the same that X had and
>   task B hasn't yet deleted the key from the extent tree.
>   This failure results in a transaction abort, producing a stack like
>   the following:
>
>------------[ cut here ]------------
> BTRFS: Transaction aborted (error -17)
> WARNING: CPU: 2 PID: 19736 at fs/btrfs/block-group.c:2074 btrfs_create_pending_block_groups+0x1eb/0x260 [btrfs]
> Modules linked in: btrfs blake2b_generic xor raid6_pq (...)
> CPU: 2 PID: 19736 Comm: fsstress Tainted: G        W         5.6.0-rc7-btrfs-next-58 #5
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
> RIP: 0010:btrfs_create_pending_block_groups+0x1eb/0x260 [btrfs]
> Code: ff ff ff 48 8b 55 50 f0 48 (...)
> RSP: 0018:ffffa4160a1c7d58 EFLAGS: 00010286
> RAX: 0000000000000000 RBX: ffff961581909d98 RCX: 0000000000000000
> RDX: 0000000000000001 RSI: ffffffffb3d63990 RDI: 0000000000000001
> RBP: ffff9614f3356a58 R08: 0000000000000000 R09: 0000000000000001
> R10: ffff9615b65b0040 R11: 0000000000000000 R12: ffff961581909c10
> R13: ffff9615b0c32000 R14: ffff9614f3356ab0 R15: ffff9614be779000
> FS:  00007f2ce2841e80(0000) GS:ffff9615bae00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000555f18780000 CR3: 0000000131d34005 CR4: 00000000003606e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> Call Trace:
>  btrfs_start_dirty_block_groups+0x398/0x4e0 [btrfs]
>  btrfs_commit_transaction+0xd0/0xc50 [btrfs]
>  ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs]
>  ? __ia32_sys_fdatasync+0x20/0x20
>  iterate_supers+0xdb/0x180
>  ksys_sync+0x60/0xb0
>  __ia32_sys_sync+0xa/0x10
>  do_syscall_64+0x5c/0x280
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> RIP: 0033:0x7f2ce1d4d5b7
> Code: 83 c4 08 48 3d 01 (...)
> RSP: 002b:00007ffd8b558c58 EFLAGS: 00000202 ORIG_RAX: 00000000000000a2
> RAX: ffffffffffffffda RBX: 000000000000002c RCX: 00007f2ce1d4d5b7
> RDX: 00000000ffffffff RSI: 00000000186ba07b RDI: 000000000000002c
> RBP: 0000555f17b9e520 R08: 0000000000000012 R09: 000000000000ce00
> R10: 0000000000000078 R11: 0000000000000202 R12: 0000000000000032
> R13: 0000000051eb851f R14: 00007ffd8b558cd0 R15: 0000555f1798ec20
> irq event stamp: 0
> hardirqs last  enabled at (0): [<0000000000000000>] 0x0
> hardirqs last disabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020
> softirqs last  enabled at (0): [<ffffffffb2abdedf>] copy_process+0x74f/0x2020
> softirqs last disabled at (0): [<0000000000000000>] 0x0
> ---[ end trace bd7c03622e0b0a9c ]---
>
>Fix this simply by making btrfs_remove_block_group() remove the block
>group's item from the extent tree before it flags the block group as
>removed. Also make the free space deletion from the free space tree
>before flagging the block group as removed, to avoid a similar race
>with adding and removing free space entries for the free space tree.
>
>Fixes: 04216820fe83d5 ("Btrfs: fix race between fs trimming and block group remove/allocation")
>CC: stable@vger.kernel.org # 4.4+
>Signed-off-by: Filipe Manana <fdmanana@suse.com>
>Signed-off-by: David Sterba <dsterba@suse.com>

I've backported it to 5.7 by also taking 7357623a7f4b ("btrfs:
block-group: refactor how we delete one block group item"), but older
branches require a more complex backport which I didn't attempt.

-- 
Thanks,
Sasha