[PATCH 0/3] btrfs: fix problem with balance recovery and snap delete

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] btrfs: fix problem with balance recovery and snap delete
@ 2022-02-16 19:06 Josef Bacik
  2022-02-16 19:06 ` [PATCH 1/3] btrfs: clean deleted snapshots on mount Josef Bacik
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Josef Bacik @ 2022-02-16 19:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

Hello,

I found a problem where we'll try to add refs to blocks that no longer have
references because we started a relocation while we had a half deleted snapshot
in the file system.  This only occurs on a clean mount with a pending snapshot
delete in place.  I saw this in production but lost the file system before I
could test my patch.  However I reproduced it locally with some error injection.
Without my patch we'd fail to run delayed refs with the warning in the commit
message in the first patch, with my patch we mount and can use the file system.

The other two patches are just cleanups that i noticed while messing with this
code.  Thanks,

Josef

Josef Bacik (3):
  btrfs: clean deleted snapshots on mount
  btrfs: use btrfs_fs_info for deleting snapshots and cleaner
  btrfs: pass btrfs_fs_info to btrfs_recover_relocation

 fs/btrfs/ctree.h       |  2 +-
 fs/btrfs/disk-io.c     | 22 +++++++++++++++++-----
 fs/btrfs/relocation.c  |  5 ++---
 fs/btrfs/transaction.c |  4 ++--
 fs/btrfs/transaction.h |  2 +-
 5 files changed, 23 insertions(+), 12 deletions(-)

-- 
2.26.3

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/3] btrfs: clean deleted snapshots on mount
  2022-02-16 19:06 [PATCH 0/3] btrfs: fix problem with balance recovery and snap delete Josef Bacik
@ 2022-02-16 19:06 ` Josef Bacik
  2022-02-17 11:53   ` Filipe Manana
  2022-02-16 19:06 ` [PATCH 2/3] btrfs: use btrfs_fs_info for deleting snapshots and cleaner Josef Bacik
  2022-02-16 19:06 ` [PATCH 3/3] btrfs: pass btrfs_fs_info to btrfs_recover_relocation Josef Bacik
  2 siblings, 1 reply; 5+ messages in thread
From: Josef Bacik @ 2022-02-16 19:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We hit a bug with a recovering relocation on mount for one of our file
systems in production.  I reproduced this locally by injecting errors
into snapshot delete with balance running at the same time.  This
presented as an error while looking up an extent item

------------[ cut here ]------------
WARNING: CPU: 5 PID: 1501 at fs/btrfs/extent-tree.c:866 lookup_inline_extent_backref+0x647/0x680
CPU: 5 PID: 1501 Comm: btrfs-balance Not tainted 5.16.0-rc8+ #8
RIP: 0010:lookup_inline_extent_backref+0x647/0x680
RSP: 0018:ffffae0a023ab960 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000
RBP: ffff943fd2a39b60 R08: 0000000000000000 R09: 0000000000000001
R10: 0001434088152de0 R11: 0000000000000000 R12: 0000000001d05000
R13: ffff943fd2a39b60 R14: ffff943fdb96f2a0 R15: ffff9442fc923000
FS:  0000000000000000(0000) GS:ffff944e9eb40000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1157b1fca8 CR3: 000000010f092000 CR4: 0000000000350ee0
Call Trace:
 <TASK>
 insert_inline_extent_backref+0x46/0xd0
 __btrfs_inc_extent_ref.isra.0+0x5f/0x200
 ? btrfs_merge_delayed_refs+0x164/0x190
 __btrfs_run_delayed_refs+0x561/0xfa0
 ? btrfs_search_slot+0x7b4/0xb30
 ? btrfs_update_root+0x1a9/0x2c0
 btrfs_run_delayed_refs+0x73/0x1f0
 ? btrfs_update_root+0x1a9/0x2c0
 btrfs_commit_transaction+0x50/0xa50
 ? btrfs_update_reloc_root+0x122/0x220
 prepare_to_merge+0x29f/0x320
 relocate_block_group+0x2b8/0x550
 btrfs_relocate_block_group+0x1a6/0x350
 btrfs_relocate_chunk+0x27/0xe0
 btrfs_balance+0x777/0xe60
 balance_kthread+0x35/0x50
 ? btrfs_balance+0xe60/0xe60
 kthread+0x16b/0x190
 ? set_kthread_struct+0x40/0x40
 ret_from_fork+0x22/0x30
 </TASK>
---[ end trace 7ebc95131709d2b0 ]---

Normally snapshot deletion and relocation are excluded from running at
the same time by the fs_info->cleaner_mutex.  However if we had a
pending balance waiting to get the ->cleaner_mutex, and a snapshot
deletion was running, and then the box crashed, we would come up in a
state where we have a half deleted snapshot.

Again, in the normal case the snapshot deletion needs to complete before
relocation can start, but in this case relocation could very well start
before the snapshot deletion completes, as we simply add the root to the
dead roots list and wait for the next time the cleaner runs to clean up
the snapshot.

Fix this by simply cleaning all of the deleted snapshots at mount time,
before we start messing with relocation.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/disk-io.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index b6a81c39d5f4..ae8c201070f2 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3380,6 +3380,19 @@ int btrfs_start_pre_rw_mount(struct btrfs_fs_info *fs_info)
 	up_read(&fs_info->cleanup_work_sem);
 
 	mutex_lock(&fs_info->cleaner_mutex);
+	/*
+	 * If there are any DEAD_TREE's we must recover them here, otherwise we
+	 * could re-start relocation and attempt to relocate blocks that are
+	 * within a half-deleted snapshot.  Under normal operations we can't run
+	 * relocation and snapshot delete at the same time, however if we had a
+	 * snapshot deletion happening prior to this mount there's no way to
+	 * guarantee that the deletion will start before we re-start (or a user
+	 * starts) the relocation.  So do the cleanup here in order to prevent
+	 * problems.
+	 */
+	while (btrfs_clean_one_deleted_snapshot(fs_info->tree_root))
+		cond_resched();
+
 	ret = btrfs_recover_relocation(fs_info->tree_root);
 	mutex_unlock(&fs_info->cleaner_mutex);
 	if (ret < 0) {
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/3] btrfs: use btrfs_fs_info for deleting snapshots and cleaner
  2022-02-16 19:06 [PATCH 0/3] btrfs: fix problem with balance recovery and snap delete Josef Bacik
  2022-02-16 19:06 ` [PATCH 1/3] btrfs: clean deleted snapshots on mount Josef Bacik
@ 2022-02-16 19:06 ` Josef Bacik
  2022-02-16 19:06 ` [PATCH 3/3] btrfs: pass btrfs_fs_info to btrfs_recover_relocation Josef Bacik
  2 siblings, 0 replies; 5+ messages in thread
From: Josef Bacik @ 2022-02-16 19:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We're passing a root around here, but we only really need the fs_info,
so fix up btrfs_clean_one_deleted_snapshot() to take an fs_info instead,
and then fix up all the callers appropriately.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/disk-io.c     | 9 ++++-----
 fs/btrfs/transaction.c | 4 ++--
 fs/btrfs/transaction.h | 2 +-
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index ae8c201070f2..4303b996ad2f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -1947,8 +1947,7 @@ static void end_workqueue_fn(struct btrfs_work *work)
 
 static int cleaner_kthread(void *arg)
 {
-	struct btrfs_root *root = arg;
-	struct btrfs_fs_info *fs_info = root->fs_info;
+	struct btrfs_fs_info *fs_info = (struct btrfs_fs_info *)arg;
 	int again;
 
 	while (1) {
@@ -1981,7 +1980,7 @@ static int cleaner_kthread(void *arg)
 
 		btrfs_run_delayed_iputs(fs_info);
 
-		again = btrfs_clean_one_deleted_snapshot(root);
+		again = btrfs_clean_one_deleted_snapshot(fs_info);
 		mutex_unlock(&fs_info->cleaner_mutex);
 
 		/*
@@ -3390,7 +3389,7 @@ int btrfs_start_pre_rw_mount(struct btrfs_fs_info *fs_info)
 	 * starts) the relocation.  So do the cleanup here in order to prevent
 	 * problems.
 	 */
-	while (btrfs_clean_one_deleted_snapshot(fs_info->tree_root))
+	while (btrfs_clean_one_deleted_snapshot(fs_info))
 		cond_resched();
 
 	ret = btrfs_recover_relocation(fs_info->tree_root);
@@ -3819,7 +3818,7 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_device
 		goto fail_sysfs;
 	}
 
-	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, tree_root,
+	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, fs_info,
 					       "btrfs-cleaner");
 	if (IS_ERR(fs_info->cleaner_kthread))
 		goto fail_sysfs;
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index c6e550fa4d55..b467570ae58b 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -2444,10 +2444,10 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans)
  * because btrfs_commit_super will poke cleaner thread and it will process it a
  * few seconds later.
  */
-int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root)
+int btrfs_clean_one_deleted_snapshot(struct btrfs_fs_info *fs_info)
 {
+	struct btrfs_root *root;
 	int ret;
-	struct btrfs_fs_info *fs_info = root->fs_info;
 
 	spin_lock(&fs_info->trans_lock);
 	if (list_empty(&fs_info->dead_roots)) {
diff --git a/fs/btrfs/transaction.h b/fs/btrfs/transaction.h
index 9402d8d94484..399efc674f24 100644
--- a/fs/btrfs/transaction.h
+++ b/fs/btrfs/transaction.h
@@ -216,7 +216,7 @@ int btrfs_wait_for_commit(struct btrfs_fs_info *fs_info, u64 transid);
 
 void btrfs_add_dead_root(struct btrfs_root *root);
 int btrfs_defrag_root(struct btrfs_root *root);
-int btrfs_clean_one_deleted_snapshot(struct btrfs_root *root);
+int btrfs_clean_one_deleted_snapshot(struct btrfs_fs_info *fs_info);
 int btrfs_commit_transaction(struct btrfs_trans_handle *trans);
 void btrfs_commit_transaction_async(struct btrfs_trans_handle *trans);
 int btrfs_end_transaction_throttle(struct btrfs_trans_handle *trans);
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/3] btrfs: pass btrfs_fs_info to btrfs_recover_relocation
  2022-02-16 19:06 [PATCH 0/3] btrfs: fix problem with balance recovery and snap delete Josef Bacik
  2022-02-16 19:06 ` [PATCH 1/3] btrfs: clean deleted snapshots on mount Josef Bacik
  2022-02-16 19:06 ` [PATCH 2/3] btrfs: use btrfs_fs_info for deleting snapshots and cleaner Josef Bacik
@ 2022-02-16 19:06 ` Josef Bacik
  2 siblings, 0 replies; 5+ messages in thread
From: Josef Bacik @ 2022-02-16 19:06 UTC (permalink / raw)
  To: linux-btrfs, kernel-team

We don't need a root here, we just need the btrfs_fs_info, we can just
get the specific roots we need from fs_info.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/ctree.h      | 2 +-
 fs/btrfs/disk-io.c    | 2 +-
 fs/btrfs/relocation.c | 5 ++---
 3 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index f870d893d13b..0740d434bee5 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -3832,7 +3832,7 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans,
 			  struct btrfs_root *root);
 int btrfs_update_reloc_root(struct btrfs_trans_handle *trans,
 			    struct btrfs_root *root);
-int btrfs_recover_relocation(struct btrfs_root *root);
+int btrfs_recover_relocation(struct btrfs_fs_info *fs_info);
 int btrfs_reloc_clone_csums(struct btrfs_inode *inode, u64 file_pos, u64 len);
 int btrfs_reloc_cow_block(struct btrfs_trans_handle *trans,
 			  struct btrfs_root *root, struct extent_buffer *buf,
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 4303b996ad2f..08c260a1e948 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3392,7 +3392,7 @@ int btrfs_start_pre_rw_mount(struct btrfs_fs_info *fs_info)
 	while (btrfs_clean_one_deleted_snapshot(fs_info))
 		cond_resched();
 
-	ret = btrfs_recover_relocation(fs_info->tree_root);
+	ret = btrfs_recover_relocation(fs_info);
 	mutex_unlock(&fs_info->cleaner_mutex);
 	if (ret < 0) {
 		btrfs_warn(fs_info, "failed to recover relocation: %d", ret);
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index f5465197996d..b51a11e72b64 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -4110,9 +4110,8 @@ static noinline_for_stack int mark_garbage_root(struct btrfs_root *root)
  * this function resumes merging reloc trees with corresponding fs trees.
  * this is important for keeping the sharing of tree blocks
  */
-int btrfs_recover_relocation(struct btrfs_root *root)
+int btrfs_recover_relocation(struct btrfs_fs_info *fs_info)
 {
-	struct btrfs_fs_info *fs_info = root->fs_info;
 	LIST_HEAD(reloc_roots);
 	struct btrfs_key key;
 	struct btrfs_root *fs_root;
@@ -4153,7 +4152,7 @@ int btrfs_recover_relocation(struct btrfs_root *root)
 		    key.type != BTRFS_ROOT_ITEM_KEY)
 			break;
 
-		reloc_root = btrfs_read_tree_root(root, &key);
+		reloc_root = btrfs_read_tree_root(fs_info->tree_root, &key);
 		if (IS_ERR(reloc_root)) {
 			err = PTR_ERR(reloc_root);
 			goto out;
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/3] btrfs: clean deleted snapshots on mount
  2022-02-16 19:06 ` [PATCH 1/3] btrfs: clean deleted snapshots on mount Josef Bacik
@ 2022-02-17 11:53   ` Filipe Manana
  0 siblings, 0 replies; 5+ messages in thread
From: Filipe Manana @ 2022-02-17 11:53 UTC (permalink / raw)
  To: Josef Bacik; +Cc: linux-btrfs, kernel-team

On Wed, Feb 16, 2022 at 02:06:53PM -0500, Josef Bacik wrote:
> We hit a bug with a recovering relocation on mount for one of our file
> systems in production.  I reproduced this locally by injecting errors
> into snapshot delete with balance running at the same time.  This
> presented as an error while looking up an extent item
> 
> ------------[ cut here ]------------
> WARNING: CPU: 5 PID: 1501 at fs/btrfs/extent-tree.c:866 lookup_inline_extent_backref+0x647/0x680
> CPU: 5 PID: 1501 Comm: btrfs-balance Not tainted 5.16.0-rc8+ #8
> RIP: 0010:lookup_inline_extent_backref+0x647/0x680
> RSP: 0018:ffffae0a023ab960 EFLAGS: 00010202
> RAX: 0000000000000001 RBX: 0000000000000000 RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 000000000000000c RDI: 0000000000000000
> RBP: ffff943fd2a39b60 R08: 0000000000000000 R09: 0000000000000001
> R10: 0001434088152de0 R11: 0000000000000000 R12: 0000000001d05000
> R13: ffff943fd2a39b60 R14: ffff943fdb96f2a0 R15: ffff9442fc923000
> FS:  0000000000000000(0000) GS:ffff944e9eb40000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f1157b1fca8 CR3: 000000010f092000 CR4: 0000000000350ee0
> Call Trace:
>  <TASK>
>  insert_inline_extent_backref+0x46/0xd0
>  __btrfs_inc_extent_ref.isra.0+0x5f/0x200
>  ? btrfs_merge_delayed_refs+0x164/0x190
>  __btrfs_run_delayed_refs+0x561/0xfa0
>  ? btrfs_search_slot+0x7b4/0xb30
>  ? btrfs_update_root+0x1a9/0x2c0
>  btrfs_run_delayed_refs+0x73/0x1f0
>  ? btrfs_update_root+0x1a9/0x2c0
>  btrfs_commit_transaction+0x50/0xa50
>  ? btrfs_update_reloc_root+0x122/0x220
>  prepare_to_merge+0x29f/0x320
>  relocate_block_group+0x2b8/0x550
>  btrfs_relocate_block_group+0x1a6/0x350
>  btrfs_relocate_chunk+0x27/0xe0
>  btrfs_balance+0x777/0xe60
>  balance_kthread+0x35/0x50
>  ? btrfs_balance+0xe60/0xe60
>  kthread+0x16b/0x190
>  ? set_kthread_struct+0x40/0x40
>  ret_from_fork+0x22/0x30
>  </TASK>
> ---[ end trace 7ebc95131709d2b0 ]---
> 
> Normally snapshot deletion and relocation are excluded from running at
> the same time by the fs_info->cleaner_mutex.  However if we had a
> pending balance waiting to get the ->cleaner_mutex, and a snapshot
> deletion was running, and then the box crashed, we would come up in a
> state where we have a half deleted snapshot.
> 
> Again, in the normal case the snapshot deletion needs to complete before
> relocation can start, but in this case relocation could very well start
> before the snapshot deletion completes, as we simply add the root to the
> dead roots list and wait for the next time the cleaner runs to clean up
> the snapshot.
> 
> Fix this by simply cleaning all of the deleted snapshots at mount time,
> before we start messing with relocation.
> 
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>  fs/btrfs/disk-io.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index b6a81c39d5f4..ae8c201070f2 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -3380,6 +3380,19 @@ int btrfs_start_pre_rw_mount(struct btrfs_fs_info *fs_info)
>  	up_read(&fs_info->cleanup_work_sem);
>  
>  	mutex_lock(&fs_info->cleaner_mutex);
> +	/*
> +	 * If there are any DEAD_TREE's we must recover them here, otherwise we
> +	 * could re-start relocation and attempt to relocate blocks that are
> +	 * within a half-deleted snapshot.  Under normal operations we can't run
> +	 * relocation and snapshot delete at the same time, however if we had a
> +	 * snapshot deletion happening prior to this mount there's no way to
> +	 * guarantee that the deletion will start before we re-start (or a user
> +	 * starts) the relocation.  So do the cleanup here in order to prevent
> +	 * problems.
> +	 */
> +	while (btrfs_clean_one_deleted_snapshot(fs_info->tree_root))
> +		cond_resched();

Unless I missed something subtle, this deletes the snapshots synchronously at
mount time, meaning that if we have many and/or large snapshots to delete,
this will slowdown a mount for a long time. And this will happen even if
the above scenario did not happen.

We already have people often report slow mount times (several minutes) likely
due to large extent trees and many block groups, so synchronously deleting
snapshots will make things a lot more slower. Such an increase in mount time
is bad for user experience.

Can we do this asynchronously, still ensuring that balance is not resumed before
all snapshots are deleted? Or at the very least do this synchonously only if we
a have relocation to resume - as this is right now, it will do the snapshot
deletion even if we have no relocation to resume.

Thanks.


> +
>  	ret = btrfs_recover_relocation(fs_info->tree_root);
>  	mutex_unlock(&fs_info->cleaner_mutex);
>  	if (ret < 0) {
> -- 
> 2.26.3
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-02-17 11:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-16 19:06 [PATCH 0/3] btrfs: fix problem with balance recovery and snap delete Josef Bacik
2022-02-16 19:06 ` [PATCH 1/3] btrfs: clean deleted snapshots on mount Josef Bacik
2022-02-17 11:53   ` Filipe Manana
2022-02-16 19:06 ` [PATCH 2/3] btrfs: use btrfs_fs_info for deleting snapshots and cleaner Josef Bacik
2022-02-16 19:06 ` [PATCH 3/3] btrfs: pass btrfs_fs_info to btrfs_recover_relocation Josef Bacik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.