linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] btrfs: fix root drop key inconsistent when drop subvolume/snapshot fails
@ 2021-08-04  2:49 robbieko
  2021-08-04 11:56 ` Filipe Manana
  0 siblings, 1 reply; 4+ messages in thread
From: robbieko @ 2021-08-04  2:49 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Robbie Ko

From: Robbie Ko <robbieko@synology.com>

When walk down/up tree fails, we did not aborting the transaction,
nor did modify the root drop key, but the refs of some tree blocks
may have been removed in the transaction.

Therefore when we retry to delete the subvolume in the future,
we will fail due to the fact that some references were deleted
in the previous attempt.

------------[ cut here ]------------
WARNING: at fs/btrfs/extent-tree.c:898 btrfs_lookup_extent_info+0x40a/0x4c0 [btrfs]()
CPU: 2 PID: 11618 Comm: btrfs-cleaner Tainted: P
Hardware name: Synology Inc. RS3617xs Series/Type2 - Board Product Name1, BIOS M.017 2019/10/16
ffffffff814c2246 ffffffff81036536 ffff88024a911d08 ffff880262de45b0
ffff8802448b5f20 ffff88024a9c1ad8 0000000000000000 ffffffffa08eb05a
000008f84e72c000 0000000000000000 0000000000000001 0000000100000000
Call Trace:
[<ffffffff814c2246>] ? dump_stack+0xc/0x15
[<ffffffff81036536>] ? warn_slowpath_common+0x56/0x70
[<ffffffffa08eb05a>] ? btrfs_lookup_extent_info+0x40a/0x4c0 [btrfs]
[<ffffffffa08ee558>] ? do_walk_down+0x128/0x750 [btrfs]
[<ffffffffa08ebab4>] ? walk_down_proc+0x314/0x330 [btrfs]
[<ffffffffa08eec42>] ? walk_down_tree+0xc2/0xf0 [btrfs]
[<ffffffffa08f2bce>] ? btrfs_drop_snapshot+0x40e/0x9a0 [btrfs]
[<ffffffffa09096db>] ? btrfs_clean_one_deleted_snapshot+0xab/0xe0 [btrfs]
[<ffffffffa08fe970>] ? cleaner_kthread+0x280/0x320 [btrfs]
[<ffffffff81052eaf>] ? kthread+0xaf/0xc0
[<ffffffff81052e00>] ? kthread_create_on_node+0x110/0x110
[<ffffffff814c8c0d>] ? ret_from_fork+0x5d/0xb0
[<ffffffff81052e00>] ? kthread_create_on_node+0x110/0x110
------------[ end trace ]-----------
BTRFS error (device dm-1): Missing references.
BTRFS: error (device dm-1) in btrfs_drop_snapshot:9557: errno=-5 IO failure

By not aborting the transaction, every future attempt to delete the
subvolume fails and we end up never freeing all the extents used by
the subvolume/snapshot.

By aborting the transaction we have a least the possibility to
succeeded after unmounting and mounting again the filesystem.

So we fix this problem by aborting the transaction when the walk down/up
tree fails, which is a safer approach.

In addition, we also added the initialization of drop_progress and
drop_level.

Signed-off-by: Robbie Ko <robbieko@synology.com>
---
 fs/btrfs/extent-tree.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 268ce58d4569..032a257fdb65 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -5539,10 +5539,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
 		path->locks[level] = BTRFS_WRITE_LOCK;
 		memset(&wc->update_progress, 0,
 		       sizeof(wc->update_progress));
+		memset(&wc->drop_progress, 0,
+		       sizeof(wc->drop_progress));
 	} else {
 		btrfs_disk_key_to_cpu(&key, &root_item->drop_progress);
 		memcpy(&wc->update_progress, &key,
 		       sizeof(wc->update_progress));
+		memcpy(&wc->drop_progress, &key,
+		       sizeof(wc->drop_progress));
 
 		level = btrfs_root_drop_level(root_item);
 		BUG_ON(level == 0);
@@ -5588,6 +5592,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
 
 	wc->restarted = test_bit(BTRFS_ROOT_DEAD_TREE, &root->state);
 	wc->level = level;
+	wc->drop_level = level;
 	wc->shared_level = -1;
 	wc->stage = DROP_REFERENCE;
 	wc->update_ref = update_ref;
@@ -5659,8 +5664,10 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
 		}
 	}
 	btrfs_release_path(path);
-	if (err)
+	if (err) {
+		btrfs_abort_transaction(trans, err);
 		goto out_end_trans;
+	}
 
 	ret = btrfs_del_root(trans, &root->root_key);
 	if (ret) {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] btrfs: fix root drop key inconsistent when drop subvolume/snapshot fails
  2021-08-04  2:49 [PATCH v2] btrfs: fix root drop key inconsistent when drop subvolume/snapshot fails robbieko
@ 2021-08-04 11:56 ` Filipe Manana
  2021-08-05  3:02   ` robbieko
  0 siblings, 1 reply; 4+ messages in thread
From: Filipe Manana @ 2021-08-04 11:56 UTC (permalink / raw)
  To: robbieko; +Cc: linux-btrfs

On Wed, Aug 4, 2021 at 4:45 AM robbieko <robbieko@synology.com> wrote:
>
> From: Robbie Ko <robbieko@synology.com>
>
> When walk down/up tree fails, we did not aborting the transaction,
> nor did modify the root drop key, but the refs of some tree blocks
> may have been removed in the transaction.
>
> Therefore when we retry to delete the subvolume in the future,
> we will fail due to the fact that some references were deleted
> in the previous attempt.
>
> ------------[ cut here ]------------
> WARNING: at fs/btrfs/extent-tree.c:898 btrfs_lookup_extent_info+0x40a/0x4c0 [btrfs]()
> CPU: 2 PID: 11618 Comm: btrfs-cleaner Tainted: P
> Hardware name: Synology Inc. RS3617xs Series/Type2 - Board Product Name1, BIOS M.017 2019/10/16
> ffffffff814c2246 ffffffff81036536 ffff88024a911d08 ffff880262de45b0
> ffff8802448b5f20 ffff88024a9c1ad8 0000000000000000 ffffffffa08eb05a
> 000008f84e72c000 0000000000000000 0000000000000001 0000000100000000
> Call Trace:
> [<ffffffff814c2246>] ? dump_stack+0xc/0x15
> [<ffffffff81036536>] ? warn_slowpath_common+0x56/0x70
> [<ffffffffa08eb05a>] ? btrfs_lookup_extent_info+0x40a/0x4c0 [btrfs]
> [<ffffffffa08ee558>] ? do_walk_down+0x128/0x750 [btrfs]
> [<ffffffffa08ebab4>] ? walk_down_proc+0x314/0x330 [btrfs]
> [<ffffffffa08eec42>] ? walk_down_tree+0xc2/0xf0 [btrfs]
> [<ffffffffa08f2bce>] ? btrfs_drop_snapshot+0x40e/0x9a0 [btrfs]
> [<ffffffffa09096db>] ? btrfs_clean_one_deleted_snapshot+0xab/0xe0 [btrfs]
> [<ffffffffa08fe970>] ? cleaner_kthread+0x280/0x320 [btrfs]
> [<ffffffff81052eaf>] ? kthread+0xaf/0xc0
> [<ffffffff81052e00>] ? kthread_create_on_node+0x110/0x110
> [<ffffffff814c8c0d>] ? ret_from_fork+0x5d/0xb0
> [<ffffffff81052e00>] ? kthread_create_on_node+0x110/0x110
> ------------[ end trace ]-----------
> BTRFS error (device dm-1): Missing references.
> BTRFS: error (device dm-1) in btrfs_drop_snapshot:9557: errno=-5 IO failure
>
> By not aborting the transaction, every future attempt to delete the
> subvolume fails and we end up never freeing all the extents used by
> the subvolume/snapshot.
>
> By aborting the transaction we have a least the possibility to
> succeeded after unmounting and mounting again the filesystem.
>
> So we fix this problem by aborting the transaction when the walk down/up
> tree fails, which is a safer approach.

Ok, this still misses the explanation on why we can't solve the
problem by simply updating the drop_progress and drop_level when we
get an error from walk_up_tree() or walk_down_tree().

>
> In addition, we also added the initialization of drop_progress and
> drop_level.

So, this about initializing drop_progress and drop_level seems like a
separate change.
How is this related to the original problem?

It completely lacks an explanation about why it's needed, how it
relates to the original problem.

If it's unrelated, then it should go into a separate patch with a
proper explanation in its changelog.

>
> Signed-off-by: Robbie Ko <robbieko@synology.com>
> ---
>  fs/btrfs/extent-tree.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 268ce58d4569..032a257fdb65 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -5539,10 +5539,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
>                 path->locks[level] = BTRFS_WRITE_LOCK;
>                 memset(&wc->update_progress, 0,
>                        sizeof(wc->update_progress));
> +               memset(&wc->drop_progress, 0,
> +                      sizeof(wc->drop_progress));

I don't see why this is needed.
wc was allocated with kzalloc(), how can wc->drop_progress not be
already zeroed here?

Also, walk_up_tree() and walk_down_tree() never read from
wc->drop_progress. We use wc->drop_progress only for storing after
walk up/walk down and then copy it into the root item for persistence.

>         } else {
>                 btrfs_disk_key_to_cpu(&key, &root_item->drop_progress);
>                 memcpy(&wc->update_progress, &key,
>                        sizeof(wc->update_progress));
> +               memcpy(&wc->drop_progress, &key,
> +                      sizeof(wc->drop_progress));

Why is this needed?
Except when starting the subvolume/snapshot drop, before entering the
loop with the walk up and down logic, we never read wc->drop_progress.

wc->drop_progress is meant only for storing into the root_item after
each walk up/walk down iteration, we never read from it during the
walks.

>
>                 level = btrfs_root_drop_level(root_item);
>                 BUG_ON(level == 0);
> @@ -5588,6 +5592,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
>
>         wc->restarted = test_bit(BTRFS_ROOT_DEAD_TREE, &root->state);
>         wc->level = level;
> +       wc->drop_level = level;

I don't understand why this is needed too.
We never read from wc->drop_level during the walk up and walk down,
it's only used to store a level during/after each walk up and walk
down iteration, and if the walks succeeded we
set wc->drop_level to wc->level and then copy it into the root item.

So please provide an explanation on why these initializations are
needed and how they relate to the original problem.
If they are not related to the original problem, then move them into a
separate patch, with all the proper explanations in the changelog.

Thanks.


>         wc->shared_level = -1;
>         wc->stage = DROP_REFERENCE;
>         wc->update_ref = update_ref;
> @@ -5659,8 +5664,10 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
>                 }
>         }
>         btrfs_release_path(path);
> -       if (err)
> +       if (err) {
> +               btrfs_abort_transaction(trans, err);
>                 goto out_end_trans;
> +       }
>
>         ret = btrfs_del_root(trans, &root->root_key);
>         if (ret) {
> --
> 2.17.1
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] btrfs: fix root drop key inconsistent when drop subvolume/snapshot fails
  2021-08-04 11:56 ` Filipe Manana
@ 2021-08-05  3:02   ` robbieko
  2021-08-05  8:57     ` Filipe Manana
  0 siblings, 1 reply; 4+ messages in thread
From: robbieko @ 2021-08-05  3:02 UTC (permalink / raw)
  To: fdmanana; +Cc: linux-btrfs

Filipe Manana 於 2021/8/4 下午 07:56 寫道:
> On Wed, Aug 4, 2021 at 4:45 AM robbieko <robbieko@synology.com> wrote:
>> From: Robbie Ko <robbieko@synology.com>
>>
>> When walk down/up tree fails, we did not aborting the transaction,
>> nor did modify the root drop key, but the refs of some tree blocks
>> may have been removed in the transaction.
>>
>> Therefore when we retry to delete the subvolume in the future,
>> we will fail due to the fact that some references were deleted
>> in the previous attempt.
>>
>> ------------[ cut here ]------------
>> WARNING: at fs/btrfs/extent-tree.c:898 btrfs_lookup_extent_info+0x40a/0x4c0 [btrfs]()
>> CPU: 2 PID: 11618 Comm: btrfs-cleaner Tainted: P
>> Hardware name: Synology Inc. RS3617xs Series/Type2 - Board Product Name1, BIOS M.017 2019/10/16
>> ffffffff814c2246 ffffffff81036536 ffff88024a911d08 ffff880262de45b0
>> ffff8802448b5f20 ffff88024a9c1ad8 0000000000000000 ffffffffa08eb05a
>> 000008f84e72c000 0000000000000000 0000000000000001 0000000100000000
>> Call Trace:
>> [<ffffffff814c2246>] ? dump_stack+0xc/0x15
>> [<ffffffff81036536>] ? warn_slowpath_common+0x56/0x70
>> [<ffffffffa08eb05a>] ? btrfs_lookup_extent_info+0x40a/0x4c0 [btrfs]
>> [<ffffffffa08ee558>] ? do_walk_down+0x128/0x750 [btrfs]
>> [<ffffffffa08ebab4>] ? walk_down_proc+0x314/0x330 [btrfs]
>> [<ffffffffa08eec42>] ? walk_down_tree+0xc2/0xf0 [btrfs]
>> [<ffffffffa08f2bce>] ? btrfs_drop_snapshot+0x40e/0x9a0 [btrfs]
>> [<ffffffffa09096db>] ? btrfs_clean_one_deleted_snapshot+0xab/0xe0 [btrfs]
>> [<ffffffffa08fe970>] ? cleaner_kthread+0x280/0x320 [btrfs]
>> [<ffffffff81052eaf>] ? kthread+0xaf/0xc0
>> [<ffffffff81052e00>] ? kthread_create_on_node+0x110/0x110
>> [<ffffffff814c8c0d>] ? ret_from_fork+0x5d/0xb0
>> [<ffffffff81052e00>] ? kthread_create_on_node+0x110/0x110
>> ------------[ end trace ]-----------
>> BTRFS error (device dm-1): Missing references.
>> BTRFS: error (device dm-1) in btrfs_drop_snapshot:9557: errno=-5 IO failure
>>
>> By not aborting the transaction, every future attempt to delete the
>> subvolume fails and we end up never freeing all the extents used by
>> the subvolume/snapshot.
>>
>> By aborting the transaction we have a least the possibility to
>> succeeded after unmounting and mounting again the filesystem.
>>
>> So we fix this problem by aborting the transaction when the walk down/up
>> tree fails, which is a safer approach.
> Ok, this still misses the explanation on why we can't solve the
> problem by simply updating the drop_progress and drop_level when we
> get an error from walk_up_tree() or walk_down_tree().

I don't fully understand the walk up/down tree part, so I am not sure if 
drop_progress and drop_level are correct. So it’s safer for me to abort 
the transaction.


>> In addition, we also added the initialization of drop_progress and
>> drop_level.
> So, this about initializing drop_progress and drop_level seems like a
> separate change.
> How is this related to the original problem?
>
> It completely lacks an explanation about why it's needed, how it
> relates to the original problem.
>
> If it's unrelated, then it should go into a separate patch with a
> proper explanation in its changelog.

This part is really not related to the original problem, but I think 
there is a potential risk, I will separate it into another patch.


>> Signed-off-by: Robbie Ko <robbieko@synology.com>
>> ---
>>   fs/btrfs/extent-tree.c | 9 ++++++++-
>>   1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 268ce58d4569..032a257fdb65 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -5539,10 +5539,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
>>                  path->locks[level] = BTRFS_WRITE_LOCK;
>>                  memset(&wc->update_progress, 0,
>>                         sizeof(wc->update_progress));
>> +               memset(&wc->drop_progress, 0,
>> +                      sizeof(wc->drop_progress));
> I don't see why this is needed.
> wc was allocated with kzalloc(), how can wc->drop_progress not be
> already zeroed here?

Yes, it's should all be zero.
But I think the wording of update_progress should be consistent.

>
> Also, walk_up_tree() and walk_down_tree() never read from
> wc->drop_progress. We use wc->drop_progress only for storing after
> walk up/walk down and then copy it into the root item for persistence.
>
>>          } else {
>>                  btrfs_disk_key_to_cpu(&key, &root_item->drop_progress);
>>                  memcpy(&wc->update_progress, &key,
>>                         sizeof(wc->update_progress));
>> +               memcpy(&wc->drop_progress, &key,
>> +                      sizeof(wc->drop_progress));
> Why is this needed?
> Except when starting the subvolume/snapshot drop, before entering the
> loop with the walk up and down logic, we never read wc->drop_progress.
>
> wc->drop_progress is meant only for storing into the root_item after
> each walk up/walk down iteration, we never read from it during the
> walks.

Indeed, drop_progress is only used to storing into the root item after 
each walk up/down iteration. But this value is not updated every time, 
when the stage is UPDATE_BACKREF, we may update 0 into to the root item, 
resulting in inconsistent drop_progress.

So I think we must initialize drop_progress and drop_level.

>>                  level = btrfs_root_drop_level(root_item);
>>                  BUG_ON(level == 0);
>> @@ -5588,6 +5592,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
>>
>>          wc->restarted = test_bit(BTRFS_ROOT_DEAD_TREE, &root->state);
>>          wc->level = level;
>> +       wc->drop_level = level;
> I don't understand why this is needed too.
> We never read from wc->drop_level during the walk up and walk down,
> it's only used to store a level during/after each walk up and walk
> down iteration, and if the walks succeeded we
> set wc->drop_level to wc->level and then copy it into the root item.

This is not always true.
If the stage is UPDATE_BACKREF, drop_level and drop_progress will not be updated,
so the root item may write 0, resulting in inconsistent drop_progress.


>
> So please provide an explanation on why these initializations are
> needed and how they relate to the original problem.
> If they are not related to the original problem, then move them into a
> separate patch, with all the proper explanations in the changelog.
>
> Thanks.
>
>
>>          wc->shared_level = -1;
>>          wc->stage = DROP_REFERENCE;
>>          wc->update_ref = update_ref;
>> @@ -5659,8 +5664,10 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
>>                  }
>>          }
>>          btrfs_release_path(path);
>> -       if (err)
>> +       if (err) {
>> +               btrfs_abort_transaction(trans, err);
>>                  goto out_end_trans;
>> +       }
>>
>>          ret = btrfs_del_root(trans, &root->root_key);
>>          if (ret) {
>> --
>> 2.17.1
>>
>


-- 
Avast 防毒軟體已檢查此封電子郵件的病毒。
https://www.avast.com/antivirus


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v2] btrfs: fix root drop key inconsistent when drop subvolume/snapshot fails
  2021-08-05  3:02   ` robbieko
@ 2021-08-05  8:57     ` Filipe Manana
  0 siblings, 0 replies; 4+ messages in thread
From: Filipe Manana @ 2021-08-05  8:57 UTC (permalink / raw)
  To: robbieko; +Cc: linux-btrfs

On Thu, Aug 5, 2021 at 4:00 AM robbieko <robbieko@synology.com> wrote:
>
> Filipe Manana 於 2021/8/4 下午 07:56 寫道:
> > On Wed, Aug 4, 2021 at 4:45 AM robbieko <robbieko@synology.com> wrote:
> >> From: Robbie Ko <robbieko@synology.com>
> >>
> >> When walk down/up tree fails, we did not aborting the transaction,
> >> nor did modify the root drop key, but the refs of some tree blocks
> >> may have been removed in the transaction.
> >>
> >> Therefore when we retry to delete the subvolume in the future,
> >> we will fail due to the fact that some references were deleted
> >> in the previous attempt.
> >>
> >> ------------[ cut here ]------------
> >> WARNING: at fs/btrfs/extent-tree.c:898 btrfs_lookup_extent_info+0x40a/0x4c0 [btrfs]()
> >> CPU: 2 PID: 11618 Comm: btrfs-cleaner Tainted: P
> >> Hardware name: Synology Inc. RS3617xs Series/Type2 - Board Product Name1, BIOS M.017 2019/10/16
> >> ffffffff814c2246 ffffffff81036536 ffff88024a911d08 ffff880262de45b0
> >> ffff8802448b5f20 ffff88024a9c1ad8 0000000000000000 ffffffffa08eb05a
> >> 000008f84e72c000 0000000000000000 0000000000000001 0000000100000000
> >> Call Trace:
> >> [<ffffffff814c2246>] ? dump_stack+0xc/0x15
> >> [<ffffffff81036536>] ? warn_slowpath_common+0x56/0x70
> >> [<ffffffffa08eb05a>] ? btrfs_lookup_extent_info+0x40a/0x4c0 [btrfs]
> >> [<ffffffffa08ee558>] ? do_walk_down+0x128/0x750 [btrfs]
> >> [<ffffffffa08ebab4>] ? walk_down_proc+0x314/0x330 [btrfs]
> >> [<ffffffffa08eec42>] ? walk_down_tree+0xc2/0xf0 [btrfs]
> >> [<ffffffffa08f2bce>] ? btrfs_drop_snapshot+0x40e/0x9a0 [btrfs]
> >> [<ffffffffa09096db>] ? btrfs_clean_one_deleted_snapshot+0xab/0xe0 [btrfs]
> >> [<ffffffffa08fe970>] ? cleaner_kthread+0x280/0x320 [btrfs]
> >> [<ffffffff81052eaf>] ? kthread+0xaf/0xc0
> >> [<ffffffff81052e00>] ? kthread_create_on_node+0x110/0x110
> >> [<ffffffff814c8c0d>] ? ret_from_fork+0x5d/0xb0
> >> [<ffffffff81052e00>] ? kthread_create_on_node+0x110/0x110
> >> ------------[ end trace ]-----------
> >> BTRFS error (device dm-1): Missing references.
> >> BTRFS: error (device dm-1) in btrfs_drop_snapshot:9557: errno=-5 IO failure
> >>
> >> By not aborting the transaction, every future attempt to delete the
> >> subvolume fails and we end up never freeing all the extents used by
> >> the subvolume/snapshot.
> >>
> >> By aborting the transaction we have a least the possibility to
> >> succeeded after unmounting and mounting again the filesystem.
> >>
> >> So we fix this problem by aborting the transaction when the walk down/up
> >> tree fails, which is a safer approach.
> > Ok, this still misses the explanation on why we can't solve the
> > problem by simply updating the drop_progress and drop_level when we
> > get an error from walk_up_tree() or walk_down_tree().
>
> I don't fully understand the walk up/down tree part, so I am not sure if
> drop_progress and drop_level are correct. So it’s safer for me to abort
> the transaction.

Then it seems a better understanding is needed.
I would like to know exactly why updating the drop_progress and
drop_level, and the root item, even when the walk up/down fails would
not work. And have it mentioned in the changelog.

When attempting to delete again a root, we have checks that first
verify if backrefs still exist for an extent.
They were added by:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=78c52d9eb6b7ac899bcd5a681aeff7c971c22934

The problem you describe sounds more like it's missing that type of
check somewhere.

>
>
> >> In addition, we also added the initialization of drop_progress and
> >> drop_level.
> > So, this about initializing drop_progress and drop_level seems like a
> > separate change.
> > How is this related to the original problem?
> >
> > It completely lacks an explanation about why it's needed, how it
> > relates to the original problem.
> >
> > If it's unrelated, then it should go into a separate patch with a
> > proper explanation in its changelog.
>
> This part is really not related to the original problem, but I think
> there is a potential risk, I will separate it into another patch.
>
>
> >> Signed-off-by: Robbie Ko <robbieko@synology.com>
> >> ---
> >>   fs/btrfs/extent-tree.c | 9 ++++++++-
> >>   1 file changed, 8 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> >> index 268ce58d4569..032a257fdb65 100644
> >> --- a/fs/btrfs/extent-tree.c
> >> +++ b/fs/btrfs/extent-tree.c
> >> @@ -5539,10 +5539,14 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
> >>                  path->locks[level] = BTRFS_WRITE_LOCK;
> >>                  memset(&wc->update_progress, 0,
> >>                         sizeof(wc->update_progress));
> >> +               memset(&wc->drop_progress, 0,
> >> +                      sizeof(wc->drop_progress));
> > I don't see why this is needed.
> > wc was allocated with kzalloc(), how can wc->drop_progress not be
> > already zeroed here?
>
> Yes, it's should all be zero.
> But I think the wording of update_progress should be consistent.

What do you mean by "wording of update_progress"?

>
> >
> > Also, walk_up_tree() and walk_down_tree() never read from
> > wc->drop_progress. We use wc->drop_progress only for storing after
> > walk up/walk down and then copy it into the root item for persistence.
> >
> >>          } else {
> >>                  btrfs_disk_key_to_cpu(&key, &root_item->drop_progress);
> >>                  memcpy(&wc->update_progress, &key,
> >>                         sizeof(wc->update_progress));
> >> +               memcpy(&wc->drop_progress, &key,
> >> +                      sizeof(wc->drop_progress));
> > Why is this needed?
> > Except when starting the subvolume/snapshot drop, before entering the
> > loop with the walk up and down logic, we never read wc->drop_progress.
> >
> > wc->drop_progress is meant only for storing into the root_item after
> > each walk up/walk down iteration, we never read from it during the
> > walks.
>
> Indeed, drop_progress is only used to storing into the root item after
> each walk up/down iteration. But this value is not updated every time,
> when the stage is UPDATE_BACKREF, we may update 0 into to the root item,
> resulting in inconsistent drop_progress.

I still don't see how that could happen.
We always start in the DROP_REFERENCE stage, and then we update
drop_progress based on wc->level, path->nodes[wc->level] and
path->slots[wc->level].
All these are properly initialized.

>
> So I think we must initialize drop_progress and drop_level.
>
> >>                  level = btrfs_root_drop_level(root_item);
> >>                  BUG_ON(level == 0);
> >> @@ -5588,6 +5592,7 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
> >>
> >>          wc->restarted = test_bit(BTRFS_ROOT_DEAD_TREE, &root->state);
> >>          wc->level = level;
> >> +       wc->drop_level = level;
> > I don't understand why this is needed too.
> > We never read from wc->drop_level during the walk up and walk down,
> > it's only used to store a level during/after each walk up and walk
> > down iteration, and if the walks succeeded we
> > set wc->drop_level to wc->level and then copy it into the root item.
>
> This is not always true.
> If the stage is UPDATE_BACKREF, drop_level and drop_progress will not be updated,
> so the root item may write 0, resulting in inconsistent drop_progress.

Same as before, we always start with DROP_REFERENCE stage, which
guarantees drop_progress and drop_level end up properly
initialized/setup.
So I still don't see a problem.

I think you need to show an exact sequence of steps that lead to a problem.

Thanks.

>
>
> >
> > So please provide an explanation on why these initializations are
> > needed and how they relate to the original problem.
> > If they are not related to the original problem, then move them into a
> > separate patch, with all the proper explanations in the changelog.
> >
> > Thanks.
> >
> >
> >>          wc->shared_level = -1;
> >>          wc->stage = DROP_REFERENCE;
> >>          wc->update_ref = update_ref;
> >> @@ -5659,8 +5664,10 @@ int btrfs_drop_snapshot(struct btrfs_root *root, int update_ref, int for_reloc)
> >>                  }
> >>          }
> >>          btrfs_release_path(path);
> >> -       if (err)
> >> +       if (err) {
> >> +               btrfs_abort_transaction(trans, err);
> >>                  goto out_end_trans;
> >> +       }
> >>
> >>          ret = btrfs_del_root(trans, &root->root_key);
> >>          if (ret) {
> >> --
> >> 2.17.1
> >>
> >
>
>
> --
> Avast 防毒軟體已檢查此封電子郵件的病毒。
> https://www.avast.com/antivirus
>


-- 
Filipe David Manana,

“Whether you think you can, or you think you can't — you're right.”

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-08-05  8:57 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-04  2:49 [PATCH v2] btrfs: fix root drop key inconsistent when drop subvolume/snapshot fails robbieko
2021-08-04 11:56 ` Filipe Manana
2021-08-05  3:02   ` robbieko
2021-08-05  8:57     ` Filipe Manana

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).