All of lore.kernel.org
 help / color / mirror / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Anand Jain <anand.jain@oracle.com>, Qu Wenruo <wqu@suse.com>,
	linux-btrfs@vger.kernel.org
Cc: "Luca Béla Palkovics" <luca.bela.palkovics@gmail.com>
Subject: Re: [PATCH v2] btrfs: repair super block num_devices automatically
Date: Mon, 28 Feb 2022 17:21:08 +0800	[thread overview]
Message-ID: <f8c0610e-466b-256b-347f-d10c517023ae@gmx.com> (raw)
In-Reply-To: <82df81f6-74a1-b77d-c4e9-48d20ab1bd68@oracle.com>



On 2022/2/28 17:01, Anand Jain wrote:
> On 28/02/2022 16:54, Qu Wenruo wrote:
>>
>>
>> On 2022/2/28 16:00, Anand Jain wrote:
>>> On 28/02/2022 15:05, Qu Wenruo wrote:
>>>> [BUG]
>>>> There is a report that a btrfs has a bad super block num devices.
>>>>
>>>> This makes btrfs to reject the fs completely.
>>>>
>>>>    BTRFS error (device sdd3): super_num_devices 3 mismatch with
>>>> num_devices 2 found here
>>>>    BTRFS error (device sdd3): failed to read chunk tree: -22
>>>>    BTRFS error (device sdd3): open_ctree failed
>>>>
>>>> [CAUSE]
>>>> During btrfs device removal, chunk tree and super block num devs are
>>>> updated in two different transactions:
>>>>
>>>>    btrfs_rm_device()
>>>>    |- btrfs_rm_dev_item(device)
>>>>    |  |- trans = btrfs_start_transaction()
>>>>    |  |  Now we got transaction X
>>>>    |  |
>>>>    |  |- btrfs_del_item()
>>>>    |  |  Now device item is removed from chunk tree
>>>>    |  |
>>>>    |  |- btrfs_commit_transaction()
>>>>    |     Transaction X got committed, super num devs untouched,
>>>>    |     but device item removed from chunk tree.
>>>>    |     (AKA, super num devs is already incorrect)
>>>>    |
>>>>    |- cur_devices->num_devices--;
>>>>    |- cur_devices->total_devices--;
>>>>    |- btrfs_set_super_num_devices()
>>>>       All those operations are not in transaction X, thus it will
>>>>       only be written back to disk in next transaction.
>>>>
>>>> So after the transaction X in btrfs_rm_dev_item() committed, but before
>>>> transaction X+1 (which can be minutes away), a power loss happen, then
>>>> we got the super num mismatch.
>>>>
>>>
>
>
>>> The cause part is much better now. So why not also update the super
>>> num_devices in the same transaction?
>>
>> A lot of other things like total_rw_bytes.
>>
>> Not to mention, even we got a fix, it will be another patch.
>>
>> Since the handling of such mismatch is needed to handle older kernels
>> anyway.
>
>   Ok.
>
>
>>>> [FIX]
>>>> Make the super_num_devices check less strict, converting it from a hard
>>>> error to a warning, and reset the value to a correct one for the
>>>> current
>>>> or next transaction commitment.
>>>
>>> So that we can leave the part where we identify and report num_devices
>>> miss-match as it is.
>>
>> I didn't get your point.
>> What do you want to get from this patch?
>>
>> Isn't this already the behavior of this patch?
>
>   Let me clarify - we don't need this patch if we fix the actual bug as
> above. IMO.

Big NO NO.

The damage is already done, we must be responsible for whatever damage
we caused, especially the damage has already reached disk.

Just fixing the cause and call it a day is definitely not a responsible way.

Especially when the damage is done, you have no way to mount it, just
like the reporter.

You dare to say the same thing to the end user?

>
>>>
>>> Thanks, Anand
>>>
>>>
>>>> Reported-by: Luca Béla Palkovics <luca.bela.palkovics@gmail.com>
>>>> Link:
>>>> https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CKmp3igr44okQ@mail.gmail.com/
>>>>
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>> Changelog:
>>>> v2:
>>>> - Add a proper reason on why this mismatch can happen
>>>>    No code change.
>>>> ---
>>>>   fs/btrfs/volumes.c | 8 ++++----
>>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>> index 74c8024d8f96..d0ba3ff21920 100644
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -7682,12 +7682,12 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info
>>>> *fs_info)
>>>>        * do another round of validation checks.
>>>>        */
>>>>       if (total_dev != fs_info->fs_devices->total_devices) {
>>>> -        btrfs_err(fs_info,
>>>> -       "super_num_devices %llu mismatch with num_devices %llu found
>>>> here",
>>>> +        btrfs_warn(fs_info,
>>>> +       "super_num_devices %llu mismatch with num_devices %llu found
>>>> here, will be repaired on next transaction commitment",
>>>>                 btrfs_super_num_devices(fs_info->super_copy),
>>>>                 total_dev);
>>>> -        ret = -EINVAL;
>>>> -        goto error;
>>>> +        fs_info->fs_devices->total_devices = total_dev;
>>>> +        btrfs_set_super_num_devices(fs_info->super_copy, total_dev);
>>>>       }
>>>>       if (btrfs_super_total_bytes(fs_info->super_copy) <
>>>>           fs_info->fs_devices->total_rw_bytes) {
>>>
>

  reply	other threads:[~2022-02-28  9:21 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-28  7:05 [PATCH v2] btrfs: repair super block num_devices automatically Qu Wenruo
2022-02-28  8:00 ` Anand Jain
2022-02-28  8:54   ` Qu Wenruo
2022-02-28  9:01     ` Anand Jain
2022-02-28  9:21       ` Qu Wenruo [this message]
2022-02-28 12:51         ` Anand Jain
2022-02-28 13:03           ` Qu Wenruo
2022-02-28 13:13             ` Luca Béla Palkovics
2022-02-28 13:29               ` Qu Wenruo
2022-04-20 15:42           ` David Sterba
2022-04-25 17:07             ` Anand Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f8c0610e-466b-256b-347f-d10c517023ae@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=anand.jain@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=luca.bela.palkovics@gmail.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.