All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anand Jain <anand.jain@oracle.com>
To: Qu Wenruo <wqu@suse.com>, linux-btrfs@vger.kernel.org
Cc: "Luca Béla Palkovics" <luca.bela.palkovics@gmail.com>
Subject: Re: [PATCH v2] btrfs: repair super block num_devices automatically
Date: Mon, 28 Feb 2022 16:00:11 +0800	[thread overview]
Message-ID: <ebdb0e67-0e9e-4ca6-1d2e-4dd2a7a7c103@oracle.com> (raw)
In-Reply-To: <732d0a86c3624bc96df3cca05512edac40efc75d.1646031785.git.wqu@suse.com>

On 28/02/2022 15:05, Qu Wenruo wrote:
> [BUG]
> There is a report that a btrfs has a bad super block num devices.
> 
> This makes btrfs to reject the fs completely.
> 
>    BTRFS error (device sdd3): super_num_devices 3 mismatch with num_devices 2 found here
>    BTRFS error (device sdd3): failed to read chunk tree: -22
>    BTRFS error (device sdd3): open_ctree failed
> 
> [CAUSE]
> During btrfs device removal, chunk tree and super block num devs are
> updated in two different transactions:
> 
>    btrfs_rm_device()
>    |- btrfs_rm_dev_item(device)
>    |  |- trans = btrfs_start_transaction()
>    |  |  Now we got transaction X
>    |  |
>    |  |- btrfs_del_item()
>    |  |  Now device item is removed from chunk tree
>    |  |
>    |  |- btrfs_commit_transaction()
>    |     Transaction X got committed, super num devs untouched,
>    |     but device item removed from chunk tree.
>    |     (AKA, super num devs is already incorrect)
>    |
>    |- cur_devices->num_devices--;
>    |- cur_devices->total_devices--;
>    |- btrfs_set_super_num_devices()
>       All those operations are not in transaction X, thus it will
>       only be written back to disk in next transaction.
> 
> So after the transaction X in btrfs_rm_dev_item() committed, but before
> transaction X+1 (which can be minutes away), a power loss happen, then
> we got the super num mismatch.
> 

The cause part is much better now. So why not also update the super
num_devices in the same transaction?


> [FIX]
> Make the super_num_devices check less strict, converting it from a hard
> error to a warning, and reset the value to a correct one for the current
> or next transaction commitment.

So that we can leave the part where we identify and report num_devices
miss-match as it is.

Thanks, Anand


> Reported-by: Luca Béla Palkovics <luca.bela.palkovics@gmail.com>
> Link: https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CKmp3igr44okQ@mail.gmail.com/
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Changelog:
> v2:
> - Add a proper reason on why this mismatch can happen
>    No code change.
> ---
>   fs/btrfs/volumes.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 74c8024d8f96..d0ba3ff21920 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -7682,12 +7682,12 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
>   	 * do another round of validation checks.
>   	 */
>   	if (total_dev != fs_info->fs_devices->total_devices) {
> -		btrfs_err(fs_info,
> -	   "super_num_devices %llu mismatch with num_devices %llu found here",
> +		btrfs_warn(fs_info,
> +	   "super_num_devices %llu mismatch with num_devices %llu found here, will be repaired on next transaction commitment",
>   			  btrfs_super_num_devices(fs_info->super_copy),
>   			  total_dev);
> -		ret = -EINVAL;
> -		goto error;
> +		fs_info->fs_devices->total_devices = total_dev;
> +		btrfs_set_super_num_devices(fs_info->super_copy, total_dev);
>   	}
>   	if (btrfs_super_total_bytes(fs_info->super_copy) <
>   	    fs_info->fs_devices->total_rw_bytes) {


  reply	other threads:[~2022-02-28  8:00 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-28  7:05 [PATCH v2] btrfs: repair super block num_devices automatically Qu Wenruo
2022-02-28  8:00 ` Anand Jain [this message]
2022-02-28  8:54   ` Qu Wenruo
2022-02-28  9:01     ` Anand Jain
2022-02-28  9:21       ` Qu Wenruo
2022-02-28 12:51         ` Anand Jain
2022-02-28 13:03           ` Qu Wenruo
2022-02-28 13:13             ` Luca Béla Palkovics
2022-02-28 13:29               ` Qu Wenruo
2022-04-20 15:42           ` David Sterba
2022-04-25 17:07             ` Anand Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ebdb0e67-0e9e-4ca6-1d2e-4dd2a7a7c103@oracle.com \
    --to=anand.jain@oracle.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=luca.bela.palkovics@gmail.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.