All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] btrfs: repair super block num_devices automatically
@ 2022-02-28  7:05 Qu Wenruo
  2022-02-28  8:00 ` Anand Jain
  0 siblings, 1 reply; 11+ messages in thread
From: Qu Wenruo @ 2022-02-28  7:05 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Luca Béla Palkovics

[BUG]
There is a report that a btrfs has a bad super block num devices.

This makes btrfs to reject the fs completely.

  BTRFS error (device sdd3): super_num_devices 3 mismatch with num_devices 2 found here
  BTRFS error (device sdd3): failed to read chunk tree: -22
  BTRFS error (device sdd3): open_ctree failed

[CAUSE]
During btrfs device removal, chunk tree and super block num devs are
updated in two different transactions:

  btrfs_rm_device()
  |- btrfs_rm_dev_item(device)
  |  |- trans = btrfs_start_transaction()
  |  |  Now we got transaction X
  |  |
  |  |- btrfs_del_item()
  |  |  Now device item is removed from chunk tree
  |  |
  |  |- btrfs_commit_transaction()
  |     Transaction X got committed, super num devs untouched,
  |     but device item removed from chunk tree.
  |     (AKA, super num devs is already incorrect)
  |
  |- cur_devices->num_devices--;
  |- cur_devices->total_devices--;
  |- btrfs_set_super_num_devices()
     All those operations are not in transaction X, thus it will
     only be written back to disk in next transaction.

So after the transaction X in btrfs_rm_dev_item() committed, but before
transaction X+1 (which can be minutes away), a power loss happen, then
we got the super num mismatch.

[FIX]
Make the super_num_devices check less strict, converting it from a hard
error to a warning, and reset the value to a correct one for the current
or next transaction commitment.

Reported-by: Luca Béla Palkovics <luca.bela.palkovics@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CKmp3igr44okQ@mail.gmail.com/
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
Changelog:
v2:
- Add a proper reason on why this mismatch can happen
  No code change.
---
 fs/btrfs/volumes.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 74c8024d8f96..d0ba3ff21920 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -7682,12 +7682,12 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
 	 * do another round of validation checks.
 	 */
 	if (total_dev != fs_info->fs_devices->total_devices) {
-		btrfs_err(fs_info,
-	   "super_num_devices %llu mismatch with num_devices %llu found here",
+		btrfs_warn(fs_info,
+	   "super_num_devices %llu mismatch with num_devices %llu found here, will be repaired on next transaction commitment",
 			  btrfs_super_num_devices(fs_info->super_copy),
 			  total_dev);
-		ret = -EINVAL;
-		goto error;
+		fs_info->fs_devices->total_devices = total_dev;
+		btrfs_set_super_num_devices(fs_info->super_copy, total_dev);
 	}
 	if (btrfs_super_total_bytes(fs_info->super_copy) <
 	    fs_info->fs_devices->total_rw_bytes) {
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] btrfs: repair super block num_devices automatically
  2022-02-28  7:05 [PATCH v2] btrfs: repair super block num_devices automatically Qu Wenruo
@ 2022-02-28  8:00 ` Anand Jain
  2022-02-28  8:54   ` Qu Wenruo
  0 siblings, 1 reply; 11+ messages in thread
From: Anand Jain @ 2022-02-28  8:00 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs; +Cc: Luca Béla Palkovics

On 28/02/2022 15:05, Qu Wenruo wrote:
> [BUG]
> There is a report that a btrfs has a bad super block num devices.
> 
> This makes btrfs to reject the fs completely.
> 
>    BTRFS error (device sdd3): super_num_devices 3 mismatch with num_devices 2 found here
>    BTRFS error (device sdd3): failed to read chunk tree: -22
>    BTRFS error (device sdd3): open_ctree failed
> 
> [CAUSE]
> During btrfs device removal, chunk tree and super block num devs are
> updated in two different transactions:
> 
>    btrfs_rm_device()
>    |- btrfs_rm_dev_item(device)
>    |  |- trans = btrfs_start_transaction()
>    |  |  Now we got transaction X
>    |  |
>    |  |- btrfs_del_item()
>    |  |  Now device item is removed from chunk tree
>    |  |
>    |  |- btrfs_commit_transaction()
>    |     Transaction X got committed, super num devs untouched,
>    |     but device item removed from chunk tree.
>    |     (AKA, super num devs is already incorrect)
>    |
>    |- cur_devices->num_devices--;
>    |- cur_devices->total_devices--;
>    |- btrfs_set_super_num_devices()
>       All those operations are not in transaction X, thus it will
>       only be written back to disk in next transaction.
> 
> So after the transaction X in btrfs_rm_dev_item() committed, but before
> transaction X+1 (which can be minutes away), a power loss happen, then
> we got the super num mismatch.
> 

The cause part is much better now. So why not also update the super
num_devices in the same transaction?


> [FIX]
> Make the super_num_devices check less strict, converting it from a hard
> error to a warning, and reset the value to a correct one for the current
> or next transaction commitment.

So that we can leave the part where we identify and report num_devices
miss-match as it is.

Thanks, Anand


> Reported-by: Luca Béla Palkovics <luca.bela.palkovics@gmail.com>
> Link: https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CKmp3igr44okQ@mail.gmail.com/
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
> Changelog:
> v2:
> - Add a proper reason on why this mismatch can happen
>    No code change.
> ---
>   fs/btrfs/volumes.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> index 74c8024d8f96..d0ba3ff21920 100644
> --- a/fs/btrfs/volumes.c
> +++ b/fs/btrfs/volumes.c
> @@ -7682,12 +7682,12 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info *fs_info)
>   	 * do another round of validation checks.
>   	 */
>   	if (total_dev != fs_info->fs_devices->total_devices) {
> -		btrfs_err(fs_info,
> -	   "super_num_devices %llu mismatch with num_devices %llu found here",
> +		btrfs_warn(fs_info,
> +	   "super_num_devices %llu mismatch with num_devices %llu found here, will be repaired on next transaction commitment",
>   			  btrfs_super_num_devices(fs_info->super_copy),
>   			  total_dev);
> -		ret = -EINVAL;
> -		goto error;
> +		fs_info->fs_devices->total_devices = total_dev;
> +		btrfs_set_super_num_devices(fs_info->super_copy, total_dev);
>   	}
>   	if (btrfs_super_total_bytes(fs_info->super_copy) <
>   	    fs_info->fs_devices->total_rw_bytes) {


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] btrfs: repair super block num_devices automatically
  2022-02-28  8:00 ` Anand Jain
@ 2022-02-28  8:54   ` Qu Wenruo
  2022-02-28  9:01     ` Anand Jain
  0 siblings, 1 reply; 11+ messages in thread
From: Qu Wenruo @ 2022-02-28  8:54 UTC (permalink / raw)
  To: Anand Jain, Qu Wenruo, linux-btrfs; +Cc: Luca Béla Palkovics



On 2022/2/28 16:00, Anand Jain wrote:
> On 28/02/2022 15:05, Qu Wenruo wrote:
>> [BUG]
>> There is a report that a btrfs has a bad super block num devices.
>>
>> This makes btrfs to reject the fs completely.
>>
>>    BTRFS error (device sdd3): super_num_devices 3 mismatch with
>> num_devices 2 found here
>>    BTRFS error (device sdd3): failed to read chunk tree: -22
>>    BTRFS error (device sdd3): open_ctree failed
>>
>> [CAUSE]
>> During btrfs device removal, chunk tree and super block num devs are
>> updated in two different transactions:
>>
>>    btrfs_rm_device()
>>    |- btrfs_rm_dev_item(device)
>>    |  |- trans = btrfs_start_transaction()
>>    |  |  Now we got transaction X
>>    |  |
>>    |  |- btrfs_del_item()
>>    |  |  Now device item is removed from chunk tree
>>    |  |
>>    |  |- btrfs_commit_transaction()
>>    |     Transaction X got committed, super num devs untouched,
>>    |     but device item removed from chunk tree.
>>    |     (AKA, super num devs is already incorrect)
>>    |
>>    |- cur_devices->num_devices--;
>>    |- cur_devices->total_devices--;
>>    |- btrfs_set_super_num_devices()
>>       All those operations are not in transaction X, thus it will
>>       only be written back to disk in next transaction.
>>
>> So after the transaction X in btrfs_rm_dev_item() committed, but before
>> transaction X+1 (which can be minutes away), a power loss happen, then
>> we got the super num mismatch.
>>
>
> The cause part is much better now. So why not also update the super
> num_devices in the same transaction?

A lot of other things like total_rw_bytes.

Not to mention, even we got a fix, it will be another patch.

Since the handling of such mismatch is needed to handle older kernels
anyway.

>
>
>> [FIX]
>> Make the super_num_devices check less strict, converting it from a hard
>> error to a warning, and reset the value to a correct one for the current
>> or next transaction commitment.
>
> So that we can leave the part where we identify and report num_devices
> miss-match as it is.

I didn't get your point.
What do you want to get from this patch?

Isn't this already the behavior of this patch?

>
> Thanks, Anand
>
>
>> Reported-by: Luca Béla Palkovics <luca.bela.palkovics@gmail.com>
>> Link:
>> https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CKmp3igr44okQ@mail.gmail.com/
>>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>> Changelog:
>> v2:
>> - Add a proper reason on why this mismatch can happen
>>    No code change.
>> ---
>>   fs/btrfs/volumes.c | 8 ++++----
>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>> index 74c8024d8f96..d0ba3ff21920 100644
>> --- a/fs/btrfs/volumes.c
>> +++ b/fs/btrfs/volumes.c
>> @@ -7682,12 +7682,12 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info
>> *fs_info)
>>        * do another round of validation checks.
>>        */
>>       if (total_dev != fs_info->fs_devices->total_devices) {
>> -        btrfs_err(fs_info,
>> -       "super_num_devices %llu mismatch with num_devices %llu found
>> here",
>> +        btrfs_warn(fs_info,
>> +       "super_num_devices %llu mismatch with num_devices %llu found
>> here, will be repaired on next transaction commitment",
>>                 btrfs_super_num_devices(fs_info->super_copy),
>>                 total_dev);
>> -        ret = -EINVAL;
>> -        goto error;
>> +        fs_info->fs_devices->total_devices = total_dev;
>> +        btrfs_set_super_num_devices(fs_info->super_copy, total_dev);
>>       }
>>       if (btrfs_super_total_bytes(fs_info->super_copy) <
>>           fs_info->fs_devices->total_rw_bytes) {
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] btrfs: repair super block num_devices automatically
  2022-02-28  8:54   ` Qu Wenruo
@ 2022-02-28  9:01     ` Anand Jain
  2022-02-28  9:21       ` Qu Wenruo
  0 siblings, 1 reply; 11+ messages in thread
From: Anand Jain @ 2022-02-28  9:01 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo, linux-btrfs; +Cc: Luca Béla Palkovics

On 28/02/2022 16:54, Qu Wenruo wrote:
> 
> 
> On 2022/2/28 16:00, Anand Jain wrote:
>> On 28/02/2022 15:05, Qu Wenruo wrote:
>>> [BUG]
>>> There is a report that a btrfs has a bad super block num devices.
>>>
>>> This makes btrfs to reject the fs completely.
>>>
>>>    BTRFS error (device sdd3): super_num_devices 3 mismatch with
>>> num_devices 2 found here
>>>    BTRFS error (device sdd3): failed to read chunk tree: -22
>>>    BTRFS error (device sdd3): open_ctree failed
>>>
>>> [CAUSE]
>>> During btrfs device removal, chunk tree and super block num devs are
>>> updated in two different transactions:
>>>
>>>    btrfs_rm_device()
>>>    |- btrfs_rm_dev_item(device)
>>>    |  |- trans = btrfs_start_transaction()
>>>    |  |  Now we got transaction X
>>>    |  |
>>>    |  |- btrfs_del_item()
>>>    |  |  Now device item is removed from chunk tree
>>>    |  |
>>>    |  |- btrfs_commit_transaction()
>>>    |     Transaction X got committed, super num devs untouched,
>>>    |     but device item removed from chunk tree.
>>>    |     (AKA, super num devs is already incorrect)
>>>    |
>>>    |- cur_devices->num_devices--;
>>>    |- cur_devices->total_devices--;
>>>    |- btrfs_set_super_num_devices()
>>>       All those operations are not in transaction X, thus it will
>>>       only be written back to disk in next transaction.
>>>
>>> So after the transaction X in btrfs_rm_dev_item() committed, but before
>>> transaction X+1 (which can be minutes away), a power loss happen, then
>>> we got the super num mismatch.
>>>
>>


>> The cause part is much better now. So why not also update the super
>> num_devices in the same transaction?
> 
> A lot of other things like total_rw_bytes.
> 
> Not to mention, even we got a fix, it will be another patch.
> 
> Since the handling of such mismatch is needed to handle older kernels
> anyway.

  Ok.


>>> [FIX]
>>> Make the super_num_devices check less strict, converting it from a hard
>>> error to a warning, and reset the value to a correct one for the current
>>> or next transaction commitment.
>>
>> So that we can leave the part where we identify and report num_devices
>> miss-match as it is.
> 
> I didn't get your point.
> What do you want to get from this patch?
> 
> Isn't this already the behavior of this patch?

  Let me clarify - we don't need this patch if we fix the actual bug as 
above. IMO.

>>
>> Thanks, Anand
>>
>>
>>> Reported-by: Luca Béla Palkovics <luca.bela.palkovics@gmail.com>
>>> Link:
>>> https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CKmp3igr44okQ@mail.gmail.com/ 
>>>
>>>
>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>> ---
>>> Changelog:
>>> v2:
>>> - Add a proper reason on why this mismatch can happen
>>>    No code change.
>>> ---
>>>   fs/btrfs/volumes.c | 8 ++++----
>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>> index 74c8024d8f96..d0ba3ff21920 100644
>>> --- a/fs/btrfs/volumes.c
>>> +++ b/fs/btrfs/volumes.c
>>> @@ -7682,12 +7682,12 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info
>>> *fs_info)
>>>        * do another round of validation checks.
>>>        */
>>>       if (total_dev != fs_info->fs_devices->total_devices) {
>>> -        btrfs_err(fs_info,
>>> -       "super_num_devices %llu mismatch with num_devices %llu found
>>> here",
>>> +        btrfs_warn(fs_info,
>>> +       "super_num_devices %llu mismatch with num_devices %llu found
>>> here, will be repaired on next transaction commitment",
>>>                 btrfs_super_num_devices(fs_info->super_copy),
>>>                 total_dev);
>>> -        ret = -EINVAL;
>>> -        goto error;
>>> +        fs_info->fs_devices->total_devices = total_dev;
>>> +        btrfs_set_super_num_devices(fs_info->super_copy, total_dev);
>>>       }
>>>       if (btrfs_super_total_bytes(fs_info->super_copy) <
>>>           fs_info->fs_devices->total_rw_bytes) {
>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] btrfs: repair super block num_devices automatically
  2022-02-28  9:01     ` Anand Jain
@ 2022-02-28  9:21       ` Qu Wenruo
  2022-02-28 12:51         ` Anand Jain
  0 siblings, 1 reply; 11+ messages in thread
From: Qu Wenruo @ 2022-02-28  9:21 UTC (permalink / raw)
  To: Anand Jain, Qu Wenruo, linux-btrfs; +Cc: Luca Béla Palkovics



On 2022/2/28 17:01, Anand Jain wrote:
> On 28/02/2022 16:54, Qu Wenruo wrote:
>>
>>
>> On 2022/2/28 16:00, Anand Jain wrote:
>>> On 28/02/2022 15:05, Qu Wenruo wrote:
>>>> [BUG]
>>>> There is a report that a btrfs has a bad super block num devices.
>>>>
>>>> This makes btrfs to reject the fs completely.
>>>>
>>>>    BTRFS error (device sdd3): super_num_devices 3 mismatch with
>>>> num_devices 2 found here
>>>>    BTRFS error (device sdd3): failed to read chunk tree: -22
>>>>    BTRFS error (device sdd3): open_ctree failed
>>>>
>>>> [CAUSE]
>>>> During btrfs device removal, chunk tree and super block num devs are
>>>> updated in two different transactions:
>>>>
>>>>    btrfs_rm_device()
>>>>    |- btrfs_rm_dev_item(device)
>>>>    |  |- trans = btrfs_start_transaction()
>>>>    |  |  Now we got transaction X
>>>>    |  |
>>>>    |  |- btrfs_del_item()
>>>>    |  |  Now device item is removed from chunk tree
>>>>    |  |
>>>>    |  |- btrfs_commit_transaction()
>>>>    |     Transaction X got committed, super num devs untouched,
>>>>    |     but device item removed from chunk tree.
>>>>    |     (AKA, super num devs is already incorrect)
>>>>    |
>>>>    |- cur_devices->num_devices--;
>>>>    |- cur_devices->total_devices--;
>>>>    |- btrfs_set_super_num_devices()
>>>>       All those operations are not in transaction X, thus it will
>>>>       only be written back to disk in next transaction.
>>>>
>>>> So after the transaction X in btrfs_rm_dev_item() committed, but before
>>>> transaction X+1 (which can be minutes away), a power loss happen, then
>>>> we got the super num mismatch.
>>>>
>>>
>
>
>>> The cause part is much better now. So why not also update the super
>>> num_devices in the same transaction?
>>
>> A lot of other things like total_rw_bytes.
>>
>> Not to mention, even we got a fix, it will be another patch.
>>
>> Since the handling of such mismatch is needed to handle older kernels
>> anyway.
>
>   Ok.
>
>
>>>> [FIX]
>>>> Make the super_num_devices check less strict, converting it from a hard
>>>> error to a warning, and reset the value to a correct one for the
>>>> current
>>>> or next transaction commitment.
>>>
>>> So that we can leave the part where we identify and report num_devices
>>> miss-match as it is.
>>
>> I didn't get your point.
>> What do you want to get from this patch?
>>
>> Isn't this already the behavior of this patch?
>
>   Let me clarify - we don't need this patch if we fix the actual bug as
> above. IMO.

Big NO NO.

The damage is already done, we must be responsible for whatever damage
we caused, especially the damage has already reached disk.

Just fixing the cause and call it a day is definitely not a responsible way.

Especially when the damage is done, you have no way to mount it, just
like the reporter.

You dare to say the same thing to the end user?

>
>>>
>>> Thanks, Anand
>>>
>>>
>>>> Reported-by: Luca Béla Palkovics <luca.bela.palkovics@gmail.com>
>>>> Link:
>>>> https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CKmp3igr44okQ@mail.gmail.com/
>>>>
>>>>
>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>> ---
>>>> Changelog:
>>>> v2:
>>>> - Add a proper reason on why this mismatch can happen
>>>>    No code change.
>>>> ---
>>>>   fs/btrfs/volumes.c | 8 ++++----
>>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>> index 74c8024d8f96..d0ba3ff21920 100644
>>>> --- a/fs/btrfs/volumes.c
>>>> +++ b/fs/btrfs/volumes.c
>>>> @@ -7682,12 +7682,12 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info
>>>> *fs_info)
>>>>        * do another round of validation checks.
>>>>        */
>>>>       if (total_dev != fs_info->fs_devices->total_devices) {
>>>> -        btrfs_err(fs_info,
>>>> -       "super_num_devices %llu mismatch with num_devices %llu found
>>>> here",
>>>> +        btrfs_warn(fs_info,
>>>> +       "super_num_devices %llu mismatch with num_devices %llu found
>>>> here, will be repaired on next transaction commitment",
>>>>                 btrfs_super_num_devices(fs_info->super_copy),
>>>>                 total_dev);
>>>> -        ret = -EINVAL;
>>>> -        goto error;
>>>> +        fs_info->fs_devices->total_devices = total_dev;
>>>> +        btrfs_set_super_num_devices(fs_info->super_copy, total_dev);
>>>>       }
>>>>       if (btrfs_super_total_bytes(fs_info->super_copy) <
>>>>           fs_info->fs_devices->total_rw_bytes) {
>>>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] btrfs: repair super block num_devices automatically
  2022-02-28  9:21       ` Qu Wenruo
@ 2022-02-28 12:51         ` Anand Jain
  2022-02-28 13:03           ` Qu Wenruo
  2022-04-20 15:42           ` David Sterba
  0 siblings, 2 replies; 11+ messages in thread
From: Anand Jain @ 2022-02-28 12:51 UTC (permalink / raw)
  To: Qu Wenruo, Qu Wenruo; +Cc: Luca Béla Palkovics, linux-btrfs

On 28/02/2022 17:21, Qu Wenruo wrote:
> 
> 
> On 2022/2/28 17:01, Anand Jain wrote:
>> On 28/02/2022 16:54, Qu Wenruo wrote:
>>>
>>>
>>> On 2022/2/28 16:00, Anand Jain wrote:
>>>> On 28/02/2022 15:05, Qu Wenruo wrote:
>>>>> [BUG]
>>>>> There is a report that a btrfs has a bad super block num devices.
>>>>>
>>>>> This makes btrfs to reject the fs completely.
>>>>>
>>>>>    BTRFS error (device sdd3): super_num_devices 3 mismatch with
>>>>> num_devices 2 found here
>>>>>    BTRFS error (device sdd3): failed to read chunk tree: -22
>>>>>    BTRFS error (device sdd3): open_ctree failed
>>>>>
>>>>> [CAUSE]
>>>>> During btrfs device removal, chunk tree and super block num devs are
>>>>> updated in two different transactions:
>>>>>
>>>>>    btrfs_rm_device()
>>>>>    |- btrfs_rm_dev_item(device)
>>>>>    |  |- trans = btrfs_start_transaction()
>>>>>    |  |  Now we got transaction X
>>>>>    |  |
>>>>>    |  |- btrfs_del_item()
>>>>>    |  |  Now device item is removed from chunk tree
>>>>>    |  |
>>>>>    |  |- btrfs_commit_transaction()
>>>>>    |     Transaction X got committed, super num devs untouched,
>>>>>    |     but device item removed from chunk tree.
>>>>>    |     (AKA, super num devs is already incorrect)
>>>>>    |
>>>>>    |- cur_devices->num_devices--;
>>>>>    |- cur_devices->total_devices--;
>>>>>    |- btrfs_set_super_num_devices()
>>>>>       All those operations are not in transaction X, thus it will
>>>>>       only be written back to disk in next transaction.
>>>>>
>>>>> So after the transaction X in btrfs_rm_dev_item() committed, but 
>>>>> before
>>>>> transaction X+1 (which can be minutes away), a power loss happen, then
>>>>> we got the super num mismatch.
>>>>>
>>>>
>>
>>
>>>> The cause part is much better now. So why not also update the super
>>>> num_devices in the same transaction?
>>>
>>> A lot of other things like total_rw_bytes.
>>>
>>> Not to mention, even we got a fix, it will be another patch.
>>>
>>> Since the handling of such mismatch is needed to handle older kernels
>>> anyway.
>>
>>   Ok.
>>
>>
>>>>> [FIX]
>>>>> Make the super_num_devices check less strict, converting it from a 
>>>>> hard
>>>>> error to a warning, and reset the value to a correct one for the
>>>>> current
>>>>> or next transaction commitment.
>>>>
>>>> So that we can leave the part where we identify and report num_devices
>>>> miss-match as it is.
>>>
>>> I didn't get your point.
>>> What do you want to get from this patch?
>>>
>>> Isn't this already the behavior of this patch?
>>
>>   Let me clarify - we don't need this patch if we fix the actual bug as
>> above. IMO.
> 
> Big NO NO.
> 
> The damage is already done, we must be responsible for whatever damage
> we caused, especially the damage has already reached disk.
> 
> Just fixing the cause and call it a day is definitely not a responsible 
> way.
> 
> Especially when the damage is done, you have no way to mount it, just
> like the reporter.
> 
> You dare to say the same thing to the end user?

You have a btrfs-progs patch to recover from that situation. Right?
Plus, I suppose you are sending a kernel patch for the actual bug
which is causing this corruption. No?

This patch is the reporter side fix. I don't encourage fixing the
reporter because a similar corruption might happen for reasons unknown
yet. For example, raid1 split-brain? Which is not yet completely
analyzed and test-cased yet.

In my POV.

Thanks, Anand



>>>> Thanks, Anand
>>>>
>>>>
>>>>> Reported-by: Luca Béla Palkovics <luca.bela.palkovics@gmail.com>
>>>>> Link:
>>>>> https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CKmp3igr44okQ@mail.gmail.com/ 
>>>>>
>>>>>
>>>>>
>>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>>> ---
>>>>> Changelog:
>>>>> v2:
>>>>> - Add a proper reason on why this mismatch can happen
>>>>>    No code change.
>>>>> ---
>>>>>   fs/btrfs/volumes.c | 8 ++++----
>>>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>
>>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>>> index 74c8024d8f96..d0ba3ff21920 100644
>>>>> --- a/fs/btrfs/volumes.c
>>>>> +++ b/fs/btrfs/volumes.c
>>>>> @@ -7682,12 +7682,12 @@ int btrfs_read_chunk_tree(struct btrfs_fs_info
>>>>> *fs_info)
>>>>>        * do another round of validation checks.
>>>>>        */
>>>>>       if (total_dev != fs_info->fs_devices->total_devices) {
>>>>> -        btrfs_err(fs_info,
>>>>> -       "super_num_devices %llu mismatch with num_devices %llu found
>>>>> here",
>>>>> +        btrfs_warn(fs_info,
>>>>> +       "super_num_devices %llu mismatch with num_devices %llu found
>>>>> here, will be repaired on next transaction commitment",
>>>>>                 btrfs_super_num_devices(fs_info->super_copy),
>>>>>                 total_dev);
>>>>> -        ret = -EINVAL;
>>>>> -        goto error;
>>>>> +        fs_info->fs_devices->total_devices = total_dev;
>>>>> +        btrfs_set_super_num_devices(fs_info->super_copy, total_dev);
>>>>>       }
>>>>>       if (btrfs_super_total_bytes(fs_info->super_copy) <
>>>>>           fs_info->fs_devices->total_rw_bytes) {
>>>>
>>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] btrfs: repair super block num_devices automatically
  2022-02-28 12:51         ` Anand Jain
@ 2022-02-28 13:03           ` Qu Wenruo
  2022-02-28 13:13             ` Luca Béla Palkovics
  2022-04-20 15:42           ` David Sterba
  1 sibling, 1 reply; 11+ messages in thread
From: Qu Wenruo @ 2022-02-28 13:03 UTC (permalink / raw)
  To: Anand Jain, Qu Wenruo; +Cc: Luca Béla Palkovics, linux-btrfs



On 2022/2/28 20:51, Anand Jain wrote:
> On 28/02/2022 17:21, Qu Wenruo wrote:
>>
>>
>> On 2022/2/28 17:01, Anand Jain wrote:
>>> On 28/02/2022 16:54, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2022/2/28 16:00, Anand Jain wrote:
>>>>> On 28/02/2022 15:05, Qu Wenruo wrote:
>>>>>> [BUG]
>>>>>> There is a report that a btrfs has a bad super block num devices.
>>>>>>
>>>>>> This makes btrfs to reject the fs completely.
>>>>>>
>>>>>>    BTRFS error (device sdd3): super_num_devices 3 mismatch with
>>>>>> num_devices 2 found here
>>>>>>    BTRFS error (device sdd3): failed to read chunk tree: -22
>>>>>>    BTRFS error (device sdd3): open_ctree failed
>>>>>>
>>>>>> [CAUSE]
>>>>>> During btrfs device removal, chunk tree and super block num devs are
>>>>>> updated in two different transactions:
>>>>>>
>>>>>>    btrfs_rm_device()
>>>>>>    |- btrfs_rm_dev_item(device)
>>>>>>    |  |- trans = btrfs_start_transaction()
>>>>>>    |  |  Now we got transaction X
>>>>>>    |  |
>>>>>>    |  |- btrfs_del_item()
>>>>>>    |  |  Now device item is removed from chunk tree
>>>>>>    |  |
>>>>>>    |  |- btrfs_commit_transaction()
>>>>>>    |     Transaction X got committed, super num devs untouched,
>>>>>>    |     but device item removed from chunk tree.
>>>>>>    |     (AKA, super num devs is already incorrect)
>>>>>>    |
>>>>>>    |- cur_devices->num_devices--;
>>>>>>    |- cur_devices->total_devices--;
>>>>>>    |- btrfs_set_super_num_devices()
>>>>>>       All those operations are not in transaction X, thus it will
>>>>>>       only be written back to disk in next transaction.
>>>>>>
>>>>>> So after the transaction X in btrfs_rm_dev_item() committed, but
>>>>>> before
>>>>>> transaction X+1 (which can be minutes away), a power loss happen,
>>>>>> then
>>>>>> we got the super num mismatch.
>>>>>>
>>>>>
>>>
>>>
>>>>> The cause part is much better now. So why not also update the super
>>>>> num_devices in the same transaction?
>>>>
>>>> A lot of other things like total_rw_bytes.
>>>>
>>>> Not to mention, even we got a fix, it will be another patch.
>>>>
>>>> Since the handling of such mismatch is needed to handle older kernels
>>>> anyway.
>>>
>>>   Ok.
>>>
>>>
>>>>>> [FIX]
>>>>>> Make the super_num_devices check less strict, converting it from a
>>>>>> hard
>>>>>> error to a warning, and reset the value to a correct one for the
>>>>>> current
>>>>>> or next transaction commitment.
>>>>>
>>>>> So that we can leave the part where we identify and report num_devices
>>>>> miss-match as it is.
>>>>
>>>> I didn't get your point.
>>>> What do you want to get from this patch?
>>>>
>>>> Isn't this already the behavior of this patch?
>>>
>>>   Let me clarify - we don't need this patch if we fix the actual bug as
>>> above. IMO.
>>
>> Big NO NO.
>>
>> The damage is already done, we must be responsible for whatever damage
>> we caused, especially the damage has already reached disk.
>>
>> Just fixing the cause and call it a day is definitely not a
>> responsible way.
>>
>> Especially when the damage is done, you have no way to mount it, just
>> like the reporter.
>>
>> You dare to say the same thing to the end user?
>
> You have a btrfs-progs patch to recover from that situation. Right?
> Plus, I suppose you are sending a kernel patch for the actual bug
> which is causing this corruption. No?

Not yet. It can be more complex.
Feel free to try to fix it properly.

>
> This patch is the reporter side fix. I don't encourage fixing the
> reporter because a similar corruption might happen for reasons unknown
> yet. For example, raid1 split-brain? Which is not yet completely
> analyzed and test-cased yet.

No matter whatever the reason, you still can't deny the fact that, if
just super num dev mismatches, there is no need to reject the full fs.

We have tree-checker for each chunk leave, and rw bytes check already.
Even the possible bit flip check for num devs.

The super num devs check doesn't make much sense except causing more
hassles to the end uesr.

The report is just giving us a chance to review if the behavior is
really helpful.
To me, NO.

>
> In my POV.
>
> Thanks, Anand
>
>
>
>>>>> Thanks, Anand
>>>>>
>>>>>
>>>>>> Reported-by: Luca Béla Palkovics <luca.bela.palkovics@gmail.com>
>>>>>> Link:
>>>>>> https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CKmp3igr44okQ@mail.gmail.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>>>> ---
>>>>>> Changelog:
>>>>>> v2:
>>>>>> - Add a proper reason on why this mismatch can happen
>>>>>>    No code change.
>>>>>> ---
>>>>>>   fs/btrfs/volumes.c | 8 ++++----
>>>>>>   1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>>>> index 74c8024d8f96..d0ba3ff21920 100644
>>>>>> --- a/fs/btrfs/volumes.c
>>>>>> +++ b/fs/btrfs/volumes.c
>>>>>> @@ -7682,12 +7682,12 @@ int btrfs_read_chunk_tree(struct
>>>>>> btrfs_fs_info
>>>>>> *fs_info)
>>>>>>        * do another round of validation checks.
>>>>>>        */
>>>>>>       if (total_dev != fs_info->fs_devices->total_devices) {
>>>>>> -        btrfs_err(fs_info,
>>>>>> -       "super_num_devices %llu mismatch with num_devices %llu found
>>>>>> here",
>>>>>> +        btrfs_warn(fs_info,
>>>>>> +       "super_num_devices %llu mismatch with num_devices %llu found
>>>>>> here, will be repaired on next transaction commitment",
>>>>>>                 btrfs_super_num_devices(fs_info->super_copy),
>>>>>>                 total_dev);
>>>>>> -        ret = -EINVAL;
>>>>>> -        goto error;
>>>>>> +        fs_info->fs_devices->total_devices = total_dev;
>>>>>> +        btrfs_set_super_num_devices(fs_info->super_copy, total_dev);
>>>>>>       }
>>>>>>       if (btrfs_super_total_bytes(fs_info->super_copy) <
>>>>>>           fs_info->fs_devices->total_rw_bytes) {
>>>>>
>>>
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] btrfs: repair super block num_devices automatically
  2022-02-28 13:03           ` Qu Wenruo
@ 2022-02-28 13:13             ` Luca Béla Palkovics
  2022-02-28 13:29               ` Qu Wenruo
  0 siblings, 1 reply; 11+ messages in thread
From: Luca Béla Palkovics @ 2022-02-28 13:13 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Anand Jain, Qu Wenruo, linux-btrfs

Am Mo., 28. Feb. 2022 um 14:03 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>
>
>
> On 2022/2/28 20:51, Anand Jain wrote:
> > On 28/02/2022 17:21, Qu Wenruo wrote:
> >>
> >>
> >> On 2022/2/28 17:01, Anand Jain wrote:
> >>> On 28/02/2022 16:54, Qu Wenruo wrote:
> >>>>
> >>>>
> >>>> On 2022/2/28 16:00, Anand Jain wrote:
> >>>>> On 28/02/2022 15:05, Qu Wenruo wrote:
> >>>>>> [BUG]
> >>>>>> There is a report that a btrfs has a bad super block num devices.
> >>>>>>
> >>>>>> This makes btrfs to reject the fs completely.
> >>>>>>
> >>>>>>    BTRFS error (device sdd3): super_num_devices 3 mismatch with
> >>>>>> num_devices 2 found here
> >>>>>>    BTRFS error (device sdd3): failed to read chunk tree: -22
> >>>>>>    BTRFS error (device sdd3): open_ctree failed
> >>>>>>
> >>>>>> [CAUSE]
> >>>>>> During btrfs device removal, chunk tree and super block num devs are
> >>>>>> updated in two different transactions:
> >>>>>>
> >>>>>>    btrfs_rm_device()
> >>>>>>    |- btrfs_rm_dev_item(device)
> >>>>>>    |  |- trans = btrfs_start_transaction()
> >>>>>>    |  |  Now we got transaction X
> >>>>>>    |  |
> >>>>>>    |  |- btrfs_del_item()
> >>>>>>    |  |  Now device item is removed from chunk tree
> >>>>>>    |  |
> >>>>>>    |  |- btrfs_commit_transaction()
> >>>>>>    |     Transaction X got committed, super num devs untouched,
> >>>>>>    |     but device item removed from chunk tree.
> >>>>>>    |     (AKA, super num devs is already incorrect)
> >>>>>>    |
> >>>>>>    |- cur_devices->num_devices--;
> >>>>>>    |- cur_devices->total_devices--;
> >>>>>>    |- btrfs_set_super_num_devices()
> >>>>>>       All those operations are not in transaction X, thus it will
> >>>>>>       only be written back to disk in next transaction.
> >>>>>>
> >>>>>> So after the transaction X in btrfs_rm_dev_item() committed, but
> >>>>>> before
> >>>>>> transaction X+1 (which can be minutes away), a power loss happen,
> >>>>>> then
> >>>>>> we got the super num mismatch.
> >>>>>>
> >>>>>
> >>>
> >>>
> >>>>> The cause part is much better now. So why not also update the super
> >>>>> num_devices in the same transaction?
> >>>>
> >>>> A lot of other things like total_rw_bytes.
> >>>>
> >>>> Not to mention, even we got a fix, it will be another patch.
> >>>>
> >>>> Since the handling of such mismatch is needed to handle older kernels
> >>>> anyway.
> >>>
> >>>   Ok.
> >>>
> >>>
> >>>>>> [FIX]
> >>>>>> Make the super_num_devices check less strict, converting it from a
> >>>>>> hard
> >>>>>> error to a warning, and reset the value to a correct one for the
> >>>>>> current
> >>>>>> or next transaction commitment.
> >>>>>
> >>>>> So that we can leave the part where we identify and report num_devices
> >>>>> miss-match as it is.
> >>>>
> >>>> I didn't get your point.
> >>>> What do you want to get from this patch?
> >>>>
> >>>> Isn't this already the behavior of this patch?
> >>>
> >>>   Let me clarify - we don't need this patch if we fix the actual bug as
> >>> above. IMO.
> >>
> >> Big NO NO.
> >>
> >> The damage is already done, we must be responsible for whatever damage
> >> we caused, especially the damage has already reached disk.
> >>
> >> Just fixing the cause and call it a day is definitely not a
> >> responsible way.
> >>
> >> Especially when the damage is done, you have no way to mount it, just
> >> like the reporter.
> >>
> >> You dare to say the same thing to the end user?
> >
> > You have a btrfs-progs patch to recover from that situation. Right?
> > Plus, I suppose you are sending a kernel patch for the actual bug
> > which is causing this corruption. No?
>
> Not yet. It can be more complex.
> Feel free to try to fix it properly.
>
> >
> > This patch is the reporter side fix. I don't encourage fixing the
> > reporter because a similar corruption might happen for reasons unknown
> > yet. For example, raid1 split-brain? Which is not yet completely
> > analyzed and test-cased yet.
>
> No matter whatever the reason, you still can't deny the fact that, if
> just super num dev mismatches, there is no need to reject the full fs.

As a user .. I expected to be able to mount it with "-o degraded" ..
or/and "-o recovery,ro"

But at the same time .. I don't understand what this "num_devices"
does and/or why it's required...
Or what effects it has on the data-integrity..

It was just very confusing to me because "btrfs check" said everything is fine..

> We have tree-checker for each chunk leave, and rw bytes check already.
> Even the possible bit flip check for num devs.
>
> The super num devs check doesn't make much sense except causing more
> hassles to the end uesr.
>
> The report is just giving us a chance to review if the behavior is
> really helpful.
> To me, NO.
>
> >
> > In my POV.
> >
> > Thanks, Anand
> >
> >
> >
> >>>>> Thanks, Anand
> >>>>>
> >>>>>
> >>>>>> Reported-by: Luca Béla Palkovics <luca.bela.palkovics@gmail.com>
> >>>>>> Link:
> >>>>>> https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CKmp3igr44okQ@mail.gmail.com/
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
> >>>>>> ---
> >>>>>> Changelog:
> >>>>>> v2:
> >>>>>> - Add a proper reason on why this mismatch can happen
> >>>>>>    No code change.
> >>>>>> ---
> >>>>>>   fs/btrfs/volumes.c | 8 ++++----
> >>>>>>   1 file changed, 4 insertions(+), 4 deletions(-)
> >>>>>>
> >>>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >>>>>> index 74c8024d8f96..d0ba3ff21920 100644
> >>>>>> --- a/fs/btrfs/volumes.c
> >>>>>> +++ b/fs/btrfs/volumes.c
> >>>>>> @@ -7682,12 +7682,12 @@ int btrfs_read_chunk_tree(struct
> >>>>>> btrfs_fs_info
> >>>>>> *fs_info)
> >>>>>>        * do another round of validation checks.
> >>>>>>        */
> >>>>>>       if (total_dev != fs_info->fs_devices->total_devices) {
> >>>>>> -        btrfs_err(fs_info,
> >>>>>> -       "super_num_devices %llu mismatch with num_devices %llu found
> >>>>>> here",
> >>>>>> +        btrfs_warn(fs_info,
> >>>>>> +       "super_num_devices %llu mismatch with num_devices %llu found
> >>>>>> here, will be repaired on next transaction commitment",
> >>>>>>                 btrfs_super_num_devices(fs_info->super_copy),
> >>>>>>                 total_dev);
> >>>>>> -        ret = -EINVAL;
> >>>>>> -        goto error;
> >>>>>> +        fs_info->fs_devices->total_devices = total_dev;
> >>>>>> +        btrfs_set_super_num_devices(fs_info->super_copy, total_dev);
> >>>>>>       }
> >>>>>>       if (btrfs_super_total_bytes(fs_info->super_copy) <
> >>>>>>           fs_info->fs_devices->total_rw_bytes) {
> >>>>>
> >>>
> >

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] btrfs: repair super block num_devices automatically
  2022-02-28 13:13             ` Luca Béla Palkovics
@ 2022-02-28 13:29               ` Qu Wenruo
  0 siblings, 0 replies; 11+ messages in thread
From: Qu Wenruo @ 2022-02-28 13:29 UTC (permalink / raw)
  To: Luca Béla Palkovics; +Cc: Anand Jain, Qu Wenruo, linux-btrfs



On 2022/2/28 21:13, Luca Béla Palkovics wrote:
> Am Mo., 28. Feb. 2022 um 14:03 Uhr schrieb Qu Wenruo <quwenruo.btrfs@gmx.com>:
>>
>>
>>
>> On 2022/2/28 20:51, Anand Jain wrote:
>>> On 28/02/2022 17:21, Qu Wenruo wrote:
>>>>
>>>>
>>>> On 2022/2/28 17:01, Anand Jain wrote:
>>>>> On 28/02/2022 16:54, Qu Wenruo wrote:
>>>>>>
>>>>>>
>>>>>> On 2022/2/28 16:00, Anand Jain wrote:
>>>>>>> On 28/02/2022 15:05, Qu Wenruo wrote:
>>>>>>>> [BUG]
>>>>>>>> There is a report that a btrfs has a bad super block num devices.
>>>>>>>>
>>>>>>>> This makes btrfs to reject the fs completely.
>>>>>>>>
>>>>>>>>     BTRFS error (device sdd3): super_num_devices 3 mismatch with
>>>>>>>> num_devices 2 found here
>>>>>>>>     BTRFS error (device sdd3): failed to read chunk tree: -22
>>>>>>>>     BTRFS error (device sdd3): open_ctree failed
>>>>>>>>
>>>>>>>> [CAUSE]
>>>>>>>> During btrfs device removal, chunk tree and super block num devs are
>>>>>>>> updated in two different transactions:
>>>>>>>>
>>>>>>>>     btrfs_rm_device()
>>>>>>>>     |- btrfs_rm_dev_item(device)
>>>>>>>>     |  |- trans = btrfs_start_transaction()
>>>>>>>>     |  |  Now we got transaction X
>>>>>>>>     |  |
>>>>>>>>     |  |- btrfs_del_item()
>>>>>>>>     |  |  Now device item is removed from chunk tree
>>>>>>>>     |  |
>>>>>>>>     |  |- btrfs_commit_transaction()
>>>>>>>>     |     Transaction X got committed, super num devs untouched,
>>>>>>>>     |     but device item removed from chunk tree.
>>>>>>>>     |     (AKA, super num devs is already incorrect)
>>>>>>>>     |
>>>>>>>>     |- cur_devices->num_devices--;
>>>>>>>>     |- cur_devices->total_devices--;
>>>>>>>>     |- btrfs_set_super_num_devices()
>>>>>>>>        All those operations are not in transaction X, thus it will
>>>>>>>>        only be written back to disk in next transaction.
>>>>>>>>
>>>>>>>> So after the transaction X in btrfs_rm_dev_item() committed, but
>>>>>>>> before
>>>>>>>> transaction X+1 (which can be minutes away), a power loss happen,
>>>>>>>> then
>>>>>>>> we got the super num mismatch.
>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>>>> The cause part is much better now. So why not also update the super
>>>>>>> num_devices in the same transaction?
>>>>>>
>>>>>> A lot of other things like total_rw_bytes.
>>>>>>
>>>>>> Not to mention, even we got a fix, it will be another patch.
>>>>>>
>>>>>> Since the handling of such mismatch is needed to handle older kernels
>>>>>> anyway.
>>>>>
>>>>>    Ok.
>>>>>
>>>>>
>>>>>>>> [FIX]
>>>>>>>> Make the super_num_devices check less strict, converting it from a
>>>>>>>> hard
>>>>>>>> error to a warning, and reset the value to a correct one for the
>>>>>>>> current
>>>>>>>> or next transaction commitment.
>>>>>>>
>>>>>>> So that we can leave the part where we identify and report num_devices
>>>>>>> miss-match as it is.
>>>>>>
>>>>>> I didn't get your point.
>>>>>> What do you want to get from this patch?
>>>>>>
>>>>>> Isn't this already the behavior of this patch?
>>>>>
>>>>>    Let me clarify - we don't need this patch if we fix the actual bug as
>>>>> above. IMO.
>>>>
>>>> Big NO NO.
>>>>
>>>> The damage is already done, we must be responsible for whatever damage
>>>> we caused, especially the damage has already reached disk.
>>>>
>>>> Just fixing the cause and call it a day is definitely not a
>>>> responsible way.
>>>>
>>>> Especially when the damage is done, you have no way to mount it, just
>>>> like the reporter.
>>>>
>>>> You dare to say the same thing to the end user?
>>>
>>> You have a btrfs-progs patch to recover from that situation. Right?
>>> Plus, I suppose you are sending a kernel patch for the actual bug
>>> which is causing this corruption. No?
>>
>> Not yet. It can be more complex.
>> Feel free to try to fix it properly.
>>
>>>
>>> This patch is the reporter side fix. I don't encourage fixing the
>>> reporter because a similar corruption might happen for reasons unknown
>>> yet. For example, raid1 split-brain? Which is not yet completely
>>> analyzed and test-cased yet.
>>
>> No matter whatever the reason, you still can't deny the fact that, if
>> just super num dev mismatches, there is no need to reject the full fs.
>
> As a user .. I expected to be able to mount it with "-o degraded" ..
> or/and "-o recovery,ro"

With this patch, you won't need any special mount option.
Since btrfs knows it's just a number mismatch, nothing to worry.

>
> But at the same time .. I don't understand what this "num_devices"
> does and/or why it's required...
> Or what effects it has on the data-integrity..
>
> It was just very confusing to me because "btrfs check" said everything is fine..

That's why there is also another patchset for btrfs-progs, to detect and
repair the problem.
(Although now with this patch, you won't even notice a problem, kernel
will automatically fix the problem, other than rejecting it)

Thanks,
Qu

>
>> We have tree-checker for each chunk leave, and rw bytes check already.
>> Even the possible bit flip check for num devs.
>>
>> The super num devs check doesn't make much sense except causing more
>> hassles to the end uesr.
>>
>> The report is just giving us a chance to review if the behavior is
>> really helpful.
>> To me, NO.
>>
>>>
>>> In my POV.
>>>
>>> Thanks, Anand
>>>
>>>
>>>
>>>>>>> Thanks, Anand
>>>>>>>
>>>>>>>
>>>>>>>> Reported-by: Luca Béla Palkovics <luca.bela.palkovics@gmail.com>
>>>>>>>> Link:
>>>>>>>> https://lore.kernel.org/linux-btrfs/CA+8xDSpvdm_U0QLBAnrH=zqDq_cWCOH5TiV46CKmp3igr44okQ@mail.gmail.com/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>>>>>>>> ---
>>>>>>>> Changelog:
>>>>>>>> v2:
>>>>>>>> - Add a proper reason on why this mismatch can happen
>>>>>>>>     No code change.
>>>>>>>> ---
>>>>>>>>    fs/btrfs/volumes.c | 8 ++++----
>>>>>>>>    1 file changed, 4 insertions(+), 4 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
>>>>>>>> index 74c8024d8f96..d0ba3ff21920 100644
>>>>>>>> --- a/fs/btrfs/volumes.c
>>>>>>>> +++ b/fs/btrfs/volumes.c
>>>>>>>> @@ -7682,12 +7682,12 @@ int btrfs_read_chunk_tree(struct
>>>>>>>> btrfs_fs_info
>>>>>>>> *fs_info)
>>>>>>>>         * do another round of validation checks.
>>>>>>>>         */
>>>>>>>>        if (total_dev != fs_info->fs_devices->total_devices) {
>>>>>>>> -        btrfs_err(fs_info,
>>>>>>>> -       "super_num_devices %llu mismatch with num_devices %llu found
>>>>>>>> here",
>>>>>>>> +        btrfs_warn(fs_info,
>>>>>>>> +       "super_num_devices %llu mismatch with num_devices %llu found
>>>>>>>> here, will be repaired on next transaction commitment",
>>>>>>>>                  btrfs_super_num_devices(fs_info->super_copy),
>>>>>>>>                  total_dev);
>>>>>>>> -        ret = -EINVAL;
>>>>>>>> -        goto error;
>>>>>>>> +        fs_info->fs_devices->total_devices = total_dev;
>>>>>>>> +        btrfs_set_super_num_devices(fs_info->super_copy, total_dev);
>>>>>>>>        }
>>>>>>>>        if (btrfs_super_total_bytes(fs_info->super_copy) <
>>>>>>>>            fs_info->fs_devices->total_rw_bytes) {
>>>>>>>
>>>>>
>>>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] btrfs: repair super block num_devices automatically
  2022-02-28 12:51         ` Anand Jain
  2022-02-28 13:03           ` Qu Wenruo
@ 2022-04-20 15:42           ` David Sterba
  2022-04-25 17:07             ` Anand Jain
  1 sibling, 1 reply; 11+ messages in thread
From: David Sterba @ 2022-04-20 15:42 UTC (permalink / raw)
  To: Anand Jain; +Cc: Qu Wenruo, Qu Wenruo, Luca Béla Palkovics, linux-btrfs

On Mon, Feb 28, 2022 at 08:51:30PM +0800, Anand Jain wrote:
> >>> I didn't get your point.
> >>> What do you want to get from this patch?
> >>>
> >>> Isn't this already the behavior of this patch?
> >>
> >>   Let me clarify - we don't need this patch if we fix the actual bug as
> >> above. IMO.
> > 
> > Big NO NO.
> > 
> > The damage is already done, we must be responsible for whatever damage
> > we caused, especially the damage has already reached disk.
> > 
> > Just fixing the cause and call it a day is definitely not a responsible 
> > way.
> > 
> > Especially when the damage is done, you have no way to mount it, just
> > like the reporter.
> > 
> > You dare to say the same thing to the end user?
> 
> You have a btrfs-progs patch to recover from that situation. Right?
> Plus, I suppose you are sending a kernel patch for the actual bug
> which is causing this corruption. No?

Normally we'd have just the repair code in check, but this is sometimes
difficult to run eg. on a root filesystem, or if the boot fails because
some mount fails and ends up in the rescue environment with limited
options.

> This patch is the reporter side fix. I don't encourage fixing the
> reporter because a similar corruption might happen for reasons unknown
> yet. For example, raid1 split-brain? Which is not yet completely
> analyzed and test-cased yet.

So that's a valid question if the auto-repair should be done or not in
kernel. But an offline repair would do the same thing, read number of
device items and set the num_devices. The decision on kernel side at
mount or by user when running btrfs check has the same amount of
information, so forcing user to do offline repair is a bit less
convenient.

The num_devices is basically a cached value of the number of device
items, used in many places as a shortcut so we don't have to count them
each time. Once we make sure that the numbers are in sync, it's the
correct stat. We know about one scenario where it would get out of sync,
now fixed, so the simple auto repair is IMHO safe.

For the raid1 split brain, I can't tell and if you say it's not analyzed
properly we'd have to rely on other mechanisms to catch the
inconsistency. Eg. a missing physical device would be recognized and
would require degraded mount, but it's unrelated to the value of
num_devices.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v2] btrfs: repair super block num_devices automatically
  2022-04-20 15:42           ` David Sterba
@ 2022-04-25 17:07             ` Anand Jain
  0 siblings, 0 replies; 11+ messages in thread
From: Anand Jain @ 2022-04-25 17:07 UTC (permalink / raw)
  To: dsterba, Qu Wenruo, Qu Wenruo, Luca Béla Palkovics, linux-btrfs

On 4/20/22 23:42, David Sterba wrote:
> On Mon, Feb 28, 2022 at 08:51:30PM +0800, Anand Jain wrote:
>>>>> I didn't get your point.
>>>>> What do you want to get from this patch?
>>>>>
>>>>> Isn't this already the behavior of this patch?
>>>>
>>>>    Let me clarify - we don't need this patch if we fix the actual bug as
>>>> above. IMO.
>>>
>>> Big NO NO.
>>>
>>> The damage is already done, we must be responsible for whatever damage
>>> we caused, especially the damage has already reached disk.
>>>
>>> Just fixing the cause and call it a day is definitely not a responsible
>>> way.
>>>
>>> Especially when the damage is done, you have no way to mount it, just
>>> like the reporter.
>>>
>>> You dare to say the same thing to the end user?
>>
>> You have a btrfs-progs patch to recover from that situation. Right?
>> Plus, I suppose you are sending a kernel patch for the actual bug
>> which is causing this corruption. No?
> 
> Normally we'd have just the repair code in check, but this is sometimes
> difficult to run eg. on a root filesystem, or if the boot fails because
> some mount fails and ends up in the rescue environment with limited
> options.
> 

  I agree. The following patch has fixed the actual bug in this case:

    [PATCH] btrfs: make device item removal and super block num devices 
update happen in the same transac

>> This patch is the reporter side fix. I don't encourage fixing the
>> reporter because a similar corruption might happen for reasons unknown
>> yet. For example, raid1 split-brain? Which is not yet completely
>> analyzed and test-cased yet.
> 
> So that's a valid question if the auto-repair should be done or not in
> kernel. But an offline repair would do the same thing, read number of
> device items and set the num_devices. The decision on kernel side at
> mount or by user when running btrfs check has the same amount of
> information, so forcing user to do offline repair is a bit less
> convenient.
> 
> The num_devices is basically a cached value of the number of device
> items, used in many places as a shortcut so we don't have to count them
> each time. Once we make sure that the numbers are in sync, it's the
> correct stat. We know about one scenario where it would get out of sync,
> now fixed, so the simple auto repair is IMHO safe.
> 
> For the raid1 split brain, I can't tell and if you say it's not analyzed
> properly we'd have to rely on other mechanisms to catch the
> inconsistency. Eg. a missing physical device would be recognized and
> would require degraded mount, but it's unrelated to the value of
> num_devices.


Consider the following situation:

  RAID1 on d1 d2 d3 mounted on /btrfs

  Run
     $ btrfs dev del d2 /btrfs <==== but encounters a power failure

  At the next mount, let's say the d1 generation is lower than d3.
  We read the root tree from d3 (higher generation).

  Per d3 dev_item count = 2 but, d1 says dev_item count = 3.

  We pick d3's side and fix d1's super_block num_devices _only_, but
  the dev_items count is still 3 in d1.

  reboot; now the d1 generation == d3 generation.
  Let's say mount picks d1, and finds dev_items count = 3 and
  super_block num_device = 2.
  The logic in this patch thinks that the d1 dev_item count is correct
  and reverts the num_device in the super-block back to 3.

So the problem statement here is:
   During mount, a lower generation on a device indicates missing writes.
   But any write to the device fixes the generation number; subsequent
   reboots has no idea about the missing writes.

Thanks, Anand

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-04-25 17:07 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-28  7:05 [PATCH v2] btrfs: repair super block num_devices automatically Qu Wenruo
2022-02-28  8:00 ` Anand Jain
2022-02-28  8:54   ` Qu Wenruo
2022-02-28  9:01     ` Anand Jain
2022-02-28  9:21       ` Qu Wenruo
2022-02-28 12:51         ` Anand Jain
2022-02-28 13:03           ` Qu Wenruo
2022-02-28 13:13             ` Luca Béla Palkovics
2022-02-28 13:29               ` Qu Wenruo
2022-04-20 15:42           ` David Sterba
2022-04-25 17:07             ` Anand Jain

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.