All of lore.kernel.org
 help / color / mirror / Atom feed
* 'btrfs rescue' command (recommended by btrfs check) fails on old BTRFS RAID1 on (currently) openSUSE Leap 15.3
@ 2022-04-23 18:39 Johannes Kastl
  2022-04-23 23:07 ` Qu Wenruo
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Kastl @ 2022-04-23 18:39 UTC (permalink / raw)
  To: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 4963 bytes --]

Good evening,

I need your advice on how to continue with one of my BTRFS RAID1 setups.

The machine and the BTRFS RAID1 were built/created in 2014, as far as I can say. 
It was built using openSUSE Leap, I think 13.X or similar. The machine was then 
constantly upgraded and is now running openSUSE Leap 15.3 with a 5.3.18 kernel 
(detailed infos below).

As one of the HDDs started reporting SMART errors, I just dd'ed each of the 4TB 
disks onto new 8TB disks (and fixed the GPT backup). I did not resize the 
filesystem, so it is still 3.6 TiB like on the old HDDs.
To make sure that the filesystem was working, I issued a btrfsck against each of 
the devices.

The output of the check is below. The TL;DR was that I should run 'btrfs rescue 
fix-device-size' to fix a "minor" issue.

Unfortunately, running this command fails:

> root dumbo:/root # btrfs rescue fix-device-size /dev/sdc1
> Unable to find block group for 0
> Unable to find block group for 0
> Unable to find block group for 0
> btrfs unable to find ref byte nr 2959295381504 parent 0 root 3  owner 1 offset 0
> transaction.c:168: btrfs_commit_transaction: BUG_ON `ret` triggered, value -5
> btrfs(+0x51f99)[0x55aeeae12f99]
> btrfs(btrfs_commit_transaction+0x193)[0x55aeeae13573]
> btrfs(btrfs_fix_device_size+0x123)[0x55aeeadfe2a3]
> btrfs(btrfs_fix_device_and_super_size+0x6b)[0x55aeeadfe56b]
> btrfs(+0x6ceee)[0x55aeeae2deee]
> btrfs(main+0x8e)[0x55aeeade008e]
> /lib64/libc.so.6(__libc_start_main+0xef)[0x7f907fc0d2bd]
> btrfs(_start+0x2a)[0x55aeeade028a]
> Aborted (core dumped)
> root dumbo:/root #

So, my question is what I should do:

Do I need to run another command to fix this issue?
Can I safely ignore the issue?
Should I copy all of the data to another disk, and create a new BTRFS RAID1 from 
scratch? (Which of course I would like to avoid, if possible...)

Maybe someone can advise me on how to proceed. I am grateful for all of the 
input I get.

If there is other information I should give, please feel free to reach out to me.

Kind Regards
Johannes

#######################################################################
btrfs check output:

> root dumbo:/root # btrfs check -p /dev/sdc1 ;btrfs check -p /dev/sdd1
> Opening filesystem to check...
> Checking filesystem on /dev/sdc1
> UUID: 50651b41-bf33-47e7-8a08-afbc71ba0bf8
> [1/7] checking root items                      (0:03:09 elapsed, 9467877 items checked)
> WARNING: unaligned total_bytes detected for devid 2, have 4000785964544 should be aligned to 4096
> WARNING: this is OK for older kernel, but may cause kernel warning for newer kernels
> WARNING: this can be fixed by 'btrfs rescue fix-device-size'
> [2/7] checking extents                         (0:38:38 elapsed, 6910485 items checked)
> WARNING: minor unaligned/mismatch device size detected
> WARNING: recommended to use 'btrfs rescue fix-device-size' to fix it
> [3/7] checking free space cache                (0:02:26 elapsed, 3730 items checked)
> [4/7] checking fs roots                        (6:43:40 elapsed, 6614818 items checked)
> [5/7] checking csums (without verifying data)  (0:10:36 elapsed, 1419101 items checked)
> [6/7] checking root refs                       (0:00:00 elapsed, 4 items checked)
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 3308275023872 bytes used, no error found
> total csum bytes: 3119386928
> total tree bytes: 113221238784
> total fs tree bytes: 108770082816
> total extent tree bytes: 971456512
> btree space waste bytes: 15308811797
> file data blocks allocated: 3195053785088
>  referenced 3195047018496

#######################################################################

Machine and filesystem details

> $ uname -a
> Linux dumbo 5.3.18-150300.59.60-default #1 SMP Fri Mar 18 18:37:08 UTC 2022 (79e1683) x86_64 x86_64 x86_64 GNU/Linux
> 
> # btrfs --version
> btrfs-progs v4.19.1
> 
> # btrfs fi show
> Label: 'DUMBO_BACKUP_4TB'  uuid: 50651b41-bf33-47e7-8a08-afbc71ba0bf8
>         Total devices 2 FS bytes used 3.08TiB
>         devid    1 size 3.64TiB used 3.64TiB path /dev/sdd1
>         devid    2 size 3.64TiB used 3.63TiB path /dev/sdc1
> 
> # btrfs fi df /mnt/DUMBO_BACKUP_4TB/
> Data, RAID1: total=3.36TiB, used=2.97TiB
> Data, DUP: total=13.50MiB, used=2.81MiB
> Data, single: total=1.00GiB, used=0.00B
> System, RAID1: total=32.00MiB, used=560.00KiB
> System, single: total=32.00MiB, used=0.00B
> Metadata, RAID1: total=284.94GiB, used=108.05GiB
> Metadata, DUP: total=512.00MiB, used=64.00KiB
> Metadata, single: total=1.00GiB, used=0.00B
> GlobalReserve, single: total=512.00MiB, used=0.00B

-- 
Johannes Kastl
Linux Consultant & Trainer
Tel.: +49 (0) 151 2372 5802
Mail: kastl@b1-systems.de

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg
http://www.b1-systems.de
GF: Ralph Dehner
Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'btrfs rescue' command (recommended by btrfs check) fails on old BTRFS RAID1 on (currently) openSUSE Leap 15.3
  2022-04-23 18:39 'btrfs rescue' command (recommended by btrfs check) fails on old BTRFS RAID1 on (currently) openSUSE Leap 15.3 Johannes Kastl
@ 2022-04-23 23:07 ` Qu Wenruo
  2022-04-24  9:10   ` Johannes Kastl
  0 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2022-04-23 23:07 UTC (permalink / raw)
  To: Johannes Kastl, linux-btrfs



On 2022/4/24 02:39, Johannes Kastl wrote:
> Good evening,
>
> I need your advice on how to continue with one of my BTRFS RAID1 setups.
>
> The machine and the BTRFS RAID1 were built/created in 2014, as far as I
> can say. It was built using openSUSE Leap, I think 13.X or similar. The
> machine was then constantly upgraded and is now running openSUSE Leap
> 15.3 with a 5.3.18 kernel (detailed infos below).
>
> As one of the HDDs started reporting SMART errors, I just dd'ed each of
> the 4TB disks onto new 8TB disks (and fixed the GPT backup). I did not
> resize the filesystem, so it is still 3.6 TiB like on the old HDDs.
> To make sure that the filesystem was working, I issued a btrfsck against
> each of the devices.

No need to run btrfs check on each device.

Btrfs check will assemble the array automatically (just like kernel),
and check the fs on all involved devices.
Thus no need to run the same check on all devices.

>
> The output of the check is below. The TL;DR was that I should run 'btrfs
> rescue fix-device-size' to fix a "minor" issue.
>
> Unfortunately, running this command fails:
>
>> root dumbo:/root # btrfs rescue fix-device-size /dev/sdc1
>> Unable to find block group for 0
>> Unable to find block group for 0
>> Unable to find block group for 0

This is an unique error message, which can only be triggered when
btrfs-progs failed to find a block group with enough free space.

>> btrfs unable to find ref byte nr 2959295381504 parent 0 root 3  owner
>> 1 offset 0
>> transaction.c:168: btrfs_commit_transaction: BUG_ON `ret` triggered,
>> value -5

So at least no damage done to the good and innocent (but a little old) fs.
>> btrfs(+0x51f99)[0x55aeeae12f99]
>> btrfs(btrfs_commit_transaction+0x193)[0x55aeeae13573]
>> btrfs(btrfs_fix_device_size+0x123)[0x55aeeadfe2a3]
>> btrfs(btrfs_fix_device_and_super_size+0x6b)[0x55aeeadfe56b]
>> btrfs(+0x6ceee)[0x55aeeae2deee]
>> btrfs(main+0x8e)[0x55aeeade008e]
>> /lib64/libc.so.6(__libc_start_main+0xef)[0x7f907fc0d2bd]
>> btrfs(_start+0x2a)[0x55aeeade028a]
>> Aborted (core dumped)
>> root dumbo:/root #
>
> So, my question is what I should do:
>
> Do I need to run another command to fix this issue?

Not really.

But if you want to really remove the warning, please update btrfs-progs
first, to the latest stable version (v5.16.2), and try again.

The involved progs, v4.19 is a little old, and IIRC we had some ENOSPC
related fixed in progs, thus if above problem a bug caused false ENOSPC,
it should be fixed now.

> Can I safely ignore the issue?

You can ignore it for now.
It's not a big deal and kernel can handle it without problem.

> Should I copy all of the data to another disk, and create a new BTRFS
> RAID1 from scratch? (Which of course I would like to avoid, if possible...)

Definitely no.

Thanks,
Qu
>
> Maybe someone can advise me on how to proceed. I am grateful for all of
> the input I get.
>
> If there is other information I should give, please feel free to reach
> out to me.
>
> Kind Regards
> Johannes
>
> #######################################################################
> btrfs check output:
>
>> root dumbo:/root # btrfs check -p /dev/sdc1 ;btrfs check -p /dev/sdd1
>> Opening filesystem to check...
>> Checking filesystem on /dev/sdc1
>> UUID: 50651b41-bf33-47e7-8a08-afbc71ba0bf8
>> [1/7] checking root items                      (0:03:09 elapsed,
>> 9467877 items checked)
>> WARNING: unaligned total_bytes detected for devid 2, have
>> 4000785964544 should be aligned to 4096
>> WARNING: this is OK for older kernel, but may cause kernel warning for
>> newer kernels
>> WARNING: this can be fixed by 'btrfs rescue fix-device-size'
>> [2/7] checking extents                         (0:38:38 elapsed,
>> 6910485 items checked)
>> WARNING: minor unaligned/mismatch device size detected
>> WARNING: recommended to use 'btrfs rescue fix-device-size' to fix it
>> [3/7] checking free space cache                (0:02:26 elapsed, 3730
>> items checked)
>> [4/7] checking fs roots                        (6:43:40 elapsed,
>> 6614818 items checked)
>> [5/7] checking csums (without verifying data)  (0:10:36 elapsed,
>> 1419101 items checked)
>> [6/7] checking root refs                       (0:00:00 elapsed, 4
>> items checked)
>> [7/7] checking quota groups skipped (not enabled on this FS)
>> found 3308275023872 bytes used, no error found
>> total csum bytes: 3119386928
>> total tree bytes: 113221238784
>> total fs tree bytes: 108770082816
>> total extent tree bytes: 971456512
>> btree space waste bytes: 15308811797
>> file data blocks allocated: 3195053785088
>>  referenced 3195047018496
>
> #######################################################################
>
> Machine and filesystem details
>
>> $ uname -a
>> Linux dumbo 5.3.18-150300.59.60-default #1 SMP Fri Mar 18 18:37:08 UTC
>> 2022 (79e1683) x86_64 x86_64 x86_64 GNU/Linux
>>
>> # btrfs --version
>> btrfs-progs v4.19.1
>>
>> # btrfs fi show
>> Label: 'DUMBO_BACKUP_4TB'  uuid: 50651b41-bf33-47e7-8a08-afbc71ba0bf8
>>         Total devices 2 FS bytes used 3.08TiB
>>         devid    1 size 3.64TiB used 3.64TiB path /dev/sdd1
>>         devid    2 size 3.64TiB used 3.63TiB path /dev/sdc1
>>
>> # btrfs fi df /mnt/DUMBO_BACKUP_4TB/
>> Data, RAID1: total=3.36TiB, used=2.97TiB
>> Data, DUP: total=13.50MiB, used=2.81MiB
>> Data, single: total=1.00GiB, used=0.00B
>> System, RAID1: total=32.00MiB, used=560.00KiB
>> System, single: total=32.00MiB, used=0.00B
>> Metadata, RAID1: total=284.94GiB, used=108.05GiB
>> Metadata, DUP: total=512.00MiB, used=64.00KiB
>> Metadata, single: total=1.00GiB, used=0.00B
>> GlobalReserve, single: total=512.00MiB, used=0.00B
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'btrfs rescue' command (recommended by btrfs check) fails on old BTRFS RAID1 on (currently) openSUSE Leap 15.3
  2022-04-23 23:07 ` Qu Wenruo
@ 2022-04-24  9:10   ` Johannes Kastl
  2022-04-24  9:21     ` Qu Wenruo
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Kastl @ 2022-04-24  9:10 UTC (permalink / raw)
  To: linux-btrfs; +Cc: Qu Wenruo


[-- Attachment #1.1: Type: text/plain, Size: 2503 bytes --]

Hi Qu,

On 24.04.22 at 01:07 Qu Wenruo wrote:

> No need to run btrfs check on each device.
> 
> Btrfs check will assemble the array automatically (just like kernel),
> and check the fs on all involved devices.
> Thus no need to run the same check on all devices.

OK, good to know. That saves half the time :-)

>> The output of the check is below. The TL;DR was that I should run 'btrfs
>> rescue fix-device-size' to fix a "minor" issue.
>>
>> Unfortunately, running this command fails:
>>
>>> root dumbo:/root # btrfs rescue fix-device-size /dev/sdc1
>>> Unable to find block group for 0
>>> Unable to find block group for 0
>>> Unable to find block group for 0
> 
> This is an unique error message, which can only be triggered when
> btrfs-progs failed to find a block group with enough free space.

So would resizing the filesystem (to 8GiB) workaround this "limitation", so 
afterwards it could properly fix the device size?

>>> btrfs unable to find ref byte nr 2959295381504 parent 0 root 3  owner
>>> 1 offset 0
>>> transaction.c:168: btrfs_commit_transaction: BUG_ON `ret` triggered,
>>> value -5
> 
> So at least no damage done to the good and innocent (but a little old) fs.

Puuuh, nice to hear that. :-)

>> So, my question is what I should do:
>>
>> Do I need to run another command to fix this issue?
> 
> Not really.
> 
> But if you want to really remove the warning, please update btrfs-progs
> first, to the latest stable version (v5.16.2), and try again.

I'll have a look if I can easily install a newer version of btrfsprogs on this 
machine.

> The involved progs, v4.19 is a little old, and IIRC we had some ENOSPC
> related fixed in progs, thus if above problem a bug caused false ENOSPC,
> it should be fixed now.

If I can install a newer version, I'll let you know if the bug disappears.

> You can ignore it for now.
> It's not a big deal and kernel can handle it without problem.

That's good.

>> Should I copy all of the data to another disk, and create a new BTRFS
>> RAID1 from scratch? (Which of course I would like to avoid, if possible...)
> 
> Definitely no.

Perfect.

Thanks for your reply! Have a nice day.

Kind Regards,
Johannes

-- 
Johannes Kastl
Linux Consultant & Trainer
Tel.: +49 (0) 151 2372 5802
Mail: kastl@b1-systems.de

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg
http://www.b1-systems.de
GF: Ralph Dehner
Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'btrfs rescue' command (recommended by btrfs check) fails on old BTRFS RAID1 on (currently) openSUSE Leap 15.3
  2022-04-24  9:10   ` Johannes Kastl
@ 2022-04-24  9:21     ` Qu Wenruo
  2022-05-18 10:38       ` Johannes Kastl
  0 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2022-04-24  9:21 UTC (permalink / raw)
  To: Johannes Kastl, linux-btrfs



On 2022/4/24 17:10, Johannes Kastl wrote:
> Hi Qu,
>
> On 24.04.22 at 01:07 Qu Wenruo wrote:
>
>> No need to run btrfs check on each device.
>>
>> Btrfs check will assemble the array automatically (just like kernel),
>> and check the fs on all involved devices.
>> Thus no need to run the same check on all devices.
>
> OK, good to know. That saves half the time :-)
>
>>> The output of the check is below. The TL;DR was that I should run 'btrfs
>>> rescue fix-device-size' to fix a "minor" issue.
>>>
>>> Unfortunately, running this command fails:
>>>
>>>> root dumbo:/root # btrfs rescue fix-device-size /dev/sdc1
>>>> Unable to find block group for 0
>>>> Unable to find block group for 0
>>>> Unable to find block group for 0
>>
>> This is an unique error message, which can only be triggered when
>> btrfs-progs failed to find a block group with enough free space.
>
> So would resizing the filesystem (to 8GiB) workaround this "limitation",
> so afterwards it could properly fix the device size?

I'm not yet sure if it's a bug in progs causing false ENOSPC, or really
there isn't many space left.

For the former case, no matter how much free space you have, it won't help.

For the latter case, it would definitely help.

Thanks,
Qu

>
>>>> btrfs unable to find ref byte nr 2959295381504 parent 0 root 3  owner
>>>> 1 offset 0
>>>> transaction.c:168: btrfs_commit_transaction: BUG_ON `ret` triggered,
>>>> value -5
>>
>> So at least no damage done to the good and innocent (but a little old)
>> fs.
>
> Puuuh, nice to hear that. :-)
>
>>> So, my question is what I should do:
>>>
>>> Do I need to run another command to fix this issue?
>>
>> Not really.
>>
>> But if you want to really remove the warning, please update btrfs-progs
>> first, to the latest stable version (v5.16.2), and try again.
>
> I'll have a look if I can easily install a newer version of btrfsprogs
> on this machine.
>
>> The involved progs, v4.19 is a little old, and IIRC we had some ENOSPC
>> related fixed in progs, thus if above problem a bug caused false ENOSPC,
>> it should be fixed now.
>
> If I can install a newer version, I'll let you know if the bug disappears.
>
>> You can ignore it for now.
>> It's not a big deal and kernel can handle it without problem.
>
> That's good.
>
>>> Should I copy all of the data to another disk, and create a new BTRFS
>>> RAID1 from scratch? (Which of course I would like to avoid, if
>>> possible...)
>>
>> Definitely no.
>
> Perfect.
>
> Thanks for your reply! Have a nice day.
>
> Kind Regards,
> Johannes
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'btrfs rescue' command (recommended by btrfs check) fails on old BTRFS RAID1 on (currently) openSUSE Leap 15.3
  2022-04-24  9:21     ` Qu Wenruo
@ 2022-05-18 10:38       ` Johannes Kastl
  2022-05-18 10:59         ` Qu Wenruo
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Kastl @ 2022-05-18 10:38 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3668 bytes --]

Hi Qu,

TL;DR: took a while until I had all of the data backed up properly and had some 
time to test this. Unfortunately the filesystem is now no longer mountable...

Any ideas?

Johannes

On 24.04.22 at 11:21 Qu Wenruo wrote:
> On 2022/4/24 17:10, Johannes Kastl wrote:

>> So would resizing the filesystem (to 8GiB) workaround this "limitation",
>> so afterwards it could properly fix the device size?
> 
> I'm not yet sure if it's a bug in progs causing false ENOSPC, or really
> there isn't many space left.
> 
> For the former case, no matter how much free space you have, it won't help.
> 
> For the latter case, it would definitely help.

So, I deleted the partitions on both disks and re-created them with the new 
(bigger size), keeping the start sector and the btrfs signature intact.

I could then resize both disks to the same value successfully. At least, the 
commands ran without errors.

Fixing the device size fails nonetheless (see below). And I can no longer mount 
the filesystem, when I try I find this in the logs:

> [87396.889043] BTRFS error (device sdb1): super_total_bytes 15393162784768 mismatch with fs_devices total_rw_bytes 15393162788864
> [87396.889974] BTRFS error (device sdb1): failed to read chunk tree: -22
> [87396.892741] BTRFS error (device sdb1): open_ctree failed

(Don't get confused by sdb1, this is from a rescue system with only some HDDs 
attached)

Fixing the device-size on Leap 15.3:
> # btrfs filesystem show /mnt/DUMBO_BACKUP_4TB/
> Label: 'DUMBO_BACKUP_4TB'  uuid: 50651b41-bf33-47e7-8a08-afbc71ba0bf8
>         Total devices 2 FS bytes used 3.17TiB
>         devid    1 size 7.00TiB used 3.64TiB path /dev/sdd1
>         devid    2 size 7.00TiB used 3.63TiB path /dev/sdc1
> 
> # umount /mnt/DUMBO_BACKUP_4TB
> # btrfs rescue fix-device-size /dev/sdd1
> Unable to find block group for 0
> Unable to find block group for 0
> Unable to find block group for 0
> transaction.c:189: btrfs_commit_transaction: BUG_ON `ret` triggered, value -28
> btrfs(+0x51f99)[0x55edf7a43f99]
> btrfs(+0x525a9)[0x55edf7a445a9]
> btrfs(btrfs_fix_super_size+0x98)[0x55edf7a2f438]
> btrfs(btrfs_fix_device_and_super_size+0x84)[0x55edf7a2f584]
> btrfs(+0x6ceee)[0x55edf7a5eeee]
> btrfs(main+0x8e)[0x55edf7a1108e]
> /lib64/libc.so.6(__libc_start_main+0xef)[0x7f672ad962bd]
> btrfs(_start+0x2a)[0x55edf7a1128a]
> Aborted (core dumped)
> # 

I tested fixing the device-id by booting from a Tumbleweed rescue stick, running 
kernel 5.16 with btrfsprogs 5.16. This also fails, but spits out an error 
message that is a little different:

 > [...]
> Unable to find block group for 0
> Error: failed to commit current transaction: -28 (No space left on device)
> No device size related problem found
> ERROR: commit_root already set when starting transaction
> extent buffer leak: start ... len 16384

(I had to type this off of the screen)

As the mounting failed with an error related to chunks, I tried the btrfs rescue 
chunk-recover command, but that also aborts and dumps a core, even on Tumbleweed 
with kernel 5.16...

The error messages look something like this:
 > Unable to find block group for 0
 > Unable to find block group for 0
 > Unable to find block group for 0

followed by a "...BUG_ON `ret` triggered, value -28"

So this could all be related to -28 (No space left on device)?

-- 
Johannes Kastl
Linux Consultant & Trainer
Tel.: +49 (0) 151 2372 5802
Mail: kastl@b1-systems.de

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg
http://www.b1-systems.de
GF: Ralph Dehner
Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'btrfs rescue' command (recommended by btrfs check) fails on old BTRFS RAID1 on (currently) openSUSE Leap 15.3
  2022-05-18 10:38       ` Johannes Kastl
@ 2022-05-18 10:59         ` Qu Wenruo
  2022-05-20 15:14           ` Johannes Kastl
  0 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2022-05-18 10:59 UTC (permalink / raw)
  To: Johannes Kastl, linux-btrfs



On 2022/5/18 18:38, Johannes Kastl wrote:
> Hi Qu,
>
> TL;DR: took a while until I had all of the data backed up properly and
> had some time to test this. Unfortunately the filesystem is now no
> longer mountable...
>
> Any ideas?
>
> Johannes
>
> On 24.04.22 at 11:21 Qu Wenruo wrote:
>> On 2022/4/24 17:10, Johannes Kastl wrote:
>
>>> So would resizing the filesystem (to 8GiB) workaround this "limitation",
>>> so afterwards it could properly fix the device size?
>>
>> I'm not yet sure if it's a bug in progs causing false ENOSPC, or really
>> there isn't many space left.
>>
>> For the former case, no matter how much free space you have, it won't
>> help.
>>
>> For the latter case, it would definitely help.
>
> So, I deleted the partitions on both disks and re-created them with the
> new (bigger size), keeping the start sector and the btrfs signature intact.
>
> I could then resize both disks to the same value successfully. At least,
> the commands ran without errors.
>
> Fixing the device size fails nonetheless (see below). And I can no
> longer mount the filesystem, when I try I find this in the logs:
>
>> [87396.889043] BTRFS error (device sdb1): super_total_bytes
>> 15393162784768 mismatch with fs_devices total_rw_bytes 15393162788864
>> [87396.889974] BTRFS error (device sdb1): failed to read chunk tree: -22
>> [87396.892741] BTRFS error (device sdb1): open_ctree failed
>
> (Don't get confused by sdb1, this is from a rescue system with only some
> HDDs attached)
>
> Fixing the device-size on Leap 15.3:
>> # btrfs filesystem show /mnt/DUMBO_BACKUP_4TB/
>> Label: 'DUMBO_BACKUP_4TB'  uuid: 50651b41-bf33-47e7-8a08-afbc71ba0bf8
>>         Total devices 2 FS bytes used 3.17TiB
>>         devid    1 size 7.00TiB used 3.64TiB path /dev/sdd1
>>         devid    2 size 7.00TiB used 3.63TiB path /dev/sdc1

That's super weird, we have tons of unallocated space.

So definitely something wrong in btrfs-progs.
Normally `btrfs fi usage` would provide more info, but it needs the fs
to be mountable.

Can you prepare a building environment for btrfs-progs?

I can update the code to skip transaction commit so that we won't be
bother with -ENOSPC at all.

And since we're not really doing any metadata update, we don't really
need any new space.

And after your building environment prepared, you can fetch this branch
to compile the btrfs-progs and try to use the compiled `btrfs` command
to rescue the device again.

https://github.com/adam900710/btrfs-progs/tree/dirty_fix

I did some local tests, it shows no problem, but not sure if it would
work for you.

Thanks,
Qu

>>
>> # umount /mnt/DUMBO_BACKUP_4TB
>> # btrfs rescue fix-device-size /dev/sdd1
>> Unable to find block group for 0
>> Unable to find block group for 0
>> Unable to find block group for 0
>> transaction.c:189: btrfs_commit_transaction: BUG_ON `ret` triggered,
>> value -28
>> btrfs(+0x51f99)[0x55edf7a43f99]
>> btrfs(+0x525a9)[0x55edf7a445a9]
>> btrfs(btrfs_fix_super_size+0x98)[0x55edf7a2f438]
>> btrfs(btrfs_fix_device_and_super_size+0x84)[0x55edf7a2f584]
>> btrfs(+0x6ceee)[0x55edf7a5eeee]
>> btrfs(main+0x8e)[0x55edf7a1108e]
>> /lib64/libc.so.6(__libc_start_main+0xef)[0x7f672ad962bd]
>> btrfs(_start+0x2a)[0x55edf7a1128a]
>> Aborted (core dumped)
>> #
>
> I tested fixing the device-id by booting from a Tumbleweed rescue stick,
> running kernel 5.16 with btrfsprogs 5.16. This also fails, but spits out
> an error message that is a little different:
>
>  > [...]
>> Unable to find block group for 0
>> Error: failed to commit current transaction: -28 (No space left on
>> device)
>> No device size related problem found
>> ERROR: commit_root already set when starting transaction
>> extent buffer leak: start ... len 16384
>
> (I had to type this off of the screen)
>
> As the mounting failed with an error related to chunks, I tried the
> btrfs rescue chunk-recover command, but that also aborts and dumps a
> core, even on Tumbleweed with kernel 5.16...
>
> The error messages look something like this:
>  > Unable to find block group for 0
>  > Unable to find block group for 0
>  > Unable to find block group for 0
>
> followed by a "...BUG_ON `ret` triggered, value -28"
>
> So this could all be related to -28 (No space left on device)?
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'btrfs rescue' command (recommended by btrfs check) fails on old BTRFS RAID1 on (currently) openSUSE Leap 15.3
  2022-05-18 10:59         ` Qu Wenruo
@ 2022-05-20 15:14           ` Johannes Kastl
  2022-05-20 20:21             ` Johannes Kastl
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Kastl @ 2022-05-20 15:14 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2238 bytes --]

Hello Qu,

On 18.05.22 at 12:59 Qu Wenruo wrote:
> On 2022/5/18 18:38, Johannes Kastl wrote:
>> Fixing the device size fails nonetheless (see below). And I can no
>> longer mount the filesystem, when I try I find this in the logs:
>>
>>> [87396.889043] BTRFS error (device sdb1): super_total_bytes
>>> 15393162784768 mismatch with fs_devices total_rw_bytes 15393162788864
>>> [87396.889974] BTRFS error (device sdb1): failed to read chunk tree: -22
>>> [87396.892741] BTRFS error (device sdb1): open_ctree failed
>>
>> (Don't get confused by sdb1, this is from a rescue system with only some
>> HDDs attached)
>>
>> Fixing the device-size on Leap 15.3:
>>> # btrfs filesystem show /mnt/DUMBO_BACKUP_4TB/
>>> Label: 'DUMBO_BACKUP_4TB'  uuid: 50651b41-bf33-47e7-8a08-afbc71ba0bf8
>>>         Total devices 2 FS bytes used 3.17TiB
>>>         devid    1 size 7.00TiB used 3.64TiB path /dev/sdd1
>>>         devid    2 size 7.00TiB used 3.63TiB path /dev/sdc1
> 
> That's super weird, we have tons of unallocated space.
> 
> So definitely something wrong in btrfs-progs.
> Normally `btrfs fi usage` would provide more info, but it needs the fs
> to be mountable.
> 
> Can you prepare a building environment for btrfs-progs?
> 
> I can update the code to skip transaction commit so that we won't be
> bother with -ENOSPC at all.
> 
> And since we're not really doing any metadata update, we don't really
> need any new space.
> 
> And after your building environment prepared, you can fetch this branch
> to compile the btrfs-progs and try to use the compiled `btrfs` command
> to rescue the device again.

Thanks for your help!

> https://github.com/adam900710/btrfs-progs/tree/dirty_fix
> 
> I did some local tests, it shows no problem, but not sure if it would
> work for you.

I am trying to build this and will test it, hopefully tomorrow. I'll let you 
know what happens...

Kind Regards,
Johannes

-- 
Johannes Kastl
Linux Consultant & Trainer
Tel.: +49 (0) 151 2372 5802
Mail: kastl@b1-systems.de

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg
http://www.b1-systems.de
GF: Ralph Dehner
Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'btrfs rescue' command (recommended by btrfs check) fails on old BTRFS RAID1 on (currently) openSUSE Leap 15.3
  2022-05-20 15:14           ` Johannes Kastl
@ 2022-05-20 20:21             ` Johannes Kastl
  2022-05-21  1:10               ` Qu Wenruo
  0 siblings, 1 reply; 10+ messages in thread
From: Johannes Kastl @ 2022-05-20 20:21 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 1148 bytes --]

Hello Qu,

On 20.05.22 at 17:14 Johannes Kastl wrote:

> I am trying to build this and will test it, hopefully tomorrow. I'll let you 
> know what happens...

I was able to build an RPM for Leap 15.3, based on your branch.

> https://build.opensuse.org/project/show/home:ojkastl_buildservice:btrfs_debugging

I installed it on my Leap 15.3 system, started the fix-device-size and... after 
only a couple of seconds it was done.

No errors, just one line saying that it fixed something.

I could mount the filesystem directly afterwards.

I unmounted and am currently running a btrfscheck on the filesystem, based on 
the code from your branch. I hope the filesystem is working again, and I can 
start using it again (tomorrow, the check will take ~8 hours)...

I doubt that this will give valuable input to fix this error in btrfsprogs...

Kind Regards,
Johannes

-- 
Johannes Kastl
Linux Consultant & Trainer
Tel.: +49 (0) 151 2372 5802
Mail: kastl@b1-systems.de

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg
http://www.b1-systems.de
GF: Ralph Dehner
Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'btrfs rescue' command (recommended by btrfs check) fails on old BTRFS RAID1 on (currently) openSUSE Leap 15.3
  2022-05-20 20:21             ` Johannes Kastl
@ 2022-05-21  1:10               ` Qu Wenruo
  2022-05-24  7:44                 ` Johannes Kastl
  0 siblings, 1 reply; 10+ messages in thread
From: Qu Wenruo @ 2022-05-21  1:10 UTC (permalink / raw)
  To: Johannes Kastl, linux-btrfs



On 2022/5/21 04:21, Johannes Kastl wrote:
> Hello Qu,
>
> On 20.05.22 at 17:14 Johannes Kastl wrote:
>
>> I am trying to build this and will test it, hopefully tomorrow. I'll
>> let you know what happens...
>
> I was able to build an RPM for Leap 15.3, based on your branch.
>
>> https://build.opensuse.org/project/show/home:ojkastl_buildservice:btrfs_debugging
>>
>
> I installed it on my Leap 15.3 system, started the fix-device-size
> and... after only a couple of seconds it was done.
>
> No errors, just one line saying that it fixed something.
>
> I could mount the filesystem directly afterwards.

Great to know that.

>
> I unmounted and am currently running a btrfscheck on the filesystem,
> based on the code from your branch. I hope the filesystem is working
> again, and I can start using it again (tomorrow, the check will take ~8
> hours)...

Hope no error from btrfsck.

>
> I doubt that this will give valuable input to fix this error in
> btrfsprogs...

At least we know the new way to fix it is working.

BTW, mind to share things like `btrfs fi usage` and `btrfs fi df` when
you can mount the fs?

Thanks,
Qu
>
> Kind Regards,
> Johannes
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: 'btrfs rescue' command (recommended by btrfs check) fails on old BTRFS RAID1 on (currently) openSUSE Leap 15.3
  2022-05-21  1:10               ` Qu Wenruo
@ 2022-05-24  7:44                 ` Johannes Kastl
  0 siblings, 0 replies; 10+ messages in thread
From: Johannes Kastl @ 2022-05-24  7:44 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 2839 bytes --]

Hi Qu,

On 21.05.22 at 03:10 Qu Wenruo wrote:

> At least we know the new way to fix it is working.
> 
> BTW, mind to share things like `btrfs fi usage` and `btrfs fi df` when
> you can mount the fs?

Sure, here they are:

>> root dumbo:/root # btrfs filesystem df /mnt/DUMBO_BACKUP_4TB
>> Data, RAID1: total=3.36TiB, used=3.24TiB
>> Data, DUP: total=13.50MiB, used=2.81MiB
>> Data, single: total=1.00GiB, used=0.00B
>> System, RAID1: total=32.00MiB, used=560.00KiB
>> System, single: total=32.00MiB, used=0.00B
>> Metadata, RAID1: total=284.94GiB, used=111.84GiB
>> Metadata, DUP: total=512.00MiB, used=48.00KiB
>> Metadata, single: total=1.00GiB, used=0.00B
>> GlobalReserve, single: total=512.00MiB, used=0.00B
 >> root dumbo:/root #

>> root dumbo:/root # btrfs filesystem show /mnt/DUMBO_BACKUP_4TB
>> Label: 'DUMBO_BACKUP_4TB'  uuid: 50651b41-bf33-47e7-8a08-afbc71ba0bf8
>>         Total devices 2 FS bytes used 3.35TiB
>>         devid    1 size 7.00TiB used 3.64TiB path /dev/sdd1
>>         devid    2 size 7.00TiB used 3.63TiB path /dev/sdc1
>>
>> root dumbo:/root #

Not sure why one of the devices has 3.64TiB used, the other one 3.63TiB.

>> root dumbo:/root # btrfs filesystem usage /mnt/DUMBO_BACKUP_4TB
>> Overall:
>>     Device size:                  14.00TiB
>>     Device allocated:              7.27TiB
>>     Device unallocated:            6.73TiB
>>     Device missing:                  0.00B
>>     Used:                          6.69TiB
>>     Free (estimated):              3.48TiB      (min: 3.48TiB)
>>     Data ratio:                       2.00
>>     Metadata ratio:                   2.00
>>     Global reserve:              512.00MiB      (used: 0.00B)
>>
>> Data,single: Size:1.00GiB, Used:0.00B
>>    /dev/sdd1       1.00GiB
>>
>> Data,RAID1: Size:3.36TiB, Used:3.24TiB
>>    /dev/sdc1       3.36TiB
>>    /dev/sdd1       3.36TiB
>>
>> Data,DUP: Size:13.50MiB, Used:2.81MiB
>>    /dev/sdd1      27.00MiB
>>
>> Metadata,single: Size:1.00GiB, Used:0.00B
>>    /dev/sdd1       1.00GiB
>>
>> Metadata,RAID1: Size:284.94GiB, Used:111.84GiB
>>    /dev/sdc1     284.94GiB
>>    /dev/sdd1     284.94GiB
>>
>> Metadata,DUP: Size:512.00MiB, Used:48.00KiB
>>    /dev/sdd1       1.00GiB
>>
>> System,single: Size:32.00MiB, Used:0.00B
>>    /dev/sdd1      32.00MiB
>>
>> System,RAID1: Size:32.00MiB, Used:560.00KiB
>>    /dev/sdc1      32.00MiB
>>    /dev/sdd1      32.00MiB
>>
>> Unallocated:
>>    /dev/sdc1       3.36TiB
>>    /dev/sdd1       3.36TiB
>> root dumbo:/root #





-- 
Johannes Kastl
Linux Consultant & Trainer
Tel.: +49 (0) 151 2372 5802
Mail: kastl@b1-systems.de

B1 Systems GmbH
Osterfeldstraße 7 / 85088 Vohburg
http://www.b1-systems.de
GF: Ralph Dehner
Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2022-05-24  7:44 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-23 18:39 'btrfs rescue' command (recommended by btrfs check) fails on old BTRFS RAID1 on (currently) openSUSE Leap 15.3 Johannes Kastl
2022-04-23 23:07 ` Qu Wenruo
2022-04-24  9:10   ` Johannes Kastl
2022-04-24  9:21     ` Qu Wenruo
2022-05-18 10:38       ` Johannes Kastl
2022-05-18 10:59         ` Qu Wenruo
2022-05-20 15:14           ` Johannes Kastl
2022-05-20 20:21             ` Johannes Kastl
2022-05-21  1:10               ` Qu Wenruo
2022-05-24  7:44                 ` Johannes Kastl

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.