linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* "parent transid verify failed" and mount usebackuproot does not seem to work
@ 2020-06-30 19:41 Illia Bobyr
  2020-07-01  1:36 ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: Illia Bobyr @ 2020-06-30 19:41 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I have a btrfs with bcache setup that failed during a boot yesterday.
There is one SSD with bcache that is used as a cache for 3 btrfs HDDs.

Reading through a number of discussions, I've decided to ask for advice here.
Should I be running "btrfs check --recover"?

The last message in the dmesg log is this one:

Btrfs loaded, crc32c=crc32c-intel
BTRFS: device label root devid 3 transid 138434 /dev/bcache2 scanned
by btrfs (341)
BTRFS: device label root devid 2 transid 138434 /dev/bcache1 scanned
by btrfs (341)
BTRFS: device label root devid 1 transid 138434 /dev/bcache0 scanned
by btrfs (341)
BTRFS info (device bcache0): disk space caching is enabled
BTRFS info (device bcache0): has skinny extents
BTRFS error (device bcache0): parent transid verify failed on
16984159518720 wanted 138414 found 138207
BTRFS error (device bcache0): parent transid verify failed on
16984159518720 wanted 138414 found 138207
BTRFS error (device bcache0): open_ctree failed

Trying to mount it in the recovery mode does not seem to work:

(initramfs) mount -t btrfs -o ro,usebackuproot /dev/bcache0 /mnt
BTRFS info (device bcache1): trying to use backup root at mount time
BTRFS info (device bcache1): disk space caching is enabled
BTRFS info (device bcache1): has skinny extents
BTRFS error (device bcache1): parent transid verify failed on
16984159518720 wanted 138414 found 138207
BTRFS error (device bcache1): parent transid verify failed on
16984159518720 wanted 138414 found 138207
BTRFS error (device bcache1): parent transid verify failed on
16984173199360 wanted 138433 found 138195
BTRFS error (device bcache1): parent transid verify failed on
16984173199360 wanted 138433 found 138195
BTRFS warning (device bcache1): failed to read tree root
BTRFS error (device bcache1): parent transid verify failed on
16984171298816 wanted 138431 found 131157
BTRFS error (device bcache1): parent transid verify failed on
16984171298816 wanted 138431 found 131157
BTRFS warning (device bcache1): failed to read tree root
BTRFS critical (device bcache1): corrupt leaf: block=16984183013376
slot=36 extent bytenr=11447166291968 len=262144 invalid generation,
have 138434 expect (0, 138433]
BTRFS error (device bcache1): block=16984183013376 read time tree
block corruption detected
BTRFS critical (device bcache1): corrupt leaf: block=16984183013376
slot=36 extent bytenr=11447166291968 len=262144 invalid generation,
have 138434 expect (0, 138433]
BTRFS error (device bcache1): block=16984183013376 read time tree
block corruption detected
BTRFS warning (device bcache1): failed to read tree root
BUG: kernel NULL pointer dereference, address: 000000000000001f
#PF: supervisor read access in kernel mode

<a stack trace follows>

(initramfs) btrfs --version
btrfs-progs v5.4.1

(initramfs) uname -a
Linux (none) 5.6.11-050611-generic #202005061022 SMP Wed May 6 10:27:04
UTC 2020 x86_64 GNU/Linux

(initramfs) btrfs fi show
Label: 'root' uuid: 0a3d051b-72ef-4a5d-8a48-eb0dbb960b56
        Total devices 3 FS bytes used 6.55TiB
        devid    1 size 3.64TiB used 1.62TiB path /dev/bcache1
        devid    2 size 7.28TiB used 5.21TiB path /dev/bcache0
        devid    3 size 12.73TiB used 6.80TiB path /dev/bcache2

I have tried booting using a live ISO with 5.8.0 kernel and btrfs v5.6.1
from http://defender.exton.net/.
After booting tried mounting the bcache using the same command as above.
The only message in the console was "Killed".
/dev/kmsg on the other hand lists messages very similar to the ones I've
seen in the initramfs environment: https://pastebin.com/Vhy072Mx

P.S. Please CC me, as I am not subscribed.

Thank you,
Illia Bobyr

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "parent transid verify failed" and mount usebackuproot does not seem to work
  2020-06-30 19:41 "parent transid verify failed" and mount usebackuproot does not seem to work Illia Bobyr
@ 2020-07-01  1:36 ` Qu Wenruo
  2020-07-01 10:16   ` Illia Bobyr
  0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2020-07-01  1:36 UTC (permalink / raw)
  To: Illia Bobyr, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 4662 bytes --]



On 2020/7/1 上午3:41, Illia Bobyr wrote:
> Hi,
> 
> I have a btrfs with bcache setup that failed during a boot yesterday.
> There is one SSD with bcache that is used as a cache for 3 btrfs HDDs.
> 
> Reading through a number of discussions, I've decided to ask for advice here.
> Should I be running "btrfs check --recover"?
> 
> The last message in the dmesg log is this one:
> 
> Btrfs loaded, crc32c=crc32c-intel
> BTRFS: device label root devid 3 transid 138434 /dev/bcache2 scanned
> by btrfs (341)
> BTRFS: device label root devid 2 transid 138434 /dev/bcache1 scanned
> by btrfs (341)
> BTRFS: device label root devid 1 transid 138434 /dev/bcache0 scanned
> by btrfs (341)
> BTRFS info (device bcache0): disk space caching is enabled
> BTRFS info (device bcache0): has skinny extents
> BTRFS error (device bcache0): parent transid verify failed on
> 16984159518720 wanted 138414 found 138207
> BTRFS error (device bcache0): parent transid verify failed on
> 16984159518720 wanted 138414 found 138207
> BTRFS error (device bcache0): open_ctree failed

Looks like some tree blocks not written back correctly.

Considering we don't have known write back related bugs with 5.6, I
guess bcache may be involved again?

> 
> Trying to mount it in the recovery mode does not seem to work:
> 
> (initramfs) mount -t btrfs -o ro,usebackuproot /dev/bcache0 /mnt
> BTRFS info (device bcache1): trying to use backup root at mount time
> BTRFS info (device bcache1): disk space caching is enabled
> BTRFS info (device bcache1): has skinny extents
> BTRFS error (device bcache1): parent transid verify failed on
> 16984159518720 wanted 138414 found 138207
> BTRFS error (device bcache1): parent transid verify failed on
> 16984159518720 wanted 138414 found 138207
> BTRFS error (device bcache1): parent transid verify failed on
> 16984173199360 wanted 138433 found 138195
> BTRFS error (device bcache1): parent transid verify failed on
> 16984173199360 wanted 138433 found 138195
> BTRFS warning (device bcache1): failed to read tree root
> BTRFS error (device bcache1): parent transid verify failed on
> 16984171298816 wanted 138431 found 131157
> BTRFS error (device bcache1): parent transid verify failed on
> 16984171298816 wanted 138431 found 131157
> BTRFS warning (device bcache1): failed to read tree root
> BTRFS critical (device bcache1): corrupt leaf: block=16984183013376
> slot=36 extent bytenr=11447166291968 len=262144 invalid generation,
> have 138434 expect (0, 138433]
> BTRFS error (device bcache1): block=16984183013376 read time tree
> block corruption detected
> BTRFS critical (device bcache1): corrupt leaf: block=16984183013376
> slot=36 extent bytenr=11447166291968 len=262144 invalid generation,
> have 138434 expect (0, 138433]
> BTRFS error (device bcache1): block=16984183013376 read time tree
> block corruption detected
> BTRFS warning (device bcache1): failed to read tree root
> BUG: kernel NULL pointer dereference, address: 000000000000001f
> #PF: supervisor read access in kernel mode
> 
> <a stack trace follows>
> 
> (initramfs) btrfs --version
> btrfs-progs v5.4.1
> 
> (initramfs) uname -a
> Linux (none) 5.6.11-050611-generic #202005061022 SMP Wed May 6 10:27:04
> UTC 2020 x86_64 GNU/Linux
> 
> (initramfs) btrfs fi show
> Label: 'root' uuid: 0a3d051b-72ef-4a5d-8a48-eb0dbb960b56
>         Total devices 3 FS bytes used 6.55TiB
>         devid    1 size 3.64TiB used 1.62TiB path /dev/bcache1
>         devid    2 size 7.28TiB used 5.21TiB path /dev/bcache0
>         devid    3 size 12.73TiB used 6.80TiB path /dev/bcache2
> 
> I have tried booting using a live ISO with 5.8.0 kernel and btrfs v5.6.1
> from http://defender.exton.net/.
> After booting tried mounting the bcache using the same command as above.
> The only message in the console was "Killed".
> /dev/kmsg on the other hand lists messages very similar to the ones I've
> seen in the initramfs environment: https://pastebin.com/Vhy072Mx

It looks like there is a chance to recover, as there is a rootbackup
with newer generation.

While tree-checker is rejecting the newer generation one.

The kernel panic is caused by some corner error handling with root
backups cleanups.
We need to fix it anyway.

In this case, I guess "btrfs ins dump-super -fFa" output would help to
show if it's possible to recover.

Anyway, something looks strange.

The backup roots have a newer generation while the super block is still
old doesn't look correct at all.

Thanks,
Qu
> 
> P.S. Please CC me, as I am not subscribed.
> 
> Thank you,
> Illia Bobyr
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "parent transid verify failed" and mount usebackuproot does not seem to work
  2020-07-01  1:36 ` Qu Wenruo
@ 2020-07-01 10:16   ` Illia Bobyr
  2020-07-01 10:48     ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: Illia Bobyr @ 2020-07-01 10:16 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 6/30/2020 6:36 PM, Qu Wenruo wrote:
> On 2020/7/1 上午3:41, Illia Bobyr wrote:
>> Hi,
>>
>> I have a btrfs with bcache setup that failed during a boot yesterday.
>> There is one SSD with bcache that is used as a cache for 3 btrfs HDDs.
>>
>> Reading through a number of discussions, I've decided to ask for advice here.
>> Should I be running "btrfs check --recover"?
>>
>> The last message in the dmesg log is this one:
>>
>> Btrfs loaded, crc32c=crc32c-intel
>> BTRFS: device label root devid 3 transid 138434 /dev/bcache2 scanned
>> by btrfs (341)
>> BTRFS: device label root devid 2 transid 138434 /dev/bcache1 scanned
>> by btrfs (341)
>> BTRFS: device label root devid 1 transid 138434 /dev/bcache0 scanned
>> by btrfs (341)
>> BTRFS info (device bcache0): disk space caching is enabled
>> BTRFS info (device bcache0): has skinny extents
>> BTRFS error (device bcache0): parent transid verify failed on
>> 16984159518720 wanted 138414 found 138207
>> BTRFS error (device bcache0): parent transid verify failed on
>> 16984159518720 wanted 138414 found 138207
>> BTRFS error (device bcache0): open_ctree failed
> Looks like some tree blocks not written back correctly.
>
> Considering we don't have known write back related bugs with 5.6, I
> guess bcache may be involved again?

A bit more details: the system started to misbehave.
Interactive session was saying that the main file system became read/only.
And then the SSH disconnected and did not reconnect any more.
It did not seem to reboot correctly after I've pressed the reboot
button, so I did a hard rebooted.
And now it could not mount the root partition any more.
>> Trying to mount it in the recovery mode does not seem to work:
>>
>> [...]
>>
>> I have tried booting using a live ISO with 5.8.0 kernel and btrfs v5.6.1
>> from http://defender.exton.net/.
>> After booting tried mounting the bcache using the same command as above.
>> The only message in the console was "Killed".
>> /dev/kmsg on the other hand lists messages very similar to the ones I've
>> seen in the initramfs environment: https://pastebin.com/Vhy072Mx
> It looks like there is a chance to recover, as there is a rootbackup
> with newer generation.
>
> While tree-checker is rejecting the newer generation one.
>
> The kernel panic is caused by some corner error handling with root
> backups cleanups.
> We need to fix it anyway.
>
> In this case, I guess "btrfs ins dump-super -fFa" output would help to
> show if it's possible to recover.

Here is the output: https://pastebin.com/raw/DtJd813y

> Anyway, something looks strange.
>
> The backup roots have a newer generation while the super block is still
> old doesn't look correct at all.

Just in case, here is the output of "btrfs check", as suggested by "A L
<mail@lechevalier.se>".  It does not seem to contain any new information.

parent transid verify failed on 16984014372864 wanted 138350 found 131117
parent transid verify failed on 16984014405632 wanted 138350 found 131127
parent transid verify failed on 16984013406208 wanted 138350 found 131112
parent transid verify failed on 16984075436032 wanted 138384 found 131136
parent transid verify failed on 16984075436032 wanted 138384 found 131136
parent transid verify failed on 16984075436032 wanted 138384 found 131136
Ignoring transid failure
ERROR: child eb corrupted: parent bytenr=16984175853568 item=8 parent
level=2 child level=0
ERROR: failed to read block groups: Input/output error
ERROR: cannot open file system
Opening filesystem to check...

As I was running the commands I have accidentally run the following command:

    btrfs inspect-internal dump-super -fFa >/dev/bcache0 2>&1

Effectively overwriting the first 10kb of the partition :(

Seems like the superblock starts at 64kb.  So, I hope, this would not
cause any more damage.

P.S. Thanks a lot for your reply Qu Wenruo!

Thank you,
Illia


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "parent transid verify failed" and mount usebackuproot does not seem to work
  2020-07-01 10:16   ` Illia Bobyr
@ 2020-07-01 10:48     ` Qu Wenruo
  2020-07-01 21:36       ` Illia Bobyr
  0 siblings, 1 reply; 9+ messages in thread
From: Qu Wenruo @ 2020-07-01 10:48 UTC (permalink / raw)
  To: Illia Bobyr, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 4937 bytes --]



On 2020/7/1 下午6:16, Illia Bobyr wrote:
> On 6/30/2020 6:36 PM, Qu Wenruo wrote:
>> On 2020/7/1 上午3:41, Illia Bobyr wrote:
>>> Hi,
>>>
>>> I have a btrfs with bcache setup that failed during a boot yesterday.
>>> There is one SSD with bcache that is used as a cache for 3 btrfs HDDs.
>>>
>>> Reading through a number of discussions, I've decided to ask for advice here.
>>> Should I be running "btrfs check --recover"?
>>>
>>> The last message in the dmesg log is this one:
>>>
>>> Btrfs loaded, crc32c=crc32c-intel
>>> BTRFS: device label root devid 3 transid 138434 /dev/bcache2 scanned
>>> by btrfs (341)
>>> BTRFS: device label root devid 2 transid 138434 /dev/bcache1 scanned
>>> by btrfs (341)
>>> BTRFS: device label root devid 1 transid 138434 /dev/bcache0 scanned
>>> by btrfs (341)
>>> BTRFS info (device bcache0): disk space caching is enabled
>>> BTRFS info (device bcache0): has skinny extents
>>> BTRFS error (device bcache0): parent transid verify failed on
>>> 16984159518720 wanted 138414 found 138207
>>> BTRFS error (device bcache0): parent transid verify failed on
>>> 16984159518720 wanted 138414 found 138207
>>> BTRFS error (device bcache0): open_ctree failed
>> Looks like some tree blocks not written back correctly.
>>
>> Considering we don't have known write back related bugs with 5.6, I
>> guess bcache may be involved again?
> 
> A bit more details: the system started to misbehave.
> Interactive session was saying that the main file system became read/only.

Any dmesg of that RO event?
That would be the most valuable info to help us to locate the bug and
fix it.

I guess there is something wrong before that, and by somehow it
corrupted the extent tree, breaking the life keeping COW of metadata and
screwed up everything.

> And then the SSH disconnected and did not reconnect any more.
> It did not seem to reboot correctly after I've pressed the reboot
> button, so I did a hard rebooted.
> And now it could not mount the root partition any more.
>>> Trying to mount it in the recovery mode does not seem to work:
>>>
>>> [...]
>>>
>>> I have tried booting using a live ISO with 5.8.0 kernel and btrfs v5.6.1
>>> from http://defender.exton.net/.
>>> After booting tried mounting the bcache using the same command as above.
>>> The only message in the console was "Killed".
>>> /dev/kmsg on the other hand lists messages very similar to the ones I've
>>> seen in the initramfs environment: https://pastebin.com/Vhy072Mx
>> It looks like there is a chance to recover, as there is a rootbackup
>> with newer generation.
>>
>> While tree-checker is rejecting the newer generation one.
>>
>> The kernel panic is caused by some corner error handling with root
>> backups cleanups.
>> We need to fix it anyway.
>>
>> In this case, I guess "btrfs ins dump-super -fFa" output would help to
>> show if it's possible to recover.
> 
> Here is the output: https://pastebin.com/raw/DtJd813y

OK, the backup root is fine.

So this means, metadata COW is corrupted, which caused the transid mismatch.

> 
>> Anyway, something looks strange.
>>
>> The backup roots have a newer generation while the super block is still
>> old doesn't look correct at all.
> 
> Just in case, here is the output of "btrfs check", as suggested by "A L
> <mail@lechevalier.se>".  It does not seem to contain any new information.
> 
> parent transid verify failed on 16984014372864 wanted 138350 found 131117
> parent transid verify failed on 16984014405632 wanted 138350 found 131127
> parent transid verify failed on 16984013406208 wanted 138350 found 131112
> parent transid verify failed on 16984075436032 wanted 138384 found 131136
> parent transid verify failed on 16984075436032 wanted 138384 found 131136
> parent transid verify failed on 16984075436032 wanted 138384 found 131136
> Ignoring transid failure
> ERROR: child eb corrupted: parent bytenr=16984175853568 item=8 parent
> level=2 child level=0
> ERROR: failed to read block groups: Input/output error

Extent tree is completely screwed up, no wonder the transid error happens.

I don't believe it's reasonable possible to restore the fs to RW status.
The only remaining method left is btrfs-restore then.

> ERROR: cannot open file system
> Opening filesystem to check...
> 
> As I was running the commands I have accidentally run the following command:
> 
>     btrfs inspect-internal dump-super -fFa >/dev/bcache0 2>&1
> 
> Effectively overwriting the first 10kb of the partition :(

That's not a problem at all.
Btrfs reserves the first 0~1M space, so as long as you don't screw up
the super block at [64K, 68K) you're completely fine.

Thanks,
Qu
> 
> Seems like the superblock starts at 64kb.  So, I hope, this would not
> cause any more damage.
> 
> P.S. Thanks a lot for your reply Qu Wenruo!
> 
> Thank you,
> Illia
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "parent transid verify failed" and mount usebackuproot does not seem to work
  2020-07-01 10:48     ` Qu Wenruo
@ 2020-07-01 21:36       ` Illia Bobyr
  2020-07-01 23:50         ` Qu Wenruo
  0 siblings, 1 reply; 9+ messages in thread
From: Illia Bobyr @ 2020-07-01 21:36 UTC (permalink / raw)
  To: Qu Wenruo, linux-btrfs

On 7/1/2020 3:48 AM, Qu Wenruo wrote:
> On 2020/7/1 下午6:16, Illia Bobyr wrote:
>> On 6/30/2020 6:36 PM, Qu Wenruo wrote:
>>> On 2020/7/1 上午3:41, Illia Bobyr wrote:
>>>> [...]
>>> Looks like some tree blocks not written back correctly.
>>>
>>> Considering we don't have known write back related bugs with 5.6, I
>>> guess bcache may be involved again?
>> A bit more details: the system started to misbehave.
>> Interactive session was saying that the main file system became read/only.
> Any dmesg of that RO event?
> That would be the most valuable info to help us to locate the bug and
> fix it.
>
> I guess there is something wrong before that, and by somehow it
> corrupted the extent tree, breaking the life keeping COW of metadata and
> screwed up everything.

After I will restore the data, I will check the kernel log to see if
there are any messages in there.
Will post here if I will find anything.

>> [...]
>>> In this case, I guess "btrfs ins dump-super -fFa" output would help to
>>> show if it's possible to recover.
>> Here is the output: https://pastebin.com/raw/DtJd813y
> OK, the backup root is fine.
>
> So this means, metadata COW is corrupted, which caused the transid mismatch.
>
>>> Anyway, something looks strange.
>>>
>>> The backup roots have a newer generation while the super block is still
>>> old doesn't look correct at all.
>> Just in case, here is the output of "btrfs check", as suggested by "A L
>> <mail@lechevalier.se>".  It does not seem to contain any new information.
>>
>> parent transid verify failed on 16984014372864 wanted 138350 found 131117
>> parent transid verify failed on 16984014405632 wanted 138350 found 131127
>> parent transid verify failed on 16984013406208 wanted 138350 found 131112
>> parent transid verify failed on 16984075436032 wanted 138384 found 131136
>> parent transid verify failed on 16984075436032 wanted 138384 found 131136
>> parent transid verify failed on 16984075436032 wanted 138384 found 131136
>> Ignoring transid failure
>> ERROR: child eb corrupted: parent bytenr=16984175853568 item=8 parent
>> level=2 child level=0
>> ERROR: failed to read block groups: Input/output error
> Extent tree is completely screwed up, no wonder the transid error happens.
>
> I don't believe it's reasonable possible to restore the fs to RW status.
> The only remaining method left is btrfs-restore then.

There are no more available SATA connections in the system and there is
a lot of data in that FS (~7TB).
I do not immediately have another disk that would be able to hold this much.

At the same time this FS is RAID0.
I wonder if there is a way to first check if restore will work should I
will disconnect half of the disks, as each half contains all the data.
And then if it does, I would be able to restore by reusing the space on
of the mirrors.

I see "-D: Dry run" that can be passed to "btrfs restore", but, I guess,
it would not really do a full check of the data, making sure that the
restore would really succeed, does it?

Is there a way to perform this kind of check?
Or is "btrfs restore" the only option at the moment?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "parent transid verify failed" and mount usebackuproot does not seem to work
  2020-07-01 21:36       ` Illia Bobyr
@ 2020-07-01 23:50         ` Qu Wenruo
  0 siblings, 0 replies; 9+ messages in thread
From: Qu Wenruo @ 2020-07-01 23:50 UTC (permalink / raw)
  To: Illia Bobyr, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 3948 bytes --]



On 2020/7/2 上午5:36, Illia Bobyr wrote:
> On 7/1/2020 3:48 AM, Qu Wenruo wrote:
>> On 2020/7/1 下午6:16, Illia Bobyr wrote:
>>> On 6/30/2020 6:36 PM, Qu Wenruo wrote:
>>>> On 2020/7/1 上午3:41, Illia Bobyr wrote:
>>>>> [...]
>>>> Looks like some tree blocks not written back correctly.
>>>>
>>>> Considering we don't have known write back related bugs with 5.6, I
>>>> guess bcache may be involved again?
>>> A bit more details: the system started to misbehave.
>>> Interactive session was saying that the main file system became read/only.
>> Any dmesg of that RO event?
>> That would be the most valuable info to help us to locate the bug and
>> fix it.
>>
>> I guess there is something wrong before that, and by somehow it
>> corrupted the extent tree, breaking the life keeping COW of metadata and
>> screwed up everything.
> 
> After I will restore the data, I will check the kernel log to see if
> there are any messages in there.
> Will post here if I will find anything.
> 
>>> [...]
>>>> In this case, I guess "btrfs ins dump-super -fFa" output would help to
>>>> show if it's possible to recover.
>>> Here is the output: https://pastebin.com/raw/DtJd813y
>> OK, the backup root is fine.
>>
>> So this means, metadata COW is corrupted, which caused the transid mismatch.
>>
>>>> Anyway, something looks strange.
>>>>
>>>> The backup roots have a newer generation while the super block is still
>>>> old doesn't look correct at all.
>>> Just in case, here is the output of "btrfs check", as suggested by "A L
>>> <mail@lechevalier.se>".  It does not seem to contain any new information.
>>>
>>> parent transid verify failed on 16984014372864 wanted 138350 found 131117
>>> parent transid verify failed on 16984014405632 wanted 138350 found 131127
>>> parent transid verify failed on 16984013406208 wanted 138350 found 131112
>>> parent transid verify failed on 16984075436032 wanted 138384 found 131136
>>> parent transid verify failed on 16984075436032 wanted 138384 found 131136
>>> parent transid verify failed on 16984075436032 wanted 138384 found 131136
>>> Ignoring transid failure
>>> ERROR: child eb corrupted: parent bytenr=16984175853568 item=8 parent
>>> level=2 child level=0
>>> ERROR: failed to read block groups: Input/output error
>> Extent tree is completely screwed up, no wonder the transid error happens.
>>
>> I don't believe it's reasonable possible to restore the fs to RW status.
>> The only remaining method left is btrfs-restore then.
> 
> There are no more available SATA connections in the system and there is
> a lot of data in that FS (~7TB).
> I do not immediately have another disk that would be able to hold this much.
> 
> At the same time this FS is RAID0.
> I wonder if there is a way to first check if restore will work should I
> will disconnect half of the disks, as each half contains all the data.
> And then if it does, I would be able to restore by reusing the space on
> of the mirrors.

Yes, there is.

We have the out-of-tree rescue mount options patchset.
It allows you to mount the fs RO, with extent tree completely corrupted.

It's in David's misc-next branch already:
https://github.com/kdave/btrfs-devel/tree/misc-next

Then you can try to mount the fs with "-o
ro,rescue=skipbg,rescue=nologreplay" and do your tests on what can be
salvaged and what can not as if your fs is still alive.

This should provide a more flex solution compared to btrfs-restore, but
it needs to compile the kernel.

> 
> I see "-D: Dry run" that can be passed to "btrfs restore", but, I guess,
> it would not really do a full check of the data, making sure that the
> restore would really succeed, does it?

It would only check the metadata, but that should cover most of the risks.

Thanks,
Qu
> 
> Is there a way to perform this kind of check?
> Or is "btrfs restore" the only option at the moment?
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: "parent transid verify failed" and mount usebackuproot does not seem to work
  2020-06-30 19:26 Illia Bobyr
@ 2020-06-30 19:55 ` Lukas Straub
  0 siblings, 0 replies; 9+ messages in thread
From: Lukas Straub @ 2020-06-30 19:55 UTC (permalink / raw)
  To: illia.bobyr; +Cc: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 202 bytes --]

Hi,
The developers will need the output of "btrfs check <disk>" to debug the problem further. Also you can use "btrfs restore <disk> <target>" to rescue data off your disks.

Regards,
Lukas Straub

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* "parent transid verify failed" and mount usebackuproot does not seem to work
@ 2020-06-30 19:26 Illia Bobyr
  2020-06-30 19:55 ` Lukas Straub
  0 siblings, 1 reply; 9+ messages in thread
From: Illia Bobyr @ 2020-06-30 19:26 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I have a btrfs with bcache setup that failed during a boot yesterday.
There is one SSD with bcache that is used as a cache for 3 btrfs HDDs.

Reading through a number of discussions, I've decided to ask for advice
here.
Should I be running "btrfs check --recover"?

The last message in the dmesg log is this one:

Btrfs loaded, crc32c=crc32c-intel
BTRFS: device label root devid 3 transid 138434 /dev/bcache2 scanned by
btrfs (341)
BTRFS: device label root devid 2 transid 138434 /dev/bcache1 scanned by
btrfs (341)
BTRFS: device label root devid 1 transid 138434 /dev/bcache0 scanned by
btrfs (341)
BTRFS info (device bcache0): disk space caching is enabled
BTRFS info (device bcache0): has skinny extents
BTRFS error (device bcache0): parent transid verify failed on
16984159518720 wanted 138414 found 138207
BTRFS error (device bcache0): parent transid verify failed on
16984159518720 wanted 138414 found 138207
BTRFS error (device bcache0): open_ctree failed

Trying to mount it in the recovery mode does not seem to work:

(initramfs) mount -t btrfs -o ro,usebackuproot /dev/bcache0 /mnt
BTRFS info (device bcache1): trying to use backup root at mount time
BTRFS info (device bcache1): disk space caching is enabled
BTRFS info (device bcache1): has skinny extents
BTRFS error (device bcache1): parent transid verify failed on
16984159518720 wanted 138414 found 138207
BTRFS error (device bcache1): parent transid verify failed on
16984159518720 wanted 138414 found 138207
BTRFS error (device bcache1): parent transid verify failed on
16984173199360 wanted 138433 found 138195
BTRFS error (device bcache1): parent transid verify failed on
16984173199360 wanted 138433 found 138195
BTRFS warning (device bcache1): failed to read tree root
BTRFS error (device bcache1): parent transid verify failed on
16984171298816 wanted 138431 found 131157
BTRFS error (device bcache1): parent transid verify failed on
16984171298816 wanted 138431 found 131157
BTRFS warning (device bcache1): failed to read tree root
BTRFS critical (device bcache1): corrupt leaf: block=16984183013376
slot=36 extent bytenr=11447166291968 len=262144 invalid generation, have
138434 expect (0, 138433]
BTRFS error (device bcache1): block=16984183013376 read time tree block
corruption detected
BTRFS critical (device bcache1): corrupt leaf: block=16984183013376
slot=36 extent bytenr=11447166291968 len=262144 invalid generation, have
138434 expect (0, 138433]
BTRFS error (device bcache1): block=16984183013376 read time tree block
corruption detected
BTRFS warning (device bcache1): failed to read tree root
BUG: kernel NULL pointer dereference, address: 000000000000001f
#PF: supervisor read access in kernel mode

<a stack trace follows>

(initramfs) btrfs --version
btrfs-progs v5.4.1

(initramfs) uname -a
Linux (none) 5.6.11-050611-generic #202005061022 SMP Wed May 6 10:27:04
UTC 2020 x86_64 GNU/Linux

(initramfs) btrfs fi show
Label: 'root' uuid: 0a3d051b-72ef-4a5d-8a48-eb0dbb960b56
        Total devices 3 FS bytes used 6.55TiB
        devid    1 size 3.64TiB used 1.62TiB path /dev/bcache1
        devid    2 size 7.28TiB used 5.21TiB path /dev/bcache0
        devid    3 size 12.73TiB used 6.80TiB path /dev/bcache2

I have tried booting using a live ISO with 5.8.0 kernel and btrfs v5.6.1
from http://defender.exton.net/.
After booting tried mounting the bcache using the same command as above.
The only message in the console was "Killed".
/dev/kmsg on the other hand lists messages very similar to the ones I've
seen in the initramfs environment: https://pastebin.com/Vhy072Mx

P.S. Please CC me, as I am not subscribed.

Thank you,
Illia Bobyr



^ permalink raw reply	[flat|nested] 9+ messages in thread

* "parent transid verify failed" and mount usebackuproot does not seem to work
@ 2020-06-30  4:24 Illia Bobyr
  0 siblings, 0 replies; 9+ messages in thread
From: Illia Bobyr @ 2020-06-30  4:24 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I have a btrfs with bcache setup that failed during a boot yesterday.
There is one SSD with bcache that is used as a cache for 3 btrfs HDDs.

Reading through a number of discussions, I've decided to ask for advice
here.
Should I be running "btrfs check --recover"?

The last message in the dmesg log is this one:

Btrfs loaded, crc32c=crc32c-intel
BTRFS: device label root devid 3 transid 138434 /dev/bcache2 scanned by
btrfs (341)
BTRFS: device label root devid 2 transid 138434 /dev/bcache1 scanned by
btrfs (341)
BTRFS: device label root devid 1 transid 138434 /dev/bcache0 scanned by
btrfs (341)
BTRFS info (device bcache0): disk space caching is enabled
BTRFS info (device bcache0): has skinny extents
BTRFS error (device bcache0): parent transid verify failed on
16984159518720 wanted 138414 found 138207
BTRFS error (device bcache0): parent transid verify failed on
16984159518720 wanted 138414 found 138207
BTRFS error (device bcache0): open_ctree failed

Trying to mount it in the recovery mode does not seem to work:

(initramfs) mount -t btrfs -o ro,usebackuproot /dev/bcache0 /mnt
BTRFS info (device bcache1): trying to use backup root at mount time
BTRFS info (device bcache1): disk space caching is enabled
BTRFS info (device bcache1): has skinny extents
BTRFS error (device bcache1): parent transid verify failed on
16984159518720 wanted 138414 found 138207
BTRFS error (device bcache1): parent transid verify failed on
16984159518720 wanted 138414 found 138207
BTRFS error (device bcache1): parent transid verify failed on
16984173199360 wanted 138433 found 138195
BTRFS error (device bcache1): parent transid verify failed on
16984173199360 wanted 138433 found 138195
BTRFS warning (device bcache1): failed to read tree root
BTRFS error (device bcache1): parent transid verify failed on
16984171298816 wanted 138431 found 131157
BTRFS error (device bcache1): parent transid verify failed on
16984171298816 wanted 138431 found 131157
BTRFS warning (device bcache1): failed to read tree root
BTRFS critical (device bcache1): corrupt leaf: block=16984183013376
slot=36 extent bytenr=11447166291968 len=262144 invalid generation, have
138434 expect (0, 138433]
BTRFS error (device bcache1): block=16984183013376 read time tree block
corruption detected
BTRFS critical (device bcache1): corrupt leaf: block=16984183013376
slot=36 extent bytenr=11447166291968 len=262144 invalid generation, have
138434 expect (0, 138433]
BTRFS error (device bcache1): block=16984183013376 read time tree block
corruption detected
BTRFS warning (device bcache1): failed to read tree root
BUG: kernel NULL pointer dereference, address: 000000000000001f
#PF: supervisor read access in kernel mode

<a stack trace follows>

(initramfs) btrfs --version
btrfs-progs v5.4.1

(initramfs) uname -a
Linux (none) 5.6.11-050611-generic #202005061022 SMP Wed May 6 10:27:04
UTC 2020 x86_64 GNU/Linux

(initramfs) btrfs fi show
Label: 'root' uuid: 0a3d051b-72ef-4a5d-8a48-eb0dbb960b56
        Total devices 3 FS bytes used 6.55TiB
        devid    1 size 3.64TiB used 1.62TiB path /dev/bcache1
        devid    2 size 7.28TiB used 5.21TiB path /dev/bcache0
        devid    3 size 12.73TiB used 6.80TiB path /dev/bcache2

I have tried booting using a live ISO with 5.8.0 kernel and btrfs v5.6.1
from http://defender.exton.net/.
After booting tried mounting the bcache using the same command as above.
The only message in the console was "Killed".
/dev/kmsg on the other hand lists messages very similar to the ones I've
seen in the initramfs environment: https://pastebin.com/Vhy072Mx

P.S. Please CC me, as I am not subscribed.

Thank you,
Illia Bobyr


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-07-01 23:50 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-30 19:41 "parent transid verify failed" and mount usebackuproot does not seem to work Illia Bobyr
2020-07-01  1:36 ` Qu Wenruo
2020-07-01 10:16   ` Illia Bobyr
2020-07-01 10:48     ` Qu Wenruo
2020-07-01 21:36       ` Illia Bobyr
2020-07-01 23:50         ` Qu Wenruo
  -- strict thread matches above, loose matches on Subject: below --
2020-06-30 19:26 Illia Bobyr
2020-06-30 19:55 ` Lukas Straub
2020-06-30  4:24 Illia Bobyr

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).