Linux-BTRFS Archive on lore.kernel.org
 help / color / Atom feed
From: Qu Wenruo <quwenruo.btrfs@gmx.com>
To: Illia Bobyr <illia.bobyr@gmail.com>, linux-btrfs@vger.kernel.org
Subject: Re: "parent transid verify failed" and mount usebackuproot does not seem to work
Date: Wed, 1 Jul 2020 18:48:41 +0800
Message-ID: <39558ad7-dfb3-05f7-1583-181f76f2a93d@gmx.com> (raw)
In-Reply-To: <2f22bd0a-aa48-d0f1-04d0-cb130897249d@gmail.com>

[-- Attachment #1.1: Type: text/plain, Size: 4937 bytes --]



On 2020/7/1 下午6:16, Illia Bobyr wrote:
> On 6/30/2020 6:36 PM, Qu Wenruo wrote:
>> On 2020/7/1 上午3:41, Illia Bobyr wrote:
>>> Hi,
>>>
>>> I have a btrfs with bcache setup that failed during a boot yesterday.
>>> There is one SSD with bcache that is used as a cache for 3 btrfs HDDs.
>>>
>>> Reading through a number of discussions, I've decided to ask for advice here.
>>> Should I be running "btrfs check --recover"?
>>>
>>> The last message in the dmesg log is this one:
>>>
>>> Btrfs loaded, crc32c=crc32c-intel
>>> BTRFS: device label root devid 3 transid 138434 /dev/bcache2 scanned
>>> by btrfs (341)
>>> BTRFS: device label root devid 2 transid 138434 /dev/bcache1 scanned
>>> by btrfs (341)
>>> BTRFS: device label root devid 1 transid 138434 /dev/bcache0 scanned
>>> by btrfs (341)
>>> BTRFS info (device bcache0): disk space caching is enabled
>>> BTRFS info (device bcache0): has skinny extents
>>> BTRFS error (device bcache0): parent transid verify failed on
>>> 16984159518720 wanted 138414 found 138207
>>> BTRFS error (device bcache0): parent transid verify failed on
>>> 16984159518720 wanted 138414 found 138207
>>> BTRFS error (device bcache0): open_ctree failed
>> Looks like some tree blocks not written back correctly.
>>
>> Considering we don't have known write back related bugs with 5.6, I
>> guess bcache may be involved again?
> 
> A bit more details: the system started to misbehave.
> Interactive session was saying that the main file system became read/only.

Any dmesg of that RO event?
That would be the most valuable info to help us to locate the bug and
fix it.

I guess there is something wrong before that, and by somehow it
corrupted the extent tree, breaking the life keeping COW of metadata and
screwed up everything.

> And then the SSH disconnected and did not reconnect any more.
> It did not seem to reboot correctly after I've pressed the reboot
> button, so I did a hard rebooted.
> And now it could not mount the root partition any more.
>>> Trying to mount it in the recovery mode does not seem to work:
>>>
>>> [...]
>>>
>>> I have tried booting using a live ISO with 5.8.0 kernel and btrfs v5.6.1
>>> from http://defender.exton.net/.
>>> After booting tried mounting the bcache using the same command as above.
>>> The only message in the console was "Killed".
>>> /dev/kmsg on the other hand lists messages very similar to the ones I've
>>> seen in the initramfs environment: https://pastebin.com/Vhy072Mx
>> It looks like there is a chance to recover, as there is a rootbackup
>> with newer generation.
>>
>> While tree-checker is rejecting the newer generation one.
>>
>> The kernel panic is caused by some corner error handling with root
>> backups cleanups.
>> We need to fix it anyway.
>>
>> In this case, I guess "btrfs ins dump-super -fFa" output would help to
>> show if it's possible to recover.
> 
> Here is the output: https://pastebin.com/raw/DtJd813y

OK, the backup root is fine.

So this means, metadata COW is corrupted, which caused the transid mismatch.

> 
>> Anyway, something looks strange.
>>
>> The backup roots have a newer generation while the super block is still
>> old doesn't look correct at all.
> 
> Just in case, here is the output of "btrfs check", as suggested by "A L
> <mail@lechevalier.se>".  It does not seem to contain any new information.
> 
> parent transid verify failed on 16984014372864 wanted 138350 found 131117
> parent transid verify failed on 16984014405632 wanted 138350 found 131127
> parent transid verify failed on 16984013406208 wanted 138350 found 131112
> parent transid verify failed on 16984075436032 wanted 138384 found 131136
> parent transid verify failed on 16984075436032 wanted 138384 found 131136
> parent transid verify failed on 16984075436032 wanted 138384 found 131136
> Ignoring transid failure
> ERROR: child eb corrupted: parent bytenr=16984175853568 item=8 parent
> level=2 child level=0
> ERROR: failed to read block groups: Input/output error

Extent tree is completely screwed up, no wonder the transid error happens.

I don't believe it's reasonable possible to restore the fs to RW status.
The only remaining method left is btrfs-restore then.

> ERROR: cannot open file system
> Opening filesystem to check...
> 
> As I was running the commands I have accidentally run the following command:
> 
>     btrfs inspect-internal dump-super -fFa >/dev/bcache0 2>&1
> 
> Effectively overwriting the first 10kb of the partition :(

That's not a problem at all.
Btrfs reserves the first 0~1M space, so as long as you don't screw up
the super block at [64K, 68K) you're completely fine.

Thanks,
Qu
> 
> Seems like the superblock starts at 64kb.  So, I hope, this would not
> cause any more damage.
> 
> P.S. Thanks a lot for your reply Qu Wenruo!
> 
> Thank you,
> Illia
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  reply index

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-30 19:41 Illia Bobyr
2020-07-01  1:36 ` Qu Wenruo
2020-07-01 10:16   ` Illia Bobyr
2020-07-01 10:48     ` Qu Wenruo [this message]
2020-07-01 21:36       ` Illia Bobyr
2020-07-01 23:50         ` Qu Wenruo
  -- strict thread matches above, loose matches on Subject: below --
2020-06-30 19:26 Illia Bobyr
2020-06-30 19:55 ` Lukas Straub
2020-06-30  4:24 Illia Bobyr

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=39558ad7-dfb3-05f7-1583-181f76f2a93d@gmx.com \
    --to=quwenruo.btrfs@gmx.com \
    --cc=illia.bobyr@gmail.com \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-BTRFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-btrfs/0 linux-btrfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-btrfs linux-btrfs/ https://lore.kernel.org/linux-btrfs \
		linux-btrfs@vger.kernel.org
	public-inbox-index linux-btrfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-btrfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git