From: Tim Walberg <twalberg@comcast.net>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Tim Walberg <twalberg@comcast.net>, linux-btrfs@vger.kernel.org
Subject: Re: recovering from "parent transid verify failed"
Date: Thu, 15 Aug 2019 10:01:08 -0500 [thread overview]
Message-ID: <20190815150108.GF2731@comcast.net> (raw)
In-Reply-To: <1ce8ace9-b86a-19fb-0b4c-f6315c8e73b2@gmx.com>
Thanks for all the help!
If I get a chance later today, I may try the patch set, but in
the interest of getting things back online quicker, I may just
have to recreate and restore the recovered data. The snapshots
are no great loss - they're just one level of daily backups.
tw
On 08/15/2019 22:45 +0800, Qu Wenruo wrote:
>>
>>
>> On 2019/8/15 ??????10:21, Tim Walberg wrote:
>> > 'dump-super -Ffa' from all three devices attached.
>> >
>> > 'btrfs restore' did appear to recover most of the main data, minus
>> > snapshots, which would have greatly increased the required time and
>> > capacity, since I was recovering to XFS.
>>
>> That's why I recommend that experimental patchset, it will make the fs
>> mountable (RO though), with all btrfs snapshots available.
>>
>> >
>> > 'btrfs rescue chunk-recover' ran, but failed to fix anything.
>> > 'btrfs rescue super-recover' says all supers are fine.
>>
>> Those are useless for your case.
>>
>> >
>> > Initial corruption was due to a hard hang, which didn't leave enough
>> > crumbs to determine the source - might have been btrfs, might have
>> > been nvidia, might have been something completely different.
>>
>> Anyway, the corruption is a little strange.
>>
>> First of all, even hard hang/power loss shouldn't cause btrfs to
>> overwrite its tree block, thus even hard hang/power loss happens, btrfs
>> should be corrupted.
>>
>> But that's definitely not the case. (We have quite some such report, but
>> haven't pinned down the cause yet)
>>
>> Secondly, the generation of your fs is strange.
>> The latest geneartion of your tree root is 49750, matches with your
>> corrupted tree block, but your extent tree is definitely older.
>>
>> So it looks like, your super blocks (all nine!) reach disk before some
>> tree blocks reach the disk.
>>
>> Finally, the superblock doesn't record previous transaction correctly.
>> It doesn't has transaction of 49749 in its backup roots.
>>
>> Not 100% sure, but looks somewhat like the problem fixed by this patch:
>> Btrfs: fix race leading to fs corruption after transaction abortion
>>
>> It should get backported to all stable release recently.
>>
>> Thanks,
>> Qu
>>
>> >
>> >
>> > On 08/15/2019 22:07 +0800, Qu Wenruo wrote:
>> >>>
>> >>>
>> >>> On 2019/8/15 ??????9:52, Tim Walberg wrote:
>> >>> > Had to wait for 'btrfs recover' to finish before I proceed farther.
>> >>> >
>> >>> > Kernel is 4.19.45, tools are 4.19.1
>> >>> >
>> >>> > File system is a 3-disk RAID10 with WD3003FZEX (WD Black 3TB)
>> >>> >
>> >>> > Output from attempting to mount:
>> >>> >
>> >>> > # mount -o ro,usebackuproot /dev/sdc1 /mnt
>> >>> > mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
>> >>> > missing codepage or helper program, or other error
>> >>> >
>> >>> > In some cases useful info is found in syslog - try
>> >>> > dmesg | tail or so.
>> >>> >
>> >>> > Kernel messages from the mount attempt:
>> >>> >
>> >>> > [Thu Aug 15 08:47:42 2019] BTRFS info (device sdc1): trying to use backup root at mount time
>> >>> > [Thu Aug 15 08:47:42 2019] BTRFS info (device sdc1): disk space caching is enabled
>> >>> > [Thu Aug 15 08:47:42 2019] BTRFS info (device sdc1): has skinny extents
>> >>> > [Thu Aug 15 08:47:42 2019] BTRFS error (device sdc1): parent transid verify failed on 229846466560 wanted 49749 found 49750
>> >>> > [Thu Aug 15 08:47:42 2019] BTRFS error (device sdc1): parent transid verify failed on 229846466560 wanted 49749 found 49750
>> >>> > [Thu Aug 15 08:47:42 2019] BTRFS error (device sdc1): failed to read block groups: -5
>> >>>
>> >>> Extent tree corruption.
>> >>>
>> >>> So if that's the only corruption, you have a very high chance to recover
>> >>> most of your data.
>> >>>
>> >>> Btrfs rescue can work, or you can try the experimental patches which
>> >>> provides rescue=skip_bg mount option to allow you mount the fs RO and
>> >>> receive your data (later is way faster than user space rescue)
>> >>> https://patchwork.kernel.org/project/linux-btrfs/list/?series=130637
>> >>>
>> >>> Also, for your dump super output, it doesn't provide too much info.
>> >>>
>> >>> You would like to use -Ffa option for more info.
>> >>> Also, you could also try that on all 3 devices, to find out which one
>> >>> has lower generation.
>> >>>
>> >>> Also, please provide the history of the corruption.
>> >>> One generation corruptions is a little rare. Is sudden power loss
>> >>> involved in this case?
>> >>>
>> >>> Thanks,
>> >>> Qu
>> >>>
>> >>> > [Thu Aug 15 08:47:42 2019] BTRFS error (device sdc1): open_ctree failed
>> >>> >
>> >>> > Output from 'btrfs check -p /dev/sdc1':
>> >>> >
>> >>> > # btrfs check -p /dev/sdc1
>> >>> > Opening filesystem to check...
>> >>> > parent transid verify failed on 229846466560 wanted 49749 found 49750
>> >>> > Ignoring transid failure
>> >>> > ERROR: child eb corrupted: parent bytenr=229845336064 item=0 parent level=1 child level=2
>> >>> > ERROR: cannot open file system
>> >>> >
>> >>> >
>> >>> >
>> >>> > On 08/15/2019 10:35 +0800, Qu Wenruo wrote:
>> >>> >>>
>> >>> >>>
>> >>> >>> On 2019/8/15 ??????2:32, Tim Walberg wrote:
>> >>> >>> > Most of the recommendations I've found online deal with when "wanted" is
>> >>> >>> > greater than "found", which, if I understand correctly means that one or
>> >>> >>> > more transactions were interrupted/lost before fully committed.
>> >>> >>>
>> >>> >>> No matter what the case is, a proper transaction shouldn't have any tree
>> >>> >>> block overwritten.
>> >>> >>>
>> >>> >>> That means, either the FLUSH/FUA of the hardware/lower block layer is
>> >>> >>> screwed up, or the COW of tree block is already screwed up.
>> >>> >>>
>> >>> >>> >
>> >>> >>> > Are the recommendations for recovery the same if the system is reporting a
>> >>> >>> > "wanted" that is less than "found"?
>> >>> >>> >
>> >>> >>> The salvage is no difference than any transid mismatch, no matter if
>> >>> >>> it's larger or smaller.
>> >>> >>>
>> >>> >>> It depends on the tree block.
>> >>> >>>
>> >>> >>> Please provide full dmesg output and btrfs check for further advice.
>> >>> >>>
>> >>> >>> Thanks,
>> >>> >>> Qu
>> >>> >>>
>> >>> >
>> >>> >
>> >>> >
>> >>> >
>> >>>
>> >
>> >
>> >
>> > End of included message
>> >
>> >
>> >
>>
End of included message
--
+----------------------+
| Tim Walberg |
| 830 Carriage Dr. |
| Algonquin, IL 60102 |
| twalberg@comcast.net |
+----------------------+
next prev parent reply other threads:[~2019-08-15 15:01 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-08-14 18:32 recovering from "parent transid verify failed" Tim Walberg
2019-08-15 2:35 ` Qu Wenruo
2019-08-15 13:52 ` Tim Walberg
2019-08-15 14:07 ` Qu Wenruo
2019-08-15 14:21 ` Tim Walberg
2019-08-15 14:45 ` Qu Wenruo
2019-08-15 15:01 ` Tim Walberg [this message]
2019-08-15 13:55 ` Tim Walberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190815150108.GF2731@comcast.net \
--to=twalberg@comcast.net \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).