linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tim Walberg <twalberg@comcast.net>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: Tim Walberg <twalberg@comcast.net>, linux-btrfs@vger.kernel.org
Subject: Re: recovering from "parent transid verify failed"
Date: Thu, 15 Aug 2019 10:01:08 -0500	[thread overview]
Message-ID: <20190815150108.GF2731@comcast.net> (raw)
In-Reply-To: <1ce8ace9-b86a-19fb-0b4c-f6315c8e73b2@gmx.com>

Thanks for all the help!

If I get a chance later today, I may try the patch set, but in
the interest of getting things back online quicker, I may just
have to recreate and restore the recovered data. The snapshots
are no great loss - they're just one level of daily backups.

			tw



On 08/15/2019 22:45 +0800, Qu Wenruo wrote:
>>	
>>	
>>	On 2019/8/15 ??????10:21, Tim Walberg wrote:
>>	> 'dump-super -Ffa' from all three devices attached.
>>	> 
>>	> 'btrfs restore' did appear to recover most of the main data, minus
>>	> snapshots, which would have greatly increased the required time and
>>	> capacity, since I was recovering to XFS.
>>	
>>	That's why I recommend that experimental patchset, it will make the fs
>>	mountable (RO though), with all btrfs snapshots available.
>>	
>>	> 
>>	> 'btrfs rescue chunk-recover' ran, but failed to fix anything.
>>	> 'btrfs rescue super-recover' says all supers are fine.
>>	
>>	Those are useless for your case.
>>	
>>	> 
>>	> Initial corruption was due to a hard hang, which didn't leave enough
>>	> crumbs to determine the source - might have been btrfs, might have
>>	> been nvidia, might have been something completely different.
>>	
>>	Anyway, the corruption is a little strange.
>>	
>>	First of all, even hard hang/power loss shouldn't cause btrfs to
>>	overwrite its tree block, thus even hard hang/power loss happens, btrfs
>>	should be corrupted.
>>	
>>	But that's definitely not the case. (We have quite some such report, but
>>	haven't pinned down the cause yet)
>>	
>>	Secondly, the generation of your fs is strange.
>>	The latest geneartion of your tree root is 49750, matches with your
>>	corrupted tree block, but your extent tree is definitely older.
>>	
>>	So it looks like, your super blocks (all nine!) reach disk before some
>>	tree blocks reach the disk.
>>	
>>	Finally, the superblock doesn't record previous transaction correctly.
>>	It doesn't has transaction of 49749 in its backup roots.
>>	
>>	Not 100% sure, but looks somewhat like the problem fixed by this patch:
>>	Btrfs: fix race leading to fs corruption after transaction abortion
>>	
>>	It should get backported to all stable release recently.
>>	
>>	Thanks,
>>	Qu
>>	
>>	> 
>>	> 
>>	> On 08/15/2019 22:07 +0800, Qu Wenruo wrote:
>>	>>> 	
>>	>>> 	
>>	>>> 	On 2019/8/15 ??????9:52, Tim Walberg wrote:
>>	>>> 	> Had to wait for 'btrfs recover' to finish before I proceed farther.
>>	>>> 	> 
>>	>>> 	> Kernel is 4.19.45, tools are 4.19.1
>>	>>> 	> 
>>	>>> 	> File system is a 3-disk RAID10 with WD3003FZEX (WD Black 3TB)
>>	>>> 	> 
>>	>>> 	> Output from attempting to mount:
>>	>>> 	> 
>>	>>> 	> # mount -o ro,usebackuproot /dev/sdc1 /mnt
>>	>>> 	> mount: wrong fs type, bad option, bad superblock on /dev/sdc1,
>>	>>> 	>        missing codepage or helper program, or other error
>>	>>> 	> 
>>	>>> 	>        In some cases useful info is found in syslog - try
>>	>>> 	>        dmesg | tail or so.
>>	>>> 	> 
>>	>>> 	> Kernel messages from the mount attempt:
>>	>>> 	> 
>>	>>> 	> [Thu Aug 15 08:47:42 2019] BTRFS info (device sdc1): trying to use backup root at mount time
>>	>>> 	> [Thu Aug 15 08:47:42 2019] BTRFS info (device sdc1): disk space caching is enabled
>>	>>> 	> [Thu Aug 15 08:47:42 2019] BTRFS info (device sdc1): has skinny extents
>>	>>> 	> [Thu Aug 15 08:47:42 2019] BTRFS error (device sdc1): parent transid verify failed on 229846466560 wanted 49749 found 49750
>>	>>> 	> [Thu Aug 15 08:47:42 2019] BTRFS error (device sdc1): parent transid verify failed on 229846466560 wanted 49749 found 49750
>>	>>> 	> [Thu Aug 15 08:47:42 2019] BTRFS error (device sdc1): failed to read block groups: -5
>>	>>> 	
>>	>>> 	Extent tree corruption.
>>	>>> 	
>>	>>> 	So if that's the only corruption, you have a very high chance to recover
>>	>>> 	most of your data.
>>	>>> 	
>>	>>> 	Btrfs rescue can work, or you can try the experimental patches which
>>	>>> 	provides rescue=skip_bg mount option to allow you mount the fs RO and
>>	>>> 	receive your data (later is way faster than user space rescue)
>>	>>> 	https://patchwork.kernel.org/project/linux-btrfs/list/?series=130637
>>	>>> 	
>>	>>> 	Also, for your dump super output, it doesn't provide too much info.
>>	>>> 	
>>	>>> 	You would like to use -Ffa option for more info.
>>	>>> 	Also, you could also try that on all 3 devices, to find out which one
>>	>>> 	has lower generation.
>>	>>> 	
>>	>>> 	Also, please provide the history of the corruption.
>>	>>> 	One generation corruptions is a little rare. Is sudden power loss
>>	>>> 	involved in this case?
>>	>>> 	
>>	>>> 	Thanks,
>>	>>> 	Qu
>>	>>> 	
>>	>>> 	> [Thu Aug 15 08:47:42 2019] BTRFS error (device sdc1): open_ctree failed
>>	>>> 	> 
>>	>>> 	> Output from 'btrfs check -p /dev/sdc1':
>>	>>> 	> 
>>	>>> 	> # btrfs check -p /dev/sdc1
>>	>>> 	> Opening filesystem to check...
>>	>>> 	> parent transid verify failed on 229846466560 wanted 49749 found 49750
>>	>>> 	> Ignoring transid failure
>>	>>> 	> ERROR: child eb corrupted: parent bytenr=229845336064 item=0 parent level=1 child level=2
>>	>>> 	> ERROR: cannot open file system
>>	>>> 	> 
>>	>>> 	> 
>>	>>> 	> 
>>	>>> 	> On 08/15/2019 10:35 +0800, Qu Wenruo wrote:
>>	>>> 	>>> 	
>>	>>> 	>>> 	
>>	>>> 	>>> 	On 2019/8/15 ??????2:32, Tim Walberg wrote:
>>	>>> 	>>> 	> Most of the recommendations I've found online deal with when "wanted" is
>>	>>> 	>>> 	> greater than "found", which, if I understand correctly means that one or
>>	>>> 	>>> 	> more transactions were interrupted/lost before fully committed.
>>	>>> 	>>> 	
>>	>>> 	>>> 	No matter what the case is, a proper transaction shouldn't have any tree
>>	>>> 	>>> 	block overwritten.
>>	>>> 	>>> 	
>>	>>> 	>>> 	That means, either the FLUSH/FUA of the hardware/lower block layer is
>>	>>> 	>>> 	screwed up, or the COW of tree block is already screwed up.
>>	>>> 	>>> 	
>>	>>> 	>>> 	> 
>>	>>> 	>>> 	> Are the recommendations for recovery the same if the system is reporting a
>>	>>> 	>>> 	> "wanted" that is less than "found"?
>>	>>> 	>>> 	> 
>>	>>> 	>>> 	The salvage is no difference than any transid mismatch, no matter if
>>	>>> 	>>> 	it's larger or smaller.
>>	>>> 	>>> 	
>>	>>> 	>>> 	It depends on the tree block.
>>	>>> 	>>> 	
>>	>>> 	>>> 	Please provide full dmesg output and btrfs check for further advice.
>>	>>> 	>>> 	
>>	>>> 	>>> 	Thanks,
>>	>>> 	>>> 	Qu
>>	>>> 	>>> 	
>>	>>> 	> 
>>	>>> 	> 
>>	>>> 	> 
>>	>>> 	> 
>>	>>> 	
>>	> 
>>	> 
>>	> 
>>	> End of included message
>>	> 
>>	> 
>>	> 
>>	



End of included message



-- 
+----------------------+
| Tim Walberg          |
| 830 Carriage Dr.     |
| Algonquin, IL 60102  |
| twalberg@comcast.net |
+----------------------+

  reply	other threads:[~2019-08-15 15:01 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-14 18:32 recovering from "parent transid verify failed" Tim Walberg
2019-08-15  2:35 ` Qu Wenruo
2019-08-15 13:52   ` Tim Walberg
2019-08-15 14:07     ` Qu Wenruo
2019-08-15 14:21       ` Tim Walberg
2019-08-15 14:45         ` Qu Wenruo
2019-08-15 15:01           ` Tim Walberg [this message]
2019-08-15 13:55   ` Tim Walberg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190815150108.GF2731@comcast.net \
    --to=twalberg@comcast.net \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).