From: Patrick Dijkgraaf <bolderbast@duckstad.net>
To: Qu Wenruo <quwenruo.btrfs@gmx.com>, linux-btrfs@vger.kernel.org
Subject: Re: Need help with potential ~45TB dataloss
Date: Sun, 02 Dec 2018 10:03:23 +0100 [thread overview]
Message-ID: <c723ab8e8e99e54c6f93dc70f897725525f0db29.camel@duckstad.net> (raw)
In-Reply-To: <6ce9cd01-960f-af3d-0273-0b9abfa1d4f8@gmx.com>
Hi Qu,
Thanks for helping me!
Please see the reponses in-line.
Any suggestions based on this?
Thanks!
On Sat, 2018-12-01 at 07:57 +0800, Qu Wenruo wrote:
> On 2018/11/30 下午9:53, Patrick Dijkgraaf wrote:
> > Hi all,
> >
> > I have been a happy BTRFS user for quite some time. But now I'm
> > facing
> > a potential ~45TB dataloss... :-(
> > I hope someone can help!
> >
> > I have Server A and Server B. Both having a 20-devices BTRFS RAID6
> > filesystem. Because of known RAID5/6 risks, Server B was a backup
> > of
> > Server A.
> > After applying updates to server B and reboot, the FS would not
> > mount
> > anymore. Because it was "just" a backup. I decided to recreate the
> > FS
> > and perform a new backup. Later, I discovered that the FS was not
> > broken, but I faced this issue:
> > https://patchwork.kernel.org/patch/10694997/
> >
>
> Sorry for the inconvenience.
>
> I didn't realize the max_chunk_size limit isn't reliable at that
> timing.
No problem, I should not have jumped to the conclusion to recreate the
backup volume.
> > Anyway, the FS was already recreated, so I needed to do a new
> > backup.
> > During the backup (using rsync -vah), Server A (the source)
> > encountered
> > an I/O error and my rsync failed. In an attempt to "quick fix" the
> > issue, I rebooted Server A after which the FS would not mount
> > anymore.
>
> Did you have any dmesg about that IO error?
Yes there was. But I omitted capturing it... The system is now rebooted
and I can't retrieve it anymore. :-(
> And how is the reboot scheduled? Forced power off or normal reboot
> command?
The system was rebooted using a normal reboot command.
> > I documented what I have tried, below. I have not yet tried
> > anything
> > except what is shown, because I am afraid of causing more harm to
> > the FS.
>
> Pretty clever, no btrfs check --repair is a pretty good move.
>
> > I hope somebody here can give me advice on how to (hopefully)
> > retrieve my data...
> >
> > Thanks in advance!
> >
> > ==========================================
> >
> > [root@cornelis ~]# btrfs fi show
> > Label: 'cornelis-btrfs' uuid: ac643516-670e-40f3-aa4c-f329fc3795fd
> > Total devices 1 FS bytes used 463.92GiB
> > devid 1 size 800.00GiB used 493.02GiB path
> > /dev/mapper/cornelis-cornelis--btrfs
> >
> > Label: 'data' uuid: 4c66fa8b-8fc6-4bba-9d83-02a2a1d69ad5
> > Total devices 20 FS bytes used 44.85TiB
> > devid 1 size 3.64TiB used 3.64TiB path /dev/sdn2
> > devid 2 size 3.64TiB used 3.64TiB path /dev/sdp2
> > devid 3 size 3.64TiB used 3.64TiB path /dev/sdu2
> > devid 4 size 3.64TiB used 3.64TiB path /dev/sdx2
> > devid 5 size 3.64TiB used 3.64TiB path /dev/sdh2
> > devid 6 size 3.64TiB used 3.64TiB path /dev/sdg2
> > devid 7 size 3.64TiB used 3.64TiB path /dev/sdm2
> > devid 8 size 3.64TiB used 3.64TiB path /dev/sdw2
> > devid 9 size 3.64TiB used 3.64TiB path /dev/sdj2
> > devid 10 size 3.64TiB used 3.64TiB path /dev/sdt2
> > devid 11 size 3.64TiB used 3.64TiB path /dev/sdk2
> > devid 12 size 3.64TiB used 3.64TiB path /dev/sdq2
> > devid 13 size 3.64TiB used 3.64TiB path /dev/sds2
> > devid 14 size 3.64TiB used 3.64TiB path /dev/sdf2
> > devid 15 size 7.28TiB used 588.80GiB path /dev/sdr2
> > devid 16 size 7.28TiB used 588.80GiB path /dev/sdo2
> > devid 17 size 7.28TiB used 588.80GiB path /dev/sdv2
> > devid 18 size 7.28TiB used 588.80GiB path /dev/sdi2
> > devid 19 size 7.28TiB used 588.80GiB path /dev/sdl2
> > devid 20 size 7.28TiB used 588.80GiB path /dev/sde2
> >
> > [root@cornelis ~]# mount /dev/sdn2 /mnt/data
> > mount: /mnt/data: wrong fs type, bad option, bad superblock on
> > /dev/sdn2, missing codepage or helper program, or other error.
>
> What is the dmesg of the mount failure?
[Sun Dec 2 09:41:08 2018] BTRFS info (device sdn2): disk space caching
is enabled
[Sun Dec 2 09:41:08 2018] BTRFS info (device sdn2): has skinny extents
[Sun Dec 2 09:41:08 2018] BTRFS error (device sdn2): parent transid
verify failed on 46451963543552 wanted 114401 found 114173
[Sun Dec 2 09:41:08 2018] BTRFS critical (device sdn2): corrupt leaf:
root=2 block=46451963543552 slot=0, unexpected item end, have
1387359977 expect 16283
[Sun Dec 2 09:41:08 2018] BTRFS warning (device sdn2): failed to read
tree root
[Sun Dec 2 09:41:08 2018] BTRFS error (device sdn2): open_ctree failed
> And have you tried -o ro,degraded ?
Tried it just now, gives the exact same error.
> > [root@cornelis ~]# btrfs check /dev/sdn2
> > Opening filesystem to check...
> > parent transid verify failed on 46451963543552 wanted 114401 found
> > 114173
> > parent transid verify failed on 46451963543552 wanted 114401 found
> > 114173
> > checksum verify failed on 46451963543552 found A8F2A769 wanted
> > 4C111ADF
> > checksum verify failed on 46451963543552 found 32153BE8 wanted
> > 8B07ABE4
> > checksum verify failed on 46451963543552 found 32153BE8 wanted
> > 8B07ABE4
> > bad tree block 46451963543552, bytenr mismatch,
> > want=46451963543552,
> > have=75208089814272
> > Couldn't read tree root
>
> Would you please also paste the output of "btrfs ins dump-super
> /dev/sdn2" ?
[root@cornelis ~]# btrfs ins dump-super /dev/sdn2
superblock: bytenr=65536, device=/dev/sdn2
---------------------------------------------------------
csum_type 0 (crc32c)
csum_size 4
csum 0x51725c39 [match]
bytenr 65536
flags 0x1
( WRITTEN )
magic _BHRfS_M [match]
fsid 4c66fa8b-8fc6-4bba-9d83-02a2a1d69ad5
label data
generation 114401
root 46451963543552
sys_array_size 513
chunk_root_generation 112769
root_level 1
chunk_root 22085632
chunk_root_level 1
log_root 46451935461376
log_root_transid 0
log_root_level 0
total_bytes 104020314161152
bytes_used 49308554543104
sectorsize 4096
nodesize 16384
leafsize (deprecated) 16384
stripesize 4096
root_dir 6
num_devices 20
compat_flags 0x0
compat_ro_flags 0x0
incompat_flags 0x1e1
( MIXED_BACKREF |
BIG_METADATA |
EXTENDED_IREF |
RAID56 |
SKINNY_METADATA )
cache_generation 114401
uuid_tree_generation 114401
dev_item.uuid c6b44903-e849-4403-98c4-f3ba4d0b3fc3
dev_item.fsid 4c66fa8b-8fc6-4bba-9d83-02a2a1d69ad5 [match]
dev_item.type 0
dev_item.total_bytes 4000783007744
dev_item.bytes_used 4000781959168
dev_item.io_align 4096
dev_item.io_width 4096
dev_item.sector_size 4096
dev_item.devid 1
dev_item.dev_group 0
dev_item.seek_speed 0
dev_item.bandwidth 0
dev_item.generation 0
> It looks like your tree root (or at least some tree root nodes/leaves
> get corrupted)
>
> > ERROR: cannot open file system
>
> And since it's your tree root corrupted, you could also try
> "btrfs-find-root <device>" to try to get a good old copy of your tree
> root.
The output is rather long. I pasted it here:
https://pastebin.com/FkyBLgj9
I'm unsure what to look for in this output?
> But I suspect the corruption happens before you noticed, thus the old
> tree root may not help much.
>
> Also, the output of "btrfs ins dump-tree -t root <device>" will help.
Here it is:
[root@cornelis ~]# btrfs ins dump-tree -t root /dev/sdn2
btrfs-progs v4.19
parent transid verify failed on 46451963543552 wanted 114401 found
114173
parent transid verify failed on 46451963543552 wanted 114401 found
114173
checksum verify failed on 46451963543552 found A8F2A769 wanted 4C111ADF
checksum verify failed on 46451963543552 found 32153BE8 wanted 8B07ABE4
checksum verify failed on 46451963543552 found 32153BE8 wanted 8B07ABE4
bad tree block 46451963543552, bytenr mismatch, want=46451963543552,
have=75208089814272
Couldn't read tree root
ERROR: unable to open /dev/sdn2
> Thanks,
> Qu
No, thank YOU! :-)
> > [root@cornelis ~]# btrfs restore /dev/sdn2 /mnt/data/
> > parent transid verify failed on 46451963543552 wanted 114401 found
> > 114173
> > parent transid verify failed on 46451963543552 wanted 114401 found
> > 114173
> > checksum verify failed on 46451963543552 found A8F2A769 wanted
> > 4C111ADF
> > checksum verify failed on 46451963543552 found 32153BE8 wanted
> > 8B07ABE4
> > checksum verify failed on 46451963543552 found 32153BE8 wanted
> > 8B07ABE4
> > bad tree block 46451963543552, bytenr mismatch,
> > want=46451963543552,
> > have=75208089814272
> > Couldn't read tree root
> > Could not open root, trying backup super
> > warning, device 14 is missing
> > warning, device 13 is missing
> > warning, device 12 is missing
> > warning, device 11 is missing
> > warning, device 10 is missing
> > warning, device 9 is missing
> > warning, device 8 is missing
> > warning, device 7 is missing
> > warning, device 6 is missing
> > warning, device 5 is missing
> > warning, device 4 is missing
> > warning, device 3 is missing
> > warning, device 2 is missing
> > checksum verify failed on 22085632 found 5630EA32 wanted 1AA6FFF0
> > checksum verify failed on 22085632 found 5630EA32 wanted 1AA6FFF0
> > bad tree block 22085632, bytenr mismatch, want=22085632,
> > have=1147797504
> > ERROR: cannot read chunk root
> > Could not open root, trying backup super
> > warning, device 14 is missing
> > warning, device 13 is missing
> > warning, device 12 is missing
> > warning, device 11 is missing
> > warning, device 10 is missing
> > warning, device 9 is missing
> > warning, device 8 is missing
> > warning, device 7 is missing
> > warning, device 6 is missing
> > warning, device 5 is missing
> > warning, device 4 is missing
> > warning, device 3 is missing
> > warning, device 2 is missing
> > checksum verify failed on 22085632 found 5630EA32 wanted 1AA6FFF0
> > checksum verify failed on 22085632 found 5630EA32 wanted 1AA6FFF0
> > bad tree block 22085632, bytenr mismatch, want=22085632,
> > have=1147797504
> > ERROR: cannot read chunk root
> > Could not open root, trying backup super
> >
> > [root@cornelis ~]# uname -r
> > 4.18.16-arch1-1-ARCH
> >
> > [root@cornelis ~]# btrfs --version
> > btrfs-progs v4.19
> >
--
Groet / Cheers,
Patrick Dijkgraaf
next prev parent reply other threads:[~2018-12-02 9:03 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-11-30 13:53 Need help with potential ~45TB dataloss Patrick Dijkgraaf
2018-11-30 23:57 ` Qu Wenruo
2018-12-02 9:03 ` Patrick Dijkgraaf [this message]
2018-12-02 20:14 ` Patrick Dijkgraaf
2018-12-02 20:30 ` Andrei Borzenkov
2018-12-03 5:58 ` Qu Wenruo
2018-12-04 3:16 ` Chris Murphy
2018-12-04 10:09 ` Patrick Dijkgraaf
2018-12-04 19:38 ` Chris Murphy
2018-12-09 9:28 ` Patrick Dijkgraaf
2018-12-03 0:35 ` Qu Wenruo
2018-12-03 0:45 ` Qu Wenruo
2018-12-04 9:58 ` Patrick Dijkgraaf
2018-12-09 9:32 ` Patrick Dijkgraaf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c723ab8e8e99e54c6f93dc70f897725525f0db29.camel@duckstad.net \
--to=bolderbast@duckstad.net \
--cc=linux-btrfs@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).