* Recover from Extent Tree Corruption (maybe due to hardware failure)
@ 2020-09-28 13:17 Marc Wittke
2020-09-28 15:32 ` Chris Murphy
2020-09-29 10:39 ` Marc Wittke
0 siblings, 2 replies; 5+ messages in thread
From: Marc Wittke @ 2020-09-28 13:17 UTC (permalink / raw)
To: linux-btrfs
Hi mailing-list,
yesterday I had a catastrophic file system corruption on my notebook.
The machine was running over night doing basically nothing, but when I had a look in the morning, the file system was mounted readonly. Not thinking a lot about it I decided to reboot the machine, but
it did not came up. Cryptsetup was able to open the volume, but the btrfs rootfs was unable to be mounted. I ended up in the rescue system.
In the meantime I dd-ed the bad partition to a USB disk, and finally reinstalled the system after various rescue attempts. However, I am missing one work of not-pushed development work :(
Can't provide you with all details of the failing system, since it isn't there any more. It was a full patched Fedora 32, so I think the following info from my current system is valid for the previous
as well. Please note that this is coming from the usb drive (/dev/sdc1) that was originally /dev/dm-1. The error messages are identical.
Disk type: intel 600p 2000GB nvme
kernel 5.8.11-200.fc32.x86_64 (unsure, might have been 5.9 already)
btrfs-progs v5.7
# btrfs fi show
Label: none uuid: 131112e7-6e32-474c-813a-9c1ce4292c18
Total devices 1 FS bytes used 535.72GiB
devid 1 size 1.83TiB used 538.02GiB path /dev/sdc1
# mount /dev/sdc1 /mnt
mount: /mnt: can't read superblock on /dev/sdc1.
# sudo btrfs rescue super-recover -v /dev/sdc1
All Devices:
Device: id = 1, name = /dev/sdc1
Before Recovering:
[All good supers]:
device name = /dev/sdc1
superblock bytenr = 65536
device name = /dev/sdc1
superblock bytenr = 67108864
device name = /dev/sdc1
superblock bytenr = 274877906944
[All bad supers]:
All supers are valid, no need to recover
# sudo btrfs restore -oi /dev/sdc1 /home/marc/rescued/
checksum verify failed on 385831911424 found 000000C0 wanted 0000001C
checksum verify failed on 385831911424 found 000000C0 wanted 0000001C
bad tree block 385831911424, bytenr mismatch, want=385831911424, have=1900825539188143805
checksum verify failed on 385831911424 found 000000C0 wanted 0000001C
checksum verify failed on 385831911424 found 000000C0 wanted 0000001C
bad tree block 385831911424, bytenr mismatch, want=385831911424, have=1900825539188143805
Error searching -5
Error searching /home/marc/rescued/etc/anaconda
checksum verify failed on 385831911424 found 000000C0 wanted 0000001C
checksum verify failed on 385831911424 found 000000C0 wanted 0000001C
bad tree block 385831911424, bytenr mismatch, want=385831911424, have=1900825539188143805
Error searching -5
... and so on ...
bad tree block 385831682048, bytenr mismatch, want=385831682048, have=18271273693833811190
Error searching -5
Error searching /home/marc/rescued/var
# btrfs check /dev/sdc1
Opening filesystem to check...
Checking filesystem on /dev/sdc1
UUID: 131112e7-6e32-474c-813a-9c1ce4292c18
[1/7] checking root items
checksum verify failed on 385811005440 found 000000D7 wanted 0000007E
checksum verify failed on 385811005440 found 000000D7 wanted 0000007E
bad tree block 385811005440, bytenr mismatch, want=385811005440, have=12032019063440798054
ERROR: failed to repair root items: Input/output error
[2/7] checking extents
checksum verify failed on 385829978112 found 000000EE wanted FFFFFFCD
checksum verify failed on 385829978112 found 000000EE wanted FFFFFFCD
bad tree block 385829978112, bytenr mismatch, want=385829978112, have=389172930910726983
checksum verify failed on 385829978112 found 000000EE wanted FFFFFFCD
checksum verify failed on 385829978112 found 000000EE wanted FFFFFFCD
bad tree block 385829978112, bytenr mismatch, want=385829978112, have=389172930910726983
checksum verify failed on 385829978112 found 000000EE wanted FFFFFFCD
...
backpointer mismatch on [568129818624 4096]
ref mismatch on [568129822720 4096] extent item 0, found 1
data backref 568129822720 root 5 owner 3387874 offset 32802598912 num_refs 0 not found in extent tree
incorrect local backref count on 568129822720 root 5 owner 3387874 offset 32802598912 found 1 wanted 0 back 0x55da67434c10
...
root 5 inode 4269638 errors 2001, no inode item, link count wrong
unresolved ref dir 267 index 0 namelen 3 name kde filetype 2 errors 6, no dir index, no inode ref
root 5 inode 4288305 errors 2001, no inode item, link count wrong
unresolved ref dir 267 index 0 namelen 6 name kde4rc filetype 1 errors 6, no dir index, no inode ref
...
ERROR: errors found in fs roots
found 575218155520 bytes used, error(s) found
total csum bytes: 547173716
total tree bytes: 3970580480
total fs tree bytes: 3200319488
total extent tree bytes: 159367168
btree space waste bytes: 603945834
file data blocks allocated: 3481346736128
referenced 559327043584
# dmesg (relevant portion)
Sep 28 09:46:10 localhost.localdomain kernel: BTRFS info (device sdc1): disk space caching is enabled
Sep 28 09:46:11 localhost.localdomain kernel: BTRFS info (device sdc1): has skinny extents
Sep 28 09:46:15 localhost.localdomain kernel: btree_readpage_end_io_hook: 1 callbacks suppressed
Sep 28 09:46:15 localhost.localdomain kernel: BTRFS error (device sdc1): bad tree block start, want 385831223296 have 17007041628713579106
Sep 28 09:46:15 localhost.localdomain kernel: BTRFS error (device sdc1): bad tree block start, want 385831223296 have 17007041628713579106
Sep 28 09:46:15 localhost.localdomain kernel: BTRFS error (device sdc1): bad tree block start, want 385831223296 have 17007041628713579106
Sep 28 09:46:15 localhost.localdomain kernel: BTRFS error (device sdc1): bad tree block start, want 385831223296 have 17007041628713579106
Sep 28 09:46:15 localhost.localdomain kernel: BTRFS error (device sdc1): bad tree block start, want 385831223296 have 17007041628713579106
Sep 28 09:46:15 localhost.localdomain kernel: BTRFS error (device sdc1): bad tree block start, want 385831223296 have 17007041628713579106
Sep 28 09:46:15 localhost.localdomain kernel: BTRFS error (device sdc1): bad tree block start, want 385831223296 have 17007041628713579106
Sep 28 09:46:15 localhost.localdomain kernel: BTRFS error (device sdc1): bad tree block start, want 385831223296 have 17007041628713579106
Sep 28 09:46:15 localhost.localdomain kernel: BTRFS error (device sdc1): bad tree block start, want 385831223296 have 17007041628713579106
Sep 28 09:46:15 localhost.localdomain kernel: BTRFS error (device sdc1): bad tree block start, want 385831223296 have 17007041628713579106
Sep 28 09:46:15 localhost.localdomain kernel: BTRFS error (device sdc1): could not do orphan cleanup -5
Sep 28 09:46:16 localhost.localdomain kernel: BTRFS: error (device sdc1) in __btrfs_free_extent:3069: errno=-5 IO failure
Sep 28 09:46:16 localhost.localdomain kernel: BTRFS: error (device sdc1) in btrfs_run_delayed_refs:2173: errno=-5 IO failure
Sep 28 09:46:16 localhost.localdomain kernel: BTRFS error (device sdc1): commit super ret -5
Sep 28 09:46:17 localhost.localdomain kernel: BTRFS error (device sdc1): open_ctree failed
I somehow managed to mount the filesystem readonly with a plethora of options that I copied from somewhere, but none of the directories was readable. Just a lot of question marks.
There is a chance that the drive failed physically (although 18 months old). After reinstallation the system did not boot again, since i ran out of time I sticked the old 256GB SATA SSD and
use it for now. Don't have access to a machine that supports nvme drives to run an extensive test, could run it over night later.
Any suggestions?
Thanks,
Marc
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Recover from Extent Tree Corruption (maybe due to hardware failure)
2020-09-28 13:17 Recover from Extent Tree Corruption (maybe due to hardware failure) Marc Wittke
@ 2020-09-28 15:32 ` Chris Murphy
2020-09-28 17:09 ` Marc Wittke
2020-09-29 10:39 ` Marc Wittke
1 sibling, 1 reply; 5+ messages in thread
From: Chris Murphy @ 2020-09-28 15:32 UTC (permalink / raw)
To: Marc Wittke; +Cc: Btrfs BTRFS
On Mon, Sep 28, 2020 at 7:18 AM Marc Wittke <marc@wittke-web.de> wrote:
> # mount /dev/sdc1 /mnt
> mount: /mnt: can't read superblock on /dev/sdc1.
What about 'mount -o ro,usebackuproot' ?
Also include dmesg if it fails, and include:
'btrfs insp dump-s -f /dev/'
> # sudo btrfs restore -oi /dev/sdc1 /home/marc/rescued/
From the backup roots in the super, and also using 'btrfs-find-root'
it might be possible to find another root tree to use. This was NVMe.
What were the mount options being used?
Another possibility is to recover by isolating a specific snapshot.
Are there any snapshots on this file system?
'btrfs restore --list-roots'
Might be easier to do this on #btrfs, irc.freenode.net because it's
kinda iterative.
--
Chris Murphy
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Recover from Extent Tree Corruption (maybe due to hardware failure)
2020-09-28 15:32 ` Chris Murphy
@ 2020-09-28 17:09 ` Marc Wittke
0 siblings, 0 replies; 5+ messages in thread
From: Marc Wittke @ 2020-09-28 17:09 UTC (permalink / raw)
To: Chris Murphy; +Cc: Btrfs BTRFS
On Mon, 2020-09-28 at 09:32 -0600, Chris Murphy wrote:
> What about 'mount -o ro,usebackuproot' ?
# mount -o ro,usebackuproot /dev/sdb1 /mnt
[ 3198.709815] BTRFS info (device sdb1): trying to use backup root at mount time
[ 3198.709819] BTRFS info (device sdb1): disk space caching is enabled
[ 3198.709821] BTRFS info (device sdb1): has skinny extents
# ls -l /mnt
[ 3210.859894] BTRFS error (device sdb1): bad tree block start, want 385831682048 have 18271273693833811190
[ 3210.876871] BTRFS error (device sdb1): bad tree block start, want 385831682048 have 18271273693833811190
[ 3210.877452] BTRFS error (device sdb1): bad tree block start, want 385831682048 have 18271273693833811190
[ 3210.877776] BTRFS error (device sdb1): bad tree block start, want 385831682048 have 18271273693833811190
[ 3210.878125] BTRFS error (device sdb1): bad tree block start, want 385831682048 have 18271273693833811190
[ 3210.878434] BTRFS error (device sdb1): bad tree block start, want 385831682048 have 18271273693833811190
[ 3210.878733] BTRFS error (device sdb1): bad tree block start, want 385831682048 have 18271273693833811190
[ 3210.879036] BTRFS error (device sdb1): bad tree block start, want 385831682048 have 18271273693833811190
[ 3210.879289] BTRFS error (device sdb1): bad tree block start, want 385831682048 have 18271273693833811190
[ 3210.879574] BTRFS error (device sdb1): bad tree block start, want 385831682048 have 18271273693833811190
ls: cannot access '/mnt/home': Input/output error
ls: cannot access '/mnt/lost+found': Input/output error
ls: cannot access '/mnt/media': Input/output error
ls: cannot access '/mnt/mnt': Input/output error
ls: cannot access '/mnt/opt': Input/output error
ls: cannot access '/mnt/srv': Input/output error
ls: cannot access '/mnt/tmp': Input/output error
ls: cannot access '/mnt/usr': Input/output error
ls: cannot access '/mnt/var': Input/output error
total 16
lrwxrwxrwx. 1 root root 7 Jan 28 2020 bin -> usr/bin
drwxr-xr-x. 1 root root 0 Sep 9 22:19 boot
drwxr-xr-x. 1 root root 0 Sep 9 22:19 dev
drwxr-xr-x. 1 root root 5402 Sep 24 19:23 etc
d?????????? ? ? ? ? ? home
lrwxrwxrwx. 1 root root 7 Jan 28 2020 lib -> usr/lib
lrwxrwxrwx. 1 root root 9 Jan 28 2020 lib64 -> usr/lib64
d?????????? ? ? ? ? ? lost+found
d?????????? ? ? ? ? ? media
d?????????? ? ? ? ? ? mnt
d?????????? ? ? ? ? ? opt
drwxr-xr-x. 1 root root 0 Sep 9 22:19 proc
dr-xr-x---. 1 root root 330 Sep 20 19:16 root
drwxr-xr-x. 1 root root 0 Sep 9 22:19 run
lrwxrwxrwx. 1 root root 8 Jan 28 2020 sbin -> usr/sbin
d?????????? ? ? ? ? ? srv
drwxr-xr-x. 1 root root 0 Sep 9 22:19 sys
d?????????? ? ? ? ? ? tmp
d?????????? ? ? ? ? ? usr
d?????????? ? ? ? ? ? var
> 'btrfs insp dump-s -f /dev/'
superblock: bytenr=65536, device=/dev/sdb1
---------------------------------------------------------
csum_type 0 (crc32c)
csum_size 4
csum 0xd7f04658 [match]
bytenr 65536
flags 0x1
( WRITTEN )
magic _BHRfS_M [match]
fsid 131112e7-6e32-474c-813a-9c1ce4292c18
metadata_uuid 131112e7-6e32-474c-813a-9c1ce4292c18
label
generation 310285
root 1104625664
sys_array_size 97
chunk_root_generation 307101
root_level 1
chunk_root 1097728
chunk_root_level 1
log_root 0
log_root_transid 0
log_root_level 0
total_bytes 2012737437696
bytes_used 575223103488
sectorsize 4096
nodesize 16384
leafsize (deprecated) 16384
stripesize 4096
root_dir 6
num_devices 1
compat_flags 0x0
compat_ro_flags 0x0
incompat_flags 0x161
( MIXED_BACKREF |
BIG_METADATA |
EXTENDED_IREF |
SKINNY_METADATA )
cache_generation 310285
uuid_tree_generation 310285
dev_item.uuid 0ca7ed2a-46e9-4234-a65b-5c25a657b8e7
dev_item.fsid 131112e7-6e32-474c-813a-9c1ce4292c18 [match]
dev_item.type 0
dev_item.total_bytes 2012737437696
dev_item.bytes_used 577694072832
dev_item.io_align 4096
dev_item.io_width 4096
dev_item.sector_size 4096
dev_item.devid 1
dev_item.dev_group 0
dev_item.seek_speed 0
dev_item.bandwidth 0
dev_item.generation 0
sys_chunk_array[2048]:
item 0 key (FIRST_CHUNK_TREE CHUNK_ITEM 1048576)
length 4194304 owner 2 stripe_len 65536 type SYSTEM
io_align 4096 io_width 4096 sector_size 4096
num_stripes 1 sub_stripes 0
stripe 0 devid 1 offset 1048576
dev_uuid 0ca7ed2a-46e9-4234-a65b-5c25a657b8e7
backup_roots[4]:
backup 0:
backup_tree_root: 1104625664 gen: 310285 level: 1
backup_chunk_root: 1097728 gen: 307101 level: 1
backup_extent_root: 1104642048 gen: 310285 level: 2
backup_fs_root: 1103724544 gen: 310283 level: 3
backup_dev_root: 1105625088 gen: 310285 level: 1
backup_csum_root: 1106067456 gen: 310285 level: 2
backup_total_bytes: 2012737437696
backup_bytes_used: 575223103488
backup_num_devices: 1
backup 1:
backup_tree_root: 385875853312 gen: 310282 level: 1
backup_chunk_root: 1097728 gen: 307101 level: 1
backup_extent_root: 385857273856 gen: 310282 level: 2
backup_fs_root: 385788215296 gen: 310282 level: 3
backup_dev_root: 385520975872 gen: 307101 level: 1
backup_csum_root: 385797603328 gen: 310282 level: 2
backup_total_bytes: 2012737437696
backup_bytes_used: 575223103488
backup_num_devices: 1
backup 2:
backup_tree_root: 1105707008 gen: 310283 level: 1
backup_chunk_root: 1097728 gen: 307101 level: 1
backup_extent_root: 1104625664 gen: 310283 level: 2
backup_fs_root: 0 gen: 0 level: 0
backup_dev_root: 1107591168 gen: 310283 level: 1
backup_csum_root: 1106591744 gen: 310283 level: 2
backup_total_bytes: 2012737437696
backup_bytes_used: 575223103488
backup_num_devices: 1
backup 3:
backup_tree_root: 1107755008 gen: 310284 level: 1
backup_chunk_root: 1097728 gen: 307101 level: 1
backup_extent_root: 1107771392 gen: 310284 level: 2
backup_fs_root: 0 gen: 0 level: 0
backup_dev_root: 1107591168 gen: 310283 level: 1
backup_csum_root: 1108672512 gen: 310284 level: 2
backup_total_bytes: 2012737437696
backup_bytes_used: 575223103488
backup_num_devices: 1
>
> > # sudo btrfs restore -oi /dev/sdc1 /home/marc/rescued/
>
> From the backup roots in the super, and also using 'btrfs-find-root'
> it might be possible to find another root tree to use.
# btrfs-find-root /dev/sdb1
Superblock thinks the generation is 310285
Superblock thinks the level is 1
Found tree root at 1104625664 gen 310285 level 1
> This was NVMe. What were the mount options being used?
Good question, didn't fiddle around a lot with the basic setup of fedora. This is the fstab as it is recoverable from the partition
UUID=131112e7-6e32-474c-813a-9c1ce4292c18 / btrfs defaults,x-systemd.device-timeout=0 0 0
UUID=106fdbec-741f-4eca-9fba-3043cb80ad87 /boot ext4 defaults 1 2
UUID=F291-4D24 /boot/efi vfat umask=0077,shortname=winnt 0 2
/dev/mapper/lvmgroup--0-00 none swap defaults,x-systemd.device-timeout=0 0 0
> Another possibility is to recover by isolating a specific snapshot.Are there any snapshots on this file system?
# btrfs restore --list-roots /dev/sdb1
tree key (EXTENT_TREE ROOT_ITEM 0) 1104642048 level 2
tree key (DEV_TREE ROOT_ITEM 0) 1105625088 level 1
tree key (FS_TREE ROOT_ITEM 0) 1103724544 level 3
tree key (CSUM_TREE ROOT_ITEM 0) 1106067456 level 2
tree key (UUID_TREE ROOT_ITEM 0) 261371854848 level 0
tree key (624 ROOT_ITEM 0) 1131315200 level 0
tree key (DATA_RELOC_TREE ROOT_ITEM 0) 5390336 level 0
>
> kinda iterative.
> Might be easier to do this on #btrfs, irc.freenode.net because it's
I joined, but "cannot send to nick/channel". Investigating... it's like 15 years that I didn't use IRC back in university days.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Recover from Extent Tree Corruption (maybe due to hardware failure)
2020-09-28 13:17 Recover from Extent Tree Corruption (maybe due to hardware failure) Marc Wittke
2020-09-28 15:32 ` Chris Murphy
@ 2020-09-29 10:39 ` Marc Wittke
2020-09-29 10:42 ` Christoph Hellwig
1 sibling, 1 reply; 5+ messages in thread
From: Marc Wittke @ 2020-09-29 10:39 UTC (permalink / raw)
To: linux-btrfs
On Mon, 2020-09-28 at 10:17 -0300, Marc Wittke wrote:
>
> Disk type: intel 600p 2000GB nvme
Update: the disk seems to be fine. badblocks did two and a half passes over night without finding errors.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Recover from Extent Tree Corruption (maybe due to hardware failure)
2020-09-29 10:39 ` Marc Wittke
@ 2020-09-29 10:42 ` Christoph Hellwig
0 siblings, 0 replies; 5+ messages in thread
From: Christoph Hellwig @ 2020-09-29 10:42 UTC (permalink / raw)
To: Marc Wittke; +Cc: linux-btrfs
On Tue, Sep 29, 2020 at 07:39:27AM -0300, Marc Wittke wrote:
> On Mon, 2020-09-28 at 10:17 -0300, Marc Wittke wrote:
> >
> > Disk type: intel 600p 2000GB nvme
>
> Update: the disk seems to be fine. badblocks did two and a half passes over night without finding errors.
FYI, the Intel 600p is the most buggy common NVMe controller.
Older firmware versions are known to corrupt data when the OS commonly
writes multiple 512 byte buffers inside of a 4k boundary, something that
happens frequently with XFS.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-09-29 10:42 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-28 13:17 Recover from Extent Tree Corruption (maybe due to hardware failure) Marc Wittke
2020-09-28 15:32 ` Chris Murphy
2020-09-28 17:09 ` Marc Wittke
2020-09-29 10:39 ` Marc Wittke
2020-09-29 10:42 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.