All of lore.kernel.org
 help / color / mirror / Atom feed
* BTRFS critical: corrupt leaf, slot offset bad; then read-only
@ 2017-02-21 14:12 Lukas Tribus
  2017-02-22  7:44 ` Lukas Tribus
  0 siblings, 1 reply; 9+ messages in thread
From: Lukas Tribus @ 2017-02-21 14:12 UTC (permalink / raw)
  To: linux-btrfs

[-- Attachment #1: Type: text/plain, Size: 6153 bytes --]

Hi list!


I have btrfs pool consisting of 5x 2,72 TiB LUKS (dm-crypt) partitions 
in RAID1, mounted on Linux 4.4 with btrfs-progs 4.4. I never had any 
crashes or power loss here, but recently about every 60 - 120 minutes 
(while in use) btrfs detects corruptions, aborts the transaction and 
drops to read-only mode.
btrfs still mounts normally without any special options (it does take 
about 60 seconds, which I guess is normal for this kind of size). All 
LUKS partitions have at least 400GiB of free space.

I don't see any HW problems here; I doubt there is a corruption coming 
from the LUKS partition. I did test the RAM but it seems fine in 
multiple memtest86+ amd memtest86 runs.


Are there any known bugs in 4.4? Any suggestions would be greatly 
appreciated!


I have to admit I did not regularly scrub.


Thanks,
Lukas


---
~# uname -a
Linux srv1-dom0 4.4.0-63-generic #84-Ubuntu SMP Wed Feb 1 17:20:32 UTC 
2017 x86_64 x86_64 x86_64 GNU/Linux
~# btrfs --version
btrfs-progs v4.4
~# btrfs fi show
Label: 'dom0-os'  uuid: e475636c-21e0-4563-87d6-91f03c519a62
         Total devices 5 FS bytes used 3.52GiB
         devid    1 size 10.00GiB used 3.53GiB path /dev/sda2
         devid    2 size 10.00GiB used 4.25GiB path /dev/sdb2
         devid    3 size 10.00GiB used 3.28GiB path /dev/sdc2
         devid    4 size 10.00GiB used 4.00GiB path /dev/sdd2
         devid    5 size 10.00GiB used 4.00GiB path /dev/sde2

Label: 'storage_pool'  uuid: f50f980e-7640-49c7-bf8d-20d55cfe6005
         Total devices 5 FS bytes used 5.77TiB
         devid    1 size 2.72TiB used 2.31TiB path /dev/mapper/sda3_crypt
         devid    2 size 2.72TiB used 2.31TiB path /dev/mapper/sdb3_crypt
         devid    3 size 2.72TiB used 2.31TiB path /dev/mapper/sdc3_crypt
         devid    4 size 2.72TiB used 2.31TiB path /dev/mapper/sdd3_crypt
         devid    5 size 2.72TiB used 2.31TiB path /dev/mapper/sde3_crypt
~# btrfs fi df /storage/users/
Data, RAID1: total=5.77TiB, used=5.76TiB
System, RAID1: total=32.00MiB, used=832.00KiB
Metadata, RAID1: total=8.00GiB, used=6.96GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
~#

~#

partial dmesg:
[ 1509.033492] BTRFS: device label storage_pool devid 1 transid 238135 
/dev/dm-5
[ 1510.498804] BTRFS: device label storage_pool devid 2 transid 238135 
/dev/dm-6
[ 1511.980968] BTRFS: device label storage_pool devid 3 transid 238135 
/dev/dm-7
[ 1513.461799] BTRFS: device label storage_pool devid 4 transid 238135 
/dev/dm-8
[ 1514.838757] BTRFS: device label storage_pool devid 5 transid 238135 
/dev/dm-9
[ 1517.726471] BTRFS info (device dm-9): btrfs: use no compression
[ 1517.726477] BTRFS info (device dm-9): disk space caching is enabled
[ 1517.726479] BTRFS: has skinny extents
[ 1569.598633] BTRFS: checking UUID tree
[ 3540.825747] BTRFS critical (device dm-9): corrupt leaf, slot offset 
bad: block=5242107641856,root=1, slot=39
[ 3540.836168] BTRFS critical (device dm-9): corrupt leaf, slot offset 
bad: block=5242107641856,root=1, slot=39
[ 3540.846413] ------------[ cut here ]------------
[ 3540.846432] WARNING: CPU: 2 PID: 2757 at 
/build/linux-mPTI9s/linux-4.4.0/fs/btrfs/extent-tree.c:2930 
btrfs_run_delayed_refs+0x26b/0x2a0 [btrfs]()
[ 3540.846433] BTRFS: Transaction aborted (error -5)
[ 3540.846434] Modules linked in: algif_skcipher af_alg xen_gntdev 
xen_evtchn xenfs xen_privcmd drbg ansi_cprng dm_crypt nls_iso8859_1 
bridge stp llc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel serio_raw joydev 
input_leds nuvoton_cir 8250_fintek ie31200_edac mac_hid rc_core lpc_ich 
edac_core shpchp mei_me mei ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad 
ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi 
autofs4 btrfs raid10 raid456 async_raid6_recov async_memcpy async_pq 
async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear 
hid_generic usbhid hid mxm_wmi i915 i2c_algo_bit drm_kms_helper 
aesni_intel aes_x86_64 glue_helper syscopyarea sysfillrect firewire_ohci 
sysimgblt firewire_core fb_sys_fops lrw psmouse
[ 3540.846466]  tg3 gf128mul ablk_helper cryptd crc_itu_t ptp ahci drm 
pps_core libahci fjes wmi video
[ 3540.846473] CPU: 2 PID: 2757 Comm: btrfs-transacti Not tainted 
4.4.0-63-generic #84-Ubuntu
[ 3540.846475] Hardware name: To Be Filled By O.E.M. To Be Filled By 
O.E.M./Z77 Extreme6, BIOS P2.80 07/01/2013
[ 3540.846476]  0000000000000200 0000000002709bc3 ffff88007615fc90 
ffffffff813f8083
[ 3540.846478]  ffff88007615fcd8 ffffffffc048d498 ffff88007615fcc8 
ffffffff810812d2
[ 3540.846479]  ffff8802adf562f8 ffff8802a9c71800 ffff8800056caef0 
ffffffffffffffff
[ 3540.846481] Call Trace:
[ 3540.846486]  [<ffffffff813f8083>] dump_stack+0x63/0x90
[ 3540.846489]  [<ffffffff810812d2>] warn_slowpath_common+0x82/0xc0
[ 3540.846491]  [<ffffffff8108136c>] warn_slowpath_fmt+0x5c/0x80
[ 3540.846500]  [<ffffffffc03f16cd>] ? 
__btrfs_run_delayed_refs+0xcdd/0x1220 [btrfs]
[ 3540.846509]  [<ffffffffc03f4cab>] btrfs_run_delayed_refs+0x26b/0x2a0 
[btrfs]
[ 3540.846520]  [<ffffffffc04837b2>] commit_cowonly_roots+0x22b/0x2c2 
[btrfs]
[ 3540.846530]  [<ffffffffc040a1b6>] 
btrfs_commit_transaction+0x576/0xa90 [btrfs]
[ 3540.846533]  [<ffffffff810c41e0>] ? wake_atomic_t_function+0x60/0x60
[ 3540.846542]  [<ffffffffc04052e9>] transaction_kthread+0x229/0x240 [btrfs]
[ 3540.846558]  [<ffffffffc04050c0>] ? 
btrfs_cleanup_transaction+0x570/0x570 [btrfs]
[ 3540.846560]  [<ffffffff810a0ba8>] kthread+0xd8/0xf0
[ 3540.846562]  [<ffffffff810a0ad0>] ? kthread_create_on_node+0x1e0/0x1e0
[ 3540.846564]  [<ffffffff8183c98f>] ret_from_fork+0x3f/0x70
[ 3540.846566]  [<ffffffff810a0ad0>] ? kthread_create_on_node+0x1e0/0x1e0
[ 3540.846567] ---[ end trace 70830ce6f0e320dd ]---
[ 3540.846587] BTRFS: error (device dm-9) in 
btrfs_run_delayed_refs:2930: errno=-5 IO failure
[ 3540.855086] BTRFS info (device dm-9): forced readonly
[ 3540.855088] BTRFS warning (device dm-9): Skipping commit of aborted 
transaction.
[ 3540.855090] BTRFS: error (device dm-9) in cleanup_transaction:1746: 
errno=-5 IO failure



full dmesg attached and online at:
http://pastebin.com/raw/K8FNNEnS



[-- Attachment #2: dmesg-4.4.0-59-generic-btrfs-trace2.gz --]
[-- Type: application/gzip, Size: 21771 bytes --]

[-- Attachment #3: dmesg-4.4.0-47-generic-btrfs-trace3.gz --]
[-- Type: application/gzip, Size: 24291 bytes --]

[-- Attachment #4: dmesg-4.4.0-63-generic-btrfs-trace4.gz --]
[-- Type: application/gzip, Size: 20436 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BTRFS critical: corrupt leaf, slot offset bad; then read-only
  2017-02-21 14:12 BTRFS critical: corrupt leaf, slot offset bad; then read-only Lukas Tribus
@ 2017-02-22  7:44 ` Lukas Tribus
  2017-02-22 19:16   ` Lukas Tribus
  2017-02-22 19:40   ` Hans van Kranenburg
  0 siblings, 2 replies; 9+ messages in thread
From: Lukas Tribus @ 2017-02-22  7:44 UTC (permalink / raw)
  To: linux-btrfs

Upgrading to 4.8, the FS no longer causes a kernel calltrace and does 
not go read-only. It only shows the "corrupt leaf, slot offset bad" message.

A scrub completed without errors on 3 devices, while it was aborted on 2 
devices. Not sure why it was aborted, since there is no error message in 
dmesg?


Any suggestions why the scrub was aborted?



# uname -a
Linux srv1-dom0 4.8.0-36-generic #36~16.04.1-Ubuntu SMP Sun Feb 5 
09:39:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
# btrfs scrub status /storage/users/
scrub status for f50f980e-7640-49c7-bf8d-20d55cfe6005
         scrub started at Wed Feb 22 00:07:33 2017 and was aborted after 
06:35:42
         total bytes scrubbed: 10.60TiB with 0 errors
/# btrfs scrub status /storage/users/ -d
scrub status for f50f980e-7640-49c7-bf8d-20d55cfe6005
scrub device /dev/dm-5 (id 1) history
         scrub started at Wed Feb 22 00:07:33 2017 and finished after 
06:35:36
         total bytes scrubbed: 2.30TiB with 0 errors
scrub device /dev/dm-6 (id 2) history
         scrub started at Wed Feb 22 00:07:33 2017 and finished after 
06:35:30
         total bytes scrubbed: 2.30TiB with 0 errors
scrub device /dev/dm-7 (id 3) history
         scrub started at Wed Feb 22 00:07:33 2017 and finished after 
06:35:42
         total bytes scrubbed: 2.30TiB with 0 errors
scrub device /dev/dm-8 (id 4) history
         scrub started at Wed Feb 22 00:07:33 2017 and was aborted after 
05:01:37
         total bytes scrubbed: 1.85TiB with 0 errors
scrub device /dev/mapper/sde3_crypt (id 5) history
         scrub started at Wed Feb 22 00:07:33 2017 and was aborted after 
05:01:37
         total bytes scrubbed: 1.85TiB with 0 errors
#dmesg | grep BTRFS
[  929.737119] BTRFS critical (device dm-9): corrupt leaf, slot offset 
bad: block=5242107641856,root=1, slot=39
[19772.594129] BTRFS critical (device dm-9): corrupt leaf, slot offset 
bad: block=5242107641856,root=1, slot=39
[19777.127704] BTRFS critical (device dm-9): corrupt leaf, slot offset 
bad: block=5242107641856,root=1, slot=39
[19777.552191] BTRFS critical (device dm-9): corrupt leaf, slot offset 
bad: block=5242107641856,root=1, slot=39
#


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BTRFS critical: corrupt leaf, slot offset bad; then read-only
  2017-02-22  7:44 ` Lukas Tribus
@ 2017-02-22 19:16   ` Lukas Tribus
  2017-02-22 19:40   ` Hans van Kranenburg
  1 sibling, 0 replies; 9+ messages in thread
From: Lukas Tribus @ 2017-02-22 19:16 UTC (permalink / raw)
  To: linux-btrfs

I did a "btrfs check" (--readonly):

Summary:
589x filetype 1 errors 4, no inode ref (--> Files)
597x filetype 2 errors 4, no inode ref (--> Directories)
1183x root xxx inode YYYYYY errors 2001, no inode item, link count wrong

I looked at a handful of reported files which are verifiable via public 
MD5/SHA1 checksums and they are not corrupted, the checksum is correct.

Any hints or suggestions would be much appreciated, please see below for 
the btrfs check output (repeating lines omitted and some filenames 
redacted):

Checking filesystem on /dev/dm-9
UUID: f50f980e-7640-49c7-bf8d-20d55cfe6005
checking extents [.]
[...]
incorrect offsets 14927 14415
bad block 5242107641856

Errors found in extent allocation tree or chunk allocation
checking free space cache [.]
[...]
checking fs roots [.]
[...]
incorrect offsets 14927 14415
incorrect offsets 14927 14415
root 261 inode 127094 errors 500, file extent discount, nbytes wrong
Found file extent holes:
     start: 0, len: 499712
     unresolved ref dir 127093 index 2 namelen 24 name ABC DE Fghij 
Klmnopr.tuv filetype 1 errors 4, no inode ref
root 261 inode 127095 errors 2001, no inode item, link count wrong
     unresolved ref dir 127080 index 13 namelen 17 name 
Whateverdir123456 filetype 2 errors 4, no inode ref
root 261 inode 127097 errors 2001, no inode item, link count wrong
     unresolved ref dir 127080 index 14 namelen 12 name 
WhateverDirectory2 filetype 2 errors 4, no inode ref
root 261 inode 127099 errors 2001, no inode item, link count wrong
     unresolved ref dir 127080 index 15 namelen 11 name AnyDir filetype 
2 errors 4, no inode ref
root 261 inode 127105 errors 2001, no inode item, link count wrong
     unresolved ref dir 127080 index 16 namelen 10 name AnotherDir 
filetype 2 errors 4, no inode ref
root 261 inode 127107 errors 2001, no inode item, link count wrong
     unresolved ref dir 127080 index 17 namelen 11 name Folder11 
filetype 2 errors 4, no inode ref
root 261 inode 127112 errors 2001, no inode item, link count wrong
     unresolved ref dir 126959 index 51 namelen 11 name Folder120 
filetype 2 errors 4, no inode ref
root 261 inode 127114 errors 2001, no inode item, link count wrong
     unresolved ref dir 126146 index 40 namelen 13 name GVC-dir filetype 
2 errors 4, no inode ref
root 261 inode 127396 errors 2001, no inode item, link count wrong
     unresolved ref dir 126146 index 41 namelen 4 name G3-dir filetype 2 
errors 4, no inode ref
root 261 inode 127527 errors 2001, no inode item, link count wrong
     unresolved ref dir 126146 index 42 namelen 11 name Hello Dir 2 
filetype 2 errors 4, no inode ref
root 261 inode 127535 errors 2001, no inode item, link count wrong
     unresolved ref dir 126146 index 43 namelen 4 name Hellodir filetype 
2 errors 4, no inode ref
root 261 inode 127573 errors 2001, no inode item, link count wrong
     unresolved ref dir 126146 index 44 namelen 6 name Hello 2 filetype 
2 errors 4, no inode ref
root 261 inode 127620 errors 2001, no inode item, link count wrong
[...]
root 261 inode 177273 errors 2001, no inode item, link count wrong
     unresolved ref dir 23439 index 23 namelen 24 name Firefox Setup 
51.0.1.exe filetype 1 errors 4, no inode ref
root 261 inode 177275 errors 2001, no inode item, link count wrong
     unresolved ref dir 23439 index 26 namelen 27 name Firefox Setup 
45.7.0esr.exe filetype 1 errors 4, no inode ref
root 261 inode 180457 errors 2001, no inode item, link count wrong
[...]
checking fs roots [o]
incorrect offsets 14927 14415
checking fs roots [.]
[...]
checking fs roots [o]
The following tree block(s) is corrupted in tree 263:
     tree block bytenr: 5242107641856, level: 0, node key: 
(5241902333952, 169, 0)
checking fs roots [o]
incorrect offsets 14927 14415
checking fs roots [O]
The following tree block(s) is corrupted in tree 6685:
     tree block bytenr: 5242107641856, level: 0, node key: 
(5241902333952, 169, 0)
checking fs roots [o]
checking fs roots [.]
incorrect offsets 14927 14415
The following tree block(s) is corrupted in tree 6879:
     tree block bytenr: 5242107641856, level: 0, node key: 
(5241902333952, 169, 0)
checking fs roots [o]
incorrect offsets 14927 14415
incorrect offsets 14927 14415
root 6893 inode 127094 errors 500, file extent discount, nbytes wrong
Found file extent holes:
     start: 0, len: 499712
     unresolved ref dir 127093 index 2 namelen 24 name ABC DE Fghij 
Klmnopr.tuv filetype 1 errors 4, no inode ref
root 6893 inode 127095 errors 2001, no inode item, link count wrong
     unresolved ref dir 127080 index 13 namelen 17 name 
Whateverdir123456 filetype 2 errors 4, no inode ref
root 6893 inode 127097 errors 2001, no inode item, link count wrong
[...]
root 6893 inode 177273 errors 2001, no inode item, link count wrong
     unresolved ref dir 23439 index 23 namelen 24 name Firefox Setup 
51.0.1.exe filetype 1 errors 4, no inode ref
root 6893 inode 177275 errors 2001, no inode item, link count wrong
     unresolved ref dir 23439 index 26 namelen 27 name Firefox Setup 
45.7.0esr.exe filetype 1 errors 4, no inode ref
incorrect offsets 14927 14415
incorrect offsets 14927 14415
root 6896 inode 127094 errors 500, file extent discount, nbytes wrong
Found file extent holes:
     start: 0, len: 499712
     unresolved ref dir 127093 index 2 namelen 24 name ABC DE Fghij 
Klmnopr.tuv filetype 1 errors 4, no inode ref
root 6896 inode 127095 errors 2001, no inode item, link count wrong
     unresolved ref dir 127080 index 13 namelen 17 name 
Whateverdir123456 filetype 2 errors 4, no inode ref
root 6896 inode 127097 errors 2001, no inode item, link count wrong
[...]
root 6896 inode 177273 errors 2001, no inode item, link count wrong
     unresolved ref dir 23439 index 23 namelen 24 name Firefox Setup 
51.0.1.exe filetype 1 errors 4, no inode ref
root 6896 inode 177275 errors 2001, no inode item, link count wrong
     unresolved ref dir 23439 index 26 namelen 27 name Firefox Setup 
45.7.0esr.exe filetype 1 errors 4, no inode ref
The following tree block(s) is corrupted in tree 6893:
     tree block bytenr: 5242107641856, level: 0, node key: 
(5241902333952, 169, 0)
The following tree block(s) is corrupted in tree 6896:
     tree block bytenr: 5242107641856, level: 0, node key: 
(5241902333952, 169, 0)

found 4042079987465 bytes used err is 1
total csum bytes: 0
total tree bytes: 179142656
total fs tree bytes: 0
total extent tree bytes: 176128000
btree space waste bytes: 48203815
file data blocks allocated: 1545338880
  referenced 1545338880


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BTRFS critical: corrupt leaf, slot offset bad; then read-only
  2017-02-22  7:44 ` Lukas Tribus
  2017-02-22 19:16   ` Lukas Tribus
@ 2017-02-22 19:40   ` Hans van Kranenburg
  2017-02-23 23:47     ` Lukas Tribus
  1 sibling, 1 reply; 9+ messages in thread
From: Hans van Kranenburg @ 2017-02-22 19:40 UTC (permalink / raw)
  To: Lukas Tribus, linux-btrfs

On 02/22/2017 08:44 AM, Lukas Tribus wrote:
> Upgrading to 4.8, the FS no longer causes a kernel calltrace and does
> not go read-only. It only shows the "corrupt leaf, slot offset bad"
> message.
> 
> A scrub completed without errors on 3 devices, while it was aborted on 2
> devices. Not sure why it was aborted, since there is no error message in
> dmesg?
> 
> Any suggestions why the scrub was aborted?

Maybe because of the "corrupt leaf" error.

> # uname -a
> Linux srv1-dom0 4.8.0-36-generic #36~16.04.1-Ubuntu SMP Sun Feb 5
> 09:39:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
> # btrfs scrub status /storage/users/
> scrub status for f50f980e-7640-49c7-bf8d-20d55cfe6005
>         scrub started at Wed Feb 22 00:07:33 2017 and was aborted after
> 06:35:42
>         total bytes scrubbed: 10.60TiB with 0 errors
> /# btrfs scrub status /storage/users/ -d
> scrub status for f50f980e-7640-49c7-bf8d-20d55cfe6005
> scrub device /dev/dm-5 (id 1) history
>         scrub started at Wed Feb 22 00:07:33 2017 and finished after
> 06:35:36
>         total bytes scrubbed: 2.30TiB with 0 errors
> scrub device /dev/dm-6 (id 2) history
>         scrub started at Wed Feb 22 00:07:33 2017 and finished after
> 06:35:30
>         total bytes scrubbed: 2.30TiB with 0 errors
> scrub device /dev/dm-7 (id 3) history
>         scrub started at Wed Feb 22 00:07:33 2017 and finished after
> 06:35:42
>         total bytes scrubbed: 2.30TiB with 0 errors
> scrub device /dev/dm-8 (id 4) history
>         scrub started at Wed Feb 22 00:07:33 2017 and was aborted after
> 05:01:37
>         total bytes scrubbed: 1.85TiB with 0 errors
> scrub device /dev/mapper/sde3_crypt (id 5) history
>         scrub started at Wed Feb 22 00:07:33 2017 and was aborted after
> 05:01:37
>         total bytes scrubbed: 1.85TiB with 0 errors
> #dmesg | grep BTRFS
> [  929.737119] BTRFS critical (device dm-9): corrupt leaf, slot offset
> bad: block=5242107641856,root=1, slot=39
> [19772.594129] BTRFS critical (device dm-9): corrupt leaf, slot offset
> bad: block=5242107641856,root=1, slot=39
> [19777.127704] BTRFS critical (device dm-9): corrupt leaf, slot offset
> bad: block=5242107641856,root=1, slot=39
> [19777.552191] BTRFS critical (device dm-9): corrupt leaf, slot offset
> bad: block=5242107641856,root=1, slot=39

Ok, this is not a csum failure, so probably not the disk giving other
data back than what was sent to it when doing the writes, or a disk
controller which corrupted the data while writing.

And, it's a metadata page, in which part of the entries do not make
sense any more to btrfs. Specifically, it's in root 1, which is the tree
which contains information about all other subtrees containing metadata,
so it's quite an important one.

So, the corruption which is now present in there likely happened in
memory before writing it out. This is also a scenario in which DUP or
RAIDx on disk doesn't help you, because in memory it's stored just once.

If this is a bitflip like thing in memory, it would probably be possible
to spot it and manually correct it (using a patched btrfschk with
bitflip patch, or manually by hexediting++).

Another option is memory corruption or a bug somewhere else in the
kernel, which lead to a memory address of a pointer being changed,
leading to a write to memory end up in the middle of some btrfs metadata
waiting to be checksummed and written to disk.

Question here is... is it easier for you to nuke the filesystem and
restore the files from somewhere else, or do you want to figure out
manually if it's recoverable, and spend some time with dd, hexedit,
reading struct definitions in btrfs kernel C code etc...

If the regular --repair can't fix it (and it can't do magic if you shoot
a hole in it with a shotgun), then there's no automated other tool that
can do it now.

Since it's block 5242107641856 all the time, it might be worthwhile to
have a look at it. Either it's that block, or there's a bigger mess
hidden behind it.

-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BTRFS critical: corrupt leaf, slot offset bad; then read-only
  2017-02-22 19:40   ` Hans van Kranenburg
@ 2017-02-23 23:47     ` Lukas Tribus
  2017-02-24  0:26       ` Hans van Kranenburg
  0 siblings, 1 reply; 9+ messages in thread
From: Lukas Tribus @ 2017-02-23 23:47 UTC (permalink / raw)
  To: Hans van Kranenburg, linux-btrfs

Hello Hans,


Am 22.02.2017 um 20:40 schrieb Hans van Kranenburg:
>
> Question here is... is it easier for you to nuke the filesystem and
> restore the files from somewhere else, or do you want to figure out
> manually if it's recoverable, and spend some time with dd, hexedit,
> reading struct definitions in btrfs kernel C code etc...
>
> If the regular --repair can't fix it (and it can't do magic if you shoot
> a hole in it with a shotgun), then there's no automated other tool that
> can do it now.
>
> Since it's block 5242107641856 all the time, it might be worthwhile to
> have a look at it. Either it's that block, or there's a bigger mess
> hidden behind it.
>

Thanks for all the inputs here and on IRC. I now have a good 
understanding of what can
and what cannot be done realistically.

The files are still fully readable and I'm going to backup as much data 
as I can over the
next few days.

Once that is done, I would like to go over the "btrfs recovery" thread 
and see if it can
be applied for my case as well. I will certainly need your help when 
that time comes...


Thanks for all your help,

Lukas


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BTRFS critical: corrupt leaf, slot offset bad; then read-only
  2017-02-23 23:47     ` Lukas Tribus
@ 2017-02-24  0:26       ` Hans van Kranenburg
  2017-03-05 22:50         ` Lukas Tribus
  0 siblings, 1 reply; 9+ messages in thread
From: Hans van Kranenburg @ 2017-02-24  0:26 UTC (permalink / raw)
  To: Lukas Tribus, linux-btrfs

On 02/24/2017 12:47 AM, Lukas Tribus wrote:
> Hello Hans,
> 
> 
> Am 22.02.2017 um 20:40 schrieb Hans van Kranenburg:
>>
>> Question here is... is it easier for you to nuke the filesystem and
>> restore the files from somewhere else, or do you want to figure out
>> manually if it's recoverable, and spend some time with dd, hexedit,
>> reading struct definitions in btrfs kernel C code etc...
>>
>> If the regular --repair can't fix it (and it can't do magic if you shoot
>> a hole in it with a shotgun), then there's no automated other tool that
>> can do it now.
>>
>> Since it's block 5242107641856 all the time, it might be worthwhile to
>> have a look at it. Either it's that block, or there's a bigger mess
>> hidden behind it.
>>
> 
> Thanks for all the inputs here and on IRC. I now have a good
> understanding of what can
> and what cannot be done realistically.
> 
> The files are still fully readable and I'm going to backup as much data
> as I can over the
> next few days.
> 
> Once that is done, I would like to go over the "btrfs recovery" thread
> and see if it can
> be applied for my case as well. I will certainly need your help when
> that time comes...

We can take a stab at it.

-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BTRFS critical: corrupt leaf, slot offset bad; then read-only
  2017-02-24  0:26       ` Hans van Kranenburg
@ 2017-03-05 22:50         ` Lukas Tribus
  2017-03-07 14:12           ` Hans van Kranenburg
  0 siblings, 1 reply; 9+ messages in thread
From: Lukas Tribus @ 2017-03-05 22:50 UTC (permalink / raw)
  To: Hans van Kranenburg, linux-btrfs

Hello Hans,


Am 24.02.2017 um 01:26 schrieb Hans van Kranenburg:
>
>> Once that is done, I would like to go over the "btrfs recovery" thread
>> and see if it can
>> be applied for my case as well. I will certainly need your help when
>> that time comes...
> We can take a stab at it.
>

I upgraded btrfs-tools to 4.8.1 as 4.4 didn't have btrfs 
inspect-internal dump-tree.
But I cannot find anything about 5242107641856 in the dump-tree output.

What does that mean?




Thanks,
Lukas


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BTRFS critical: corrupt leaf, slot offset bad; then read-only
  2017-03-05 22:50         ` Lukas Tribus
@ 2017-03-07 14:12           ` Hans van Kranenburg
  2017-03-07 19:46             ` Lukas Tribus
  0 siblings, 1 reply; 9+ messages in thread
From: Hans van Kranenburg @ 2017-03-07 14:12 UTC (permalink / raw)
  To: Lukas Tribus, linux-btrfs

On 03/05/2017 11:50 PM, Lukas Tribus wrote:
> 
> Am 24.02.2017 um 01:26 schrieb Hans van Kranenburg:
>>
>>> Once that is done, I would like to go over the "btrfs recovery" thread
>>> and see if it can
>>> be applied for my case as well. I will certainly need your help when
>>> that time comes...
>> We can take a stab at it.
> 
> I upgraded btrfs-tools to 4.8.1 as 4.4 didn't have btrfs
> inspect-internal dump-tree.
> But I cannot find anything about 5242107641856 in the dump-tree output.
> 
> What does that mean?

I have no idea. It probably means it's gone. Did you use the filesystem
read/write? Are the symptoms also gone?

-- 
Hans van Kranenburg

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: BTRFS critical: corrupt leaf, slot offset bad; then read-only
  2017-03-07 14:12           ` Hans van Kranenburg
@ 2017-03-07 19:46             ` Lukas Tribus
  0 siblings, 0 replies; 9+ messages in thread
From: Lukas Tribus @ 2017-03-07 19:46 UTC (permalink / raw)
  To: Hans van Kranenburg, linux-btrfs


Am 07.03.2017 um 15:12 schrieb Hans van Kranenburg:
> On 03/05/2017 11:50 PM, Lukas Tribus wrote:
>>
>> I upgraded btrfs-tools to 4.8.1 as 4.4 didn't have btrfs
>> inspect-internal dump-tree.
>> But I cannot find anything about 5242107641856 in the dump-tree output.
>>
>> What does that mean?
> I have no idea. It probably means it's gone. Did you use the filesystem
> read/write? Are the symptoms also gone?
>

Well I read basically everything and copied it to other drivers. Nothing 
appears corrupted
from what I can tell. I didn't write to the pool consciously, although I 
did not mount it
readonly either not that I'm thinking about it ...


btrfs check --readonly reports block corruption (and a number of "no 
inode ref" in files/folders):

Checking filesystem on /dev/mapper/sda3_crypt
UUID: f50f980e-7640-49c7-bf8d-20d55cfe6005
The following tree block(s) is corrupted in tree 261:
         tree block bytenr: 5242107641856, level: 0, node key: 
(5241902333952, 169, 0)
The following tree block(s) is corrupted in tree 263:
         tree block bytenr: 5242107641856, level: 0, node key: 
(5241902333952, 169, 0)
The following tree block(s) is corrupted in tree 6685:
         tree block bytenr: 5242107641856, level: 0, node key: 
(5241902333952, 169, 0)
The following tree block(s) is corrupted in tree 6879:
         tree block bytenr: 5242107641856, level: 0, node key: 
(5241902333952, 169, 0)
The following tree block(s) is corrupted in tree 6893:
         tree block bytenr: 5242107641856, level: 0, node key: 
(5241902333952, 169, 0)
The following tree block(s) is corrupted in tree 6896:
         tree block bytenr: 5242107641856, level: 0, node key: 
(5241902333952, 169, 0)
found 4080263675904 bytes used err is 1
total csum bytes: 0
total tree bytes: 181780480
total fs tree bytes: 0
total extent tree bytes: 178765824
btree space waste bytes: 49102341
file data blocks allocated: 1545338880
  referenced 1545338880


Not sure how btrfs check finds a corrupted block that doesn't appear in 
the dump-tree output.


And I had an additional stack trace on the new btrfs pool I was copying 
the data to:

[873067.780479] BTRFS error (device sdf3): bdev /dev/sdf3 errs: wr 0, rd 
1, flush 0, corrupt 0, gen 0
[873067.790639] BTRFS error (device sdf3): bdev /dev/sdf3 errs: wr 0, rd 
2, flush 0, corrupt 0, gen 0
[873067.800708] ------------[ cut here ]------------
[873067.800727] WARNING: CPU: 3 PID: 12942 at 
/build/linux-hwe-6_oOe5/linux-hwe-4.8.0/fs/btrfs/extent-tree.c:6954 
__btrfs_free_extent.isra.71+0x2cb/0xcc0 [btrfs]
[873067.800730] BTRFS: Transaction aborted (error -5)
[873067.800731] Modules linked in: ufs qnx4 hfsplus hfs minix ntfs msdos 
jfs xfs algif_skcipher af_alg xen_gntdev xen_evtchn xenfs xen_privcmd 
dm_crypt intel_rapl x86_pkg_temp_thermal intel_powerclamp nls_iso8859_1 
coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel bridge stp 
llc intel_rapl_perf serio_raw lpc_ich joydev shpchp nuvoton_cir 
input_leds mei_me mei rc_core mac_hid ie31200_edac edac_core ib_iser 
rdma_cm iw_cm ib_cm ib_core configfs iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi autofs4 uas usb_storage btrfs raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid mxm_wmi 
aesni_intel aes_x86_64 i915 glue_helper lrw i2c_algo_bit ablk_helper tg3 
cryptd drm_kms_helper syscopyarea sysfillrect
[873067.800782]  firewire_ohci ptp sysimgblt psmouse firewire_core 
fb_sys_fops crc_itu_t pps_core ahci drm libahci wmi fjes video
[873067.800791] CPU: 3 PID: 12942 Comm: screen Tainted: G W       
4.8.0-39-generic #42~16.04.1-Ubuntu
[873067.800791] Hardware name: To Be Filled By O.E.M. To Be Filled By 
O.E.M./Z77 Extreme6, BIOS P2.80 07/01/2013
[873067.800793]  0000000000000200 00000000f56bf709 ffff880259f1f908 
ffffffff8142e043
[873067.800795]  ffff880259f1f958 0000000000000000 ffff880259f1f948 
ffffffff8108313b
[873067.800797]  00001b2a59f1faa0 00000000fffffffb 000001cda76bc000 
ffff8802a9fe0d20
[873067.800798] Call Trace:
[873067.800803]  [<ffffffff8142e043>] dump_stack+0x63/0x90
[873067.800805]  [<ffffffff8108313b>] __warn+0xcb/0xf0
[873067.800807]  [<ffffffff810831bf>] warn_slowpath_fmt+0x5f/0x80
[873067.800821]  [<ffffffffc03e22ab>] 
__btrfs_free_extent.isra.71+0x2cb/0xcc0 [btrfs]
[873067.800836]  [<ffffffffc04542af>] ? 
btrfs_merge_delayed_refs+0x8f/0x6a0 [btrfs]
[873067.800846]  [<ffffffffc03e7070>] 
__btrfs_run_delayed_refs+0xb10/0x12c0 [btrfs]
[873067.800857]  [<ffffffff811ad938>] ? set_page_dirty+0x58/0xb0
[873067.800869]  [<ffffffffc0428198>] ? 
set_extent_buffer_dirty+0x78/0xd0 [btrfs]
[873067.800879]  [<ffffffffc03ea8de>] btrfs_run_delayed_refs+0x8e/0x2b0 
[btrfs]
[873067.800890]  [<ffffffffc03fefee>] commit_cowonly_roots+0xae/0x300 
[btrfs]
[873067.800901]  [<ffffffffc0470fa4>] ? 
btrfs_qgroup_account_extents+0x84/0x180 [btrfs]
[873067.800911]  [<ffffffffc0401c33>] 
btrfs_commit_transaction+0x573/0xb00 [btrfs]
[873067.800920]  [<ffffffffc040225e>] ? start_transaction+0x9e/0x4c0 [btrfs]
[873067.800930]  [<ffffffffc03fa38f>] btrfs_commit_super+0x8f/0xa0 [btrfs]
[873067.800939]  [<ffffffffc03fc577>] close_ctree+0x2b7/0x360 [btrfs]
[873067.800947]  [<ffffffffc03cbf29>] btrfs_put_super+0x19/0x20 [btrfs]
[873067.800949]  [<ffffffff8123553f>] generic_shutdown_super+0x6f/0x100
[873067.800950]  [<ffffffff81235852>] kill_anon_super+0x12/0x20
[873067.800966]  [<ffffffffc03ccde8>] btrfs_kill_super+0x18/0x110 [btrfs]
[873067.800968]  [<ffffffff81235a23>] deactivate_locked_super+0x43/0x70
[873067.800969]  [<ffffffff81235efc>] deactivate_super+0x5c/0x60
[873067.800971]  [<ffffffff8125544f>] cleanup_mnt+0x3f/0x90
[873067.800972]  [<ffffffff812554e2>] __cleanup_mnt+0x12/0x20
[873067.800974]  [<ffffffff810a22fe>] task_work_run+0x7e/0xa0
[873067.800975]  [<ffffffff81087001>] do_exit+0x2d1/0xb50
[873067.800977]  [<ffffffff8106b4f5>] ? __do_page_fault+0x265/0x4e0
[873067.800978]  [<ffffffff81087903>] do_group_exit+0x43/0xb0
[873067.800979]  [<ffffffff81087984>] SyS_exit_group+0x14/0x20
[873067.800980]  [<ffffffff8189b7f6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
[873067.800981] ---[ end trace b0a630aaaf9a5946 ]---
[873067.800983] BTRFS: error (device sdf3) in __btrfs_free_extent:6954: 
errno=-5 IO failure
[873067.810114] BTRFS info (device sdf3): forced readonly
[873067.810116] BTRFS: error (device sdf3) in 
btrfs_run_delayed_refs:2960: errno=-5 IO failure
[873067.819481] BTRFS warning (device sdf3): Skipping commit of aborted 
transaction.
[873067.819482] BTRFS: error (device sdf3) in cleanup_transaction:1854: 
errno=-5 IO failure
[873067.828668] BTRFS error (device sdf3): commit super ret -5
[873067.835748] BTRFS error (device sdf3): cleaner transaction attach 
returned -30



I guess its time to rebuild this FS from scratch, unless you have a 
better idea?




Thanks, much appreciated,
Lukas


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-03-07 20:19 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-21 14:12 BTRFS critical: corrupt leaf, slot offset bad; then read-only Lukas Tribus
2017-02-22  7:44 ` Lukas Tribus
2017-02-22 19:16   ` Lukas Tribus
2017-02-22 19:40   ` Hans van Kranenburg
2017-02-23 23:47     ` Lukas Tribus
2017-02-24  0:26       ` Hans van Kranenburg
2017-03-05 22:50         ` Lukas Tribus
2017-03-07 14:12           ` Hans van Kranenburg
2017-03-07 19:46             ` Lukas Tribus

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.