All of lore.kernel.org
 help / color / mirror / Atom feed
* errors found in extent allocation tree or chunk allocation
@ 2023-01-10 12:49 Frankie Fisher
  2023-01-12 22:59 ` Frankie Fisher
  0 siblings, 1 reply; 2+ messages in thread
From: Frankie Fisher @ 2023-01-10 12:49 UTC (permalink / raw)
  To: linux-btrfs

Hi,

I upgraded a box's kernel from 5.4 to 5.15 then restarted it. The box had been up for 2 months before the restart and after the restart the btrfs filesystem wouldn't mount. I suppose there are two possibilities - the issue occurred during the 2 months of uptime or as a consequence of starting up with the newer kernel.

uname -a:

Linux basie 5.4.0-136-generic #153-Ubuntu SMP Thu Nov 24 15:56:58 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Linux basie 5.15.0-57-generic #63~20.04.1-Ubuntu SMP Wed Nov 30 13:40:16 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux 

I first restarted with the older kernel 5.4 and the problem recurred. dmesg output (filtered for btrfs/BTRFS) is similar with both kernels:



[    4.607811] Btrfs loaded, crc32c=crc32c-intel, zoned=yes, fsverity=yes
[   22.257868] BTRFS: device fsid 0f4a1bba-fbd1-4007-88f8-5c288a8eb161 devid 11 transid 4797718 /dev/sdh2 scanned by btrfs (561)
[   22.257977] BTRFS: device fsid 0f4a1bba-fbd1-4007-88f8-5c288a8eb161 devid 8 transid 4797718 /dev/sdg2 scanned by btrfs (561)
[   22.258313] BTRFS: device fsid 0f4a1bba-fbd1-4007-88f8-5c288a8eb161 devid 10 transid 4797718 /dev/sdf2 scanned by btrfs (561)
[   22.258420] BTRFS: device fsid 0f4a1bba-fbd1-4007-88f8-5c288a8eb161 devid 7 transid 4797718 /dev/sde2 scanned by btrfs (561)
[   22.258531] BTRFS: device fsid 0f4a1bba-fbd1-4007-88f8-5c288a8eb161 devid 9 transid 4797718 /dev/sdd2 scanned by btrfs (561)
[   29.581350] BTRFS info (device sde2): disk space caching is enabled
[   31.414167] BTRFS info (device sde2): bdev /dev/sde2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
[   33.735212] BTRFS critical (device sde2): corrupt leaf: block=30874802077696 slot=176 extent bytenr=21866556121088 len=4096 previous extent [21866556112896 168 4503599627378688] overlaps current extent [21866556121088 168 4096]
[   33.735234] BTRFS error (device sde2): block=30874802077696 read time tree block corruption detected
[   33.751471] BTRFS critical (device sde2): corrupt leaf: block=30874802077696 slot=176 extent bytenr=21866556121088 len=4096 previous extent [21866556112896 168 4503599627378688] overlaps current extent [21866556121088 168 4096]
[   33.751484] BTRFS error (device sde2): block=30874802077696 read time tree block corruption detected
[   33.751517] BTRFS error (device sde2): failed to read block groups: -5
[   33.757126] BTRFS error (device sde2): open_ctree failed

I ran btrfs check with btrfs-progs v5.4.1

Checking filesystem on /dev/sde2
UUID: 0f4a1bba-fbd1-4007-88f8-5c288a8eb161
[1/7] checking root items

[2/7] checking extents
ref mismatch on [21866556112896 4503599627378688] extent item 0, found 1
backref bytes do not match extent backref, bytenr=21866556112896, ref bytes=4503599627378688, backref bytes=8192
backpointer mismatch on [21866556112896 4503599627378688]
extent item 22704514924544 has multiple extent items
ref mismatch on [28106103517184 8192] extent item 4503599627370497, found 1
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
there is no free space entry for 4525466183491584-21866556121088
cache appears valid but isn't 21865483468800
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 4511516231163904 bytes used, error(s) found
total csum bytes: 7142228136
total tree bytes: 11304239104
total fs tree bytes: 3378511872
total extent tree bytes: 386826240
btree space waste bytes: 930753844
file data blocks allocated: 28547414216704
 referenced 7990763888640


I also installed btrfs-progs v6.1.2 and the outputi was similar, other than section [3/7]:

[3/7] checking free space cache
There are still entries left in the space cache
cache appears valid but isn't 21866557210624
There are still entries left in the space cache
cache appears valid but isn't 21867630952448
.... (similar lines removed)



Any suggestions to recover this filesystem are gratefully received!

Cheers,

Frankie Fisher

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: errors found in extent allocation tree or chunk allocation
  2023-01-10 12:49 errors found in extent allocation tree or chunk allocation Frankie Fisher
@ 2023-01-12 22:59 ` Frankie Fisher
  0 siblings, 0 replies; 2+ messages in thread
From: Frankie Fisher @ 2023-01-12 22:59 UTC (permalink / raw)
  To: linux-btrfs

On Tue, 10 Jan 2023, at 12:49 PM, Frankie Fisher wrote:

> [   33.735212] BTRFS critical (device sde2): corrupt leaf: block=30874802077696 slot=176 extent bytenr=21866556121088 len=4096 previous extent [21866556112896 168 4503599627378688] overlaps current extent [21866556121088 168 4096]


> [2/7] checking extents
> ref mismatch on [21866556112896 4503599627378688] extent item 0, found 1
> backref bytes do not match extent backref, bytenr=21866556112896, ref bytes=4503599627378688, backref bytes=8192
> backpointer mismatch on [21866556112896 4503599627378688]
> extent item 22704514924544 has multiple extent items
> ref mismatch on [28106103517184 8192] extent item 4503599627370497, found 1

Based on the dmesg and btrfs check excerpts above, my research has led me to conclude that the likely cause of the corruption was a bit flip in the recorded length of an extent. This triggers the "previous extent overlaps current extent" kernel message, as the previous extent length is recorded as exactly 4PiB + 8192B. The gap between the two extents in the corrupt leaf kernel message is 8192B. And the btrfs check output backref bytes are listed as 8192B. So 
all of this points to a bitflip in memory before this part of the tree was written to disc.

The output of dump-tree puts the above in context:

        item 174 key (21866556104704 EXTENT_ITEM 8192) itemoff 7024 itemsize 53
                refs 1 gen 2228553 flags DATA
                extent data backref root 258 objectid 3633423 offset 0 count 1
        item 175 key (21866556112896 EXTENT_ITEM 4503599627378688) itemoff 6971 itemsize 53
                refs 1 gen 2228553 flags DATA
                extent data backref root 258 objectid 3633429 offset 0 count 1
        item 176 key (21866556121088 EXTENT_ITEM 4096) itemoff 6918 itemsize 53
                refs 1 gen 2228553 flags DATA
                extent data backref root 258 objectid 3633434 offset 0 count 1


I have run memtest86+ for some time which has demonstrated that if the RAM is faulty, it's a rare fault, so I feel hopeful that most/all of the rest of the data on the filesystem is intact.

In theory then, I can fix the filesystem by unflipping this bit (easy), and then updating the checksum in the csum tree (slightly more complicated but doable). I'm planning then to cobble together a programme based on some of the code in btrfsprogs to update data on my disc. Running "btrfs check --repair" seems an uncertain option to me as I don't know exactly what changes it might make to the disc, while I have a good idea of the changes I want to make to the btrfs structure.

My questions are:

* does this approach sound workable?
* are there any pitfalls that I might naively run into?
* are there any tools or libraries that will do some/all of this fix already? Or is there a simpler approach?
* are there any other things I should check in the filesystem structure before I plough on with my attempted repair?

Regards,

Frankie

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2023-01-12 22:59 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-10 12:49 errors found in extent allocation tree or chunk allocation Frankie Fisher
2023-01-12 22:59 ` Frankie Fisher

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.