All of lore.kernel.org
 help / color / mirror / Atom feed
* BTRFS critical: corrupt leaf; BTRFS warning csum failed, expected csum 0x00000000 on AMD Ryzen 7 4800H, Samsung SSD 970 EVO Plus
@ 2021-09-05  7:34 ahipp0
  2021-09-06  1:08 ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: ahipp0 @ 2021-09-05  7:34 UTC (permalink / raw)
  To: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 19873 bytes --]

Hi!

I started having various fun BTRFS warnings/errors/critical messages all of a sudden
after downloading and extracting linux-5.14.1.tar.xz on a fairly new (~1TiB read/written) Samsung SSD 970 EVO Plus 500GB.

The laptop was resumed from suspend-to-disk ~30 minutes prior to that, I think.

Hardware:
TUXEDO Pulse 14 - Gen1
CPU: AMD Ryzen 7 4800H 

RAM: 32GiB
Disk: Samsung SSD 970 EVO Plus 500GB
Distro: Kubuntu 20.04.2
Kernel: 5.11.x

BTRFS is used on top of a regular GPT partition.
(luckily, not the rootfs, otherwise I wouldn't be able to write this email that easily)
No LVM, no dm-crypt.

Please, see more information below.
Full kernel log for the past day is attached.

In the past, I also noticed odd things like ldconfig hanging or not picking up updated libraries are suspend-to-disk.
Simply rebooting helped in such cases.
The swap partition is on the same disk. (as a separate partition, not a file)

I also started using a new power profile recently, which disables half of the CPU cores when on battery power.
(but hibernation also offlines all non-boot CPUs while preparing for suspend-to-disk)

What could have caused the filesystem corruption?
Is there a way to repair the filesystem?
How safe is it to continue using this particular filesystem after/if it's repaired on this drive?
How safe is it to keep using BTRFS on this drive going forward (even after creating a new filesystem)?

I've backed up important files,
so I'll be glad to try various suggestions.
Also, I'll keep using ext4 on this drive for now and will keep an eye on it.

I think I was able to resolve the "corrupt leaf" issue by deleting affected files
(the Linux kernel sources I was unpacking while I hit the issue),
because "btrfs ins logical-resolve" can't find the file anymore:
$ btrfs ins logical-resolve 1376043008 /mnt/hippo/
ERROR: logical ino ioctl: No such file or directory

However, checksum and "btrfs check" errors make me seriously worried.

This is the earliest BTRFS warning I see in the logs:

Sep  4 14:04:51 hippo-tuxedo kernel: [   19.338196] BTRFS warning (device nvme0n1p4): block group 2169503744 has wrong amount of free space
Sep  4 14:04:51 hippo-tuxedo kernel: [   19.338202] BTRFS warning (device nvme0n1p4): failed to load free space cache for block group 2169503744, rebuilding it now


Here's the first "corrupt leaf" error:

Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151911] BTRFS critical (device nvme0n1p4): corrupt leaf: root=2 block=1376043008 slot=7 bg_start=2169503744 bg_len=1073741824, invalid block group used, have 1073790976 expect [0, 1073741824)
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151925] BTRFS info (device nvme0n1p4): leaf 1376043008 gen 24254 total ptrs 121 free space 6994 owner 2
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151929]     item 0 key (2169339904 169 0) itemoff 16250 itemsize 33
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151932]             extent refs 1 gen 20692 flags 2
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151933]             ref#0: tree block backref root 7
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151936]     item 1 key (2169356288 169 0) itemoff 16217 itemsize 33
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151938]             extent refs 1 gen 20692 flags 2
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151939]             ref#0: tree block backref root 7
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151940]     item 2 key (2169372672 169 0) itemoff 16184 itemsize 33
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151942]             extent refs 1 gen 20692 flags 2
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151943]             ref#0: tree block backref root 7
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151945]     item 3 key (2169405440 169 0) itemoff 16151 itemsize 33
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151946]             extent refs 1 gen 20692 flags 2
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151947]             ref#0: tree block backref root 7
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151949]     item 4 key (2169421824 169 0) itemoff 16118 itemsize 33
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151950]             extent refs 1 gen 20692 flags 2
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151951]             ref#0: tree block backref root 7
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151953]     item 5 key (2169470976 169 0) itemoff 16085 itemsize 33
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151954]             extent refs 1 gen 24164 flags 2
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151955]             ref#0: tree block backref root 2
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151957]     item 6 key (2169503744 168 16429056) itemoff 16032 itemsize 53
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151959]             extent refs 1 gen 47 flags 1
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151960]             ref#0: extent data backref root 257 objectid 20379 offset 0 count 1
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151962]     item 7 key (2169503744 192 1073741824) itemoff 16008 itemsize 24
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151964]             block group used 1073790976 chunk_objectid 256 flags 1
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151966]     item 8 key (2185932800 168 241664) itemoff 15955 itemsize 53
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151968]             extent refs 1 gen 47 flags 1
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151969]             ref#0: extent data backref root 257 objectid 20417 offset 0 count 1
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151971]     item 9 key (2186174464 168 299008) itemoff 15902 itemsize 53
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151973]             extent refs 1 gen 47 flags 1
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151974]             ref#0: extent data backref root 257 objectid 20418 offset 0 count 1
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151976]     item 10 key (2186473472 168 135168) itemoff 15849 itemsize 53
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151977]             extent refs 1 gen 47 flags 1
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151978]             ref#0: extent data backref root 257 objectid 20419 offset 0 count 1
...
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.152480]     item 120 key (2195210240 168 4096) itemoff 10019 itemsize 53
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.152481]             extent refs 1 gen 47 flags 1
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.152482]             ref#0: extent data backref root 257 objectid 20558 offset 0 count 1
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.152484] BTRFS error (device nvme0n1p4): block=1376043008 write time tree block corruption detected
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.152661] BTRFS: error (device nvme0n1p4) in btrfs_commit_transaction:2339: errno=-5 IO failure (Error while writing out transaction)
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.152663] BTRFS info (device nvme0n1p4): forced readonly
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.152664] BTRFS warning (device nvme0n1p4): Skipping commit of aborted transaction.
Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.152665] BTRFS: error (device nvme0n1p4) in cleanup_transaction:1939: errno=-5 IO failure


I hit these csum 0x00000000 errors while trying to backup the files to ext4 partition on the same disk:

Sep  5 00:12:26 hippo-tuxedo kernel: [  891.475516] BTRFS info (device nvme0n1p4): disk space caching is enabled
Sep  5 00:12:26 hippo-tuxedo kernel: [  891.475523] BTRFS info (device nvme0n1p4): has skinny extents
Sep  5 00:12:26 hippo-tuxedo kernel: [  891.494832] BTRFS info (device nvme0n1p4): enabling ssd optimizations
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.627577] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111845888, 3111849984)
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.627805] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5726208 csum 0x55271056 expected csum 0x00000000 mirror 1
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.627814] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.628316] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3112013824, 3112017920)
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.628931] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3112058880, 3112062976)
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.628943] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3112083456, 3112087552)
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.629210] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5894144 csum 0x45d7e010 expected csum 0x00000000 mirror 1
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.629214] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.629238] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5963776 csum 0x95b8b716 expected csum 0x00000000 mirror 1
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.630311] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.648130] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111845888, 3111849984)
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.648226] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5726208 csum 0x55271056 expected csum 0x00000000 mirror 1
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.648234] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.649275] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111845888, 3111849984)
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.649353] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5726208 csum 0x55271056 expected csum 0x00000000 mirror 1
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.649357] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.650397] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111845888, 3111849984)
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.650475] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5726208 csum 0x55271056 expected csum 0x00000000 mirror 1
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.650478] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.678142] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111124992, 3111129088)
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.678149] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111276544, 3111280640)
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.678151] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111346176, 3111350272)
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.680593] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 32063 off 39395328 csum 0xeeab29ce expected csum 0x00000000 mirror 1
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.680604] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.686438] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 32063 off 39395328 csum 0xeeab29ce expected csum 0x00000000 mirror 1
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.686449] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.687671] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 32063 off 39395328 csum 0xeeab29ce expected csum 0x00000000 mirror 1
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.687683] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.688871] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 32063 off 39395328 csum 0xeeab29ce expected csum 0x00000000 mirror 1
Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.688876] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
Sep  5 00:17:05 hippo-tuxedo kernel: [ 1170.527686] BTRFS warning (device nvme0n1p4): block group 2169503744 has wrong amount of free space
Sep  5 00:17:05 hippo-tuxedo kernel: [ 1170.527695] BTRFS warning (device nvme0n1p4): failed to load free space cache for block group 2169503744, rebuilding it now


$ uname -a
Linux hippo-tuxedo 5.11.0-27-generic #29~20.04.1-Ubuntu SMP Wed Aug 11 15:58:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

$ btrfs --version
btrfs-progs v5.4.1

$ btrfs fi show
Label: 'HIPPO'  uuid: 2b69016b-e03b-478a-84cd-f794eddfebd5
Total devices 1 FS bytes used 66.32GiB
devid    1 size 256.00GiB used 95.02GiB path /dev/nvme0n1p4

$ btrfs fi df /mnt/hippo/
Data, single: total=94.01GiB, used=66.12GiB
System, single: total=4.00MiB, used=16.00KiB
Metadata, single: total=1.01GiB, used=203.09MiB
GlobalReserve, single: total=94.59MiB, used=0.00B

$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=20.04
DISTRIB_CODENAME=focal
DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"

Mount options:
relatime,ssd,space_cache,subvolid=5,subvol=andrey

$ btrfs check --readonly /dev/nvme0n1p4
Opening filesystem to check...
Checking filesystem on /dev/nvme0n1p4
UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
[1/7] checking root items
[2/7] checking extents
extent item 3109511168 has multiple extent items
ref mismatch on [3109511168 2105344] extent item 1, found 5
backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111489536
backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111436288
backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=16384
backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111260160
backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111411712
backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=12288
backpointer mismatch on [3109511168 2105344]
extent item 3111616512 has multiple extent items
ref mismatch on [3111616512 638976] extent item 25, found 26
backref disk bytenr does not match extent record, bytenr=3111616512, ref bytenr=3112091648
backref bytes do not match extent backref, bytenr=3111616512, ref bytes=638976, backref bytes=8192
backpointer mismatch on [3111616512 638976]
extent item 3121950720 has multiple extent items
ref mismatch on [3121950720 2220032] extent item 1, found 4
backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124080640
backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124051968
backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=12288
backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3123773440
backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
backpointer mismatch on [3121950720 2220032]
extent item 3124252672 has multiple extent items
ref mismatch on [3124252672 208896] extent item 12, found 13
backref disk bytenr does not match extent record, bytenr=3124252672, ref bytenr=3124428800
backref bytes do not match extent backref, bytenr=3124252672, ref bytes=208896, backref bytes=12288
backpointer mismatch on [3124252672 208896]
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
block group 2169503744 has wrong amount of free space, free space cache has 10440704 block group has 10346496
ERROR: free space cache has more free space than block group item, this could leads to serious corruption, please contact btrfs developers
failed to load free space cache for block group 2169503744
[4/7] checking fs roots
root 257 inode 31924 errors 1000, some csum missing
ERROR: errors found in fs roots
found 71205822464 bytes used, error(s) found
total csum bytes: 69299516
total tree bytes: 212975616
total fs tree bytes: 113672192
total extent tree bytes: 14909440
btree space waste bytes: 42172819
file data blocks allocated: 86083526656
 referenced 70815563776

$ smartctl --all /dev/nvme0
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.11.0-27-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 500GB
Serial Number:                      S4EVNX0NB29088Y
Firmware Version:                   2B2QEXM7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 500,107,862,016 [500 GB]
Unallocated NVM Capacity:           0
Controller ID:                      4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          500,107,862,016 [500 GB]
Namespace 1 Utilization:            133,526,691,840 [133 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 5b01b07633
Local Time is:                      Sun Sep  5 03:08:29 2021 EDT
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     85 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
0 +     7.80W       -        -    0  0  0  0        0       0
1 +     6.00W       -        -    1  1  1  1        0       0
2 +     3.40W       -        -    2  2  2  2        0       0
3 -   0.0700W       -        -    3  3  3  3      210    1200
4 -   0.0100W       -        -    4  4  4  4     2000    8000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        33 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    1,263,056 [646 GB]
Data Units Written:                 1,381,709 [707 GB]
Host Read Commands:                 27,814,722
Host Write Commands:                29,580,959
Controller Busy Time:               37
Power Cycles:                       132
Power On Hours:                     47
Unsafe Shutdowns:                   13
Media and Data Integrity Errors:    0
Error Information Log Entries:      35
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               33 Celsius
Temperature Sensor 2:               30 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
No Errors Logged


Thank you,
Andrey

[-- Attachment #1.2: btrfs-errors-kern.log.xz --]
[-- Type: application/x-xz, Size: 36148 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS critical: corrupt leaf; BTRFS warning csum failed, expected csum 0x00000000 on AMD Ryzen 7 4800H, Samsung SSD 970 EVO Plus
  2021-09-05  7:34 BTRFS critical: corrupt leaf; BTRFS warning csum failed, expected csum 0x00000000 on AMD Ryzen 7 4800H, Samsung SSD 970 EVO Plus ahipp0
@ 2021-09-06  1:08 ` Qu Wenruo
  2021-09-06  2:35   ` ahipp0
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2021-09-06  1:08 UTC (permalink / raw)
  To: ahipp0, linux-btrfs



On 2021/9/5 下午3:34, ahipp0 wrote:
> Hi!
>
> I started having various fun BTRFS warnings/errors/critical messages all of a sudden
> after downloading and extracting linux-5.14.1.tar.xz on a fairly new (~1TiB read/written) Samsung SSD 970 EVO Plus 500GB. >
> The laptop was resumed from suspend-to-disk ~30 minutes prior to that, I think.
>
> Hardware:
> TUXEDO Pulse 14 - Gen1
> CPU: AMD Ryzen 7 4800H
>
> RAM: 32GiB
> Disk: Samsung SSD 970 EVO Plus 500GB
> Distro: Kubuntu 20.04.2
> Kernel: 5.11.x

I'm pretty sure you have used the btrfs partition for a while, as the
corrupted tree block would be rejected by kernel starting from v5.11.

Thus such corrupted tree block should not be written to disk, thus the
problem must be there for a while before you upgrading to v5.11 kernel.

>
> BTRFS is used on top of a regular GPT partition.
> (luckily, not the rootfs, otherwise I wouldn't be able to write this email that easily)
> No LVM, no dm-crypt.
>
> Please, see more information below.
> Full kernel log for the past day is attached.
>
> In the past, I also noticed odd things like ldconfig hanging or not picking up updated libraries are suspend-to-disk.
> Simply rebooting helped in such cases.
> The swap partition is on the same disk. (as a separate partition, not a file)
>
> I also started using a new power profile recently, which disables half of the CPU cores when on battery power.
> (but hibernation also offlines all non-boot CPUs while preparing for suspend-to-disk)
>
> What could have caused the filesystem corruption?

 From the dmesg, at least one block group for metadata is corrupted, it
may explain why one tree block can't be read, as it may point to some
invalid location thus got all zero.

I believe the problem exists way before v5.11.x, as in v5.11, btrfs has
the ability to detect tons of new problems and reject such incorrect
metadata from reaching disk.

Thus it should be a problem/bug caused by old kernel.

Furthermore, I didn't see any obvious bitflip, thus bad memory is less
possible.

> Is there a way to repair the filesystem?

As expected, from the btrfs-check output, extent tree is corrupted.

But thankfully, the data should be all safe.

So the first thing is to backup all your important data.

Then try "btrfs check --mode=lowmem" to get a more human readable error
report, and we can start from that to determine if it can be repaired.

But so far, I'm a little optimistic about a working repair.

> How safe is it to continue using this particular filesystem after/if it's repaired on this drive?

It's safe to mount it read-only.

It's not going to work well to read-write mount it, as btrfs will abort
transaction just as you already see in the dmesg.


Thankfully with more and more sanity check introduced in recent kernels,
btrfs can handle such corrupted fs without crashing the whole kernel.

So at least you can try to grab the data without crashing the kernel.

> How safe is it to keep using BTRFS on this drive going forward (even after creating a new filesystem)?

As long as you're using v5.11 and newer kernel, btrfs is very strict on
any data it writes back to disk, thus it can even detect quite some
memory bitflips.

So unless there is a proof of the SSD is bad, you're pretty safe to
continue use the disk.

And I don't see anything special related to the SSD, thus you're pretty
safe to go.

>
> I've backed up important files,
> so I'll be glad to try various suggestions.
> Also, I'll keep using ext4 on this drive for now and will keep an eye on it.
>
> I think I was able to resolve the "corrupt leaf" issue by deleting affected files

Nope, there is no way to solve it just using the btrfs kernel module.

Btrfs refuses to read such corrupted tree block at all, thus no way to
modify it.

Unless you're using a much older kernel, but then you lost all the new
sanity checks in v5.11, thus not recommended.

Thanks,
Qu

> (the Linux kernel sources I was unpacking while I hit the issue),
> because "btrfs ins logical-resolve" can't find the file anymore:
> $ btrfs ins logical-resolve 1376043008 /mnt/hippo/
> ERROR: logical ino ioctl: No such file or directory
>
> However, checksum and "btrfs check" errors make me seriously worried.
>
> This is the earliest BTRFS warning I see in the logs:
>
> Sep  4 14:04:51 hippo-tuxedo kernel: [   19.338196] BTRFS warning (device nvme0n1p4): block group 2169503744 has wrong amount of free space
> Sep  4 14:04:51 hippo-tuxedo kernel: [   19.338202] BTRFS warning (device nvme0n1p4): failed to load free space cache for block group 2169503744, rebuilding it now
>
>
> Here's the first "corrupt leaf" error:
>
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151911] BTRFS critical (device nvme0n1p4): corrupt leaf: root=2 block=1376043008 slot=7 bg_start=2169503744 bg_len=1073741824, invalid block group used, have 1073790976 expect [0, 1073741824)
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151925] BTRFS info (device nvme0n1p4): leaf 1376043008 gen 24254 total ptrs 121 free space 6994 owner 2
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151929]     item 0 key (2169339904 169 0) itemoff 16250 itemsize 33
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151932]             extent refs 1 gen 20692 flags 2
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151933]             ref#0: tree block backref root 7
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151936]     item 1 key (2169356288 169 0) itemoff 16217 itemsize 33
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151938]             extent refs 1 gen 20692 flags 2
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151939]             ref#0: tree block backref root 7
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151940]     item 2 key (2169372672 169 0) itemoff 16184 itemsize 33
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151942]             extent refs 1 gen 20692 flags 2
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151943]             ref#0: tree block backref root 7
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151945]     item 3 key (2169405440 169 0) itemoff 16151 itemsize 33
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151946]             extent refs 1 gen 20692 flags 2
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151947]             ref#0: tree block backref root 7
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151949]     item 4 key (2169421824 169 0) itemoff 16118 itemsize 33
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151950]             extent refs 1 gen 20692 flags 2
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151951]             ref#0: tree block backref root 7
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151953]     item 5 key (2169470976 169 0) itemoff 16085 itemsize 33
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151954]             extent refs 1 gen 24164 flags 2
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151955]             ref#0: tree block backref root 2
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151957]     item 6 key (2169503744 168 16429056) itemoff 16032 itemsize 53
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151959]             extent refs 1 gen 47 flags 1
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151960]             ref#0: extent data backref root 257 objectid 20379 offset 0 count 1
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151962]     item 7 key (2169503744 192 1073741824) itemoff 16008 itemsize 24
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151964]             block group used 1073790976 chunk_objectid 256 flags 1
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151966]     item 8 key (2185932800 168 241664) itemoff 15955 itemsize 53
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151968]             extent refs 1 gen 47 flags 1
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151969]             ref#0: extent data backref root 257 objectid 20417 offset 0 count 1
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151971]     item 9 key (2186174464 168 299008) itemoff 15902 itemsize 53
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151973]             extent refs 1 gen 47 flags 1
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151974]             ref#0: extent data backref root 257 objectid 20418 offset 0 count 1
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151976]     item 10 key (2186473472 168 135168) itemoff 15849 itemsize 53
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151977]             extent refs 1 gen 47 flags 1
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.151978]             ref#0: extent data backref root 257 objectid 20419 offset 0 count 1
> ...
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.152480]     item 120 key (2195210240 168 4096) itemoff 10019 itemsize 53
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.152481]             extent refs 1 gen 47 flags 1
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.152482]             ref#0: extent data backref root 257 objectid 20558 offset 0 count 1
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.152484] BTRFS error (device nvme0n1p4): block=1376043008 write time tree block corruption detected
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.152661] BTRFS: error (device nvme0n1p4) in btrfs_commit_transaction:2339: errno=-5 IO failure (Error while writing out transaction)
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.152663] BTRFS info (device nvme0n1p4): forced readonly
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.152664] BTRFS warning (device nvme0n1p4): Skipping commit of aborted transaction.
> Sep  4 23:44:25 hippo-tuxedo kernel: [ 9855.152665] BTRFS: error (device nvme0n1p4) in cleanup_transaction:1939: errno=-5 IO failure
>
>
> I hit these csum 0x00000000 errors while trying to backup the files to ext4 partition on the same disk:
>
> Sep  5 00:12:26 hippo-tuxedo kernel: [  891.475516] BTRFS info (device nvme0n1p4): disk space caching is enabled
> Sep  5 00:12:26 hippo-tuxedo kernel: [  891.475523] BTRFS info (device nvme0n1p4): has skinny extents
> Sep  5 00:12:26 hippo-tuxedo kernel: [  891.494832] BTRFS info (device nvme0n1p4): enabling ssd optimizations
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.627577] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111845888, 3111849984)
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.627805] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5726208 csum 0x55271056 expected csum 0x00000000 mirror 1
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.627814] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.628316] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3112013824, 3112017920)
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.628931] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3112058880, 3112062976)
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.628943] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3112083456, 3112087552)
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.629210] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5894144 csum 0x45d7e010 expected csum 0x00000000 mirror 1
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.629214] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.629238] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5963776 csum 0x95b8b716 expected csum 0x00000000 mirror 1
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.630311] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.648130] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111845888, 3111849984)
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.648226] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5726208 csum 0x55271056 expected csum 0x00000000 mirror 1
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.648234] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.649275] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111845888, 3111849984)
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.649353] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5726208 csum 0x55271056 expected csum 0x00000000 mirror 1
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.649357] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.650397] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111845888, 3111849984)
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.650475] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5726208 csum 0x55271056 expected csum 0x00000000 mirror 1
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.650478] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.678142] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111124992, 3111129088)
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.678149] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111276544, 3111280640)
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.678151] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111346176, 3111350272)
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.680593] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 32063 off 39395328 csum 0xeeab29ce expected csum 0x00000000 mirror 1
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.680604] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.686438] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 32063 off 39395328 csum 0xeeab29ce expected csum 0x00000000 mirror 1
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.686449] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.687671] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 32063 off 39395328 csum 0xeeab29ce expected csum 0x00000000 mirror 1
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.687683] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.688871] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 32063 off 39395328 csum 0xeeab29ce expected csum 0x00000000 mirror 1
> Sep  5 00:16:42 hippo-tuxedo kernel: [ 1147.688876] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
> Sep  5 00:17:05 hippo-tuxedo kernel: [ 1170.527686] BTRFS warning (device nvme0n1p4): block group 2169503744 has wrong amount of free space
> Sep  5 00:17:05 hippo-tuxedo kernel: [ 1170.527695] BTRFS warning (device nvme0n1p4): failed to load free space cache for block group 2169503744, rebuilding it now
>
>
> $ uname -a
> Linux hippo-tuxedo 5.11.0-27-generic #29~20.04.1-Ubuntu SMP Wed Aug 11 15:58:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
>
> $ btrfs --version
> btrfs-progs v5.4.1
>
> $ btrfs fi show
> Label: 'HIPPO'  uuid: 2b69016b-e03b-478a-84cd-f794eddfebd5
> Total devices 1 FS bytes used 66.32GiB
> devid    1 size 256.00GiB used 95.02GiB path /dev/nvme0n1p4
>
> $ btrfs fi df /mnt/hippo/
> Data, single: total=94.01GiB, used=66.12GiB
> System, single: total=4.00MiB, used=16.00KiB
> Metadata, single: total=1.01GiB, used=203.09MiB
> GlobalReserve, single: total=94.59MiB, used=0.00B
>
> $ cat /etc/lsb-release
> DISTRIB_ID=Ubuntu
> DISTRIB_RELEASE=20.04
> DISTRIB_CODENAME=focal
> DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"
>
> Mount options:
> relatime,ssd,space_cache,subvolid=5,subvol=andrey
>
> $ btrfs check --readonly /dev/nvme0n1p4
> Opening filesystem to check...
> Checking filesystem on /dev/nvme0n1p4
> UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
> [1/7] checking root items
> [2/7] checking extents
> extent item 3109511168 has multiple extent items
> ref mismatch on [3109511168 2105344] extent item 1, found 5
> backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111489536
> backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
> backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111436288
> backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=16384
> backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111260160
> backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
> backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111411712
> backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=12288
> backpointer mismatch on [3109511168 2105344]
> extent item 3111616512 has multiple extent items
> ref mismatch on [3111616512 638976] extent item 25, found 26
> backref disk bytenr does not match extent record, bytenr=3111616512, ref bytenr=3112091648
> backref bytes do not match extent backref, bytenr=3111616512, ref bytes=638976, backref bytes=8192
> backpointer mismatch on [3111616512 638976]
> extent item 3121950720 has multiple extent items
> ref mismatch on [3121950720 2220032] extent item 1, found 4
> backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124080640
> backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
> backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124051968
> backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=12288
> backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3123773440
> backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
> backpointer mismatch on [3121950720 2220032]
> extent item 3124252672 has multiple extent items
> ref mismatch on [3124252672 208896] extent item 12, found 13
> backref disk bytenr does not match extent record, bytenr=3124252672, ref bytenr=3124428800
> backref bytes do not match extent backref, bytenr=3124252672, ref bytes=208896, backref bytes=12288
> backpointer mismatch on [3124252672 208896]
> ERROR: errors found in extent allocation tree or chunk allocation
> [3/7] checking free space cache
> block group 2169503744 has wrong amount of free space, free space cache has 10440704 block group has 10346496
> ERROR: free space cache has more free space than block group item, this could leads to serious corruption, please contact btrfs developers
> failed to load free space cache for block group 2169503744
> [4/7] checking fs roots
> root 257 inode 31924 errors 1000, some csum missing
> ERROR: errors found in fs roots
> found 71205822464 bytes used, error(s) found
> total csum bytes: 69299516
> total tree bytes: 212975616
> total fs tree bytes: 113672192
> total extent tree bytes: 14909440
> btree space waste bytes: 42172819
> file data blocks allocated: 86083526656
>   referenced 70815563776
>
> $ smartctl --all /dev/nvme0
> smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.11.0-27-generic] (local build)
> Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
>
> === START OF INFORMATION SECTION ===
> Model Number:                       Samsung SSD 970 EVO Plus 500GB
> Serial Number:                      S4EVNX0NB29088Y
> Firmware Version:                   2B2QEXM7
> PCI Vendor/Subsystem ID:            0x144d
> IEEE OUI Identifier:                0x002538
> Total NVM Capacity:                 500,107,862,016 [500 GB]
> Unallocated NVM Capacity:           0
> Controller ID:                      4
> Number of Namespaces:               1
> Namespace 1 Size/Capacity:          500,107,862,016 [500 GB]
> Namespace 1 Utilization:            133,526,691,840 [133 GB]
> Namespace 1 Formatted LBA Size:     512
> Namespace 1 IEEE EUI-64:            002538 5b01b07633
> Local Time is:                      Sun Sep  5 03:08:29 2021 EDT
> Firmware Updates (0x16):            3 Slots, no Reset required
> Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
> Optional NVM Commands (0x005f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
> Maximum Data Transfer Size:         512 Pages
> Warning  Comp. Temp. Threshold:     85 Celsius
> Critical Comp. Temp. Threshold:     85 Celsius
>
> Supported Power States
> St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
> 0 +     7.80W       -        -    0  0  0  0        0       0
> 1 +     6.00W       -        -    1  1  1  1        0       0
> 2 +     3.40W       -        -    2  2  2  2        0       0
> 3 -   0.0700W       -        -    3  3  3  3      210    1200
> 4 -   0.0100W       -        -    4  4  4  4     2000    8000
>
> Supported LBA Sizes (NSID 0x1)
> Id Fmt  Data  Metadt  Rel_Perf
> 0 +     512       0         0
>
> === START OF SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
>
> SMART/Health Information (NVMe Log 0x02)
> Critical Warning:                   0x00
> Temperature:                        33 Celsius
> Available Spare:                    100%
> Available Spare Threshold:          10%
> Percentage Used:                    0%
> Data Units Read:                    1,263,056 [646 GB]
> Data Units Written:                 1,381,709 [707 GB]
> Host Read Commands:                 27,814,722
> Host Write Commands:                29,580,959
> Controller Busy Time:               37
> Power Cycles:                       132
> Power On Hours:                     47
> Unsafe Shutdowns:                   13
> Media and Data Integrity Errors:    0
> Error Information Log Entries:      35
> Warning  Comp. Temperature Time:    0
> Critical Comp. Temperature Time:    0
> Temperature Sensor 1:               33 Celsius
> Temperature Sensor 2:               30 Celsius
>
> Error Information (NVMe Log 0x01, max 64 entries)
> No Errors Logged
>
>
> Thank you,
> Andrey
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS critical: corrupt leaf; BTRFS warning csum failed, expected csum 0x00000000 on AMD Ryzen 7 4800H, Samsung SSD 970 EVO Plus
  2021-09-06  1:08 ` Qu Wenruo
@ 2021-09-06  2:35   ` ahipp0
  2021-09-06  2:47     ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: ahipp0 @ 2021-09-06  2:35 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 26686 bytes --]

Qu,

Thank you so much for taking a look!

Please, see my comments inline below.

On Sunday, September 5th, 2021 at 9:08 PM, Qu wrote:
> On 2021/9/5 下午3:34, ahipp0 wrote:
> 

> > Hi!
> > 

> > I started having various fun BTRFS warnings/errors/critical messages all of a sudden
> > after downloading and extracting linux-5.14.1.tar.xz on a fairly new (~1TiB read/written) Samsung SSD 970 EVO Plus 500GB.
> > The laptop was resumed from suspend-to-disk ~30 minutes prior to that, I think.
> > 

> > Hardware:
> > TUXEDO Pulse 14 - Gen1
> > CPU: AMD Ryzen 7 4800H
> > RAM: 32GiB
> > Disk: Samsung SSD 970 EVO Plus 500GB
> > Distro: Kubuntu 20.04.2
> > Kernel: 5.11.x
> 

> I'm pretty sure you have used the btrfs partition for a while, as the
> corrupted tree block would be rejected by kernel starting from v5.11.
> 

> Thus such corrupted tree block should not be written to disk, thus the
> problem must be there for a while before you upgrading to v5.11 kernel.
> 


Fair enough, it's been like 3 months approximately.
Looking at the logs, it seems I created the filesystem with 5.8 kernel.

6/06 -- install of 5.8.0-55.62~20.04.1  -- this is when the filesystem was created
6/26 -- upgrade to 5.8.0-59.66~20.04.1
7/22 -- upgrade to 5.8.0-63.71~20.04.1
8/07 -- upgrade to 5.11.0-25.27~20.04.1
8/19 -- upgrade to 5.11.0-27.29~20.04.1

I think, a good chunk of data was written while on 5.8 kernel,
but probably 30% on 5.11 (a wild guess).

<snip>

> > 

> > In the past, I also noticed odd things like ldconfig hanging or not picking up updated libraries are suspend-to-disk.
> > Simply rebooting helped in such cases.
> > The swap partition is on the same disk. (as a separate partition, not a file)
> > I also started using a new power profile recently, which disables half of the CPU cores when on battery power.
> > (but hibernation also offlines all non-boot CPUs while preparing for suspend-to-disk)
> > What could have caused the filesystem corruption?
> 

> From the dmesg, at least one block group for metadata is corrupted, it
> may explain why one tree block can't be read, as it may point to some
> invalid location thus got all zero.
>
> I believe the problem exists way before v5.11.x, as in v5.11, btrfs has 

> the ability to detect tons of new problems and reject such incorrect
> metadata from reaching disk.

Ah, cool, that's good to know that 5.11 does a lot more sanity checking!

> Thus it should be a problem/bug caused by old kernel.
> 

> Furthermore, I didn't see any obvious bitflip, thus bad memory is less
> possible.

That's good!

> > Is there a way to repair the filesystem?
> 

> As expected, from the btrfs-check output, extent tree is corrupted.
> 

> But thankfully, the data should be all safe.
> 

> So the first thing is to backup all your important data.
> 

> Then try "btrfs check --mode=lowmem" to get a more human readable error
> report, and we can start from that to determine if it can be repaired.

Sure, please see below.

$ btrfs check --mode=lowmem /dev/nvme0n1p4
Opening filesystem to check...
Checking filesystem on /dev/nvme0n1p4
UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
block group 2169503744 has wrong amount of free space, free space cache has 10440704 block group has 10346496
ERROR: free space cache has more free space than block group item, this could leads to serious corruption, please contact btrfs developers
failed to load free space cache for block group 2169503744
[4/7] checking fs roots
ERROR: root 257 EXTENT_DATA[31924 5689344] csum missing, have: 36864, expected: 40960
ERROR: errors found in fs roots
found 71205916672 bytes used, error(s) found
total csum bytes: 69299516
total tree bytes: 212975616
total fs tree bytes: 113672192
total extent tree bytes: 14909440
btree space waste bytes: 42172819
file data blocks allocated: 86083526656
referenced 70815563776 



> But so far, I'm a little optimistic about a working repair.

Makes me optimistic as well. :)

> > How safe is it to continue using this particular filesystem after/if it's repaired on this drive?
> 

> It's safe to mount it read-only.
> It's not going to work well to read-write mount it, as btrfs will abort
> transaction just as you already see in the dmesg.

Oh, even after repairing it?
Or is it yet to be seen if it can be repaired?

> Thankfully with more and more sanity check introduced in recent kernels,
> btrfs can handle such corrupted fs without crashing the whole kernel.
> 

> So at least you can try to grab the data without crashing the kernel.

Yeah, that's definitely _very_ helpful as I was able to backup all the important stuff seemingly with no problems.


> > How safe is it to keep using BTRFS on this drive going forward (even after creating a new filesystem)?
> 

> As long as you're using v5.11 and newer kernel, btrfs is very strict on
> any data it writes back to disk, thus it can even detect quite some
> memory bitflips.
> 

> So unless there is a proof of the SSD is bad, you're pretty safe to
> continue use the disk.
> 

> And I don't see anything special related to the SSD, thus you're pretty
> safe to go.

Good to hear!

I started suspecting something with TRIM/discard support in the SSD/driver after seeing these all zero checksums,
but your explanation that it's just because of the corrupt tree makes more sense.

I also saw another thread on the mailing list (from Martin)
about quite a similar (from my point of view) issue on a similar system (AMD-based, Samsung 980 SSD) with similar usage (suspend-to-disk),
so trying to figure out if it's the drive issue, a driver issue, suspend-to-disk issue, or a combination of these.
So far, suspend-to-disk seems to be the main suspect (or it somehow triggers issues elsewhere)
as I saw strange behavior upon resume from hibernation, but never on a clean reboot.


> > I've backed up important files,
> > so I'll be glad to try various suggestions.
> > 

> > Also, I'll keep using ext4 on this drive for now and will keep an eye on it.
> > I think I was able to resolve the "corrupt leaf" issue by deleting affected files
> 

> Nope, there is no way to solve it just using the btrfs kernel module.
> 

> Btrfs refuses to read such corrupted tree block at all, thus no way to
> modify it.
> 

> Unless you're using a much older kernel, but then you lost all the new
> sanity checks in v5.11, thus not recommended.

Hm, I don't remember for sure now,
but chances are that I deleted those files when booting from a LiveUSB,
which used kernel 5.10.61.
(I didn't know that 5.11 is so much more strict, otherwise I would have found another bootable ISO)
This would explain how I could have deleted them.

But overall, I was mostly following instructions here:
https://lore.kernel.org/linux-btrfs/75c522e9-88ff-0b9d-1ede-b524388d42d1@gmx.com/



> Thanks,
> 

> Qu
> 

> > (the Linux kernel sources I was unpacking while I hit the issue),
> > 

> > because "btrfs ins logical-resolve" can't find the file anymore:
> > 

> > $ btrfs ins logical-resolve 1376043008 /mnt/hippo/
> > 

> > ERROR: logical ino ioctl: No such file or directory
> > 

> > However, checksum and "btrfs check" errors make me seriously worried.
> > 

> > This is the earliest BTRFS warning I see in the logs:
> > 

> > Sep 4 14:04:51 hippo-tuxedo kernel: [ 19.338196] BTRFS warning (device nvme0n1p4): block group 2169503744 has wrong amount of free space
> > 

> > Sep 4 14:04:51 hippo-tuxedo kernel: [ 19.338202] BTRFS warning (device nvme0n1p4): failed to load free space cache for block group 2169503744, rebuilding it now
> > 

> > Here's the first "corrupt leaf" error:
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151911] BTRFS critical (device nvme0n1p4): corrupt leaf: root=2 block=1376043008 slot=7 bg_start=2169503744 bg_len=1073741824, invalid block group used, have 1073790976 expect [0, 1073741824)
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151925] BTRFS info (device nvme0n1p4): leaf 1376043008 gen 24254 total ptrs 121 free space 6994 owner 2
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151929] item 0 key (2169339904 169 0) itemoff 16250 itemsize 33
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151932] extent refs 1 gen 20692 flags 2
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151933] ref#0: tree block backref root 7
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151936] item 1 key (2169356288 169 0) itemoff 16217 itemsize 33
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151938] extent refs 1 gen 20692 flags 2
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151939] ref#0: tree block backref root 7
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151940] item 2 key (2169372672 169 0) itemoff 16184 itemsize 33
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151942] extent refs 1 gen 20692 flags 2
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151943] ref#0: tree block backref root 7
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151945] item 3 key (2169405440 169 0) itemoff 16151 itemsize 33
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151946] extent refs 1 gen 20692 flags 2
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151947] ref#0: tree block backref root 7
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151949] item 4 key (2169421824 169 0) itemoff 16118 itemsize 33
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151950] extent refs 1 gen 20692 flags 2
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151951] ref#0: tree block backref root 7
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151953] item 5 key (2169470976 169 0) itemoff 16085 itemsize 33
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151954] extent refs 1 gen 24164 flags 2
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151955] ref#0: tree block backref root 2
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151957] item 6 key (2169503744 168 16429056) itemoff 16032 itemsize 53
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151959] extent refs 1 gen 47 flags 1
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151960] ref#0: extent data backref root 257 objectid 20379 offset 0 count 1
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151962] item 7 key (2169503744 192 1073741824) itemoff 16008 itemsize 24
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151964] block group used 1073790976 chunk_objectid 256 flags 1
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151966] item 8 key (2185932800 168 241664) itemoff 15955 itemsize 53
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151968] extent refs 1 gen 47 flags 1
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151969] ref#0: extent data backref root 257 objectid 20417 offset 0 count 1
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151971] item 9 key (2186174464 168 299008) itemoff 15902 itemsize 53
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151973] extent refs 1 gen 47 flags 1
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151974] ref#0: extent data backref root 257 objectid 20418 offset 0 count 1
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151976] item 10 key (2186473472 168 135168) itemoff 15849 itemsize 53
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151977] extent refs 1 gen 47 flags 1
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151978] ref#0: extent data backref root 257 objectid 20419 offset 0 count 1
> > 

> > ...
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152480] item 120 key (2195210240 168 4096) itemoff 10019 itemsize 53
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152481] extent refs 1 gen 47 flags 1
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152482] ref#0: extent data backref root 257 objectid 20558 offset 0 count 1
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152484] BTRFS error (device nvme0n1p4): block=1376043008 write time tree block corruption detected
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152661] BTRFS: error (device nvme0n1p4) in btrfs_commit_transaction:2339: errno=-5 IO failure (Error while writing out transaction)
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152663] BTRFS info (device nvme0n1p4): forced readonly
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152664] BTRFS warning (device nvme0n1p4): Skipping commit of aborted transaction.
> > 

> > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152665] BTRFS: error (device nvme0n1p4) in cleanup_transaction:1939: errno=-5 IO failure
> > 

> > I hit these csum 0x00000000 errors while trying to backup the files to ext4 partition on the same disk:
> > 

> > Sep 5 00:12:26 hippo-tuxedo kernel: [ 891.475516] BTRFS info (device nvme0n1p4): disk space caching is enabled
> > 

> > Sep 5 00:12:26 hippo-tuxedo kernel: [ 891.475523] BTRFS info (device nvme0n1p4): has skinny extents
> > 

> > Sep 5 00:12:26 hippo-tuxedo kernel: [ 891.494832] BTRFS info (device nvme0n1p4): enabling ssd optimizations
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.627577] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111845888, 3111849984)
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.627805] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5726208 csum 0x55271056 expected csum 0x00000000 mirror 1
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.627814] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.628316] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3112013824, 3112017920)
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.628931] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3112058880, 3112062976)
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.628943] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3112083456, 3112087552)
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.629210] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5894144 csum 0x45d7e010 expected csum 0x00000000 mirror 1
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.629214] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.629238] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5963776 csum 0x95b8b716 expected csum 0x00000000 mirror 1
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.630311] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.648130] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111845888, 3111849984)
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.648226] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5726208 csum 0x55271056 expected csum 0x00000000 mirror 1
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.648234] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.649275] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111845888, 3111849984)
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.649353] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5726208 csum 0x55271056 expected csum 0x00000000 mirror 1
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.649357] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.650397] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111845888, 3111849984)
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.650475] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5726208 csum 0x55271056 expected csum 0x00000000 mirror 1
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.650478] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.678142] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111124992, 3111129088)
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.678149] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111276544, 3111280640)
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.678151] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111346176, 3111350272)
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.680593] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 32063 off 39395328 csum 0xeeab29ce expected csum 0x00000000 mirror 1
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.680604] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.686438] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 32063 off 39395328 csum 0xeeab29ce expected csum 0x00000000 mirror 1
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.686449] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.687671] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 32063 off 39395328 csum 0xeeab29ce expected csum 0x00000000 mirror 1
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.687683] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.688871] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 32063 off 39395328 csum 0xeeab29ce expected csum 0x00000000 mirror 1
> > 

> > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.688876] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
> > 

> > Sep 5 00:17:05 hippo-tuxedo kernel: [ 1170.527686] BTRFS warning (device nvme0n1p4): block group 2169503744 has wrong amount of free space
> > 

> > Sep 5 00:17:05 hippo-tuxedo kernel: [ 1170.527695] BTRFS warning (device nvme0n1p4): failed to load free space cache for block group 2169503744, rebuilding it now
> > 

> > $ uname -a
> > 

> > Linux hippo-tuxedo 5.11.0-27-generic #29~20.04.1-Ubuntu SMP Wed Aug 11 15:58:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
> > 

> > $ btrfs --version
> > 

> > btrfs-progs v5.4.1
> > 

> > $ btrfs fi show
> > 

> > Label: 'HIPPO' uuid: 2b69016b-e03b-478a-84cd-f794eddfebd5
> > 

> > Total devices 1 FS bytes used 66.32GiB
> > 

> > devid 1 size 256.00GiB used 95.02GiB path /dev/nvme0n1p4
> > 

> > $ btrfs fi df /mnt/hippo/
> > 

> > Data, single: total=94.01GiB, used=66.12GiB
> > 

> > System, single: total=4.00MiB, used=16.00KiB
> > 

> > Metadata, single: total=1.01GiB, used=203.09MiB
> > 

> > GlobalReserve, single: total=94.59MiB, used=0.00B
> > 

> > $ cat /etc/lsb-release
> > 

> > DISTRIB_ID=Ubuntu
> > 

> > DISTRIB_RELEASE=20.04
> > 

> > DISTRIB_CODENAME=focal
> > 

> > DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"
> > 

> > Mount options:
> > 

> > relatime,ssd,space_cache,subvolid=5,subvol=andrey
> > 

> > $ btrfs check --readonly /dev/nvme0n1p4
> > 

> > Opening filesystem to check...
> > 

> > Checking filesystem on /dev/nvme0n1p4
> > 

> > UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
> > 

> > [1/7] checking root items
> > 

> > [2/7] checking extents
> > 

> > extent item 3109511168 has multiple extent items
> > 

> > ref mismatch on [3109511168 2105344] extent item 1, found 5
> > 

> > backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111489536
> > 

> > backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
> > 

> > backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111436288
> > 

> > backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=16384
> > 

> > backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111260160
> > 

> > backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
> > 

> > backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111411712
> > 

> > backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=12288
> > 

> > backpointer mismatch on [3109511168 2105344]
> > 

> > extent item 3111616512 has multiple extent items
> > 

> > ref mismatch on [3111616512 638976] extent item 25, found 26
> > 

> > backref disk bytenr does not match extent record, bytenr=3111616512, ref bytenr=3112091648
> > 

> > backref bytes do not match extent backref, bytenr=3111616512, ref bytes=638976, backref bytes=8192
> > 

> > backpointer mismatch on [3111616512 638976]
> > 

> > extent item 3121950720 has multiple extent items
> > 

> > ref mismatch on [3121950720 2220032] extent item 1, found 4
> > 

> > backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124080640
> > 

> > backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
> > 

> > backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124051968
> > 

> > backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=12288
> > 

> > backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3123773440
> > 

> > backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
> > 

> > backpointer mismatch on [3121950720 2220032]
> > 

> > extent item 3124252672 has multiple extent items
> > 

> > ref mismatch on [3124252672 208896] extent item 12, found 13
> > 

> > backref disk bytenr does not match extent record, bytenr=3124252672, ref bytenr=3124428800
> > 

> > backref bytes do not match extent backref, bytenr=3124252672, ref bytes=208896, backref bytes=12288
> > 

> > backpointer mismatch on [3124252672 208896]
> > 

> > ERROR: errors found in extent allocation tree or chunk allocation
> > 

> > [3/7] checking free space cache
> > 

> > block group 2169503744 has wrong amount of free space, free space cache has 10440704 block group has 10346496
> > 

> > ERROR: free space cache has more free space than block group item, this could leads to serious corruption, please contact btrfs developers
> > 

> > failed to load free space cache for block group 2169503744
> > 

> > [4/7] checking fs roots
> > 

> > root 257 inode 31924 errors 1000, some csum missing
> > 

> > ERROR: errors found in fs roots
> > 

> > found 71205822464 bytes used, error(s) found
> > 

> > total csum bytes: 69299516
> > 

> > total tree bytes: 212975616
> > 

> > total fs tree bytes: 113672192
> > 

> > total extent tree bytes: 14909440
> > 

> > btree space waste bytes: 42172819
> > 

> > file data blocks allocated: 86083526656
> > 

> > referenced 70815563776
> > 

> > $ smartctl --all /dev/nvme0
> > 

> > smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.11.0-27-generic] (local build)
> > 

> > Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
> > 

> > === START OF INFORMATION SECTION ===
> > 

> > Model Number: Samsung SSD 970 EVO Plus 500GB
> > 

> > Serial Number: S4EVNX0NB29088Y
> > 

> > Firmware Version: 2B2QEXM7
> > 

> > PCI Vendor/Subsystem ID: 0x144d
> > 

> > IEEE OUI Identifier: 0x002538
> > 

> > Total NVM Capacity: 500,107,862,016 [500 GB]
> > 

> > Unallocated NVM Capacity: 0
> > 

> > Controller ID: 4
> > 

> > Number of Namespaces: 1
> > 

> > Namespace 1 Size/Capacity: 500,107,862,016 [500 GB]
> > 

> > Namespace 1 Utilization: 133,526,691,840 [133 GB]
> > 

> > Namespace 1 Formatted LBA Size: 512
> > 

> > Namespace 1 IEEE EUI-64: 002538 5b01b07633
> > 

> > Local Time is: Sun Sep 5 03:08:29 2021 EDT
> > 

> > Firmware Updates (0x16): 3 Slots, no Reset required
> > 

> > Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
> > 

> > Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
> > 

> > Maximum Data Transfer Size: 512 Pages
> > 

> > Warning Comp. Temp. Threshold: 85 Celsius
> > 

> > Critical Comp. Temp. Threshold: 85 Celsius
> > 

> > Supported Power States
> > 

> > St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
> > 

> > 0 + 7.80W - - 0 0 0 0 0 0
> > 

> > 1 + 6.00W - - 1 1 1 1 0 0
> > 

> > 2 + 3.40W - - 2 2 2 2 0 0
> > 

> > 3 - 0.0700W - - 3 3 3 3 210 1200
> > 

> > 4 - 0.0100W - - 4 4 4 4 2000 8000
> > 

> > Supported LBA Sizes (NSID 0x1)
> > 

> > Id Fmt Data Metadt Rel_Perf
> > 

> > 0 + 512 0 0
> > 

> > === START OF SMART DATA SECTION ===
> > 

> > SMART overall-health self-assessment test result: PASSED
> > 

> > SMART/Health Information (NVMe Log 0x02)
> > 

> > Critical Warning: 0x00
> > 

> > Temperature: 33 Celsius
> > 

> > Available Spare: 100%
> > 

> > Available Spare Threshold: 10%
> > 

> > Percentage Used: 0%
> > 

> > Data Units Read: 1,263,056 [646 GB]
> > 

> > Data Units Written: 1,381,709 [707 GB]
> > 

> > Host Read Commands: 27,814,722
> > 

> > Host Write Commands: 29,580,959
> > 

> > Controller Busy Time: 37
> > 

> > Power Cycles: 132
> > 

> > Power On Hours: 47
> > 

> > Unsafe Shutdowns: 13
> > 

> > Media and Data Integrity Errors: 0
> > 

> > Error Information Log Entries: 35
> > 

> > Warning Comp. Temperature Time: 0
> > 

> > Critical Comp. Temperature Time: 0
> > 

> > Temperature Sensor 1: 33 Celsius
> > 

> > Temperature Sensor 2: 30 Celsius
> > 

> > Error Information (NVMe Log 0x01, max 64 entries)
> > 

> > No Errors Logged

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS critical: corrupt leaf; BTRFS warning csum failed, expected csum 0x00000000 on AMD Ryzen 7 4800H, Samsung SSD 970 EVO Plus
  2021-09-06  2:35   ` ahipp0
@ 2021-09-06  2:47     ` Qu Wenruo
  2021-09-06  3:05       ` ahipp0
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2021-09-06  2:47 UTC (permalink / raw)
  To: ahipp0; +Cc: linux-btrfs



On 2021/9/6 上午10:35, ahipp0 wrote:
> Qu,
>
> Thank you so much for taking a look!
>
> Please, see my comments inline below.
>
> On Sunday, September 5th, 2021 at 9:08 PM, Qu wrote:
>> On 2021/9/5 下午3:34, ahipp0 wrote:
>>
>
>>> Hi!
>>>
>
>>> I started having various fun BTRFS warnings/errors/critical messages all of a sudden
>>> after downloading and extracting linux-5.14.1.tar.xz on a fairly new (~1TiB read/written) Samsung SSD 970 EVO Plus 500GB.
>>> The laptop was resumed from suspend-to-disk ~30 minutes prior to that, I think.
>>>
>
>>> Hardware:
>>> TUXEDO Pulse 14 - Gen1
>>> CPU: AMD Ryzen 7 4800H
>>> RAM: 32GiB
>>> Disk: Samsung SSD 970 EVO Plus 500GB
>>> Distro: Kubuntu 20.04.2
>>> Kernel: 5.11.x
>>
>
>> I'm pretty sure you have used the btrfs partition for a while, as the
>> corrupted tree block would be rejected by kernel starting from v5.11.
>>
>
>> Thus such corrupted tree block should not be written to disk, thus the
>> problem must be there for a while before you upgrading to v5.11 kernel.
>>
>
>
> Fair enough, it's been like 3 months approximately.
> Looking at the logs, it seems I created the filesystem with 5.8 kernel.
>
> 6/06 -- install of 5.8.0-55.62~20.04.1  -- this is when the filesystem was created
> 6/26 -- upgrade to 5.8.0-59.66~20.04.1
> 7/22 -- upgrade to 5.8.0-63.71~20.04.1
> 8/07 -- upgrade to 5.11.0-25.27~20.04.1
> 8/19 -- upgrade to 5.11.0-27.29~20.04.1
>
> I think, a good chunk of data was written while on 5.8 kernel,
> but probably 30% on 5.11 (a wild guess).
>
> <snip>
>
>>>
>
>>> In the past, I also noticed odd things like ldconfig hanging or not picking up updated libraries are suspend-to-disk.
>>> Simply rebooting helped in such cases.
>>> The swap partition is on the same disk. (as a separate partition, not a file)
>>> I also started using a new power profile recently, which disables half of the CPU cores when on battery power.
>>> (but hibernation also offlines all non-boot CPUs while preparing for suspend-to-disk)
>>> What could have caused the filesystem corruption?
>>
>
>>  From the dmesg, at least one block group for metadata is corrupted, it
>> may explain why one tree block can't be read, as it may point to some
>> invalid location thus got all zero.
>>
>> I believe the problem exists way before v5.11.x, as in v5.11, btrfs has
>
>> the ability to detect tons of new problems and reject such incorrect
>> metadata from reaching disk.
>
> Ah, cool, that's good to know that 5.11 does a lot more sanity checking!
>
>> Thus it should be a problem/bug caused by old kernel.
>>
>
>> Furthermore, I didn't see any obvious bitflip, thus bad memory is less
>> possible.
>
> That's good!
>
>>> Is there a way to repair the filesystem?
>>
>
>> As expected, from the btrfs-check output, extent tree is corrupted.
>>
>
>> But thankfully, the data should be all safe.
>>
>
>> So the first thing is to backup all your important data.
>>
>
>> Then try "btrfs check --mode=lowmem" to get a more human readable error
>> report, and we can start from that to determine if it can be repaired.
>
> Sure, please see below.
>
> $ btrfs check --mode=lowmem /dev/nvme0n1p4
> Opening filesystem to check...
> Checking filesystem on /dev/nvme0n1p4
> UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
> [1/7] checking root items
> [2/7] checking extents

That's strange, this means lowmem and original mode has a different
ideas on what's going wrong.

Now we need to enhance lowmem mode to detect such problem first.

> [3/7] checking free space cache
> block group 2169503744 has wrong amount of free space, free space cache has 10440704 block group has 10346496
> ERROR: free space cache has more free space than block group item, this could leads to serious corruption, please contact btrfs developers

Yeah, free space cache corruption is a big thing.

Recommended to do a btrfs-check --clear-space-cache v1 first before next
mount.

> failed to load free space cache for block group 2169503744
> [4/7] checking fs roots
> ERROR: root 257 EXTENT_DATA[31924 5689344] csum missing, have: 36864, expected: 40960
> ERROR: errors found in fs roots
> found 71205916672 bytes used, error(s) found
> total csum bytes: 69299516
> total tree bytes: 212975616
> total fs tree bytes: 113672192
> total extent tree bytes: 14909440
> btree space waste bytes: 42172819
> file data blocks allocated: 86083526656
> referenced 70815563776
>
>
>
>> But so far, I'm a little optimistic about a working repair.
>
> Makes me optimistic as well. :)

You can --repair after clearing v1 cache and backing up your data.

Then run without --repair to see the result.

For the worst case, you can try --init-extent-tree, but I still want to
see the result of regular --repair.

>
>>> How safe is it to continue using this particular filesystem after/if it's repaired on this drive?
>>
>
>> It's safe to mount it read-only.
>> It's not going to work well to read-write mount it, as btrfs will abort
>> transaction just as you already see in the dmesg.
>
> Oh, even after repairing it?

I mean before repair.

After repair, depends on whether btrfs-check reports any remaining error.
If no more error, RW mount should be fine.

> Or is it yet to be seen if it can be repaired?
>
>> Thankfully with more and more sanity check introduced in recent kernels,
>> btrfs can handle such corrupted fs without crashing the whole kernel.
>>
>
>> So at least you can try to grab the data without crashing the kernel.
>
> Yeah, that's definitely _very_ helpful as I was able to backup all the important stuff seemingly with no problems.
>
>
>>> How safe is it to keep using BTRFS on this drive going forward (even after creating a new filesystem)?
>>
>
>> As long as you're using v5.11 and newer kernel, btrfs is very strict on
>> any data it writes back to disk, thus it can even detect quite some
>> memory bitflips.
>>
>
>> So unless there is a proof of the SSD is bad, you're pretty safe to
>> continue use the disk.
>>
>
>> And I don't see anything special related to the SSD, thus you're pretty
>> safe to go.
>
> Good to hear!
>
> I started suspecting something with TRIM/discard support in the SSD/driver after seeing these all zero checksums,
> but your explanation that it's just because of the corrupt tree makes more sense.
>
> I also saw another thread on the mailing list (from Martin)
> about quite a similar (from my point of view) issue on a similar system (AMD-based, Samsung 980 SSD) with similar usage (suspend-to-disk),
> so trying to figure out if it's the drive issue, a driver issue, suspend-to-disk issue, or a combination of these.

I guess suspend to disk may be involved, but can't say for sure.

As although I'm also using AMD based CPU (3700X though), with PM981 SSD
from Samsung, but I never use suspend to disk/ram, but just power off...
So I can't say for sure.

Thanks,
Qu

> So far, suspend-to-disk seems to be the main suspect (or it somehow triggers issues elsewhere)
> as I saw strange behavior upon resume from hibernation, but never on a clean reboot.
>
>
>>> I've backed up important files,
>>> so I'll be glad to try various suggestions.
>>>
>
>>> Also, I'll keep using ext4 on this drive for now and will keep an eye on it.
>>> I think I was able to resolve the "corrupt leaf" issue by deleting affected files
>>
>
>> Nope, there is no way to solve it just using the btrfs kernel module.
>>
>
>> Btrfs refuses to read such corrupted tree block at all, thus no way to
>> modify it.
>>
>
>> Unless you're using a much older kernel, but then you lost all the new
>> sanity checks in v5.11, thus not recommended.
>
> Hm, I don't remember for sure now,
> but chances are that I deleted those files when booting from a LiveUSB,
> which used kernel 5.10.61.
> (I didn't know that 5.11 is so much more strict, otherwise I would have found another bootable ISO)
> This would explain how I could have deleted them.
>
> But overall, I was mostly following instructions here:
> https://lore.kernel.org/linux-btrfs/75c522e9-88ff-0b9d-1ede-b524388d42d1@gmx.com/
>
>
>
>> Thanks,
>>
>
>> Qu
>>
>
>>> (the Linux kernel sources I was unpacking while I hit the issue),
>>>
>
>>> because "btrfs ins logical-resolve" can't find the file anymore:
>>>
>
>>> $ btrfs ins logical-resolve 1376043008 /mnt/hippo/
>>>
>
>>> ERROR: logical ino ioctl: No such file or directory
>>>
>
>>> However, checksum and "btrfs check" errors make me seriously worried.
>>>
>
>>> This is the earliest BTRFS warning I see in the logs:
>>>
>
>>> Sep 4 14:04:51 hippo-tuxedo kernel: [ 19.338196] BTRFS warning (device nvme0n1p4): block group 2169503744 has wrong amount of free space
>>>
>
>>> Sep 4 14:04:51 hippo-tuxedo kernel: [ 19.338202] BTRFS warning (device nvme0n1p4): failed to load free space cache for block group 2169503744, rebuilding it now
>>>
>
>>> Here's the first "corrupt leaf" error:
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151911] BTRFS critical (device nvme0n1p4): corrupt leaf: root=2 block=1376043008 slot=7 bg_start=2169503744 bg_len=1073741824, invalid block group used, have 1073790976 expect [0, 1073741824)
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151925] BTRFS info (device nvme0n1p4): leaf 1376043008 gen 24254 total ptrs 121 free space 6994 owner 2
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151929] item 0 key (2169339904 169 0) itemoff 16250 itemsize 33
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151932] extent refs 1 gen 20692 flags 2
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151933] ref#0: tree block backref root 7
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151936] item 1 key (2169356288 169 0) itemoff 16217 itemsize 33
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151938] extent refs 1 gen 20692 flags 2
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151939] ref#0: tree block backref root 7
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151940] item 2 key (2169372672 169 0) itemoff 16184 itemsize 33
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151942] extent refs 1 gen 20692 flags 2
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151943] ref#0: tree block backref root 7
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151945] item 3 key (2169405440 169 0) itemoff 16151 itemsize 33
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151946] extent refs 1 gen 20692 flags 2
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151947] ref#0: tree block backref root 7
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151949] item 4 key (2169421824 169 0) itemoff 16118 itemsize 33
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151950] extent refs 1 gen 20692 flags 2
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151951] ref#0: tree block backref root 7
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151953] item 5 key (2169470976 169 0) itemoff 16085 itemsize 33
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151954] extent refs 1 gen 24164 flags 2
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151955] ref#0: tree block backref root 2
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151957] item 6 key (2169503744 168 16429056) itemoff 16032 itemsize 53
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151959] extent refs 1 gen 47 flags 1
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151960] ref#0: extent data backref root 257 objectid 20379 offset 0 count 1
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151962] item 7 key (2169503744 192 1073741824) itemoff 16008 itemsize 24
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151964] block group used 1073790976 chunk_objectid 256 flags 1
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151966] item 8 key (2185932800 168 241664) itemoff 15955 itemsize 53
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151968] extent refs 1 gen 47 flags 1
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151969] ref#0: extent data backref root 257 objectid 20417 offset 0 count 1
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151971] item 9 key (2186174464 168 299008) itemoff 15902 itemsize 53
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151973] extent refs 1 gen 47 flags 1
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151974] ref#0: extent data backref root 257 objectid 20418 offset 0 count 1
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151976] item 10 key (2186473472 168 135168) itemoff 15849 itemsize 53
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151977] extent refs 1 gen 47 flags 1
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151978] ref#0: extent data backref root 257 objectid 20419 offset 0 count 1
>>>
>
>>> ...
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152480] item 120 key (2195210240 168 4096) itemoff 10019 itemsize 53
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152481] extent refs 1 gen 47 flags 1
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152482] ref#0: extent data backref root 257 objectid 20558 offset 0 count 1
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152484] BTRFS error (device nvme0n1p4): block=1376043008 write time tree block corruption detected
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152661] BTRFS: error (device nvme0n1p4) in btrfs_commit_transaction:2339: errno=-5 IO failure (Error while writing out transaction)
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152663] BTRFS info (device nvme0n1p4): forced readonly
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152664] BTRFS warning (device nvme0n1p4): Skipping commit of aborted transaction.
>>>
>
>>> Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152665] BTRFS: error (device nvme0n1p4) in cleanup_transaction:1939: errno=-5 IO failure
>>>
>
>>> I hit these csum 0x00000000 errors while trying to backup the files to ext4 partition on the same disk:
>>>
>
>>> Sep 5 00:12:26 hippo-tuxedo kernel: [ 891.475516] BTRFS info (device nvme0n1p4): disk space caching is enabled
>>>
>
>>> Sep 5 00:12:26 hippo-tuxedo kernel: [ 891.475523] BTRFS info (device nvme0n1p4): has skinny extents
>>>
>
>>> Sep 5 00:12:26 hippo-tuxedo kernel: [ 891.494832] BTRFS info (device nvme0n1p4): enabling ssd optimizations
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.627577] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111845888, 3111849984)
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.627805] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5726208 csum 0x55271056 expected csum 0x00000000 mirror 1
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.627814] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.628316] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3112013824, 3112017920)
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.628931] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3112058880, 3112062976)
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.628943] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3112083456, 3112087552)
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.629210] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5894144 csum 0x45d7e010 expected csum 0x00000000 mirror 1
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.629214] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.629238] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5963776 csum 0x95b8b716 expected csum 0x00000000 mirror 1
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.630311] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.648130] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111845888, 3111849984)
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.648226] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5726208 csum 0x55271056 expected csum 0x00000000 mirror 1
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.648234] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.649275] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111845888, 3111849984)
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.649353] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5726208 csum 0x55271056 expected csum 0x00000000 mirror 1
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.649357] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.650397] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111845888, 3111849984)
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.650475] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5726208 csum 0x55271056 expected csum 0x00000000 mirror 1
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.650478] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.678142] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111124992, 3111129088)
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.678149] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111276544, 3111280640)
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.678151] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111346176, 3111350272)
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.680593] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 32063 off 39395328 csum 0xeeab29ce expected csum 0x00000000 mirror 1
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.680604] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.686438] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 32063 off 39395328 csum 0xeeab29ce expected csum 0x00000000 mirror 1
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.686449] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.687671] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 32063 off 39395328 csum 0xeeab29ce expected csum 0x00000000 mirror 1
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.687683] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.688871] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 32063 off 39395328 csum 0xeeab29ce expected csum 0x00000000 mirror 1
>>>
>
>>> Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.688876] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
>>>
>
>>> Sep 5 00:17:05 hippo-tuxedo kernel: [ 1170.527686] BTRFS warning (device nvme0n1p4): block group 2169503744 has wrong amount of free space
>>>
>
>>> Sep 5 00:17:05 hippo-tuxedo kernel: [ 1170.527695] BTRFS warning (device nvme0n1p4): failed to load free space cache for block group 2169503744, rebuilding it now
>>>
>
>>> $ uname -a
>>>
>
>>> Linux hippo-tuxedo 5.11.0-27-generic #29~20.04.1-Ubuntu SMP Wed Aug 11 15:58:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
>>>
>
>>> $ btrfs --version
>>>
>
>>> btrfs-progs v5.4.1
>>>
>
>>> $ btrfs fi show
>>>
>
>>> Label: 'HIPPO' uuid: 2b69016b-e03b-478a-84cd-f794eddfebd5
>>>
>
>>> Total devices 1 FS bytes used 66.32GiB
>>>
>
>>> devid 1 size 256.00GiB used 95.02GiB path /dev/nvme0n1p4
>>>
>
>>> $ btrfs fi df /mnt/hippo/
>>>
>
>>> Data, single: total=94.01GiB, used=66.12GiB
>>>
>
>>> System, single: total=4.00MiB, used=16.00KiB
>>>
>
>>> Metadata, single: total=1.01GiB, used=203.09MiB
>>>
>
>>> GlobalReserve, single: total=94.59MiB, used=0.00B
>>>
>
>>> $ cat /etc/lsb-release
>>>
>
>>> DISTRIB_ID=Ubuntu
>>>
>
>>> DISTRIB_RELEASE=20.04
>>>
>
>>> DISTRIB_CODENAME=focal
>>>
>
>>> DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"
>>>
>
>>> Mount options:
>>>
>
>>> relatime,ssd,space_cache,subvolid=5,subvol=andrey
>>>
>
>>> $ btrfs check --readonly /dev/nvme0n1p4
>>>
>
>>> Opening filesystem to check...
>>>
>
>>> Checking filesystem on /dev/nvme0n1p4
>>>
>
>>> UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
>>>
>
>>> [1/7] checking root items
>>>
>
>>> [2/7] checking extents
>>>
>
>>> extent item 3109511168 has multiple extent items
>>>
>
>>> ref mismatch on [3109511168 2105344] extent item 1, found 5
>>>
>
>>> backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111489536
>>>
>
>>> backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
>>>
>
>>> backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111436288
>>>
>
>>> backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=16384
>>>
>
>>> backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111260160
>>>
>
>>> backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
>>>
>
>>> backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111411712
>>>
>
>>> backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=12288
>>>
>
>>> backpointer mismatch on [3109511168 2105344]
>>>
>
>>> extent item 3111616512 has multiple extent items
>>>
>
>>> ref mismatch on [3111616512 638976] extent item 25, found 26
>>>
>
>>> backref disk bytenr does not match extent record, bytenr=3111616512, ref bytenr=3112091648
>>>
>
>>> backref bytes do not match extent backref, bytenr=3111616512, ref bytes=638976, backref bytes=8192
>>>
>
>>> backpointer mismatch on [3111616512 638976]
>>>
>
>>> extent item 3121950720 has multiple extent items
>>>
>
>>> ref mismatch on [3121950720 2220032] extent item 1, found 4
>>>
>
>>> backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124080640
>>>
>
>>> backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
>>>
>
>>> backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124051968
>>>
>
>>> backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=12288
>>>
>
>>> backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3123773440
>>>
>
>>> backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
>>>
>
>>> backpointer mismatch on [3121950720 2220032]
>>>
>
>>> extent item 3124252672 has multiple extent items
>>>
>
>>> ref mismatch on [3124252672 208896] extent item 12, found 13
>>>
>
>>> backref disk bytenr does not match extent record, bytenr=3124252672, ref bytenr=3124428800
>>>
>
>>> backref bytes do not match extent backref, bytenr=3124252672, ref bytes=208896, backref bytes=12288
>>>
>
>>> backpointer mismatch on [3124252672 208896]
>>>
>
>>> ERROR: errors found in extent allocation tree or chunk allocation
>>>
>
>>> [3/7] checking free space cache
>>>
>
>>> block group 2169503744 has wrong amount of free space, free space cache has 10440704 block group has 10346496
>>>
>
>>> ERROR: free space cache has more free space than block group item, this could leads to serious corruption, please contact btrfs developers
>>>
>
>>> failed to load free space cache for block group 2169503744
>>>
>
>>> [4/7] checking fs roots
>>>
>
>>> root 257 inode 31924 errors 1000, some csum missing
>>>
>
>>> ERROR: errors found in fs roots
>>>
>
>>> found 71205822464 bytes used, error(s) found
>>>
>
>>> total csum bytes: 69299516
>>>
>
>>> total tree bytes: 212975616
>>>
>
>>> total fs tree bytes: 113672192
>>>
>
>>> total extent tree bytes: 14909440
>>>
>
>>> btree space waste bytes: 42172819
>>>
>
>>> file data blocks allocated: 86083526656
>>>
>
>>> referenced 70815563776
>>>
>
>>> $ smartctl --all /dev/nvme0
>>>
>
>>> smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.11.0-27-generic] (local build)
>>>
>
>>> Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
>>>
>
>>> === START OF INFORMATION SECTION ===
>>>
>
>>> Model Number: Samsung SSD 970 EVO Plus 500GB
>>>
>
>>> Serial Number: S4EVNX0NB29088Y
>>>
>
>>> Firmware Version: 2B2QEXM7
>>>
>
>>> PCI Vendor/Subsystem ID: 0x144d
>>>
>
>>> IEEE OUI Identifier: 0x002538
>>>
>
>>> Total NVM Capacity: 500,107,862,016 [500 GB]
>>>
>
>>> Unallocated NVM Capacity: 0
>>>
>
>>> Controller ID: 4
>>>
>
>>> Number of Namespaces: 1
>>>
>
>>> Namespace 1 Size/Capacity: 500,107,862,016 [500 GB]
>>>
>
>>> Namespace 1 Utilization: 133,526,691,840 [133 GB]
>>>
>
>>> Namespace 1 Formatted LBA Size: 512
>>>
>
>>> Namespace 1 IEEE EUI-64: 002538 5b01b07633
>>>
>
>>> Local Time is: Sun Sep 5 03:08:29 2021 EDT
>>>
>
>>> Firmware Updates (0x16): 3 Slots, no Reset required
>>>
>
>>> Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
>>>
>
>>> Optional NVM Commands (0x005f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
>>>
>
>>> Maximum Data Transfer Size: 512 Pages
>>>
>
>>> Warning Comp. Temp. Threshold: 85 Celsius
>>>
>
>>> Critical Comp. Temp. Threshold: 85 Celsius
>>>
>
>>> Supported Power States
>>>
>
>>> St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
>>>
>
>>> 0 + 7.80W - - 0 0 0 0 0 0
>>>
>
>>> 1 + 6.00W - - 1 1 1 1 0 0
>>>
>
>>> 2 + 3.40W - - 2 2 2 2 0 0
>>>
>
>>> 3 - 0.0700W - - 3 3 3 3 210 1200
>>>
>
>>> 4 - 0.0100W - - 4 4 4 4 2000 8000
>>>
>
>>> Supported LBA Sizes (NSID 0x1)
>>>
>
>>> Id Fmt Data Metadt Rel_Perf
>>>
>
>>> 0 + 512 0 0
>>>
>
>>> === START OF SMART DATA SECTION ===
>>>
>
>>> SMART overall-health self-assessment test result: PASSED
>>>
>
>>> SMART/Health Information (NVMe Log 0x02)
>>>
>
>>> Critical Warning: 0x00
>>>
>
>>> Temperature: 33 Celsius
>>>
>
>>> Available Spare: 100%
>>>
>
>>> Available Spare Threshold: 10%
>>>
>
>>> Percentage Used: 0%
>>>
>
>>> Data Units Read: 1,263,056 [646 GB]
>>>
>
>>> Data Units Written: 1,381,709 [707 GB]
>>>
>
>>> Host Read Commands: 27,814,722
>>>
>
>>> Host Write Commands: 29,580,959
>>>
>
>>> Controller Busy Time: 37
>>>
>
>>> Power Cycles: 132
>>>
>
>>> Power On Hours: 47
>>>
>
>>> Unsafe Shutdowns: 13
>>>
>
>>> Media and Data Integrity Errors: 0
>>>
>
>>> Error Information Log Entries: 35
>>>
>
>>> Warning Comp. Temperature Time: 0
>>>
>
>>> Critical Comp. Temperature Time: 0
>>>
>
>>> Temperature Sensor 1: 33 Celsius
>>>
>
>>> Temperature Sensor 2: 30 Celsius
>>>
>
>>> Error Information (NVMe Log 0x01, max 64 entries)
>>>
>
>>> No Errors Logged

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS critical: corrupt leaf; BTRFS warning csum failed, expected csum 0x00000000 on AMD Ryzen 7 4800H, Samsung SSD 970 EVO Plus
  2021-09-06  2:47     ` Qu Wenruo
@ 2021-09-06  3:05       ` ahipp0
  2021-09-06  3:36         ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: ahipp0 @ 2021-09-06  3:05 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 40979 bytes --]

On Sunday, September 5th, 2021 at 10:47 PM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:

> On 2021/9/6 上午10:35, ahipp0 wrote:
> 

> > Qu,
> > 

> > Thank you so much for taking a look!
> > 

> > Please, see my comments inline below.
> > 

> > On Sunday, September 5th, 2021 at 9:08 PM, Qu wrote:
> > 

> > > On 2021/9/5 下午3:34, ahipp0 wrote:
> > 

> > > > Hi!
> > 

> > > > I started having various fun BTRFS warnings/errors/critical messages all of a sudden
> > > > 

> > > > after downloading and extracting linux-5.14.1.tar.xz on a fairly new (~1TiB read/written) Samsung SSD 970 EVO Plus 500GB.
> > > > 

> > > > The laptop was resumed from suspend-to-disk ~30 minutes prior to that, I think.
> > 

> > > > Hardware:
> > > > 

> > > > TUXEDO Pulse 14 - Gen1
> > > > 

> > > > CPU: AMD Ryzen 7 4800H
> > > > 

> > > > RAM: 32GiB
> > > > 

> > > > Disk: Samsung SSD 970 EVO Plus 500GB
> > > > 

> > > > Distro: Kubuntu 20.04.2
> > > > 

> > > > Kernel: 5.11.x
> > 

> > > I'm pretty sure you have used the btrfs partition for a while, as the
> > > 

> > > corrupted tree block would be rejected by kernel starting from v5.11.
> > 

> > > Thus such corrupted tree block should not be written to disk, thus the
> > > 

> > > problem must be there for a while before you upgrading to v5.11 kernel.
> > 

> > Fair enough, it's been like 3 months approximately.
> > 

> > Looking at the logs, it seems I created the filesystem with 5.8 kernel.
> > 

> > 6/06 -- install of 5.8.0-55.62~20.04.1 -- this is when the filesystem was created
> > 

> > 6/26 -- upgrade to 5.8.0-59.66~20.04.1
> > 

> > 7/22 -- upgrade to 5.8.0-63.71~20.04.1
> > 

> > 8/07 -- upgrade to 5.11.0-25.27~20.04.1
> > 

> > 8/19 -- upgrade to 5.11.0-27.29~20.04.1
> > 

> > I think, a good chunk of data was written while on 5.8 kernel,
> > 

> > but probably 30% on 5.11 (a wild guess).
> > 

> > <snip>
> > 

> > > > In the past, I also noticed odd things like ldconfig hanging or not picking up updated libraries are suspend-to-disk.
> > > > 

> > > > Simply rebooting helped in such cases.
> > > > 

> > > > The swap partition is on the same disk. (as a separate partition, not a file)
> > > > 

> > > > I also started using a new power profile recently, which disables half of the CPU cores when on battery power.
> > > > 

> > > > (but hibernation also offlines all non-boot CPUs while preparing for suspend-to-disk)
> > > > 

> > > > What could have caused the filesystem corruption?
> > 

> > > From the dmesg, at least one block group for metadata is corrupted, it
> > > 

> > > may explain why one tree block can't be read, as it may point to some
> > > 

> > > invalid location thus got all zero.
> > > 

> > > I believe the problem exists way before v5.11.x, as in v5.11, btrfs has
> > 

> > > the ability to detect tons of new problems and reject such incorrect
> > > 

> > > metadata from reaching disk.
> > 

> > Ah, cool, that's good to know that 5.11 does a lot more sanity checking!
> > 

> > > Thus it should be a problem/bug caused by old kernel.
> > 

> > > Furthermore, I didn't see any obvious bitflip, thus bad memory is less
> > > 

> > > possible.
> > 

> > That's good!
> > 

> > > > Is there a way to repair the filesystem?
> > 

> > > As expected, from the btrfs-check output, extent tree is corrupted.
> > 

> > > But thankfully, the data should be all safe.
> > 

> > > So the first thing is to backup all your important data.
> > 

> > > Then try "btrfs check --mode=lowmem" to get a more human readable error
> > > 

> > > report, and we can start from that to determine if it can be repaired.
> > 

> > Sure, please see below.
> > 

> > $ btrfs check --mode=lowmem /dev/nvme0n1p4
> > 

> > Opening filesystem to check...
> > 

> > Checking filesystem on /dev/nvme0n1p4
> > 

> > UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
> > 

> > [1/7] checking root items
> > 

> > [2/7] checking extents
> 

> That's strange, this means lowmem and original mode has a different
> 

> ideas on what's going wrong.
> 

> Now we need to enhance lowmem mode to detect such problem first.
> 

> > [3/7] checking free space cache
> > 

> > block group 2169503744 has wrong amount of free space, free space cache has 10440704 block group has 10346496
> > 

> > ERROR: free space cache has more free space than block group item, this could leads to serious corruption, please contact btrfs developers
> 

> Yeah, free space cache corruption is a big thing.

> Recommended to do a btrfs-check --clear-space-cache v1 first before next 

> mount.

$ btrfs check --clear-space-cache v1 /dev/nvme0n1p4
Opening filesystem to check...
Checking filesystem on /dev/nvme0n1p4
UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
Free space cache cleared

> 

> > failed to load free space cache for block group 2169503744
> > 

> > [4/7] checking fs roots
> > 

> > ERROR: root 257 EXTENT_DATA[31924 5689344] csum missing, have: 36864, expected: 40960
> > 

> > ERROR: errors found in fs roots
> > 

> > found 71205916672 bytes used, error(s) found
> > 

> > total csum bytes: 69299516
> > 

> > total tree bytes: 212975616
> > 

> > total fs tree bytes: 113672192
> > 

> > total extent tree bytes: 14909440
> > 

> > btree space waste bytes: 42172819
> > 

> > file data blocks allocated: 86083526656
> > 

> > referenced 70815563776
> > 

> > > But so far, I'm a little optimistic about a working repair.
> > 

> > Makes me optimistic as well. :)
> 

> You can --repair after clearing v1 cache and backing up your data.
> 

> Then run without --repair to see the result.
> 

> For the worst case, you can try --init-extent-tree, but I still want to
> 

> see the result of regular --repair.

$ btrfs check --repair /dev/nvme0n1p4 

enabling repair mode
WARNING:

Do not use --repair unless you are advised to do so by a developer
or an experienced user, and then only after having accepted that no
fsck can successfully repair all types of filesystem corruption. Eg.
some software or hardware bugs can fatally damage a volume.
The operation will start in 10 seconds.
Use Ctrl-C to stop it.
10 9 8 7 6 5 4 3 2 1
Starting repair.
Opening filesystem to check...
Checking filesystem on /dev/nvme0n1p4
UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
[1/7] checking root items
Fixed 0 roots.
[2/7] checking extents
ref mismatch on [3111260160 8192] extent item 0, found 1
data backref 3111260160 root 257 owner 488963 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 3111260160 root 257 owner 488963 offset 0 found 1 wanted 0 back 0x55d56b1ea2f0
backpointer mismatch on [3111260160 8192]
adding new data backref on 3111260160 root 257 owner 488963 offset 0 found 1
Repaired extent references for 3111260160
ref mismatch on [3111411712 12288] extent item 0, found 1
data backref 3111411712 root 257 owner 488887 offset 4096 num_refs 0 not found in extent tree
incorrect local backref count on 3111411712 root 257 owner 488887 offset 4096 found 1 wanted 0 back 0x55d56c18ca50
backpointer mismatch on [3111411712 12288]
adding new data backref on 3111411712 root 257 owner 488887 offset 4096 found 1
Repaired extent references for 3111411712
ref mismatch on [3111436288 16384] extent item 0, found 1
data backref 3111436288 root 257 owner 488889 offset 4096 num_refs 0 not found in extent tree
incorrect local backref count on 3111436288 root 257 owner 488889 offset 4096 found 1 wanted 0 back 0x55d576d8e290
backpointer mismatch on [3111436288 16384]
adding new data backref on 3111436288 root 257 owner 488889 offset 4096 found 1
Repaired extent references for 3111436288
ref mismatch on [3111489536 8192] extent item 0, found 1
data backref 3111489536 root 257 owner 488964 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 3111489536 root 257 owner 488964 offset 0 found 1 wanted 0 back 0x55d5699f2700
backpointer mismatch on [3111489536 8192]
adding new data backref on 3111489536 root 257 owner 488964 offset 0 found 1
Repaired extent references for 3111489536
ref mismatch on [3111616512 638976] extent item 25, found 26
data backref 3111616512 root 257 owner 488965 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 3111616512 root 257 owner 488965 offset 0 found 1 wanted 0 back 0x55d56c17dc00
backref disk bytenr does not match extent record, bytenr=3111616512, ref bytenr=3112091648
backref bytes do not match extent backref, bytenr=3111616512, ref bytes=638976, backref bytes=8192
backpointer mismatch on [3111616512 638976]
attempting to repair backref discrepancy for bytenr 3111616512
ref mismatch on [3111260160 8192] extent item 0, found 1
data backref 3111260160 root 257 owner 488963 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 3111260160 root 257 owner 488963 offset 0 found 1 wanted 0 back 0x55d578005140
backpointer mismatch on [3111260160 8192]
adding new data backref on 3111260160 root 257 owner 488963 offset 0 found 1
Repaired extent references for 3111260160
ref mismatch on [3111411712 12288] extent item 0, found 1
data backref 3111411712 root 257 owner 488887 offset 4096 num_refs 0 not found in extent tree
incorrect local backref count on 3111411712 root 257 owner 488887 offset 4096 found 1 wanted 0 back 0x55d577576b70
backpointer mismatch on [3111411712 12288]
adding new data backref on 3111411712 root 257 owner 488887 offset 4096 found 1
Repaired extent references for 3111411712
ref mismatch on [3111436288 16384] extent item 0, found 1
data backref 3111436288 root 257 owner 488889 offset 4096 num_refs 0 not found in extent tree
incorrect local backref count on 3111436288 root 257 owner 488889 offset 4096 found 1 wanted 0 back 0x55d56a2e5c40
backpointer mismatch on [3111436288 16384]
adding new data backref on 3111436288 root 257 owner 488889 offset 4096 found 1
Repaired extent references for 3111436288
ref mismatch on [3111489536 8192] extent item 0, found 1
data backref 3111489536 root 257 owner 488964 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 3111489536 root 257 owner 488964 offset 0 found 1 wanted 0 back 0x55d56b770820
backpointer mismatch on [3111489536 8192]
adding new data backref on 3111489536 root 257 owner 488964 offset 0 found 1
Repaired extent references for 3111489536
ref mismatch on [3111616512 638976] extent item 25, found 26
data backref 3111616512 root 257 owner 488965 offset 18446744073709076480 num_refs 0 not found in extent tree
incorrect local backref count on 3111616512 root 257 owner 488965 offset 18446744073709076480 found 1 wanted 0 back 0x55d576f3cab0
backpointer mismatch on [3111616512 638976]
repair deleting extent record: key [3111616512,168,638976]
adding new data backref on 3111616512 root 257 owner 31924 offset 5496832 found 25
adding new data backref on 3111616512 root 257 owner 488965 offset 18446744073709076480 found 1
Repaired extent references for 3111616512
ref mismatch on [3123773440 8192] extent item 0, found 1
data backref 3123773440 root 257 owner 488966 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 3123773440 root 257 owner 488966 offset 0 found 1 wanted 0 back 0x55d56bb7b6e0
backpointer mismatch on [3123773440 8192]
adding new data backref on 3123773440 root 257 owner 488966 offset 0 found 1
Repaired extent references for 3123773440
ref mismatch on [3124051968 12288] extent item 0, found 1
data backref 3124051968 root 257 owner 488895 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 3124051968 root 257 owner 488895 offset 0 found 1 wanted 0 back 0x55d56ac11990
backpointer mismatch on [3124051968 12288]
adding new data backref on 3124051968 root 257 owner 488895 offset 0 found 1
Repaired extent references for 3124051968
ref mismatch on [3124080640 8192] extent item 0, found 1
data backref 3124080640 root 257 owner 488967 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 3124080640 root 257 owner 488967 offset 0 found 1 wanted 0 back 0x55d577900d10
backpointer mismatch on [3124080640 8192]
adding new data backref on 3124080640 root 257 owner 488967 offset 0 found 1
Repaired extent references for 3124080640
ref mismatch on [3124252672 208896] extent item 12, found 13
data backref 3124252672 root 257 owner 488902 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 3124252672 root 257 owner 488902 offset 0 found 1 wanted 0 back 0x55d56b005980
backref disk bytenr does not match extent record, bytenr=3124252672, ref bytenr=3124428800
backref bytes do not match extent backref, bytenr=3124252672, ref bytes=208896, backref bytes=12288
backpointer mismatch on [3124252672 208896]
attempting to repair backref discrepancy for bytenr 3124252672
ref mismatch on [3111260160 8192] extent item 0, found 1
data backref 3111260160 root 257 owner 488963 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 3111260160 root 257 owner 488963 offset 0 found 1 wanted 0 back 0x55d576dbdef0
backpointer mismatch on [3111260160 8192]
adding new data backref on 3111260160 root 257 owner 488963 offset 0 found 1
Repaired extent references for 3111260160
ref mismatch on [3111411712 12288] extent item 0, found 1
data backref 3111411712 root 257 owner 488887 offset 4096 num_refs 0 not found in extent tree
incorrect local backref count on 3111411712 root 257 owner 488887 offset 4096 found 1 wanted 0 back 0x55d56b68d090
backpointer mismatch on [3111411712 12288]
adding new data backref on 3111411712 root 257 owner 488887 offset 4096 found 1
Repaired extent references for 3111411712
ref mismatch on [3111436288 16384] extent item 0, found 1
data backref 3111436288 root 257 owner 488889 offset 4096 num_refs 0 not found in extent tree
incorrect local backref count on 3111436288 root 257 owner 488889 offset 4096 found 1 wanted 0 back 0x55d576c0fb70
backpointer mismatch on [3111436288 16384]
adding new data backref on 3111436288 root 257 owner 488889 offset 4096 found 1
Repaired extent references for 3111436288
ref mismatch on [3111489536 8192] extent item 0, found 1
data backref 3111489536 root 257 owner 488964 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 3111489536 root 257 owner 488964 offset 0 found 1 wanted 0 back 0x55d56ab85320
backpointer mismatch on [3111489536 8192]
adding new data backref on 3111489536 root 257 owner 488964 offset 0 found 1
Repaired extent references for 3111489536
ref mismatch on [3123773440 8192] extent item 0, found 1
data backref 3123773440 root 257 owner 488966 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 3123773440 root 257 owner 488966 offset 0 found 1 wanted 0 back 0x55d56ab937e0
backpointer mismatch on [3123773440 8192]
adding new data backref on 3123773440 root 257 owner 488966 offset 0 found 1
Repaired extent references for 3123773440
ref mismatch on [3124051968 12288] extent item 0, found 1
data backref 3124051968 root 257 owner 488895 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 3124051968 root 257 owner 488895 offset 0 found 1 wanted 0 back 0x55d576c155b0
backpointer mismatch on [3124051968 12288]
adding new data backref on 3124051968 root 257 owner 488895 offset 0 found 1
Repaired extent references for 3124051968
ref mismatch on [3124080640 8192] extent item 0, found 1
data backref 3124080640 root 257 owner 488967 offset 0 num_refs 0 not found in extent tree
incorrect local backref count on 3124080640 root 257 owner 488967 offset 0 found 1 wanted 0 back 0x55d56b031700
backpointer mismatch on [3124080640 8192]
adding new data backref on 3124080640 root 257 owner 488967 offset 0 found 1
Repaired extent references for 3124080640
ref mismatch on [3124252672 208896] extent item 12, found 13
data backref 3124252672 root 257 owner 488902 offset 18446744073709375488 num_refs 0 not found in extent tree
incorrect local backref count on 3124252672 root 257 owner 488902 offset 18446744073709375488 found 1 wanted 0 back 0x55d5773b8b20
backpointer mismatch on [3124252672 208896]
repair deleting extent record: key [3124252672,168,208896]
adding new data backref on 3124252672 root 257 owner 31924 offset 7163904 found 12
adding new data backref on 3124252672 root 257 owner 488902 offset 18446744073709375488 found 1
Repaired extent references for 3124252672
No device size related problem found
[3/7] checking free space cache
[4/7] checking fs roots
root 257 inode 31924 errors 1000, some csum missing
ERROR: errors found in fs roots
found 427087040512 bytes used, error(s) found
total csum bytes: 415797096
total tree bytes: 1277558784
total fs tree bytes: 682033152
total extent tree bytes: 89456640
btree space waste bytes: 252979190
file data blocks allocated: 516356227072
 referenced 424745533440


$ btrfs check /dev/nvme0n1p4
Opening filesystem to check...
Checking filesystem on /dev/nvme0n1p4
UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
[1/7] checking root items
[2/7] checking extents
extent item 3109511168 has multiple extent items
ref mismatch on [3109511168 2105344] extent item 1, found 5
backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111489536
backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111260160
backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111411712
backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=12288
backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111436288
backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=16384
backpointer mismatch on [3109511168 2105344]
extent item 3121950720 has multiple extent items
ref mismatch on [3121950720 2220032] extent item 1, found 4
backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124080640
backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3123773440
backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124051968
backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=12288
backpointer mismatch on [3121950720 2220032]
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
[4/7] checking fs roots
root 257 inode 31924 errors 1000, some csum missing
ERROR: errors found in fs roots
found 71181148160 bytes used, error(s) found
total csum bytes: 69299516
total tree bytes: 212942848
total fs tree bytes: 113672192
total extent tree bytes: 14925824
btree space waste bytes: 42179056
file data blocks allocated: 86059712512
 referenced 70790922240


Hm, doesn't look overly promising anymore. :/

> 

> > > > How safe is it to continue using this particular filesystem after/if it's repaired on this drive?
> > 

> > > It's safe to mount it read-only.
> > > 

> > > It's not going to work well to read-write mount it, as btrfs will abort
> > > 

> > > transaction just as you already see in the dmesg.
> > 

> > Oh, even after repairing it?
> 

> I mean before repair.
> 

> After repair, depends on whether btrfs-check reports any remaining error.
> 

> If no more error, RW mount should be fine.
> 

> > Or is it yet to be seen if it can be repaired?
> > 

> > > Thankfully with more and more sanity check introduced in recent kernels,
> > > 

> > > btrfs can handle such corrupted fs without crashing the whole kernel.
> > 

> > > So at least you can try to grab the data without crashing the kernel.
> > 

> > Yeah, that's definitely very helpful as I was able to backup all the important stuff seemingly with no problems.
> > 

> > > > How safe is it to keep using BTRFS on this drive going forward (even after creating a new filesystem)?
> > 

> > > As long as you're using v5.11 and newer kernel, btrfs is very strict on
> > > 

> > > any data it writes back to disk, thus it can even detect quite some
> > > 

> > > memory bitflips.
> > 

> > > So unless there is a proof of the SSD is bad, you're pretty safe to
> > > 

> > > continue use the disk.
> > 

> > > And I don't see anything special related to the SSD, thus you're pretty
> > > 

> > > safe to go.
> > 

> > Good to hear!
> > 

> > I started suspecting something with TRIM/discard support in the SSD/driver after seeing these all zero checksums,
> > 

> > but your explanation that it's just because of the corrupt tree makes more sense.
> > 

> > I also saw another thread on the mailing list (from Martin)
> > 

> > about quite a similar (from my point of view) issue on a similar system (AMD-based, Samsung 980 SSD) with similar usage (suspend-to-disk),
> > 

> > so trying to figure out if it's the drive issue, a driver issue, suspend-to-disk issue, or a combination of these.
> 

> I guess suspend to disk may be involved, but can't say for sure.
> 

> As although I'm also using AMD based CPU (3700X though), with PM981 SSD
> from Samsung, but I never use suspend to disk/ram, but just power off...
> 

> So I can't say for sure.

I see.
Yeah, I'm going to stick to just powering off for the time being too.
Or using suspend-to-ram as the last resort.

> 

> Thanks,
> 

> Qu
> 

> > So far, suspend-to-disk seems to be the main suspect (or it somehow triggers issues elsewhere)
> > 

> > as I saw strange behavior upon resume from hibernation, but never on a clean reboot.
> > 

> > > > I've backed up important files,
> > > > 

> > > > so I'll be glad to try various suggestions.
> > 

> > > > Also, I'll keep using ext4 on this drive for now and will keep an eye on it.
> > > > 

> > > > I think I was able to resolve the "corrupt leaf" issue by deleting affected files
> > 

> > > Nope, there is no way to solve it just using the btrfs kernel module.
> > 

> > > Btrfs refuses to read such corrupted tree block at all, thus no way to
> > > 

> > > modify it.
> > 

> > > Unless you're using a much older kernel, but then you lost all the new
> > > 

> > > sanity checks in v5.11, thus not recommended.
> > 

> > Hm, I don't remember for sure now,
> > 

> > but chances are that I deleted those files when booting from a LiveUSB,
> > 

> > which used kernel 5.10.61.
> > 

> > (I didn't know that 5.11 is so much more strict, otherwise I would have found another bootable ISO)
> > 

> > This would explain how I could have deleted them.
> > 

> > But overall, I was mostly following instructions here:
> > 

> > https://lore.kernel.org/linux-btrfs/75c522e9-88ff-0b9d-1ede-b524388d42d1@gmx.com/
> > 

> > > Thanks,
> > 

> > > Qu
> > 

> > > > (the Linux kernel sources I was unpacking while I hit the issue),
> > 

> > > > because "btrfs ins logical-resolve" can't find the file anymore:
> > 

> > > > $ btrfs ins logical-resolve 1376043008 /mnt/hippo/
> > 

> > > > ERROR: logical ino ioctl: No such file or directory
> > 

> > > > However, checksum and "btrfs check" errors make me seriously worried.
> > 

> > > > This is the earliest BTRFS warning I see in the logs:
> > 

> > > > Sep 4 14:04:51 hippo-tuxedo kernel: [ 19.338196] BTRFS warning (device nvme0n1p4): block group 2169503744 has wrong amount of free space
> > 

> > > > Sep 4 14:04:51 hippo-tuxedo kernel: [ 19.338202] BTRFS warning (device nvme0n1p4): failed to load free space cache for block group 2169503744, rebuilding it now
> > 

> > > > Here's the first "corrupt leaf" error:
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151911] BTRFS critical (device nvme0n1p4): corrupt leaf: root=2 block=1376043008 slot=7 bg_start=2169503744 bg_len=1073741824, invalid block group used, have 1073790976 expect [0, 1073741824)
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151925] BTRFS info (device nvme0n1p4): leaf 1376043008 gen 24254 total ptrs 121 free space 6994 owner 2
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151929] item 0 key (2169339904 169 0) itemoff 16250 itemsize 33
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151932] extent refs 1 gen 20692 flags 2
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151933] ref#0: tree block backref root 7
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151936] item 1 key (2169356288 169 0) itemoff 16217 itemsize 33
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151938] extent refs 1 gen 20692 flags 2
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151939] ref#0: tree block backref root 7
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151940] item 2 key (2169372672 169 0) itemoff 16184 itemsize 33
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151942] extent refs 1 gen 20692 flags 2
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151943] ref#0: tree block backref root 7
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151945] item 3 key (2169405440 169 0) itemoff 16151 itemsize 33
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151946] extent refs 1 gen 20692 flags 2
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151947] ref#0: tree block backref root 7
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151949] item 4 key (2169421824 169 0) itemoff 16118 itemsize 33
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151950] extent refs 1 gen 20692 flags 2
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151951] ref#0: tree block backref root 7
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151953] item 5 key (2169470976 169 0) itemoff 16085 itemsize 33
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151954] extent refs 1 gen 24164 flags 2
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151955] ref#0: tree block backref root 2
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151957] item 6 key (2169503744 168 16429056) itemoff 16032 itemsize 53
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151959] extent refs 1 gen 47 flags 1
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151960] ref#0: extent data backref root 257 objectid 20379 offset 0 count 1
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151962] item 7 key (2169503744 192 1073741824) itemoff 16008 itemsize 24
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151964] block group used 1073790976 chunk_objectid 256 flags 1
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151966] item 8 key (2185932800 168 241664) itemoff 15955 itemsize 53
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151968] extent refs 1 gen 47 flags 1
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151969] ref#0: extent data backref root 257 objectid 20417 offset 0 count 1
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151971] item 9 key (2186174464 168 299008) itemoff 15902 itemsize 53
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151973] extent refs 1 gen 47 flags 1
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151974] ref#0: extent data backref root 257 objectid 20418 offset 0 count 1
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151976] item 10 key (2186473472 168 135168) itemoff 15849 itemsize 53
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151977] extent refs 1 gen 47 flags 1
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.151978] ref#0: extent data backref root 257 objectid 20419 offset 0 count 1
> > 

> > > > ...
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152480] item 120 key (2195210240 168 4096) itemoff 10019 itemsize 53
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152481] extent refs 1 gen 47 flags 1
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152482] ref#0: extent data backref root 257 objectid 20558 offset 0 count 1
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152484] BTRFS error (device nvme0n1p4): block=1376043008 write time tree block corruption detected
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152661] BTRFS: error (device nvme0n1p4) in btrfs_commit_transaction:2339: errno=-5 IO failure (Error while writing out transaction)
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152663] BTRFS info (device nvme0n1p4): forced readonly
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152664] BTRFS warning (device nvme0n1p4): Skipping commit of aborted transaction.
> > 

> > > > Sep 4 23:44:25 hippo-tuxedo kernel: [ 9855.152665] BTRFS: error (device nvme0n1p4) in cleanup_transaction:1939: errno=-5 IO failure
> > 

> > > > I hit these csum 0x00000000 errors while trying to backup the files to ext4 partition on the same disk:
> > 

> > > > Sep 5 00:12:26 hippo-tuxedo kernel: [ 891.475516] BTRFS info (device nvme0n1p4): disk space caching is enabled
> > 

> > > > Sep 5 00:12:26 hippo-tuxedo kernel: [ 891.475523] BTRFS info (device nvme0n1p4): has skinny extents
> > 

> > > > Sep 5 00:12:26 hippo-tuxedo kernel: [ 891.494832] BTRFS info (device nvme0n1p4): enabling ssd optimizations
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.627577] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111845888, 3111849984)
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.627805] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5726208 csum 0x55271056 expected csum 0x00000000 mirror 1
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.627814] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.628316] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3112013824, 3112017920)
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.628931] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3112058880, 3112062976)
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.628943] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3112083456, 3112087552)
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.629210] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5894144 csum 0x45d7e010 expected csum 0x00000000 mirror 1
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.629214] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 2, gen 0
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.629238] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5963776 csum 0x95b8b716 expected csum 0x00000000 mirror 1
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.630311] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 3, gen 0
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.648130] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111845888, 3111849984)
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.648226] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5726208 csum 0x55271056 expected csum 0x00000000 mirror 1
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.648234] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 4, gen 0
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.649275] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111845888, 3111849984)
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.649353] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5726208 csum 0x55271056 expected csum 0x00000000 mirror 1
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.649357] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 5, gen 0
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.650397] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111845888, 3111849984)
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.650475] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 31924 off 5726208 csum 0x55271056 expected csum 0x00000000 mirror 1
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.650478] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 6, gen 0
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.678142] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111124992, 3111129088)
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.678149] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111276544, 3111280640)
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.678151] BTRFS warning (device nvme0n1p4): csum hole found for disk bytenr range [3111346176, 3111350272)
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.680593] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 32063 off 39395328 csum 0xeeab29ce expected csum 0x00000000 mirror 1
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.680604] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 7, gen 0
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.686438] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 32063 off 39395328 csum 0xeeab29ce expected csum 0x00000000 mirror 1
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.686449] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 8, gen 0
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.687671] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 32063 off 39395328 csum 0xeeab29ce expected csum 0x00000000 mirror 1
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.687683] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 9, gen 0
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.688871] BTRFS warning (device nvme0n1p4): csum failed root 257 ino 32063 off 39395328 csum 0xeeab29ce expected csum 0x00000000 mirror 1
> > 

> > > > Sep 5 00:16:42 hippo-tuxedo kernel: [ 1147.688876] BTRFS error (device nvme0n1p4): bdev /dev/nvme0n1p4 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
> > 

> > > > Sep 5 00:17:05 hippo-tuxedo kernel: [ 1170.527686] BTRFS warning (device nvme0n1p4): block group 2169503744 has wrong amount of free space
> > 

> > > > Sep 5 00:17:05 hippo-tuxedo kernel: [ 1170.527695] BTRFS warning (device nvme0n1p4): failed to load free space cache for block group 2169503744, rebuilding it now
> > 

> > > > $ uname -a
> > 

> > > > Linux hippo-tuxedo 5.11.0-27-generic #29~20.04.1-Ubuntu SMP Wed Aug 11 15:58:17 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
> > 

> > > > $ btrfs --version
> > 

> > > > btrfs-progs v5.4.1
> > 

> > > > $ btrfs fi show
> > 

> > > > Label: 'HIPPO' uuid: 2b69016b-e03b-478a-84cd-f794eddfebd5
> > 

> > > > Total devices 1 FS bytes used 66.32GiB
> > 

> > > > devid 1 size 256.00GiB used 95.02GiB path /dev/nvme0n1p4
> > 

> > > > $ btrfs fi df /mnt/hippo/
> > 

> > > > Data, single: total=94.01GiB, used=66.12GiB
> > 

> > > > System, single: total=4.00MiB, used=16.00KiB
> > 

> > > > Metadata, single: total=1.01GiB, used=203.09MiB
> > 

> > > > GlobalReserve, single: total=94.59MiB, used=0.00B
> > 

> > > > $ cat /etc/lsb-release
> > 

> > > > DISTRIB_ID=Ubuntu
> > 

> > > > DISTRIB_RELEASE=20.04
> > 

> > > > DISTRIB_CODENAME=focal
> > 

> > > > DISTRIB_DESCRIPTION="Ubuntu 20.04.2 LTS"
> > 

> > > > Mount options:
> > 

> > > > relatime,ssd,space_cache,subvolid=5,subvol=andrey
> > 

> > > > $ btrfs check --readonly /dev/nvme0n1p4
> > 

> > > > Opening filesystem to check...
> > 

> > > > Checking filesystem on /dev/nvme0n1p4
> > 

> > > > UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
> > 

> > > > [1/7] checking root items
> > 

> > > > [2/7] checking extents
> > 

> > > > extent item 3109511168 has multiple extent items
> > 

> > > > ref mismatch on [3109511168 2105344] extent item 1, found 5
> > 

> > > > backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111489536
> > 

> > > > backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
> > 

> > > > backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111436288
> > 

> > > > backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=16384
> > 

> > > > backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111260160
> > 

> > > > backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
> > 

> > > > backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111411712
> > 

> > > > backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=12288
> > 

> > > > backpointer mismatch on [3109511168 2105344]
> > 

> > > > extent item 3111616512 has multiple extent items
> > 

> > > > ref mismatch on [3111616512 638976] extent item 25, found 26
> > 

> > > > backref disk bytenr does not match extent record, bytenr=3111616512, ref bytenr=3112091648
> > 

> > > > backref bytes do not match extent backref, bytenr=3111616512, ref bytes=638976, backref bytes=8192
> > 

> > > > backpointer mismatch on [3111616512 638976]
> > 

> > > > extent item 3121950720 has multiple extent items
> > 

> > > > ref mismatch on [3121950720 2220032] extent item 1, found 4
> > 

> > > > backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124080640
> > 

> > > > backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
> > 

> > > > backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124051968
> > 

> > > > backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=12288
> > 

> > > > backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3123773440
> > 

> > > > backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
> > 

> > > > backpointer mismatch on [3121950720 2220032]
> > 

> > > > extent item 3124252672 has multiple extent items
> > 

> > > > ref mismatch on [3124252672 208896] extent item 12, found 13
> > 

> > > > backref disk bytenr does not match extent record, bytenr=3124252672, ref bytenr=3124428800
> > 

> > > > backref bytes do not match extent backref, bytenr=3124252672, ref bytes=208896, backref bytes=12288
> > 

> > > > backpointer mismatch on [3124252672 208896]
> > 

> > > > ERROR: errors found in extent allocation tree or chunk allocation
> > 

> > > > [3/7] checking free space cache
> > 

> > > > block group 2169503744 has wrong amount of free space, free space cache has 10440704 block group has 10346496
> > 

> > > > ERROR: free space cache has more free space than block group item, this could leads to serious corruption, please contact btrfs developers
> > 

> > > > failed to load free space cache for block group 2169503744
> > 

> > > > [4/7] checking fs roots
> > 

> > > > root 257 inode 31924 errors 1000, some csum missing
> > 

> > > > ERROR: errors found in fs roots
> > 

> > > > found 71205822464 bytes used, error(s) found
> > 

> > > > total csum bytes: 69299516
> > 

> > > > total tree bytes: 212975616
> > 

> > > > total fs tree bytes: 113672192
> > 

> > > > total extent tree bytes: 14909440
> > 

> > > > btree space waste bytes: 42172819
> > 

> > > > file data blocks allocated: 86083526656
> > 

> > > > referenced 70815563776
<snip>

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS critical: corrupt leaf; BTRFS warning csum failed, expected csum 0x00000000 on AMD Ryzen 7 4800H, Samsung SSD 970 EVO Plus
  2021-09-06  3:05       ` ahipp0
@ 2021-09-06  3:36         ` Qu Wenruo
  2021-09-06  4:07           ` ahipp0
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2021-09-06  3:36 UTC (permalink / raw)
  To: ahipp0, Qu Wenruo; +Cc: linux-btrfs



On 2021/9/6 上午11:05, ahipp0 wrote:
[...]
> 
>> see the result of regular --repair.
> 
> $ btrfs check --repair /dev/nvme0n1p4
> 
> enabling repair mode
> WARNING:
> 
> Do not use --repair unless you are advised to do so by a developer
> or an experienced user, and then only after having accepted that no
> fsck can successfully repair all types of filesystem corruption. Eg.
> some software or hardware bugs can fatally damage a volume.
> The operation will start in 10 seconds.
> Use Ctrl-C to stop it.
> 10 9 8 7 6 5 4 3 2 1
> Starting repair.
> Opening filesystem to check...
> Checking filesystem on /dev/nvme0n1p4
> UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
> [1/7] checking root items
> Fixed 0 roots.
> [2/7] checking extents
> ref mismatch on [3111260160 8192] extent item 0, found 1
> data backref 3111260160 root 257 owner 488963 offset 0 num_refs 0 not found in extent tree
> incorrect local backref count on 3111260160 root 257 owner 488963 offset 0 found 1 wanted 0 back 0x55d56b1ea2f0
> backpointer mismatch on [3111260160 8192]
> adding new data backref on 3111260160 root 257 owner 488963 offset 0 found 1
> Repaired extent references for 3111260160
> ref mismatch on [3111411712 12288] extent item 0, found 1
> data backref 3111411712 root 257 owner 488887 offset 4096 num_refs 0 not found in extent tree
> incorrect local backref count on 3111411712 root 257 owner 488887 offset 4096 found 1 wanted 0 back 0x55d56c18ca50
> backpointer mismatch on [3111411712 12288]
> adding new data backref on 3111411712 root 257 owner 488887 offset 4096 found 1
> Repaired extent references for 3111411712
> ref mismatch on [3111436288 16384] extent item 0, found 1
> data backref 3111436288 root 257 owner 488889 offset 4096 num_refs 0 not found in extent tree
> incorrect local backref count on 3111436288 root 257 owner 488889 offset 4096 found 1 wanted 0 back 0x55d576d8e290
> backpointer mismatch on [3111436288 16384]
> adding new data backref on 3111436288 root 257 owner 488889 offset 4096 found 1
> Repaired extent references for 3111436288
> ref mismatch on [3111489536 8192] extent item 0, found 1
> data backref 3111489536 root 257 owner 488964 offset 0 num_refs 0 not found in extent tree
> incorrect local backref count on 3111489536 root 257 owner 488964 offset 0 found 1 wanted 0 back 0x55d5699f2700
> backpointer mismatch on [3111489536 8192]
> adding new data backref on 3111489536 root 257 owner 488964 offset 0 found 1
> Repaired extent references for 3111489536
> ref mismatch on [3111616512 638976] extent item 25, found 26
> data backref 3111616512 root 257 owner 488965 offset 0 num_refs 0 not found in extent tree
> incorrect local backref count on 3111616512 root 257 owner 488965 offset 0 found 1 wanted 0 back 0x55d56c17dc00
> backref disk bytenr does not match extent record, bytenr=3111616512, ref bytenr=3112091648
> backref bytes do not match extent backref, bytenr=3111616512, ref bytes=638976, backref bytes=8192
> backpointer mismatch on [3111616512 638976]
> attempting to repair backref discrepancy for bytenr 3111616512
> ref mismatch on [3111260160 8192] extent item 0, found 1
> data backref 3111260160 root 257 owner 488963 offset 0 num_refs 0 not found in extent tree
> incorrect local backref count on 3111260160 root 257 owner 488963 offset 0 found 1 wanted 0 back 0x55d578005140
> backpointer mismatch on [3111260160 8192]
> adding new data backref on 3111260160 root 257 owner 488963 offset 0 found 1
> Repaired extent references for 3111260160
> ref mismatch on [3111411712 12288] extent item 0, found 1
> data backref 3111411712 root 257 owner 488887 offset 4096 num_refs 0 not found in extent tree
> incorrect local backref count on 3111411712 root 257 owner 488887 offset 4096 found 1 wanted 0 back 0x55d577576b70
> backpointer mismatch on [3111411712 12288]
> adding new data backref on 3111411712 root 257 owner 488887 offset 4096 found 1
> Repaired extent references for 3111411712
> ref mismatch on [3111436288 16384] extent item 0, found 1
> data backref 3111436288 root 257 owner 488889 offset 4096 num_refs 0 not found in extent tree
> incorrect local backref count on 3111436288 root 257 owner 488889 offset 4096 found 1 wanted 0 back 0x55d56a2e5c40
> backpointer mismatch on [3111436288 16384]
> adding new data backref on 3111436288 root 257 owner 488889 offset 4096 found 1
> Repaired extent references for 3111436288
> ref mismatch on [3111489536 8192] extent item 0, found 1
> data backref 3111489536 root 257 owner 488964 offset 0 num_refs 0 not found in extent tree
> incorrect local backref count on 3111489536 root 257 owner 488964 offset 0 found 1 wanted 0 back 0x55d56b770820
> backpointer mismatch on [3111489536 8192]
> adding new data backref on 3111489536 root 257 owner 488964 offset 0 found 1
> Repaired extent references for 3111489536
> ref mismatch on [3111616512 638976] extent item 25, found 26
> data backref 3111616512 root 257 owner 488965 offset 18446744073709076480 num_refs 0 not found in extent tree
> incorrect local backref count on 3111616512 root 257 owner 488965 offset 18446744073709076480 found 1 wanted 0 back 0x55d576f3cab0
> backpointer mismatch on [3111616512 638976]
> repair deleting extent record: key [3111616512,168,638976]
> adding new data backref on 3111616512 root 257 owner 31924 offset 5496832 found 25
> adding new data backref on 3111616512 root 257 owner 488965 offset 18446744073709076480 found 1
> Repaired extent references for 3111616512
> ref mismatch on [3123773440 8192] extent item 0, found 1
> data backref 3123773440 root 257 owner 488966 offset 0 num_refs 0 not found in extent tree
> incorrect local backref count on 3123773440 root 257 owner 488966 offset 0 found 1 wanted 0 back 0x55d56bb7b6e0
> backpointer mismatch on [3123773440 8192]
> adding new data backref on 3123773440 root 257 owner 488966 offset 0 found 1
> Repaired extent references for 3123773440
> ref mismatch on [3124051968 12288] extent item 0, found 1
> data backref 3124051968 root 257 owner 488895 offset 0 num_refs 0 not found in extent tree
> incorrect local backref count on 3124051968 root 257 owner 488895 offset 0 found 1 wanted 0 back 0x55d56ac11990
> backpointer mismatch on [3124051968 12288]
> adding new data backref on 3124051968 root 257 owner 488895 offset 0 found 1
> Repaired extent references for 3124051968
> ref mismatch on [3124080640 8192] extent item 0, found 1
> data backref 3124080640 root 257 owner 488967 offset 0 num_refs 0 not found in extent tree
> incorrect local backref count on 3124080640 root 257 owner 488967 offset 0 found 1 wanted 0 back 0x55d577900d10
> backpointer mismatch on [3124080640 8192]
> adding new data backref on 3124080640 root 257 owner 488967 offset 0 found 1
> Repaired extent references for 3124080640
> ref mismatch on [3124252672 208896] extent item 12, found 13
> data backref 3124252672 root 257 owner 488902 offset 0 num_refs 0 not found in extent tree
> incorrect local backref count on 3124252672 root 257 owner 488902 offset 0 found 1 wanted 0 back 0x55d56b005980
> backref disk bytenr does not match extent record, bytenr=3124252672, ref bytenr=3124428800
> backref bytes do not match extent backref, bytenr=3124252672, ref bytes=208896, backref bytes=12288
> backpointer mismatch on [3124252672 208896]
> attempting to repair backref discrepancy for bytenr 3124252672
> ref mismatch on [3111260160 8192] extent item 0, found 1
> data backref 3111260160 root 257 owner 488963 offset 0 num_refs 0 not found in extent tree
> incorrect local backref count on 3111260160 root 257 owner 488963 offset 0 found 1 wanted 0 back 0x55d576dbdef0
> backpointer mismatch on [3111260160 8192]
> adding new data backref on 3111260160 root 257 owner 488963 offset 0 found 1
> Repaired extent references for 3111260160
> ref mismatch on [3111411712 12288] extent item 0, found 1
> data backref 3111411712 root 257 owner 488887 offset 4096 num_refs 0 not found in extent tree
> incorrect local backref count on 3111411712 root 257 owner 488887 offset 4096 found 1 wanted 0 back 0x55d56b68d090
> backpointer mismatch on [3111411712 12288]
> adding new data backref on 3111411712 root 257 owner 488887 offset 4096 found 1
> Repaired extent references for 3111411712
> ref mismatch on [3111436288 16384] extent item 0, found 1
> data backref 3111436288 root 257 owner 488889 offset 4096 num_refs 0 not found in extent tree
> incorrect local backref count on 3111436288 root 257 owner 488889 offset 4096 found 1 wanted 0 back 0x55d576c0fb70
> backpointer mismatch on [3111436288 16384]
> adding new data backref on 3111436288 root 257 owner 488889 offset 4096 found 1
> Repaired extent references for 3111436288
> ref mismatch on [3111489536 8192] extent item 0, found 1
> data backref 3111489536 root 257 owner 488964 offset 0 num_refs 0 not found in extent tree
> incorrect local backref count on 3111489536 root 257 owner 488964 offset 0 found 1 wanted 0 back 0x55d56ab85320
> backpointer mismatch on [3111489536 8192]
> adding new data backref on 3111489536 root 257 owner 488964 offset 0 found 1
> Repaired extent references for 3111489536
> ref mismatch on [3123773440 8192] extent item 0, found 1
> data backref 3123773440 root 257 owner 488966 offset 0 num_refs 0 not found in extent tree
> incorrect local backref count on 3123773440 root 257 owner 488966 offset 0 found 1 wanted 0 back 0x55d56ab937e0
> backpointer mismatch on [3123773440 8192]
> adding new data backref on 3123773440 root 257 owner 488966 offset 0 found 1
> Repaired extent references for 3123773440
> ref mismatch on [3124051968 12288] extent item 0, found 1
> data backref 3124051968 root 257 owner 488895 offset 0 num_refs 0 not found in extent tree
> incorrect local backref count on 3124051968 root 257 owner 488895 offset 0 found 1 wanted 0 back 0x55d576c155b0
> backpointer mismatch on [3124051968 12288]
> adding new data backref on 3124051968 root 257 owner 488895 offset 0 found 1
> Repaired extent references for 3124051968
> ref mismatch on [3124080640 8192] extent item 0, found 1
> data backref 3124080640 root 257 owner 488967 offset 0 num_refs 0 not found in extent tree
> incorrect local backref count on 3124080640 root 257 owner 488967 offset 0 found 1 wanted 0 back 0x55d56b031700
> backpointer mismatch on [3124080640 8192]
> adding new data backref on 3124080640 root 257 owner 488967 offset 0 found 1
> Repaired extent references for 3124080640
> ref mismatch on [3124252672 208896] extent item 12, found 13
> data backref 3124252672 root 257 owner 488902 offset 18446744073709375488 num_refs 0 not found in extent tree
> incorrect local backref count on 3124252672 root 257 owner 488902 offset 18446744073709375488 found 1 wanted 0 back 0x55d5773b8b20
> backpointer mismatch on [3124252672 208896]
> repair deleting extent record: key [3124252672,168,208896]
> adding new data backref on 3124252672 root 257 owner 31924 offset 7163904 found 12
> adding new data backref on 3124252672 root 257 owner 488902 offset 18446744073709375488 found 1
> Repaired extent references for 3124252672
> No device size related problem found
> [3/7] checking free space cache
> [4/7] checking fs roots
> root 257 inode 31924 errors 1000, some csum missing
> ERROR: errors found in fs roots
> found 427087040512 bytes used, error(s) found
> total csum bytes: 415797096
> total tree bytes: 1277558784
> total fs tree bytes: 682033152
> total extent tree bytes: 89456640
> btree space waste bytes: 252979190
> file data blocks allocated: 516356227072
>   referenced 424745533440
> 
> 
> $ btrfs check /dev/nvme0n1p4
> Opening filesystem to check...
> Checking filesystem on /dev/nvme0n1p4
> UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
> [1/7] checking root items
> [2/7] checking extents
> extent item 3109511168 has multiple extent items
> ref mismatch on [3109511168 2105344] extent item 1, found 5
> backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111489536
> backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
> backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111260160
> backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
> backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111411712
> backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=12288
> backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111436288
> backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=16384
> backpointer mismatch on [3109511168 2105344]
> extent item 3121950720 has multiple extent items
> ref mismatch on [3121950720 2220032] extent item 1, found 4
> backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124080640
> backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
> backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3123773440
> backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
> backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124051968
> backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=12288
> backpointer mismatch on [3121950720 2220032]

Those offending blocks are some data extents.

Can you use some newer btrfs-progs and run check on it again? (not yet 
repair)
This time in both original and lowmem mode.

As the involved btrfs-progs is pretty old, thus newer btrfs-progs (the 
newer the better) may cause some difference.
(Sorry, I should mention it earlier)

Thanks,
Qu


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS critical: corrupt leaf; BTRFS warning csum failed, expected csum 0x00000000 on AMD Ryzen 7 4800H, Samsung SSD 970 EVO Plus
  2021-09-06  3:36         ` Qu Wenruo
@ 2021-09-06  4:07           ` ahipp0
  2021-09-06  5:20             ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: ahipp0 @ 2021-09-06  4:07 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 19803 bytes --]

On Sunday, September 5th, 2021 at 11:36 PM, Qu wrote:

> On 2021/9/6 上午11:05, ahipp0 wrote:
> 

> [...]
> 

> > > see the result of regular --repair.
> > 

> > $ btrfs check --repair /dev/nvme0n1p4
> > 

> > enabling repair mode
> > 

> > WARNING:
> > 

> > Do not use --repair unless you are advised to do so by a developer
> > 

> > or an experienced user, and then only after having accepted that no
> > 

> > fsck can successfully repair all types of filesystem corruption. Eg.
> > 

> > some software or hardware bugs can fatally damage a volume.
> > 

> > The operation will start in 10 seconds.
> > 

> > Use Ctrl-C to stop it.
> > 

> > 10 9 8 7 6 5 4 3 2 1
> > 

> > Starting repair.
> > 

> > Opening filesystem to check...
> > 

> > Checking filesystem on /dev/nvme0n1p4
> > 

> > UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
> > 

> > [1/7] checking root items
> > 

> > Fixed 0 roots.
> > 

> > [2/7] checking extents
> > 

> > ref mismatch on [3111260160 8192] extent item 0, found 1
> > 

> > data backref 3111260160 root 257 owner 488963 offset 0 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3111260160 root 257 owner 488963 offset 0 found 1 wanted 0 back 0x55d56b1ea2f0
> > 

> > backpointer mismatch on [3111260160 8192]
> > 

> > adding new data backref on 3111260160 root 257 owner 488963 offset 0 found 1
> > 

> > Repaired extent references for 3111260160
> > 

> > ref mismatch on [3111411712 12288] extent item 0, found 1
> > 

> > data backref 3111411712 root 257 owner 488887 offset 4096 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3111411712 root 257 owner 488887 offset 4096 found 1 wanted 0 back 0x55d56c18ca50
> > 

> > backpointer mismatch on [3111411712 12288]
> > 

> > adding new data backref on 3111411712 root 257 owner 488887 offset 4096 found 1
> > 

> > Repaired extent references for 3111411712
> > 

> > ref mismatch on [3111436288 16384] extent item 0, found 1
> > 

> > data backref 3111436288 root 257 owner 488889 offset 4096 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3111436288 root 257 owner 488889 offset 4096 found 1 wanted 0 back 0x55d576d8e290
> > 

> > backpointer mismatch on [3111436288 16384]
> > 

> > adding new data backref on 3111436288 root 257 owner 488889 offset 4096 found 1
> > 

> > Repaired extent references for 3111436288
> > 

> > ref mismatch on [3111489536 8192] extent item 0, found 1
> > 

> > data backref 3111489536 root 257 owner 488964 offset 0 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3111489536 root 257 owner 488964 offset 0 found 1 wanted 0 back 0x55d5699f2700
> > 

> > backpointer mismatch on [3111489536 8192]
> > 

> > adding new data backref on 3111489536 root 257 owner 488964 offset 0 found 1
> > 

> > Repaired extent references for 3111489536
> > 

> > ref mismatch on [3111616512 638976] extent item 25, found 26
> > 

> > data backref 3111616512 root 257 owner 488965 offset 0 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3111616512 root 257 owner 488965 offset 0 found 1 wanted 0 back 0x55d56c17dc00
> > 

> > backref disk bytenr does not match extent record, bytenr=3111616512, ref bytenr=3112091648
> > 

> > backref bytes do not match extent backref, bytenr=3111616512, ref bytes=638976, backref bytes=8192
> > 

> > backpointer mismatch on [3111616512 638976]
> > 

> > attempting to repair backref discrepancy for bytenr 3111616512
> > 

> > ref mismatch on [3111260160 8192] extent item 0, found 1
> > 

> > data backref 3111260160 root 257 owner 488963 offset 0 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3111260160 root 257 owner 488963 offset 0 found 1 wanted 0 back 0x55d578005140
> > 

> > backpointer mismatch on [3111260160 8192]
> > 

> > adding new data backref on 3111260160 root 257 owner 488963 offset 0 found 1
> > 

> > Repaired extent references for 3111260160
> > 

> > ref mismatch on [3111411712 12288] extent item 0, found 1
> > 

> > data backref 3111411712 root 257 owner 488887 offset 4096 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3111411712 root 257 owner 488887 offset 4096 found 1 wanted 0 back 0x55d577576b70
> > 

> > backpointer mismatch on [3111411712 12288]
> > 

> > adding new data backref on 3111411712 root 257 owner 488887 offset 4096 found 1
> > 

> > Repaired extent references for 3111411712
> > 

> > ref mismatch on [3111436288 16384] extent item 0, found 1
> > 

> > data backref 3111436288 root 257 owner 488889 offset 4096 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3111436288 root 257 owner 488889 offset 4096 found 1 wanted 0 back 0x55d56a2e5c40
> > 

> > backpointer mismatch on [3111436288 16384]
> > 

> > adding new data backref on 3111436288 root 257 owner 488889 offset 4096 found 1
> > 

> > Repaired extent references for 3111436288
> > 

> > ref mismatch on [3111489536 8192] extent item 0, found 1
> > 

> > data backref 3111489536 root 257 owner 488964 offset 0 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3111489536 root 257 owner 488964 offset 0 found 1 wanted 0 back 0x55d56b770820
> > 

> > backpointer mismatch on [3111489536 8192]
> > 

> > adding new data backref on 3111489536 root 257 owner 488964 offset 0 found 1
> > 

> > Repaired extent references for 3111489536
> > 

> > ref mismatch on [3111616512 638976] extent item 25, found 26
> > 

> > data backref 3111616512 root 257 owner 488965 offset 18446744073709076480 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3111616512 root 257 owner 488965 offset 18446744073709076480 found 1 wanted 0 back 0x55d576f3cab0
> > 

> > backpointer mismatch on [3111616512 638976]
> > 

> > repair deleting extent record: key [3111616512,168,638976]
> > 

> > adding new data backref on 3111616512 root 257 owner 31924 offset 5496832 found 25
> > 

> > adding new data backref on 3111616512 root 257 owner 488965 offset 18446744073709076480 found 1
> > 

> > Repaired extent references for 3111616512
> > 

> > ref mismatch on [3123773440 8192] extent item 0, found 1
> > 

> > data backref 3123773440 root 257 owner 488966 offset 0 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3123773440 root 257 owner 488966 offset 0 found 1 wanted 0 back 0x55d56bb7b6e0
> > 

> > backpointer mismatch on [3123773440 8192]
> > 

> > adding new data backref on 3123773440 root 257 owner 488966 offset 0 found 1
> > 

> > Repaired extent references for 3123773440
> > 

> > ref mismatch on [3124051968 12288] extent item 0, found 1
> > 

> > data backref 3124051968 root 257 owner 488895 offset 0 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3124051968 root 257 owner 488895 offset 0 found 1 wanted 0 back 0x55d56ac11990
> > 

> > backpointer mismatch on [3124051968 12288]
> > 

> > adding new data backref on 3124051968 root 257 owner 488895 offset 0 found 1
> > 

> > Repaired extent references for 3124051968
> > 

> > ref mismatch on [3124080640 8192] extent item 0, found 1
> > 

> > data backref 3124080640 root 257 owner 488967 offset 0 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3124080640 root 257 owner 488967 offset 0 found 1 wanted 0 back 0x55d577900d10
> > 

> > backpointer mismatch on [3124080640 8192]
> > 

> > adding new data backref on 3124080640 root 257 owner 488967 offset 0 found 1
> > 

> > Repaired extent references for 3124080640
> > 

> > ref mismatch on [3124252672 208896] extent item 12, found 13
> > 

> > data backref 3124252672 root 257 owner 488902 offset 0 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3124252672 root 257 owner 488902 offset 0 found 1 wanted 0 back 0x55d56b005980
> > 

> > backref disk bytenr does not match extent record, bytenr=3124252672, ref bytenr=3124428800
> > 

> > backref bytes do not match extent backref, bytenr=3124252672, ref bytes=208896, backref bytes=12288
> > 

> > backpointer mismatch on [3124252672 208896]
> > 

> > attempting to repair backref discrepancy for bytenr 3124252672
> > 

> > ref mismatch on [3111260160 8192] extent item 0, found 1
> > 

> > data backref 3111260160 root 257 owner 488963 offset 0 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3111260160 root 257 owner 488963 offset 0 found 1 wanted 0 back 0x55d576dbdef0
> > 

> > backpointer mismatch on [3111260160 8192]
> > 

> > adding new data backref on 3111260160 root 257 owner 488963 offset 0 found 1
> > 

> > Repaired extent references for 3111260160
> > 

> > ref mismatch on [3111411712 12288] extent item 0, found 1
> > 

> > data backref 3111411712 root 257 owner 488887 offset 4096 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3111411712 root 257 owner 488887 offset 4096 found 1 wanted 0 back 0x55d56b68d090
> > 

> > backpointer mismatch on [3111411712 12288]
> > 

> > adding new data backref on 3111411712 root 257 owner 488887 offset 4096 found 1
> > 

> > Repaired extent references for 3111411712
> > 

> > ref mismatch on [3111436288 16384] extent item 0, found 1
> > 

> > data backref 3111436288 root 257 owner 488889 offset 4096 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3111436288 root 257 owner 488889 offset 4096 found 1 wanted 0 back 0x55d576c0fb70
> > 

> > backpointer mismatch on [3111436288 16384]
> > 

> > adding new data backref on 3111436288 root 257 owner 488889 offset 4096 found 1
> > 

> > Repaired extent references for 3111436288
> > 

> > ref mismatch on [3111489536 8192] extent item 0, found 1
> > 

> > data backref 3111489536 root 257 owner 488964 offset 0 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3111489536 root 257 owner 488964 offset 0 found 1 wanted 0 back 0x55d56ab85320
> > 

> > backpointer mismatch on [3111489536 8192]
> > 

> > adding new data backref on 3111489536 root 257 owner 488964 offset 0 found 1
> > 

> > Repaired extent references for 3111489536
> > 

> > ref mismatch on [3123773440 8192] extent item 0, found 1
> > 

> > data backref 3123773440 root 257 owner 488966 offset 0 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3123773440 root 257 owner 488966 offset 0 found 1 wanted 0 back 0x55d56ab937e0
> > 

> > backpointer mismatch on [3123773440 8192]
> > 

> > adding new data backref on 3123773440 root 257 owner 488966 offset 0 found 1
> > 

> > Repaired extent references for 3123773440
> > 

> > ref mismatch on [3124051968 12288] extent item 0, found 1
> > 

> > data backref 3124051968 root 257 owner 488895 offset 0 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3124051968 root 257 owner 488895 offset 0 found 1 wanted 0 back 0x55d576c155b0
> > 

> > backpointer mismatch on [3124051968 12288]
> > 

> > adding new data backref on 3124051968 root 257 owner 488895 offset 0 found 1
> > 

> > Repaired extent references for 3124051968
> > 

> > ref mismatch on [3124080640 8192] extent item 0, found 1
> > 

> > data backref 3124080640 root 257 owner 488967 offset 0 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3124080640 root 257 owner 488967 offset 0 found 1 wanted 0 back 0x55d56b031700
> > 

> > backpointer mismatch on [3124080640 8192]
> > 

> > adding new data backref on 3124080640 root 257 owner 488967 offset 0 found 1
> > 

> > Repaired extent references for 3124080640
> > 

> > ref mismatch on [3124252672 208896] extent item 12, found 13
> > 

> > data backref 3124252672 root 257 owner 488902 offset 18446744073709375488 num_refs 0 not found in extent tree
> > 

> > incorrect local backref count on 3124252672 root 257 owner 488902 offset 18446744073709375488 found 1 wanted 0 back 0x55d5773b8b20
> > 

> > backpointer mismatch on [3124252672 208896]
> > 

> > repair deleting extent record: key [3124252672,168,208896]
> > 

> > adding new data backref on 3124252672 root 257 owner 31924 offset 7163904 found 12
> > 

> > adding new data backref on 3124252672 root 257 owner 488902 offset 18446744073709375488 found 1
> > 

> > Repaired extent references for 3124252672
> > 

> > No device size related problem found
> > 

> > [3/7] checking free space cache
> > 

> > [4/7] checking fs roots
> > 

> > root 257 inode 31924 errors 1000, some csum missing
> > 

> > ERROR: errors found in fs roots
> > 

> > found 427087040512 bytes used, error(s) found
> > 

> > total csum bytes: 415797096
> > 

> > total tree bytes: 1277558784
> > 

> > total fs tree bytes: 682033152
> > 

> > total extent tree bytes: 89456640
> > 

> > btree space waste bytes: 252979190
> > 

> > file data blocks allocated: 516356227072
> > 

> > referenced 424745533440
> > 

> > $ btrfs check /dev/nvme0n1p4
> > 

> > Opening filesystem to check...
> > 

> > Checking filesystem on /dev/nvme0n1p4
> > 

> > UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
> > 

> > [1/7] checking root items
> > 

> > [2/7] checking extents
> > 

> > extent item 3109511168 has multiple extent items
> > 

> > ref mismatch on [3109511168 2105344] extent item 1, found 5
> > 

> > backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111489536
> > 

> > backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
> > 

> > backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111260160
> > 

> > backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
> > 

> > backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111411712
> > 

> > backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=12288
> > 

> > backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111436288
> > 

> > backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=16384
> > 

> > backpointer mismatch on [3109511168 2105344]
> > 

> > extent item 3121950720 has multiple extent items
> > 

> > ref mismatch on [3121950720 2220032] extent item 1, found 4
> > 

> > backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124080640
> > 

> > backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
> > 

> > backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3123773440
> > 

> > backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
> > 

> > backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124051968
> > 

> > backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=12288
> > 

> > backpointer mismatch on [3121950720 2220032]
> 

> Those offending blocks are some data extents.

$ sudo ./btrfs inspect-internal logical-resolve 3109511168 /mnt/hippo/
/mnt/hippo/home-andrey/.config/SpiderOakONE/tss_external_blocks_pandora_sqliite_database/00000011

$ sudo ./btrfs inspect-internal logical-resolve 3121950720 /mnt/hippo/
/mnt/hippo/home-andrey/.config/SpiderOakONE/tss_external_blocks_pandora_sqliite_database/00000011

I remember it was complaining about the file when I was backing things up.
This file can be easily dropped -- I already rebuilt SpiderOak database anyway since I couldn't back it up.

> Can you use some newer btrfs-progs and run check on it again? (not yet
> repair)
> 

> This time in both original and lowmem mode.
> 

> As the involved btrfs-progs is pretty old, thus newer btrfs-progs (the
> newer the better) may cause some difference.
> (Sorry, I should mention it earlier)

No worries.

Just built the latest tag from btrfs-progs repository with
./configure --prefix="${PWD}/_install" --disable-documentation --disable-shared --disable-convert --disable-python --disable-zoned


$ ./btrfs --version
btrfs-progs v5.13.1


$ sudo ./btrfs check --mode=lowmem /dev/nvme0n1p4
Opening filesystem to check...
Checking filesystem on /dev/nvme0n1p4
UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
ERROR: root 257 EXTENT_DATA[31924 5689344] csum missing, have: 36864, expected: 40960
ERROR: errors found in fs roots
found 71181221888 bytes used, error(s) found
total csum bytes: 69299516
total tree bytes: 212942848
total fs tree bytes: 113672192
total extent tree bytes: 14925824
btree space waste bytes: 42179056
file data blocks allocated: 86059712512
 referenced 70790922240


$ sudo ./btrfs check /dev/nvme0n1p4
Opening filesystem to check...
Checking filesystem on /dev/nvme0n1p4
UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
[1/7] checking root items
[2/7] checking extents
extent item 3109511168 has multiple extent items
ref mismatch on [3109511168 2105344] extent item 1, found 5
backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111489536
backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111260160
backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111411712
backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=12288
backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111436288
backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=16384
backpointer mismatch on [3109511168 2105344]
extent item 3121950720 has multiple extent items
ref mismatch on [3121950720 2220032] extent item 1, found 4
backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124080640
backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3123773440
backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124051968
backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=12288
backpointer mismatch on [3121950720 2220032]
ERROR: errors found in extent allocation tree or chunk allocation
[3/7] checking free space cache
[4/7] checking fs roots
root 257 inode 31924 errors 1000, some csum missing
ERROR: errors found in fs roots
found 71181148160 bytes used, error(s) found
total csum bytes: 69299516
total tree bytes: 212942848
total fs tree bytes: 113672192
total extent tree bytes: 14925824
btree space waste bytes: 42179056
file data blocks allocated: 86059712512
 referenced 70790922240


> Thanks,
> 

> Qu

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS critical: corrupt leaf; BTRFS warning csum failed, expected csum 0x00000000 on AMD Ryzen 7 4800H, Samsung SSD 970 EVO Plus
  2021-09-06  4:07           ` ahipp0
@ 2021-09-06  5:20             ` Qu Wenruo
  2021-09-06  6:13               ` ahipp0
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2021-09-06  5:20 UTC (permalink / raw)
  To: ahipp0; +Cc: Qu Wenruo, linux-btrfs



On 2021/9/6 下午12:07, ahipp0 wrote:
[...]
> 
>> Those offending blocks are some data extents.
> 
> $ sudo ./btrfs inspect-internal logical-resolve 3109511168 /mnt/hippo/
> /mnt/hippo/home-andrey/.config/SpiderOakONE/tss_external_blocks_pandora_sqliite_database/00000011
> 
> $ sudo ./btrfs inspect-internal logical-resolve 3121950720 /mnt/hippo/
> /mnt/hippo/home-andrey/.config/SpiderOakONE/tss_external_blocks_pandora_sqliite_database/00000011
> 
> I remember it was complaining about the file when I was backing things up.
> This file can be easily dropped -- I already rebuilt SpiderOak database anyway since I couldn't back it up.

You can try to delete them, but the problem is, if it doesn't work well, 
it can cause btrfs to abort transaction (aka, turns into read-only mount).

Thus you may want to delete them, sync the fs, check the dmesg to make 
sure the fs is still fine.

If that works, then btrfs-check again to make sure the problem is gone.


The csum missing problem is not a big deal, that can be easily deleted 
by finding inode 31924 of subvolume 257 and delete it.
Or you can easily ignore it completely.

Thanks,
Qu

> 
>> Can you use some newer btrfs-progs and run check on it again? (not yet
>> repair)
>>
> 
>> This time in both original and lowmem mode.
>>
> 
>> As the involved btrfs-progs is pretty old, thus newer btrfs-progs (the
>> newer the better) may cause some difference.
>> (Sorry, I should mention it earlier)
> 
> No worries.
> 
> Just built the latest tag from btrfs-progs repository with
> ./configure --prefix="${PWD}/_install" --disable-documentation --disable-shared --disable-convert --disable-python --disable-zoned
> 
> 
> $ ./btrfs --version
> btrfs-progs v5.13.1
> 
> 
> $ sudo ./btrfs check --mode=lowmem /dev/nvme0n1p4
> Opening filesystem to check...
> Checking filesystem on /dev/nvme0n1p4
> UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
> [1/7] checking root items
> [2/7] checking extents
> [3/7] checking free space cache
> [4/7] checking fs roots
> ERROR: root 257 EXTENT_DATA[31924 5689344] csum missing, have: 36864, expected: 40960
> ERROR: errors found in fs roots
> found 71181221888 bytes used, error(s) found
> total csum bytes: 69299516
> total tree bytes: 212942848
> total fs tree bytes: 113672192
> total extent tree bytes: 14925824
> btree space waste bytes: 42179056
> file data blocks allocated: 86059712512
>   referenced 70790922240
> 
> 
> $ sudo ./btrfs check /dev/nvme0n1p4
> Opening filesystem to check...
> Checking filesystem on /dev/nvme0n1p4
> UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
> [1/7] checking root items
> [2/7] checking extents
> extent item 3109511168 has multiple extent items
> ref mismatch on [3109511168 2105344] extent item 1, found 5
> backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111489536
> backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
> backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111260160
> backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
> backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111411712
> backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=12288
> backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111436288
> backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=16384
> backpointer mismatch on [3109511168 2105344]
> extent item 3121950720 has multiple extent items
> ref mismatch on [3121950720 2220032] extent item 1, found 4
> backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124080640
> backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
> backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3123773440
> backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
> backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124051968
> backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=12288
> backpointer mismatch on [3121950720 2220032]
> ERROR: errors found in extent allocation tree or chunk allocation
> [3/7] checking free space cache
> [4/7] checking fs roots
> root 257 inode 31924 errors 1000, some csum missing
> ERROR: errors found in fs roots
> found 71181148160 bytes used, error(s) found
> total csum bytes: 69299516
> total tree bytes: 212942848
> total fs tree bytes: 113672192
> total extent tree bytes: 14925824
> btree space waste bytes: 42179056
> file data blocks allocated: 86059712512
>   referenced 70790922240
> 
> 
>> Thanks,
>>
> 
>> Qu


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS critical: corrupt leaf; BTRFS warning csum failed, expected csum 0x00000000 on AMD Ryzen 7 4800H, Samsung SSD 970 EVO Plus
  2021-09-06  5:20             ` Qu Wenruo
@ 2021-09-06  6:13               ` ahipp0
  2021-09-06  6:28                 ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: ahipp0 @ 2021-09-06  6:13 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 14411 bytes --]

On Monday, September 6th, 2021 at 1:20 AM, Qu wrote:

> On 2021/9/6 下午12:07, ahipp0 wrote:
> 

> [...]
> 

> > > Those offending blocks are some data extents.
> > 

> > $ sudo ./btrfs inspect-internal logical-resolve 3109511168 /mnt/hippo/
> > 

> > /mnt/hippo/home-andrey/.config/SpiderOakONE/tss_external_blocks_pandora_sqliite_database/00000011
> > 

> > $ sudo ./btrfs inspect-internal logical-resolve 3121950720 /mnt/hippo/
> > 

> > /mnt/hippo/home-andrey/.config/SpiderOakONE/tss_external_blocks_pandora_sqliite_database/00000011
> > 

> > I remember it was complaining about the file when I was backing things up.
> > 

> > This file can be easily dropped -- I already rebuilt SpiderOak database anyway since I couldn't back it up.
> 

> You can try to delete them, but the problem is, if it doesn't work well,
> 

> it can cause btrfs to abort transaction (aka, turns into read-only mount).
> 

> Thus you may want to delete them, sync the fs, check the dmesg to make
> 

> sure the fs is still fine.

Hm, looks like it didn't complain.
(I just nuked the whole .config/SpiderOakONE directory)

> If that works, then btrfs-check again to make sure the problem is gone.

Looks much better now:

$ sudo ./btrfs check /dev/nvme0n1p4
Opening filesystem to check...
Checking filesystem on /dev/nvme0n1p4
UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
root 257 inode 488887 errors 1000, some csum missing
root 257 inode 488889 errors 1000, some csum missing
root 257 inode 488895 errors 1000, some csum missing
root 257 inode 488963 errors 1000, some csum missing
root 257 inode 488964 errors 1000, some csum missing
root 257 inode 488966 errors 1000, some csum missing
root 257 inode 488967 errors 1000, some csum missing
ERROR: errors found in fs roots
found 70414278656 bytes used, error(s) found
total csum bytes: 68552088
total tree bytes: 209338368
total fs tree bytes: 111853568
total extent tree bytes: 14024704
btree space waste bytes: 41823418
file data blocks allocated: 73253691392
 referenced 70072770560


$ sudo ./btrfs check --mode=lowmem /dev/nvme0n1p4
Opening filesystem to check...
Checking filesystem on /dev/nvme0n1p4
UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
ERROR: root 257 EXTENT_DATA[488887 4096] csum missing, have: 0, expected: 12288
ERROR: root 257 EXTENT_DATA[488889 4096] csum missing, have: 0, expected: 16384
ERROR: root 257 EXTENT_DATA[488895 0] csum missing, have: 0, expected: 12288
ERROR: root 257 EXTENT_DATA[488963 0] csum missing, have: 0, expected: 8192
ERROR: root 257 EXTENT_DATA[488964 0] csum missing, have: 0, expected: 8192
ERROR: root 257 EXTENT_DATA[488966 0] csum missing, have: 0, expected: 8192
ERROR: root 257 EXTENT_DATA[488967 0] csum missing, have: 0, expected: 8192
ERROR: errors found in fs roots
found 70414278656 bytes used, error(s) found
total csum bytes: 68552088
total tree bytes: 209338368
total fs tree bytes: 111853568
total extent tree bytes: 14024704
btree space waste bytes: 41823418
file data blocks allocated: 73253691392
 referenced 70072770560


Seems like these inodes with zero csums can all be removed too since it's some Steam (built-in browser?) cache.

$ for i in 488887 488889 488895 488963 488964 488966 488967 ; do sudo ./btrfs inspect-internal inode-resolve "$i" /mnt/hippo/ ; done
/mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/f3778f4fc6657764_0
/mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/fc05b030bc3ab2bc_0
/mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/aa9d1c627d0d4ae1_0
/mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/24ede0e2ab3e0575_0
/mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/5aa559bb0d57bd6a_0
/mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/da80b0a1607292bd_0
/mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/90b6c5585a06e357_0

$ for i in 488887 488889 488895 488963 488964 488966 488967 ; do stat $(sudo ./btrfs inspect-internal inode-resolve "$i" /mnt/hippo/) ; done
File: /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/f3778f4fc6657764_0
Size: 15094           Blocks: 32         IO Block: 4096   regular file
Device: 3bh/59d Inode: 488887      Links: 1
Access: (0600/-rw-------)  Uid: ( 1000/  andrey)   Gid: ( 1000/  andrey)
Access: 2021-09-03 23:23:30.297522881 -0400
Modify: 2021-09-03 23:23:30.705560160 -0400
Change: 2021-09-03 23:23:30.705560160 -0400
Birth: -
File: /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/fc05b030bc3ab2bc_0
Size: 19104           Blocks: 40         IO Block: 4096   regular file
Device: 3bh/59d Inode: 488889      Links: 1
Access: (0600/-rw-------)  Uid: ( 1000/  andrey)   Gid: ( 1000/  andrey)
Access: 2021-09-03 23:23:30.509542251 -0400
Modify: 2021-09-03 23:23:30.893577338 -0400
Change: 2021-09-03 23:23:30.893577338 -0400
Birth: -
File: /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/aa9d1c627d0d4ae1_0
Size: 8406            Blocks: 24         IO Block: 4096   regular file
Device: 3bh/59d Inode: 488895      Links: 1
Access: (0600/-rw-------)  Uid: ( 1000/  andrey)   Gid: ( 1000/  andrey)
Access: 2021-09-03 23:23:35.802021943 -0400
Modify: 2021-09-03 23:23:37.138141842 -0400
Change: 2021-09-03 23:23:37.138141842 -0400
Birth: -
File: /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/24ede0e2ab3e0575_0
Size: 7844            Blocks: 16         IO Block: 4096   regular file
Device: 3bh/59d Inode: 488963      Links: 1
Access: (0600/-rw-------)  Uid: ( 1000/  andrey)   Gid: ( 1000/  andrey)
Access: 2021-09-03 23:23:40.054401865 -0400
Modify: 2021-09-03 23:23:40.362429172 -0400
Change: 2021-09-03 23:23:40.362429172 -0400
Birth: -
File: /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/5aa559bb0d57bd6a_0
Size: 7473            Blocks: 16         IO Block: 4096   regular file
Device: 3bh/59d Inode: 488964      Links: 1
Access: (0600/-rw-------)  Uid: ( 1000/  andrey)   Gid: ( 1000/  andrey)
Access: 2021-09-03 23:23:40.054401865 -0400
Modify: 2021-09-03 23:23:40.370429882 -0400
Change: 2021-09-03 23:23:40.370429882 -0400
Birth: -
File: /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/da80b0a1607292bd_0
Size: 5808            Blocks: 16         IO Block: 4096   regular file
Device: 3bh/59d Inode: 488966      Links: 1
Access: (0600/-rw-------)  Uid: ( 1000/  andrey)   Gid: ( 1000/  andrey)
Access: 2021-09-03 23:23:40.054401865 -0400
Modify: 2021-09-03 23:23:40.226417115 -0400
Change: 2021-09-03 23:23:40.226417115 -0400
Birth: -
File: /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/90b6c5585a06e357_0
Size: 7110            Blocks: 16         IO Block: 4096   regular file
Device: 3bh/59d Inode: 488967      Links: 1
Access: (0600/-rw-------)  Uid: ( 1000/  andrey)   Gid: ( 1000/  andrey)
Access: 2021-09-03 23:23:40.054401865 -0400
Modify: 2021-09-03 23:23:40.362429172 -0400
Change: 2021-09-03 23:23:40.362429172 -0400
Birth: -

Seems like these files were all created within 10 seconds of each other.

After deleting the whole /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache directory,
it seems the filesystem is clean.

$ sudo ./btrfs check /dev/nvme0n1p4
Opening filesystem to check...
Checking filesystem on /dev/nvme0n1p4
UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs
[7/7] checking quota groups skipped (not enabled on this FS)
found 70097395712 bytes used, no error found
total csum bytes: 68235972
total tree bytes: 206290944
total fs tree bytes: 109363200
total extent tree bytes: 13598720
btree space waste bytes: 41683028
file data blocks allocated: 72939855872
 referenced 69761359872

$ sudo ./btrfs check --mode=lowmem /dev/nvme0n1p4
Opening filesystem to check...
Checking filesystem on /dev/nvme0n1p4
UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
[1/7] checking root items
[2/7] checking extents
[3/7] checking free space cache
[4/7] checking fs roots
[5/7] checking only csums items (without verifying data)
[6/7] checking root refs done with fs roots in lowmem mode, skipping
[7/7] checking quota groups skipped (not enabled on this FS)
found 70097395712 bytes used, no error found
total csum bytes: 68235972
total tree bytes: 206290944
total fs tree bytes: 109363200
total extent tree bytes: 13598720
btree space waste bytes: 41683028
file data blocks allocated: 72939855872
 referenced 69761359872

$ sudo ./btrfs scrub status /mnt/hippo/
UUID:             2b69016b-e03b-478a-84cd-f794eddfebd5
Scrub started:    Mon Sep  6 02:06:54 2021
Status:           finished
Duration:         0:00:22
Total to scrub:   65.28GiB
Rate:             2.97GiB/s
Error summary:    no errors found


Can the filesystem now be considered clean as in "never corrupted"?
Or is there still a reason to reformat it?

Would using DUP profile for metadata and system help with this kind of corruption?
Would it be generally advisable to use it going forward?


> 

> The csum missing problem is not a big deal, that can be easily deleted
> by finding inode 31924 of subvolume 257 and delete it.
> Or you can easily ignore it completely.

Seems like it's gone already:

$ sudo ./btrfs inspect-internal inode-resolve 31924 /mnt/hippo/
ERROR: ino paths ioctl: No such file or directory

> 

> Thanks,
> 

> Qu
> 

> > > Can you use some newer btrfs-progs and run check on it again? (not yet
> > > 

> > > repair)
> > 

> > > This time in both original and lowmem mode.
> > 

> > > As the involved btrfs-progs is pretty old, thus newer btrfs-progs (the
> > > 

> > > newer the better) may cause some difference.
> > > 

> > > (Sorry, I should mention it earlier)
> > 

> > No worries.
> > 

> > Just built the latest tag from btrfs-progs repository with
> > 

> > ./configure --prefix="${PWD}/_install" --disable-documentation --disable-shared --disable-convert --disable-python --disable-zoned
> > 

> > $ ./btrfs --version
> > 

> > btrfs-progs v5.13.1
> > 

> > $ sudo ./btrfs check --mode=lowmem /dev/nvme0n1p4
> > 

> > Opening filesystem to check...
> > 

> > Checking filesystem on /dev/nvme0n1p4
> > 

> > UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
> > 

> > [1/7] checking root items
> > 

> > [2/7] checking extents
> > 

> > [3/7] checking free space cache
> > 

> > [4/7] checking fs roots
> > 

> > ERROR: root 257 EXTENT_DATA[31924 5689344] csum missing, have: 36864, expected: 40960
> > 

> > ERROR: errors found in fs roots
> > 

> > found 71181221888 bytes used, error(s) found
> > 

> > total csum bytes: 69299516
> > 

> > total tree bytes: 212942848
> > 

> > total fs tree bytes: 113672192
> > 

> > total extent tree bytes: 14925824
> > 

> > btree space waste bytes: 42179056
> > 

> > file data blocks allocated: 86059712512
> > 

> > referenced 70790922240
> > 

> > $ sudo ./btrfs check /dev/nvme0n1p4
> > 

> > Opening filesystem to check...
> > 

> > Checking filesystem on /dev/nvme0n1p4
> > 

> > UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
> > 

> > [1/7] checking root items
> > 

> > [2/7] checking extents
> > 

> > extent item 3109511168 has multiple extent items
> > 

> > ref mismatch on [3109511168 2105344] extent item 1, found 5
> > 

> > backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111489536
> > 

> > backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
> > 

> > backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111260160
> > 

> > backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
> > 

> > backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111411712
> > 

> > backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=12288
> > 

> > backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111436288
> > 

> > backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=16384
> > 

> > backpointer mismatch on [3109511168 2105344]
> > 

> > extent item 3121950720 has multiple extent items
> > 

> > ref mismatch on [3121950720 2220032] extent item 1, found 4
> > 

> > backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124080640
> > 

> > backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
> > 

> > backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3123773440
> > 

> > backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
> > 

> > backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124051968
> > 

> > backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=12288
> > 

> > backpointer mismatch on [3121950720 2220032]
> > 

> > ERROR: errors found in extent allocation tree or chunk allocation
> > 

> > [3/7] checking free space cache
> > 

> > [4/7] checking fs roots
> > 

> > root 257 inode 31924 errors 1000, some csum missing
> > 

> > ERROR: errors found in fs roots
> > 

> > found 71181148160 bytes used, error(s) found
> > 

> > total csum bytes: 69299516
> > 

> > total tree bytes: 212942848
> > 

> > total fs tree bytes: 113672192
> > 

> > total extent tree bytes: 14925824
> > 

> > btree space waste bytes: 42179056
> > 

> > file data blocks allocated: 86059712512
> > 

> > referenced 70790922240
> > 

> > > Thanks,
> > 

> > > Qu

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS critical: corrupt leaf; BTRFS warning csum failed, expected csum 0x00000000 on AMD Ryzen 7 4800H, Samsung SSD 970 EVO Plus
  2021-09-06  6:13               ` ahipp0
@ 2021-09-06  6:28                 ` Qu Wenruo
  2021-09-06  7:00                   ` ahipp0
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2021-09-06  6:28 UTC (permalink / raw)
  To: ahipp0, Qu Wenruo; +Cc: linux-btrfs



On 2021/9/6 下午2:13, ahipp0 wrote:
[...]
>> You can try to delete them, but the problem is, if it doesn't work well,
>>
>
>> it can cause btrfs to abort transaction (aka, turns into read-only mount).
>>
>
>> Thus you may want to delete them, sync the fs, check the dmesg to make
>>
>
>> sure the fs is still fine.
>
> Hm, looks like it didn't complain.
> (I just nuked the whole .config/SpiderOakONE directory)
>
>> If that works, then btrfs-check again to make sure the problem is gone.
>
> Looks much better now:
>
> $ sudo ./btrfs check /dev/nvme0n1p4
> Opening filesystem to check...
> Checking filesystem on /dev/nvme0n1p4
> UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
> [1/7] checking root items
> [2/7] checking extents
> [3/7] checking free space cache
> [4/7] checking fs roots
> root 257 inode 488887 errors 1000, some csum missing
> root 257 inode 488889 errors 1000, some csum missing
> root 257 inode 488895 errors 1000, some csum missing
> root 257 inode 488963 errors 1000, some csum missing
> root 257 inode 488964 errors 1000, some csum missing
> root 257 inode 488966 errors 1000, some csum missing
> root 257 inode 488967 errors 1000, some csum missing
> ERROR: errors found in fs roots
> found 70414278656 bytes used, error(s) found
> total csum bytes: 68552088
> total tree bytes: 209338368
> total fs tree bytes: 111853568
> total extent tree bytes: 14024704
> btree space waste bytes: 41823418
> file data blocks allocated: 73253691392
>   referenced 70072770560
>
>
> $ sudo ./btrfs check --mode=lowmem /dev/nvme0n1p4
> Opening filesystem to check...
> Checking filesystem on /dev/nvme0n1p4
> UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
> [1/7] checking root items
> [2/7] checking extents
> [3/7] checking free space cache
> [4/7] checking fs roots
> ERROR: root 257 EXTENT_DATA[488887 4096] csum missing, have: 0, expected: 12288
> ERROR: root 257 EXTENT_DATA[488889 4096] csum missing, have: 0, expected: 16384
> ERROR: root 257 EXTENT_DATA[488895 0] csum missing, have: 0, expected: 12288
> ERROR: root 257 EXTENT_DATA[488963 0] csum missing, have: 0, expected: 8192
> ERROR: root 257 EXTENT_DATA[488964 0] csum missing, have: 0, expected: 8192
> ERROR: root 257 EXTENT_DATA[488966 0] csum missing, have: 0, expected: 8192
> ERROR: root 257 EXTENT_DATA[488967 0] csum missing, have: 0, expected: 8192
> ERROR: errors found in fs roots
> found 70414278656 bytes used, error(s) found
> total csum bytes: 68552088
> total tree bytes: 209338368
> total fs tree bytes: 111853568
> total extent tree bytes: 14024704
> btree space waste bytes: 41823418
> file data blocks allocated: 73253691392
>   referenced 70072770560

Even at this stage, your fs is considered clean already.

The missing csum is really not a big deal.
>
>
> Seems like these inodes with zero csums can all be removed too since it's some Steam (built-in browser?) cache.
>
> $ for i in 488887 488889 488895 488963 488964 488966 488967 ; do sudo ./btrfs inspect-internal inode-resolve "$i" /mnt/hippo/ ; done
> /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/f3778f4fc6657764_0
> /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/fc05b030bc3ab2bc_0
> /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/aa9d1c627d0d4ae1_0
> /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/24ede0e2ab3e0575_0
> /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/5aa559bb0d57bd6a_0
> /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/da80b0a1607292bd_0
> /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/90b6c5585a06e357_0
>
> $ for i in 488887 488889 488895 488963 488964 488966 488967 ; do stat $(sudo ./btrfs inspect-internal inode-resolve "$i" /mnt/hippo/) ; done
> File: /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/f3778f4fc6657764_0
> Size: 15094           Blocks: 32         IO Block: 4096   regular file
> Device: 3bh/59d Inode: 488887      Links: 1
> Access: (0600/-rw-------)  Uid: ( 1000/  andrey)   Gid: ( 1000/  andrey)
> Access: 2021-09-03 23:23:30.297522881 -0400
> Modify: 2021-09-03 23:23:30.705560160 -0400
> Change: 2021-09-03 23:23:30.705560160 -0400
> Birth: -
> File: /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/fc05b030bc3ab2bc_0
> Size: 19104           Blocks: 40         IO Block: 4096   regular file
> Device: 3bh/59d Inode: 488889      Links: 1
> Access: (0600/-rw-------)  Uid: ( 1000/  andrey)   Gid: ( 1000/  andrey)
> Access: 2021-09-03 23:23:30.509542251 -0400
> Modify: 2021-09-03 23:23:30.893577338 -0400
> Change: 2021-09-03 23:23:30.893577338 -0400
> Birth: -
> File: /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/aa9d1c627d0d4ae1_0
> Size: 8406            Blocks: 24         IO Block: 4096   regular file
> Device: 3bh/59d Inode: 488895      Links: 1
> Access: (0600/-rw-------)  Uid: ( 1000/  andrey)   Gid: ( 1000/  andrey)
> Access: 2021-09-03 23:23:35.802021943 -0400
> Modify: 2021-09-03 23:23:37.138141842 -0400
> Change: 2021-09-03 23:23:37.138141842 -0400
> Birth: -
> File: /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/24ede0e2ab3e0575_0
> Size: 7844            Blocks: 16         IO Block: 4096   regular file
> Device: 3bh/59d Inode: 488963      Links: 1
> Access: (0600/-rw-------)  Uid: ( 1000/  andrey)   Gid: ( 1000/  andrey)
> Access: 2021-09-03 23:23:40.054401865 -0400
> Modify: 2021-09-03 23:23:40.362429172 -0400
> Change: 2021-09-03 23:23:40.362429172 -0400
> Birth: -
> File: /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/5aa559bb0d57bd6a_0
> Size: 7473            Blocks: 16         IO Block: 4096   regular file
> Device: 3bh/59d Inode: 488964      Links: 1
> Access: (0600/-rw-------)  Uid: ( 1000/  andrey)   Gid: ( 1000/  andrey)
> Access: 2021-09-03 23:23:40.054401865 -0400
> Modify: 2021-09-03 23:23:40.370429882 -0400
> Change: 2021-09-03 23:23:40.370429882 -0400
> Birth: -
> File: /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/da80b0a1607292bd_0
> Size: 5808            Blocks: 16         IO Block: 4096   regular file
> Device: 3bh/59d Inode: 488966      Links: 1
> Access: (0600/-rw-------)  Uid: ( 1000/  andrey)   Gid: ( 1000/  andrey)
> Access: 2021-09-03 23:23:40.054401865 -0400
> Modify: 2021-09-03 23:23:40.226417115 -0400
> Change: 2021-09-03 23:23:40.226417115 -0400
> Birth: -
> File: /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache/90b6c5585a06e357_0
> Size: 7110            Blocks: 16         IO Block: 4096   regular file
> Device: 3bh/59d Inode: 488967      Links: 1
> Access: (0600/-rw-------)  Uid: ( 1000/  andrey)   Gid: ( 1000/  andrey)
> Access: 2021-09-03 23:23:40.054401865 -0400
> Modify: 2021-09-03 23:23:40.362429172 -0400
> Change: 2021-09-03 23:23:40.362429172 -0400
> Birth: -
>
> Seems like these files were all created within 10 seconds of each other.
>
> After deleting the whole /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache directory,
> it seems the filesystem is clean.
>
> $ sudo ./btrfs check /dev/nvme0n1p4
> Opening filesystem to check...
> Checking filesystem on /dev/nvme0n1p4
> UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
> [1/7] checking root items
> [2/7] checking extents
> [3/7] checking free space cache
> [4/7] checking fs roots
> [5/7] checking only csums items (without verifying data)
> [6/7] checking root refs
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 70097395712 bytes used, no error found
> total csum bytes: 68235972
> total tree bytes: 206290944
> total fs tree bytes: 109363200
> total extent tree bytes: 13598720
> btree space waste bytes: 41683028
> file data blocks allocated: 72939855872
>   referenced 69761359872
>
> $ sudo ./btrfs check --mode=lowmem /dev/nvme0n1p4
> Opening filesystem to check...
> Checking filesystem on /dev/nvme0n1p4
> UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
> [1/7] checking root items
> [2/7] checking extents
> [3/7] checking free space cache
> [4/7] checking fs roots
> [5/7] checking only csums items (without verifying data)
> [6/7] checking root refs done with fs roots in lowmem mode, skipping
> [7/7] checking quota groups skipped (not enabled on this FS)
> found 70097395712 bytes used, no error found
> total csum bytes: 68235972
> total tree bytes: 206290944
> total fs tree bytes: 109363200
> total extent tree bytes: 13598720
> btree space waste bytes: 41683028
> file data blocks allocated: 72939855872
>   referenced 69761359872
>
> $ sudo ./btrfs scrub status /mnt/hippo/
> UUID:             2b69016b-e03b-478a-84cd-f794eddfebd5
> Scrub started:    Mon Sep  6 02:06:54 2021
> Status:           finished
> Duration:         0:00:22
> Total to scrub:   65.28GiB
> Rate:             2.97GiB/s
> Error summary:    no errors found
>
>
> Can the filesystem now be considered clean as in "never corrupted"?
> Or is there still a reason to reformat it?

It's completely clean now, congratulations.

BTW, you may want to migrate to v2 space cache.

The relation between v1 cache problem and the block group item mismatch
problem is still unknown, but I guess v1 space cache may cause the problem.

Thus going to v2 would be a little more safe, and faster.
>
> Would using DUP profile for metadata and system help with this kind of corruption?

Nope, the original corruption looks like some bug in btrfs code,
DUP/RAID1 won't help to prevent it at all.

But v5.11 kernel (and newer) can prevent such problem, with their
boosted sanity check.

> Would it be generally advisable to use it going forward?

You can use the fs without any problem.

Thanks,
Qu

>
>
>>
>
>> The csum missing problem is not a big deal, that can be easily deleted
>> by finding inode 31924 of subvolume 257 and delete it.
>> Or you can easily ignore it completely.
>
> Seems like it's gone already:
>
> $ sudo ./btrfs inspect-internal inode-resolve 31924 /mnt/hippo/
> ERROR: ino paths ioctl: No such file or directory
>
>>
>
>> Thanks,
>>
>
>> Qu
>>
>
>>>> Can you use some newer btrfs-progs and run check on it again? (not yet
>>>>
>
>>>> repair)
>>>
>
>>>> This time in both original and lowmem mode.
>>>
>
>>>> As the involved btrfs-progs is pretty old, thus newer btrfs-progs (the
>>>>
>
>>>> newer the better) may cause some difference.
>>>>
>
>>>> (Sorry, I should mention it earlier)
>>>
>
>>> No worries.
>>>
>
>>> Just built the latest tag from btrfs-progs repository with
>>>
>
>>> ./configure --prefix="${PWD}/_install" --disable-documentation --disable-shared --disable-convert --disable-python --disable-zoned
>>>
>
>>> $ ./btrfs --version
>>>
>
>>> btrfs-progs v5.13.1
>>>
>
>>> $ sudo ./btrfs check --mode=lowmem /dev/nvme0n1p4
>>>
>
>>> Opening filesystem to check...
>>>
>
>>> Checking filesystem on /dev/nvme0n1p4
>>>
>
>>> UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
>>>
>
>>> [1/7] checking root items
>>>
>
>>> [2/7] checking extents
>>>
>
>>> [3/7] checking free space cache
>>>
>
>>> [4/7] checking fs roots
>>>
>
>>> ERROR: root 257 EXTENT_DATA[31924 5689344] csum missing, have: 36864, expected: 40960
>>>
>
>>> ERROR: errors found in fs roots
>>>
>
>>> found 71181221888 bytes used, error(s) found
>>>
>
>>> total csum bytes: 69299516
>>>
>
>>> total tree bytes: 212942848
>>>
>
>>> total fs tree bytes: 113672192
>>>
>
>>> total extent tree bytes: 14925824
>>>
>
>>> btree space waste bytes: 42179056
>>>
>
>>> file data blocks allocated: 86059712512
>>>
>
>>> referenced 70790922240
>>>
>
>>> $ sudo ./btrfs check /dev/nvme0n1p4
>>>
>
>>> Opening filesystem to check...
>>>
>
>>> Checking filesystem on /dev/nvme0n1p4
>>>
>
>>> UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
>>>
>
>>> [1/7] checking root items
>>>
>
>>> [2/7] checking extents
>>>
>
>>> extent item 3109511168 has multiple extent items
>>>
>
>>> ref mismatch on [3109511168 2105344] extent item 1, found 5
>>>
>
>>> backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111489536
>>>
>
>>> backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
>>>
>
>>> backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111260160
>>>
>
>>> backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=8192
>>>
>
>>> backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111411712
>>>
>
>>> backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=12288
>>>
>
>>> backref disk bytenr does not match extent record, bytenr=3109511168, ref bytenr=3111436288
>>>
>
>>> backref bytes do not match extent backref, bytenr=3109511168, ref bytes=2105344, backref bytes=16384
>>>
>
>>> backpointer mismatch on [3109511168 2105344]
>>>
>
>>> extent item 3121950720 has multiple extent items
>>>
>
>>> ref mismatch on [3121950720 2220032] extent item 1, found 4
>>>
>
>>> backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124080640
>>>
>
>>> backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
>>>
>
>>> backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3123773440
>>>
>
>>> backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=8192
>>>
>
>>> backref disk bytenr does not match extent record, bytenr=3121950720, ref bytenr=3124051968
>>>
>
>>> backref bytes do not match extent backref, bytenr=3121950720, ref bytes=2220032, backref bytes=12288
>>>
>
>>> backpointer mismatch on [3121950720 2220032]
>>>
>
>>> ERROR: errors found in extent allocation tree or chunk allocation
>>>
>
>>> [3/7] checking free space cache
>>>
>
>>> [4/7] checking fs roots
>>>
>
>>> root 257 inode 31924 errors 1000, some csum missing
>>>
>
>>> ERROR: errors found in fs roots
>>>
>
>>> found 71181148160 bytes used, error(s) found
>>>
>
>>> total csum bytes: 69299516
>>>
>
>>> total tree bytes: 212942848
>>>
>
>>> total fs tree bytes: 113672192
>>>
>
>>> total extent tree bytes: 14925824
>>>
>
>>> btree space waste bytes: 42179056
>>>
>
>>> file data blocks allocated: 86059712512
>>>
>
>>> referenced 70790922240
>>>
>
>>>> Thanks,
>>>
>
>>>> Qu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS critical: corrupt leaf; BTRFS warning csum failed, expected csum 0x00000000 on AMD Ryzen 7 4800H, Samsung SSD 970 EVO Plus
  2021-09-06  6:28                 ` Qu Wenruo
@ 2021-09-06  7:00                   ` ahipp0
  2021-09-06  7:20                     ` Qu Wenruo
  0 siblings, 1 reply; 13+ messages in thread
From: ahipp0 @ 2021-09-06  7:00 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 5190 bytes --]

On Monday, September 6th, 2021 at 2:28 AM, Qu wrote:

<snip>

> > $ sudo ./btrfs check --mode=lowmem /dev/nvme0n1p4
> > Opening filesystem to check...
> > Checking filesystem on /dev/nvme0n1p4
> > UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
> > [1/7] checking root items
> > [2/7] checking extents
> > [3/7] checking free space cache
> > [4/7] checking fs roots
> > ERROR: root 257 EXTENT_DATA[488887 4096] csum missing, have: 0, expected: 12288
> > ERROR: root 257 EXTENT_DATA[488889 4096] csum missing, have: 0, expected: 16384
> > ERROR: root 257 EXTENT_DATA[488895 0] csum missing, have: 0, expected: 12288
> > ERROR: root 257 EXTENT_DATA[488963 0] csum missing, have: 0, expected: 8192
> > ERROR: root 257 EXTENT_DATA[488964 0] csum missing, have: 0, expected: 8192
> > ERROR: root 257 EXTENT_DATA[488966 0] csum missing, have: 0, expected: 8192
> > ERROR: root 257 EXTENT_DATA[488967 0] csum missing, have: 0, expected: 8192
> > ERROR: errors found in fs roots
> > found 70414278656 bytes used, error(s) found
> > total csum bytes: 68552088
> > total tree bytes: 209338368
> > total fs tree bytes: 111853568
> > total extent tree bytes: 14024704
> > btree space waste bytes: 41823418
> > file data blocks allocated: 73253691392
> > referenced 70072770560
> 

> Even at this stage, your fs is considered clean already.
> 

> The missing csum is really not a big deal.

Ah, ok.

<snip>

> > 

> > After deleting the whole /mnt/hippo//home-andrey/.steam/debian-installation/config/htmlcache/Cache directory,
> > it seems the filesystem is clean.
> > 

> > $ sudo ./btrfs check /dev/nvme0n1p4
> > Opening filesystem to check...
> > Checking filesystem on /dev/nvme0n1p4
> > UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
> > [1/7] checking root items
> > [2/7] checking extents
> > [3/7] checking free space cache
> > [4/7] checking fs roots
> > [5/7] checking only csums items (without verifying data)
> > [6/7] checking root refs
> > [7/7] checking quota groups skipped (not enabled on this FS)
> > found 70097395712 bytes used, no error found
> > total csum bytes: 68235972
> > total tree bytes: 206290944
> > total fs tree bytes: 109363200
> > total extent tree bytes: 13598720
> > btree space waste bytes: 41683028
> > file data blocks allocated: 72939855872
> > referenced 69761359872
> > 

> > $ sudo ./btrfs check --mode=lowmem /dev/nvme0n1p4
> > Opening filesystem to check...
> > Checking filesystem on /dev/nvme0n1p4
> > UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
> > [1/7] checking root items
> > [2/7] checking extents
> > [3/7] checking free space cache
> > [4/7] checking fs roots
> > [5/7] checking only csums items (without verifying data)
> > [6/7] checking root refs done with fs roots in lowmem mode, skipping
> > [7/7] checking quota groups skipped (not enabled on this FS)
> > found 70097395712 bytes used, no error found
> > total csum bytes: 68235972
> > total tree bytes: 206290944
> > total fs tree bytes: 109363200
> > total extent tree bytes: 13598720
> > btree space waste bytes: 41683028
> > file data blocks allocated: 72939855872
> > referenced 69761359872
> > 

> > $ sudo ./btrfs scrub status /mnt/hippo/
> > UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
> > Scrub started: Mon Sep 6 02:06:54 2021
> > Status: finished
> > Duration: 0:00:22
> > Total to scrub: 65.28GiB
> > Rate: 2.97GiB/s
> > Error summary: no errors found
> > 

> > Can the filesystem now be considered clean as in "never corrupted"?
> > 

> > Or is there still a reason to reformat it?
> 

> It's completely clean now, congratulations.

Woo-hoo, awesome, thank you so much for your help!!

BTRFS rocks!

> BTW, you may want to migrate to v2 space cache.

Ah, OK, migrated.
I guess I used v1 just because it's the default.

$ sudo ./btrfs check --clear-space-cache v1 /dev/nvme0n1p4
Opening filesystem to check...
Checking filesystem on /dev/nvme0n1p4
UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
Free space cache cleared

$ sudo ./btrfs check --clear-space-cache v2 /dev/nvme0n1p4
Opening filesystem to check...
Checking filesystem on /dev/nvme0n1p4
UUID: 2b69016b-e03b-478a-84cd-f794eddfebd5
no free space cache v2 to clear

$ mount -o noatime,subvol=andrey,space_cache=v2 /dev/nvme0n1p4 /mnt/hippo/
$


> The relation between v1 cache problem and the block group item mismatch
> problem is still unknown, but I guess v1 space cache may cause the problem.
> 

> Thus going to v2 would be a little more safe, and faster.

Huh, ok, let's see how v2 performs now.

> > Would using DUP profile for metadata and system help with this kind of corruption?
> 

> Nope, the original corruption looks like some bug in btrfs code,
> DUP/RAID1 won't help to prevent it at all.

Oh, even RAID1 wouldn't have helped?

> But v5.11 kernel (and newer) can prevent such problem, with their
> boosted sanity check.

That's great!

> > Would it be generally advisable to use it going forward?
> You can use the fs without any problem.

Nice!

Is it generally advisable to use DUP profile for metadata and system going forward?

> 

> Thanks,
> 

> Qu

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS critical: corrupt leaf; BTRFS warning csum failed, expected csum 0x00000000 on AMD Ryzen 7 4800H, Samsung SSD 970 EVO Plus
  2021-09-06  7:00                   ` ahipp0
@ 2021-09-06  7:20                     ` Qu Wenruo
  2021-09-06  7:36                       ` ahipp0
  0 siblings, 1 reply; 13+ messages in thread
From: Qu Wenruo @ 2021-09-06  7:20 UTC (permalink / raw)
  To: ahipp0; +Cc: Qu Wenruo, linux-btrfs



On 2021/9/6 下午3:00, ahipp0 wrote:
[...]
>> Nope, the original corruption looks like some bug in btrfs code,
>> DUP/RAID1 won't help to prevent it at all.
>
> Oh, even RAID1 wouldn't have helped?

If btrfs module determines to write some corrupted metadata back to
disk, both copy will contain the corruption, thus RAID1 won't help.

RAID can only help if the real problem is disks (either missing device,
or really rotten bits).
>
>> But v5.11 kernel (and newer) can prevent such problem, with their
>> boosted sanity check.
>
> That's great!
>
>>> Would it be generally advisable to use it going forward?
>> You can use the fs without any problem.
>
> Nice!
>
> Is it generally advisable to use DUP profile for metadata and system going forward?

As a generic idea, have more duplication for metadata is always good,
even if you're using a SSD.

Thus DUP/RAID1/RAID10 is always recommended for metadata.

Thanks,
Qu

>
>>
>
>> Thanks,
>>
>
>> Qu

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: BTRFS critical: corrupt leaf; BTRFS warning csum failed, expected csum 0x00000000 on AMD Ryzen 7 4800H, Samsung SSD 970 EVO Plus
  2021-09-06  7:20                     ` Qu Wenruo
@ 2021-09-06  7:36                       ` ahipp0
  0 siblings, 0 replies; 13+ messages in thread
From: ahipp0 @ 2021-09-06  7:36 UTC (permalink / raw)
  To: Qu Wenruo; +Cc: Qu Wenruo, linux-btrfs


[-- Attachment #1.1: Type: text/plain, Size: 935 bytes --]

On Monday, September 6th, 2021 at 3:20 AM, Qu wrote:

> On 2021/9/6 下午3:00, ahipp0 wrote:
> 

> [...]
> 

> > > Nope, the original corruption looks like some bug in btrfs code,
> > > 

> > > DUP/RAID1 won't help to prevent it at all.
> > 

> > Oh, even RAID1 wouldn't have helped?
> 

> If btrfs module determines to write some corrupted metadata back to
> disk, both copy will contain the corruption, thus RAID1 won't help.
> 

> RAID can only help if the real problem is disks (either missing device,
> or really rotten bits).

Yeah, fair enough.

<snip>

> > Is it generally advisable to use DUP profile for metadata and system going forward?
> As a generic idea, have more duplication for metadata is always good,
> even if you're using a SSD.
> 

> Thus DUP/RAID1/RAID10 is always recommended for metadata.

Yeah, makes sense.
I enabled DUP for metadata.


Thank you again for your help!

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 249 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-09-06  7:36 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-05  7:34 BTRFS critical: corrupt leaf; BTRFS warning csum failed, expected csum 0x00000000 on AMD Ryzen 7 4800H, Samsung SSD 970 EVO Plus ahipp0
2021-09-06  1:08 ` Qu Wenruo
2021-09-06  2:35   ` ahipp0
2021-09-06  2:47     ` Qu Wenruo
2021-09-06  3:05       ` ahipp0
2021-09-06  3:36         ` Qu Wenruo
2021-09-06  4:07           ` ahipp0
2021-09-06  5:20             ` Qu Wenruo
2021-09-06  6:13               ` ahipp0
2021-09-06  6:28                 ` Qu Wenruo
2021-09-06  7:00                   ` ahipp0
2021-09-06  7:20                     ` Qu Wenruo
2021-09-06  7:36                       ` ahipp0

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.